CN116471173A - Network fault troubleshooting method and device and terminal equipment - Google Patents

Network fault troubleshooting method and device and terminal equipment Download PDF

Info

Publication number
CN116471173A
CN116471173A CN202310450024.6A CN202310450024A CN116471173A CN 116471173 A CN116471173 A CN 116471173A CN 202310450024 A CN202310450024 A CN 202310450024A CN 116471173 A CN116471173 A CN 116471173A
Authority
CN
China
Prior art keywords
network
fault
network device
determining
network equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310450024.6A
Other languages
Chinese (zh)
Inventor
姚成龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202310450024.6A priority Critical patent/CN116471173A/en
Publication of CN116471173A publication Critical patent/CN116471173A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The application provides a network fault troubleshooting method and device, wherein the method comprises the following steps: under the condition that network fault investigation conditions are met, first network equipment to be investigated in a network, which comprises a plurality of pieces of information for forwarding, is determined, then a detection instruction is sent to each first network equipment, each first network equipment is instructed to send detection messages to next-stage first network equipment, then a historical record which is reported by each first network equipment and contains the detection messages sent and/or received by the first network equipment is obtained, and fault network equipment is determined according to log information reported by each first network equipment. Therefore, different access paths to be examined are determined in a targeted manner according to different network fault examination conditions, and a first network device initiates a detection message to locate a fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.

Description

Network fault troubleshooting method and device and terminal equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a network fault troubleshooting method, device, and terminal device.
Background
The server in the data center may have abnormal phenomena such as slow service response, access failure and the like in the running process, and the abnormal phenomena may be caused by the failure of the network on which the server depends. At this time, network fault investigation needs to be performed on the network of the data center, so that the normal operation of the server is ensured.
The method for network fault detection through the detection agent deployed in the server has safety risks and high deployment cost.
Disclosure of Invention
The application provides a network fault investigation method and device, so as to reduce the safety risk and cost of fault investigation. The technical scheme of the application is as follows:
according to a first aspect of an embodiment of the present application, an embodiment of the present application provides a network failure troubleshooting method, including:
determining an access path to be checked in a network under the condition that the network fault checking condition is met, wherein the access path comprises a plurality of first network devices for forwarding messages;
sending a detection instruction to each first network device, and indicating each first network device to send a detection message to the next-stage first network device;
acquiring log information reported by each first network device, wherein the log information comprises a history record of the first network device sending and/or receiving the detection message;
And determining the fault network equipment according to the log information reported by each first network equipment.
According to a second aspect of the embodiments of the present application, the embodiments of the present application provide a network fault troubleshooting device, including:
the first determining module is used for determining an access path to be checked in the network under the condition that the network fault checking condition is met, wherein the access path comprises a plurality of first network devices used for forwarding messages;
the sending module is used for sending a detection instruction to each first network device and indicating each first network device to send a detection message to the next-stage first network device;
the acquisition module is used for acquiring log information reported by each first network device, wherein the log information comprises a history record of the first network device sending and/or receiving the detection message;
and the second determining module is used for determining the fault network equipment according to the log information reported by each first network equipment.
According to a third aspect of embodiments of the present application, there is provided a terminal device, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute instructions to implement a network troubleshooting method as an embodiment of the first aspect described above.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium, which when executed by a processor of a terminal device, enables the terminal device to perform a network failure troubleshooting method as in the embodiments of the first aspect described above.
According to a fifth aspect of embodiments of the present application, there is provided a computer program product comprising a computer program for execution by a processor of a network troubleshooting method as in the embodiments of the first aspect described above.
The technical scheme provided by the embodiment of the application at least brings the following beneficial effects: under the condition that network fault investigation conditions are met, first network equipment to be investigated in a network, which comprises a plurality of pieces of information for forwarding, is determined, then a detection instruction is sent to each first network equipment, each first network equipment is instructed to send detection messages to next-stage first network equipment, then a historical record which is reported by each first network equipment and contains the detection messages sent and/or received by the first network equipment is obtained, and fault network equipment is determined according to log information reported by each first network equipment. Therefore, different access paths to be examined are determined in a targeted manner according to different network fault examination conditions, and a first network device initiates a detection message to locate a fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application and do not constitute an undue limitation on the application.
Fig. 1 is a schematic flow chart of a network fault detection method according to a first embodiment of the present application;
fig. 2 is a flow chart of another network fault detection method according to the second embodiment of the present application;
fig. 3 is a flow chart of another network fault detection method according to the third embodiment of the present application;
fig. 4 is a schematic structural diagram of a network fault checking device according to a fifth embodiment of the present application;
fig. 5 is a block diagram illustrating a terminal device for network troubleshooting, according to an exemplary embodiment.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
In general, by deploying relevant detection agents on a large-scale service server, a simulation service message initiates massive indiscriminate Ping and Traceroute detection, so as to realize the positioning of network faults. However, when the probe agent is abnormal, abnormal service may be caused, so that not only is the cost for deploying the probe agent high, but also safety risks may be brought to normal operation of the service. On the other hand, when the data center network has an equal cost multi-path (Equal Cost Multi Path, ECMP) scene, the probe message often cannot completely cover all paths, and a link probe blind area exists, so that network fault positioning is inaccurate.
In the method, different access paths to be examined are determined in a targeted manner according to different network fault examination conditions, and each first network device in the access paths is instructed to initiate a detection message to a first network device at the next stage so as to locate a fault point. Because the detection agent deployed on the server is not relied on, all detection messages are initiated by the first network equipment in the access path, and the development, operation and maintenance cost and the security risk of large-scale introduction of the detection agent are reduced. Meanwhile, the first network equipment in the access path to be checked is checked Hop by Hop (Hop-by-Hop), so that a link detection blind area is avoided, and the accuracy and the efficiency of fault check are improved.
The network fault troubleshooting method and device of the embodiment of the application are described below with reference to the accompanying drawings. The network fault checking method provided by the application can be executed by the network fault checking device (hereinafter referred to as checking device).
Fig. 1 is a flowchart of a network fault detection method provided in an embodiment of the present application, including the following steps:
step 101, determining an access path to be checked in a network under the condition that the network fault checking condition is met, wherein the access path comprises a plurality of first network devices for forwarding messages.
The network fault investigation condition is that request information of network fault investigation is received, wherein the request information comprises a first identifier of a source server, a second identifier of a target server and a fault time point. The first and second identities may be information for uniquely determining the server, such as an internet protocol address (Internet Protocol Address, IP), a media access control address (Media Access Control Address, MAC), etc. Or the network fault checking condition is that the network health state index of any network equipment in the network is monitored to be smaller than a first threshold value. The network health status index is used for indicating the congestion degree or indicating the delay of the network equipment and the like. Or the network fault checking condition is that the current time is a preset time point. The present application is not limited in this regard.
In the application, under the condition that the network fault investigation conditions are met, different access paths to be investigated can be determined aiming at meeting different network fault investigation conditions. So as to realize the targeted fault detection of part or all of the network devices in the network. Therefore, the missing of the access path can be avoided and the accuracy of fault investigation can be improved while the fault investigation efficiency is improved.
In the application, under the condition that the request information of network fault investigation is received, according to the first identifier of the source server, the second identifier of the target server and the fault time point of the request information, the second network equipment on the connection path between the source server and the target server, which corresponds to the network at the fault time point, is inquired, and the first identifier and the second identifier are matched with the message forwarding table reported by the second network equipment at the fault time point, so that the access path to be investigated is determined.
Or when the network health status index of any network equipment in the network is monitored to be smaller than the first threshold value, inquiring the network topology corresponding to the network at the current moment, and determining the connection path containing the network equipment as an access path to be examined.
Or, when the current time is a preset time point, performing fault detection of the whole network, and finding out the network equipment with the fault in time. For example, at 24 points per day, each connection path in the network topology corresponding to the network at the moment is determined as an access path to be checked, so as to perform fault checking on all connection paths in the whole network. The operation reliability of the network equipment is guaranteed.
Step 102, sending a detection instruction to each first network device, and instructing each first network device to send a detection message to the next-stage first network device.
In the present application, the probe packet may be sent in the following manner: a probe instruction may be generated and sent to each first network device. The probe instruction may include access path information to be examined. Therefore, each first network device can determine the next-stage first network device according to the access path, and send a plurality of detection messages to the next-stage first network device so as to determine the network device with faults in each first network device.
Alternatively, the detection message may be sent in the following manner: and the first network equipment sends a detection message to the next-stage first network equipment, and instructs the next-stage first network equipment to forward the detection message in turn according to the access path until the detection message is sent to the last first network equipment in the access path, so as to determine the network equipment with faults in each first network equipment.
In addition, a detection instruction can be sent to the first network device in the access path to instruct the first network device to send a detection message to the source server so as to detect the network state between the first network device and the source server.
Step 103, obtaining log information reported by each first network device, wherein the log information comprises a history record of sending and/or receiving the detection message by the first network device.
In the application, a log reporting instruction may be sent to each first network device, so as to instruct each first network device to send and receive a history record of the detection message in the report log when sending and receiving the detection message. Or, each first network device may be further configured to report log information in real time. Therefore, the history record of the detection messages sent and/or received by each first network device can be obtained. The history record may include a time point of sending or receiving the detection message, identification information of the first network device sending the detection message, identification information of the first network device receiving the detection message, and the like.
And 104, determining the fault network equipment according to the log information reported by each first network equipment.
In the present application, when the probe message is sent in the second mode in step 102, the history of sending and/or receiving the probe message in each log information is analyzed, and the fault index of each first network device is determined. In the case that the failure index of a certain first network device is greater than or equal to the second threshold value, which indicates that the first network device may be abnormal, the first network device may be determined to be a third network device. And the third network device located at the back in the access path is normal, but is affected by the third network device located at the front, resulting in a higher failure index of the third network device located at the back. For example, the first third network device loses three data packets, so that the third network device of the second does not receive the three packets, and the packet loss rate of the third network device of the second is higher. Thus, the first third network device in the access path may be determined as the failed network device. The failure index may be delay, packet loss rate, etc., which is not limited in this application.
Optionally, in the adjacent two third network devices, if the fault index of the previous third network device is greater than the second threshold, and the difference between the fault index of the next third network device and the fault index of the previous third network device is greater than the preset threshold, the adjacent two third network devices are both determined to be the fault network devices.
Optionally, when the detection message is sent in the manner in step 102, when the failure index of each first network device is smaller than the second threshold and the sum of the failure indexes of the plurality of first network devices in the access path is greater than or equal to the third threshold, it is indicated that the transmission performance of the whole access path is poor. The plurality of first network devices may be determined to be failed network devices.
Optionally, the history record of the sending and/or receiving detection messages in each log information may be parsed to determine the fault index of each first network device. In a case where the failure index of a certain first network device is greater than or equal to the second threshold value, which indicates that the first network device may be abnormal, the first network device may be determined as a failed network device.
Optionally, when the failure index of each first network device is smaller than the second threshold value and the sum of the failure indexes of the plurality of first network devices in the access path is smaller than the fifth threshold value, it is indicated that the first network device is not abnormal. Thus, it can be determined that the network is normal.
In the method, under the condition that network fault investigation conditions are met, first network equipment to be investigated in a network, which comprises a plurality of pieces of information for forwarding, is determined, then a detection instruction is sent to each first network equipment, each first network equipment is instructed to send detection messages to next-stage first network equipment, then a history record which is reported by each first network equipment and contains the detection messages sent and/or received by the first network equipment is obtained, and fault network equipment is determined according to log information reported by each first network equipment. Therefore, different access paths to be examined are determined in a targeted manner according to different network fault examination conditions, and a first network device initiates a detection message to locate a fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.
Fig. 2 is a flowchart of a network fault detection method provided in an embodiment of the present application, including the following steps:
step 201, receiving request information of network fault investigation, querying network topology corresponding to a network at a fault time point, and determining network equipment on a connection path between a source server and a target server as second network equipment, wherein the request information comprises a first identifier of the source server, a second identifier of the target server and the fault time point.
In the application, when a service abnormal phenomenon such as a request for a stuck service occurs, a service operation and maintenance personnel can determine a first identifier of a source server and a second identifier of a target server corresponding to the stuck service request and a fault time point of the stuck service request. And then, the first identifier of the source server, the second identifier of the target server and the fault time point can be input into the client terminal corresponding to the checking device. The client terminal may generate request information including the first identifier of the source server, the second identifier of the target server, and the failure time point, and send the request information to the troubleshooting device. Thus, the checking device can check network faults according to the first identifier of the source server, the second identifier of the target server and the fault time point in the checking request.
In the application, the fault time point can be compared with the time points corresponding to the network topologies stored in the system, and the network topology corresponding to the network at the fault time point can be determined. And then, inquiring the network topology to determine a connection path between the source server and the target server, and determining the network equipment on the connection path as second network equipment.
In the application, due to the fault of connection between network devices, or the increase of network devices when the traffic flow increases, or the removal of part of network devices when the traffic flow decreases, the number of network devices or the connection in the network changes, so that the topology structure corresponding to the network at different moments changes.
Therefore, the terminal list item can be set in the network device to be reported to the checking device at preset time intervals. Or when the terminal list item changes, the network equipment is provided with the terminal list item which is reported to the checking device after the change. Therefore, the checking device can store the terminal list item and the reporting time association reported by each network device in the system. And then, respectively analyzing the terminal table items corresponding to each time point, determining the network topology corresponding to the network at each time point, and storing the network topology in the system. The terminal table entry is used to indicate a connection relationship between network devices, and the terminal table entry may include a link layer discovery protocol (Link Layer Discovery Protocol, LLDP), MAC, address resolution protocol (Address Resolution Protocol, ARP), routing table (Route), and other table entries.
Step 202, the first identifier and the second identifier are matched with a message forwarding table reported by the second network device at a fault time point, and the first network device is determined from a plurality of second network devices.
In this application, although there are multiple connection paths between the source server and the target server, at a certain time, the network allocates only one connection path for transmitting messages between the source server and the target server. And the message forwarding table (Forward Information dataBase, FIB) of the network device at a certain moment is used for indicating the source server and the target server corresponding to the message supported by the network device at the moment.
Therefore, the fault time point can be matched with the time point corresponding to the message forwarding table of each second network device stored in the system, and the message forwarding table of each second network device at the fault time point can be determined. And then, matching the first identifier and the second identifier with a message forwarding table reported by the second network equipment at a fault time point, and determining the first network equipment from the plurality of second network equipment.
For example, there are 3 connection paths between the source server and the target server. The connection path 1 connects the source server and the target server through the second network device 1 and the second network device 2 in order. The connection path 2 connects the source server and the target server sequentially through the second network device 1, the second network device 3, and the second network device 4. The connection path 3 connects the source server and the target server sequentially through the second network device 1, the second network device 3, and the second network device 5. The similarity between the first identifier and the second identifier and the addresses of the network segments under the destination field in the message forwarding table reported by the second network device 1 at the time point of the failure can be calculated. And when the similarity between the first identifier and the second identifier and the same network segment address is highest, and the next-hop network equipment corresponding to the same network segment address is 1, determining the next-hop network equipment corresponding to the same network segment address as the first network equipment. When the next-hop network device is the second network device 3, the first identifier and the second identifier may be continuously matched with the message forwarding table of the second network device 3 at the failure time point, and the next-hop network device (i.e. the first network device) of the second network device 3 is determined until all the first network devices in the plurality of second network devices are determined. Further, since the second network device 1 is a network device through which the source server to the target server must pass, the second network device 1 is determined as the first network device.
In the ECMP scenario, the next hop network device corresponding to the same network segment address is plural. For example, the next-hop network devices corresponding to the same network segment address are the second network device 2 and the second network device 3. At this time, the checking device may call an interface corresponding to load balancing in the second network device 1, and send the first identifier and the second identifier to the second network device 1. The second network device 1 may invoke a preset load balancing algorithm to determine the next-hop network device, and feed back the identifier of the next-hop network device to the checking device, so that the checking device may determine the next-hop network device as the first network device.
In addition, a message forwarding table may be set in each network device to report to the checking device at preset time intervals. Or when the message forwarding table is changed, the network equipment is provided with the message forwarding table which reports the change to the checking device. Therefore, the checking device can store the message forwarding table and the reporting time association reported by each network device in the system.
In step 203, the path formed by each first network device in the network topology is determined as the access path to be examined.
In the present application, a path formed by each first network device in the network topology is a failure time point, and a path through which a message transmitted between the source server and the target server flows. Thus, the path can be determined as an access path to be examined. The access path which may generate a fault can be accurately determined while the network equipment to be checked is reduced. Therefore, the accuracy and the efficiency of fault detection can be improved.
Alternatively, in the case where there is only one connection path between the source server and the target server, the connection path is determined as the access path to be examined.
Step 204, a detection instruction is sent to each first network device, and each first network device is instructed to send a detection message to the next-stage first network device.
Step 205, obtaining log information reported by each first network device, where the log information includes a history record of sending and/or receiving a detection message by the first network device.
And 206, determining the fault network equipment according to the log information reported by each first network equipment.
In this application, the specific process from step 203 to step 204 may be referred to in any embodiment of the present application, and will not be described herein.
In the method, after first identification of a source server, second identification of a target server and request information of a fault time point are received for network fault investigation, network topology corresponding to a network at the fault time point is queried, network equipment on a connecting path between the source server and the target server is determined to be second network equipment, the first identification and the second identification are matched with a message forwarding table reported by the second network equipment at the fault time point, first network equipment is determined from a plurality of second network equipment, then a path formed by the first network equipment in the network topology is determined to be an access path to be investigated, a detection instruction is sent to each first network equipment, each first network equipment is instructed to send a detection message to the next first network equipment, log information containing a history record of the first network equipment sending and/or receiving the detection message, reported by each first network equipment is obtained, and fault network equipment is determined according to the log information reported by each first network equipment. And the first network equipment initiates a detection message to locate the fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.
Fig. 3 is a flowchart of a network fault detection method provided in an embodiment of the present application, including the following steps:
step 301, in response to monitoring that the network health status index of any network device in the network is smaller than the first threshold, querying the network topology corresponding to the network at the current moment, and determining a connection path including any network device as an access path to be examined, wherein the access path includes a plurality of first network devices for forwarding messages.
In the application, in order to ensure the stability of network operation, the network health status monitoring tool can be used for monitoring the network health status of each network device in the network to obtain the network health status index of each network device. When the network health status index of a certain network device is smaller than the first threshold, the overall communication quality of the connection path to which the network device belongs may be poor, or the network health status index of the network device is lower due to the influence of the abnormality of other network devices on the connection path to which the network device belongs. Therefore, the network topology corresponding to the network at the current moment can be queried, and the connection path containing any network device is determined as the access path to be examined. Therefore, the access path to be checked can be effectively reduced, and the fault checking accuracy is ensured and meanwhile the fault checking efficiency is improved.
Step 302, a detection instruction is sent to each first network device, and each first network device is instructed to send a detection message to a next-stage first network device.
Step 303, obtaining log information reported by each first network device, where the log information includes a history record of sending and/or receiving a detection message by the first network device.
And step 304, determining the fault network equipment according to the log information reported by each first network equipment.
For a specific process from step 302 to step 304, reference may be made to the detailed description of any embodiment of the present application, and the detailed description is omitted herein.
In the method, under the condition that the network health status index of any network device in a network is monitored to be smaller than a first threshold, the network topology corresponding to the network at the current moment is queried, a connection path containing any network device is determined to be an access path to be examined, then a detection instruction is sent to each first network device in the access path, each first network device is instructed to send a detection message to the next-stage first network device, then log information, which is reported by each first network device and contains a history record of the detection message sent and/or received by the first network device, is obtained, and fault network devices are determined according to the log information reported by each first network device. And the first network equipment in the access path initiates a detection message to locate the fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.
Fig. 4 is a block diagram illustrating a network troubleshooting device, according to an example embodiment. Referring to fig. 4, the apparatus includes a first determining module 410, a transmitting module 420, an acquiring module 430, and a second determining module 440.
The first determining module is used for determining an access path to be checked in the network under the condition that the network fault checking condition is met, wherein the access path comprises a plurality of first network devices used for forwarding messages;
the sending module is used for sending a detection instruction to each first network device and indicating each first network device to send a detection message to the next-stage first network device;
the acquisition module is used for acquiring log information reported by each first network device, wherein the log information comprises a history record of the first network device sending and/or receiving the detection message;
and the second determining module is used for determining the fault network equipment according to the log information reported by each first network equipment.
In one possible implementation manner of the embodiment of the present application, the network fault troubleshooting condition includes:
receiving request information for network fault investigation, wherein the request information comprises a first identifier of a source server, a second identifier of a target server and a fault time point; or alternatively, the process may be performed,
Monitoring that the network health status index of any network device in the network is smaller than a first threshold; or alternatively, the process may be performed,
the current time is a preset time point.
In one possible implementation manner of the embodiment of the present application, the network failure troubleshooting condition is that the request information for network failure troubleshooting is received, and the first determining module 410 is configured to:
inquiring network topology corresponding to the network at the fault time point, and determining network equipment on a connecting path between the source server and the target server as second network equipment;
matching the first identifier and the second identifier with a message forwarding table reported by the second network equipment at a fault time point, and determining the first network equipment from a plurality of second network equipment;
and determining a path formed by each first network device in the network topology as an access path to be examined.
In one possible implementation manner of this embodiment of the present application, the network failure detection condition is that it is monitored that a network health status indicator of any network device in the network is smaller than a first threshold, and the first determining module 410 is configured to:
inquiring the network topology corresponding to the network at the current moment, and determining the connection path containing any network equipment as an access path to be examined.
In one possible implementation manner of the embodiment of the present application, the network fault detection condition is that the current time is a preset time point, and the first determining module 410 is configured to:
and determining each connection path in the network topology corresponding to the network at the current moment as an access path to be examined.
In one possible implementation manner of the embodiment of the present application, the obtaining module 430 is further configured to:
acquiring a terminal list item reported by each network device in the network at each time point;
the network fault checking device further comprises:
and the third determining module is used for respectively analyzing the terminal table items corresponding to each time point and determining the network topology corresponding to the network of each time point.
In one possible implementation manner of the embodiment of the present application, the second determining module 440 is configured to:
analyzing the history record of the transmitted and/or received detection messages in each log information, and determining the fault index of each first network device, wherein the fault index is delay or packet loss rate;
determining any first network device as a third network device under the condition that the fault index of any first network device is greater than or equal to a second threshold value;
and determining the first third network device in the access path as a fault network device.
In one possible implementation manner of the embodiment of the present application, the second determining module 440 is further configured to:
and determining the plurality of first network devices as faulty network devices under the condition that the fault index of each first network device is smaller than the second threshold value and the sum of the fault indexes of the plurality of first network devices in the access path is larger than or equal to the third threshold value.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
In the method, under the condition that network fault investigation conditions are met, first network equipment to be investigated in a network, which comprises a plurality of pieces of information for forwarding, is determined, then a detection instruction is sent to each first network equipment, each first network equipment is instructed to send detection messages to next-stage first network equipment, then a history record which is reported by each first network equipment and contains the detection messages sent and/or received by the first network equipment is obtained, and fault network equipment is determined according to log information reported by each first network equipment. Therefore, different access paths to be examined are determined in a targeted manner according to different network fault examination conditions, and a first network device initiates a detection message to locate a fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.
Fig. 5 is a block diagram illustrating a terminal device for network troubleshooting, according to an exemplary embodiment.
As shown in fig. 5, the terminal device 500 includes:
the memory 510 and the processor 520, the bus 530 connecting the different components (including the memory 510 and the processor 520), the memory 510 stores a computer program, and the processor 520 executes the program to implement the network fault detection method according to the embodiments of the present application.
Bus 530 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Terminal device 500 typically includes a variety of electronic device readable media. Such media can be any available media that is accessible by terminal device 500 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 510 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 540 and/or cache memory 550. The terminal device 500 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 560 may be used to read from or write to a non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 530 through one or more data media interfaces. Memory 510 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the present application.
A program/utility 580 having a set (at least one) of program modules 570 may be stored in, for example, memory 510, such program modules 570 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 570 generally perform the functions and/or methodologies in the embodiments described herein.
The terminal device 500 may also communicate with one or more external devices 590 (e.g., keyboard, pointing device, display 591, etc.), one or more devices that enable a user to interact with the terminal device 500, and/or any devices (e.g., network card, modem, etc.) that enable the terminal device 500 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 592. Also, terminal device 500 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, via network adapter 593. As shown, network adapter 593 communicates with other modules of terminal device 500 via bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with terminal device 500, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processor 520 executes various functional applications and data processing by running programs stored in the memory 510.
It should be noted that, the implementation process and the technical principle of the terminal device in this embodiment refer to the foregoing explanation of the network fault detection method in this embodiment, and are not repeated herein.
In addition, the above-mentioned terminal device may also implement the network fault detection method as described above, and the implementation process and technical principle of the terminal device in this embodiment refer to the foregoing explanation of the network fault detection method in this embodiment, which is not repeated herein.
In the method, under the condition that network fault investigation conditions are met, first network equipment to be investigated in a network, which comprises a plurality of pieces of information for forwarding, is determined, then a detection instruction is sent to each first network equipment, each first network equipment is instructed to send detection messages to next-stage first network equipment, then a history record which is reported by each first network equipment and contains the detection messages sent and/or received by the first network equipment is obtained, and fault network equipment is determined according to log information reported by each first network equipment. Therefore, different access paths to be examined are determined in a targeted manner according to different network fault examination conditions, and a first network device initiates a detection message to locate a fault point. The cost and the safety risk of large-scale network fault investigation are reduced, and meanwhile, the accuracy and the efficiency of the fault investigation are improved.
In an exemplary embodiment, the application also provides a computer-readable storage medium comprising instructions, e.g. a memory comprising instructions, executable by a processor of a terminal device to perform the above-mentioned method. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In order to implement the above embodiments, the present application also provides a computer program product which, when executed by a processor of a terminal device, enables the terminal device to perform the network troubleshooting method as described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (12)

1. A network failure troubleshooting method, comprising:
determining an access path to be checked in a network under the condition that a network fault checking condition is met, wherein the access path comprises a plurality of first network devices for forwarding messages;
sending a detection instruction to each first network device, and indicating each first network device to send a detection message to a next-stage first network device;
acquiring log information reported by each first network device, wherein the log information comprises a history record of sending and/or receiving detection messages by the first network device;
and determining the fault network equipment according to the log information reported by each first network equipment.
2. The method of claim 1, wherein the network troubleshooting condition comprises:
receiving request information for network fault investigation, wherein the request information comprises a first identifier of a source server, a second identifier of a target server and a fault time point; or alternatively, the process may be performed,
Monitoring that the network health status index of any network device in the network is smaller than a first threshold; or alternatively, the process may be performed,
the current time is a preset time point.
3. The method of claim 2, wherein the network troubleshooting condition is that request information for network troubleshooting is received, and the determining an access path to be troubleshooted in the network includes:
inquiring network topology corresponding to the network at the fault time point, and determining network equipment on a connecting path between the source server and the target server as second network equipment;
matching the first identifier and the second identifier with a message forwarding table reported by the second network equipment at the fault time point, and determining a first network equipment from a plurality of second network equipment;
and determining a path formed by each first network device in the network topology as an access path to be examined.
4. The method of claim 2, wherein the network troubleshooting condition is that a network health indicator of any network device in the network is monitored to be less than a first threshold, and the determining the access path to be troubleshooted in the network comprises:
Inquiring the network topology corresponding to the network at the current moment, and determining the connection path containing any network equipment as an access path to be examined.
5. The method of claim 2, wherein the network troubleshooting condition is that the current time is a preset time point, and the determining the access path to be troubleshooted in the network includes:
and determining each connection path in the network topology corresponding to the network at the current moment as an access path to be examined.
6. The method of any one of claims 3-5, further comprising:
acquiring a terminal list item reported by each network device in the network at each time point;
and respectively analyzing the terminal table items corresponding to each time point, and determining the network topology corresponding to the network at each time point.
7. The method of claim 1, wherein said determining a failed network device based on the log information reported by each of the first network devices comprises:
analyzing the history record of the transmitted and/or received detection messages in each piece of log information, and determining a fault index of each piece of first network equipment, wherein the fault index is delay or packet loss rate;
Determining any first network device as a third network device under the condition that the fault index of the any first network device is larger than or equal to a second threshold value;
and determining the first third network device in the access path as a fault network device.
8. The method as recited in claim 6, further comprising:
and determining that the plurality of first network devices are faulty network devices under the condition that the fault index of each first network device is smaller than a second threshold value and the sum of the fault indexes of the plurality of first network devices in the access path is larger than or equal to a third threshold value.
9. A network failure troubleshooting apparatus, comprising:
the first determining module is used for determining an access path to be checked in the network under the condition that the network fault checking condition is met, wherein the access path comprises a plurality of first network devices used for forwarding messages;
the sending module is used for sending a detection instruction to each first network device and indicating each first network device to send a detection message to the next-stage first network device;
the acquisition module is used for acquiring log information reported by each first network device, wherein the log information comprises a history record of the first network device sending and/or receiving detection messages;
And the second determining module is used for determining the fault network equipment according to the log information reported by each first network equipment.
10. A terminal device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the network troubleshooting method of any one of claims 1-8.
11. A computer readable storage medium, which when executed by a processor of a terminal device, causes the terminal device to perform the network troubleshooting method of any one of claims 1-8.
12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the network troubleshooting method of any one of claims 1-8.
CN202310450024.6A 2023-04-24 2023-04-24 Network fault troubleshooting method and device and terminal equipment Pending CN116471173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310450024.6A CN116471173A (en) 2023-04-24 2023-04-24 Network fault troubleshooting method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310450024.6A CN116471173A (en) 2023-04-24 2023-04-24 Network fault troubleshooting method and device and terminal equipment

Publications (1)

Publication Number Publication Date
CN116471173A true CN116471173A (en) 2023-07-21

Family

ID=87182181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310450024.6A Pending CN116471173A (en) 2023-04-24 2023-04-24 Network fault troubleshooting method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN116471173A (en)

Similar Documents

Publication Publication Date Title
US10103851B2 (en) Network link monitoring and testing
US7941690B2 (en) Reliable fault resolution in a cluster
US7287193B2 (en) Methods, systems, and media to correlate errors associated with a cluster
CN113328872B (en) Fault repairing method, device and storage medium
EP2795841B1 (en) Method and arrangement for fault analysis in a multi-layer network
CN110224883B (en) Gray fault diagnosis method applied to telecommunication bearer network
CN112311580B (en) Message transmission path determining method, device and system and computer storage medium
CN113259168B (en) Fault root cause analysis method and device
US11489715B2 (en) Method and system for assessing network resource failures using passive shared risk resource groups
CN110650041A (en) IPRAN network fault positioning method and device
CN112291116A (en) Link fault detection method and device and network equipment
JP2013157957A (en) Communication system and generator
CN114172794A (en) Network fault positioning method and server
CN111865667B (en) Network connectivity fault root cause positioning method and device
JP4464256B2 (en) Network host monitoring device
JP3416604B2 (en) Network monitoring equipment
CN116471173A (en) Network fault troubleshooting method and device and terminal equipment
CN111343031B (en) Method and device for determining network fault
CN114697196A (en) Network path switching method in data center, data center network system and equipment
US10931796B2 (en) Diffusing packets to identify faulty network apparatuses in multipath inter-data center networks
JP6378653B2 (en) Service impact cause estimation apparatus, service impact cause estimation program, and service impact cause estimation method
CN113542052A (en) Node fault determination method and device and server
CN114513398B (en) Network equipment alarm processing method, device, equipment and storage medium
CN114244682B (en) Equipment alarm loss and leakage repairing method and device
JP2017060012A (en) Fault detection device, fault detection method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination