CN110943864A - Network anomaly positioning method and device of distributed storage system - Google Patents

Network anomaly positioning method and device of distributed storage system Download PDF

Info

Publication number
CN110943864A
CN110943864A CN201911211569.1A CN201911211569A CN110943864A CN 110943864 A CN110943864 A CN 110943864A CN 201911211569 A CN201911211569 A CN 201911211569A CN 110943864 A CN110943864 A CN 110943864A
Authority
CN
China
Prior art keywords
detection
detection result
network
determining
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911211569.1A
Other languages
Chinese (zh)
Other versions
CN110943864B (en
Inventor
魏子昂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd, Beijing Kingsoft Cloud Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN201911211569.1A priority Critical patent/CN110943864B/en
Publication of CN110943864A publication Critical patent/CN110943864A/en
Application granted granted Critical
Publication of CN110943864B publication Critical patent/CN110943864B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0847Transmission error
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a network anomaly positioning method and device of a distributed storage system, and relates to the field of data processing. The method comprises the following steps: periodically carrying out first-level detection on a network of a storage server to be detected in the distributed storage system to obtain a first detection result; determining whether a first-level network anomaly exists in the network based on the first detection result; if yes, determining a target second-stage detection mode corresponding to the first detection result; detecting the network according to a target second-stage detection mode to obtain a second detection result; and determining the abnormal reason of the network corresponding to the second detection result. By carrying out hierarchical detection on the network of the storage server to be detected in the distributed storage system, the positioning of network abnormity can be more scientific, and the final fault can be positioned more quickly by the hierarchical detection, so that the fault positioning efficiency is improved, and the fault solving efficiency is further improved.

Description

Network anomaly positioning method and device of distributed storage system
Technical Field
The present application relates to the field of data processing, and in particular, to a method and an apparatus for locating a network anomaly in a distributed storage system.
Background
The traditional network storage system adopts a centralized storage server to store all data, but the storage space of the centralized storage server is limited, and the requirement of large-scale storage application cannot be met. The distributed network storage system adopts an expandable system structure, and dispersedly stores data on a plurality of independent storage servers, thereby providing a solution for large-scale data storage. However, in the distributed storage system, when a network of a storage server storing data fails, if the failure cannot be found in time and is effectively solved, the efficiency of the distributed storage system is greatly affected.
At present, the mainstream method is to manually perform investigation according to the experience of network and system operation and maintenance. The method is processed according to past experience, and no system scientific theory is available for comprehensively examining the system.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for locating a network anomaly in a distributed storage system, so as to scientifically and effectively locate the network anomaly and improve the efficiency of solving the network anomaly.
In a first aspect, an embodiment provides a method for locating a network anomaly of a distributed storage system, including: periodically carrying out first-level detection on a network of a storage server to be detected in the distributed storage system to obtain a first detection result; the first-stage detection comprises SSH detection, PING detection and packet loss rate detection; determining whether a first-level network anomaly exists in the network based on the first detection result; if the first detection result exists, determining a target second-stage detection mode corresponding to the first detection result based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode; detecting the network according to the target second-stage detection mode to obtain a second detection result; and determining the abnormal reason of the network corresponding to the second detection result based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason.
In an optional implementation, if the target second-level detection mode corresponding to the first detection result exists, the step of determining, based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode, the target second-level detection mode corresponding to the first detection result includes: and if the first detection result is that the PING detection is on and the SSH detection is not on, determining whether a target second-stage detection mode corresponding to the first detection result is the detection 22 port is open or not based on the predetermined corresponding relation between the first-stage network abnormity and the second-stage detection mode.
In an optional implementation, the step of determining, based on a predetermined correspondence between a detection result of the second-stage detection and an abnormality cause, the abnormality cause of the network corresponding to the second detection result includes: if the second detection result is that the 22 port is not opened, determining that the abnormal reason of the network is shielded by an opposite-end firewall based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason; and if the second detection result is that the 22 port is opened, determining that the abnormal reason of the network is SSH service failure based on the corresponding relation between the predetermined detection result of the second-level detection and the abnormal reason.
In an optional implementation, if the target second-level detection mode corresponding to the first detection result exists, the step of determining, based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode, the target second-level detection mode corresponding to the first detection result includes: and if the first detection result is that the PING detection is successful and the packet loss exceeds a threshold value, determining a target second-stage detection mode corresponding to the first detection result as to whether the CRC error exceeds the threshold value and/or whether the brightness of an indicator lamp at the side of the switch is normal or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In an optional implementation, based on a predetermined correspondence between a detection result of the second-stage detection and an abnormality cause, the step of determining the abnormality cause of the network corresponding to the second detection result: if the second detection result is that the CRC error exceeds the threshold value, determining that the abnormal reason of the network is an optical module or network cable fault based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason; if the second detection result is that the brightness of the indicator lamp at the side of the exchange is abnormal, determining that the abnormal reason of the network is an optical module or network cable fault based on the corresponding relation between the detection result of the predetermined second-level detection and the abnormal reason; and if the second detection result is that the CRC error does not exceed the threshold value and the brightness of the indicator lamp at the side of the exchanger is normal, determining that the reason of the abnormality of the network is data volume overload based on the predetermined corresponding relation between the detection result of the second-level detection and the reason of the abnormality.
In an optional implementation, if the target second-level detection mode corresponding to the first detection result exists, the step of determining, based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode, the target second-level detection mode corresponding to the first detection result includes: and if the first detection result is that the PING detection is not passed, determining that a target second-stage detection mode corresponding to the first detection result is to detect whether a route to the opposite end exists in a local route list or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In an optional implementation, based on a predetermined correspondence between a detection result of the second-stage detection and an abnormality cause, the step of determining the abnormality cause of the network corresponding to the second detection result: if the second detection result is that no route to the opposite terminal exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-stage detection and the abnormal reason; and if the second detection result is that no route to the opposite end exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormal reason.
In a second aspect, an embodiment provides a network anomaly locating device for a distributed storage system, including: the first detection module is used for periodically carrying out first-level detection on the network of the storage server to be detected in the distributed storage system to obtain a first detection result; the first-stage detection comprises SSH detection, PING detection and packet loss rate detection; a first determining module, configured to determine whether a first-level network anomaly exists in the network based on the first detection result; a second determining module, configured to determine, if the first detection result exists, a target second-stage detection manner corresponding to the first detection result based on a predetermined correspondence between the first-stage network anomaly and the second-stage detection manner; the second detection module is used for detecting the network according to the target second-level detection mode to obtain a second detection result; and the third determining module is used for determining the abnormal reason of the network corresponding to the second detection result based on the corresponding relation between the predetermined detection result of the second-level detection and the abnormal reason.
In an optional embodiment, the second determining module is specifically configured to: and if the first detection result is that the PING detection is on and the SSH detection is not on, determining whether a target second-level detection mode corresponding to the first detection result is that a detection 22 port is open or not based on the predetermined corresponding relation between the first-level network abnormality and the second-level detection mode.
In an optional embodiment, the third determining module is specifically configured to: if the second detection result is that the 22 port is not opened, determining that the abnormal reason of the network is shielded by an opposite-end firewall based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason; and if the second detection result is that the 22 port is opened, determining that the abnormal reason of the network is SSH service failure based on the corresponding relation between the predetermined detection result of the second-level detection and the abnormal reason.
In an optional embodiment, the second determining module is specifically configured to: and if the first detection result is that the PING detection is successful and the packet loss exceeds a threshold value, determining a target second-stage detection mode corresponding to the first detection result as to whether the CRC error exceeds the threshold value and/or whether the brightness of an indicator lamp at the side of the switch is normal based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In an optional embodiment, the third determining module is specifically configured to: if the second detection result is that the CRC error exceeds the threshold value, determining that the abnormal reason of the network is an optical module or network cable fault based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason; if the second detection result is that the brightness of the indicator lamp at the side of the exchange is abnormal, determining that the abnormal reason of the network is an optical module or network cable fault based on the corresponding relation between the detection result of the predetermined second-level detection and the abnormal reason; and if the second detection result is that the CRC error does not exceed the threshold value and the brightness of the indicator lamp at the side of the exchanger is normal, determining that the reason of the abnormality of the network is data volume overload based on the predetermined corresponding relation between the detection result of the second-level detection and the reason of the abnormality.
In an optional embodiment, the second determining module is specifically configured to: and if the first detection result is that the PING detection is not passed, determining a target second-stage detection mode corresponding to the first detection result as whether a route to the opposite end exists in a local route list or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In an optional embodiment, the third determining module is specifically configured to: if the second detection result is that no route to the opposite terminal exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-stage detection and the abnormal reason; and if the second detection result is that no route to the opposite end exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormal reason.
In a third aspect, an embodiment provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the method described in any one of the foregoing embodiments when executing the computer program.
In a fourth aspect, embodiments provide a computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any of the preceding embodiments.
The embodiment of the application brings the following beneficial effects:
according to the method and the device for positioning the network abnormity of the distributed storage system, the network of the storage server to be detected in the distributed storage system is detected in a grading mode, so that the positioning of the network abnormity can be more scientific, the final fault can be positioned more quickly through the grading detection, the fault positioning efficiency is improved, and the fault solving efficiency is further improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a network anomaly positioning method of a distributed storage system according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a network anomaly positioning apparatus of a distributed storage system according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning either a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application. As shown in fig. 1, the distributed storage system may include at least one management node 100 and at least one storage node 200, wherein the storage node may also be referred to as a storage server. Each storage node 200 may be directly connected to the management node 100, and may be connected to the management node 100 through an intermediate node. The intermediate node may be another management node or storage node, or may be a relay device (e.g., a switch). The management node 100 may monitor the network status of the distributed storage system so that a failure can be timely discovered and resolved. It should be noted that the structure of the distributed storage system shown in fig. 1 is only one example provided for clarity of description, and the structure of the distributed storage system may be other structures in practical application.
In some embodiments, users may purchase the business processing services provided by these proprietary cloud servers. For example, the user side determines a service to be processed, and the dedicated cloud server may obtain the service to be processed from the user side and process the service.
Fig. 2 is a schematic flowchart of a network anomaly positioning method of a distributed storage system according to an embodiment of the present application. As shown in fig. 2, the method includes:
s210, periodically carrying out first-level detection on the network of the storage server to be detected in the distributed storage system to obtain a first detection result. The first-stage detection comprises SSH detection, PING detection and packet loss rate detection, and the first detection result can comprise whether the SSH detection is on, whether the PING detection is on and whether packet loss exceeds a threshold value.
The method can be applied to a management node in a distributed storage system, and the network of the storage server to be detected can be a link between the management node and the storage server to be detected.
The first detection may be performed periodically, and the period may be determined according to actual needs, for example, the period may be days, hours, minutes, or the like.
S220, determining whether the network has a first-level network anomaly or not based on the first detection result.
After the first-level detection is periodically performed on the network of the storage server to be detected in the distributed storage system, a possible result may be any one of the following results:
PING detection is conducted, SSH detection is conducted, and packet loss does not exceed a threshold value;
PING detection fails;
PING detection is on, SSH detection is not on, and packet loss does not exceed a threshold value;
PING detection is conducted, SSH detection is conducted, and packet loss exceeds a threshold value;
PING probing is on, SSH probing is not on, and packet loss exceeds a threshold.
And only when the SSH detection fails, the PING detection fails or the packet loss exceeds a threshold value, the network of the storage server to be detected is considered to be in fault.
If the determination result of the step S220 is non-existent, it is ended. After the next cycle, step S210 and step S220 are executed in sequence. If the determination result is yes, the following step S230 is executed.
And S230, if the first-level network anomaly exists, determining a target second-level detection mode corresponding to the first detection result based on the predetermined corresponding relation between the first-level network anomaly and the second-level detection mode.
The predetermined correspondence between the first-level network anomaly and the second-level detection mode may include any one or more of the following:
the first-stage network abnormity is PING detection connection and SSH detection disconnection, and the second-stage detection mode is a corresponding relation of detecting whether a 22 port is opened or not;
the first-stage network abnormality is PING detection and packet loss exceeds a threshold, and the second-stage detection mode is a corresponding relation of determining whether CRC errors exceed the threshold and/or determining whether brightness of an indicator lamp at the side of the switch is normal;
the first-level network abnormity is PING detection failure, and the second-level detection mode is to detect whether a route to an opposite end exists in the local route list.
Based on this, the step S230 is realized by the following steps:
step 1.1), if the first detection result is PING detection is on and SSH detection is not on, determining a target second-stage detection mode corresponding to the first detection result as whether a detection 22 port is open or not based on a predetermined corresponding relation between the first-stage network anomaly and the second-stage detection mode.
Step 1.2), if the first detection result is PING detection is successful and packet loss exceeds a threshold value, determining a target second-stage detection mode corresponding to the first detection result as to determine whether CRC errors exceed the threshold value and/or whether brightness of an indicator lamp at the side of the switch is normal based on a predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
Step 1.3), if the first detection result is that the PING detection is not passed, determining a target second-stage detection mode corresponding to the first detection result as whether a route to an opposite end exists in the local route list or not based on the predetermined corresponding relation between the first-stage network anomaly and the second-stage detection mode.
S240, detecting the network according to a target second-stage detection mode to obtain a second detection result.
And S250, determining the abnormal reason of the network corresponding to the second detection result based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason.
By carrying out hierarchical detection on the network of the storage server to be detected in the distributed storage system, the positioning of network abnormity can be more scientific, and the final fault can be positioned more quickly by the hierarchical detection, so that the fault positioning efficiency is improved, and the fault solving efficiency is further improved.
In some embodiments, the predetermined correspondence between the detection result of the second-stage detection and the cause of the abnormality may include one or more of the following:
the detection result of the second-stage detection is that the 22 port is not opened and corresponds to the abnormal reason shielded by the firewall of the opposite end;
the detection result of the second-stage detection is that the 22 port is opened, and corresponds to the abnormal reason of SSH service failure;
the detection result of the second-stage detection is the corresponding relation between the CRC error exceeding the threshold value and the abnormal reason being the optical module or the network cable fault;
the detection result of the second-stage detection is the corresponding relation between abnormal brightness of the indicating lamp at the side of the exchange and the abnormal reason of the optical module or the network cable fault;
the detection result of the second-stage detection is that the CRC error does not exceed the threshold value and the brightness of an indicator lamp at the side of the exchange is normal, and the corresponding relation between the CRC error and the indication lamp at the side of the exchange and the abnormal reason is data volume overload;
the detection result of the second-level detection is that no route to the opposite terminal exists, and the corresponding relation between the abnormal reason and the local routing table configuration is incorrect;
the detection result of the second-level detection is that the corresponding relation between the route to the opposite end and the switch routing table with the abnormal reason of incorrect configuration or downtime fault exists.
Based on this, S250 described above may include various implementations.
As an example, if the target second-level detection manner determined based on the steps S210 to S240 is to detect whether the port 22 is open, the step S250 may further include the following steps:
step 2.1), if the second detection result is that the port 22 is not opened, determining that the abnormal reason of the network is shielded by the firewall of the opposite end based on the corresponding relation between the detection result of the predetermined second-level detection and the abnormal reason;
step 2.2), if the second detection result is that the 22-port is opened, determining that the abnormal reason of the network is the SSH service fault based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason.
As another example, if the target second-level detection manner determined based on the above steps S210 to S240 is to determine whether the CRC error exceeds a threshold and/or whether the brightness of the indicator light on the exchange side is normal, the above step S250 may further include the steps of:
step 3.1), if the second detection result is that the CRC error exceeds the threshold value, determining that the abnormal reason of the network is the fault of the optical module or the network cable based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
step 3.2), if the second detection result is that the brightness of the indicator lamp at the exchange side is abnormal, determining that the abnormal reason of the network is the fault of the optical module or the network cable based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
and 3.3) if the second detection result is that the CRC error does not exceed the threshold value and the brightness of the indicator lamp at the exchange side is normal, determining that the abnormal reason of the network is overload due to the data volume based on the predetermined corresponding relation between the detection result of the second-stage detection and the abnormal reason.
As another example, if the target second-level detection manner determined based on the above steps S210 to S240 is to detect whether there is a route to the opposite end in the local route list, the above step S250 may further include the following steps:
step 4.1), if the second detection result is that no route to the opposite end exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormal reason;
and 4.2) if the second detection result is that no route to the opposite end exists, determining that the abnormal reason of the network is incorrect in configuration of the local routing table based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormal reason.
Based on the abnormal reason determined by the steps, a conventional solution strategy corresponding to the abnormal reason can be provided for the user, and the information can be sent to the client through a mail or displayed on a display interface interacting with the client.
It should be noted that, for network-related failures, the three basic elements are PING detection, SSH (22-port detection), and the value size of data packet loss.
If the conclusion of the first level of detection is PING on, but SSH does not log on. The network anomalies causing this phenomenon are mainly two, one is the problem of SSH service of the storage server to be detected, and the other is the restriction of the firewall filtering rules on the 22-port SSH protocol. At this point, the two types of faults can be distinguished by detecting 22 whether the port is open (belonging to the second level of detection). If the 22 port is not opened, the network exception can be determined as being shielded by the firewall of the opposite end; and if the 22 port is opened, determining that the network exception is SSH service failure.
There are a number of possibilities if PING is not enabled. The first is that the firewall shields the Internet Control Message Protocol (ICMP). The second is that the routing table of the switch of the local or intermediate node fails. The third is that it may be just a storage server to be checked. The two failures can be distinguished by whether there is a route to the peer in the local route list (subject to the second level of detection). If the local route reaches the storage server to be detected at the opposite end, it indicates that the storage server to be detected at the opposite end may be down, the switch of the intermediate node does not have a relevant route to reach the opposite end, or a firewall policy filters the ICMP protocol, and the final fault can be determined through further analysis or problem-by-problem troubleshooting.
For TCP packet loss rate, if the value is high, it indicates that many packets are dropped, one is because the number of transmitted packets is too large and a portion is temporarily dropped by the distributed storage system. One is because problems with optical modules or network wires result in CRC errors for packets. Whether CRC errors are large or not and whether the brightness of the indicator lamp of the storage server or the exchange side to be detected is normal (belonging to the second-level detection) can be further analyzed to distinguish the two faults.
According to the element judgment method, hierarchical recursion is checked, and the related network abnormity problem is finally solved.
Fig. 3 is a schematic structural diagram of a network anomaly positioning device of a distributed storage system according to an embodiment of the present application. As shown in fig. 3, the apparatus may include:
the first detection module 301 is configured to periodically perform first-level detection on a network of a storage server to be detected in the distributed storage system to obtain a first detection result; the first-stage detection comprises SSH detection, PING detection and packet loss rate detection;
a first determining module 302, configured to determine whether a first-level network anomaly exists in the network based on the first detection result;
a second determining module 303, configured to determine, if the target second-level detection mode exists, a target second-level detection mode corresponding to the first detection result based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode;
the second detection module 304 is configured to detect the network according to a target second-stage detection manner, so as to obtain a second detection result;
a third determining module 305, configured to determine, based on a predetermined correspondence between the detection result of the second-level detection and the abnormality cause, the abnormality cause of the network corresponding to the second detection result.
In some embodiments, the second determining module 303 is specifically configured to: and if the first detection result is that the PING detection is on and the SSH detection is not on, determining whether a target second-stage detection mode corresponding to the first detection result is that the port of the detection 22 is open or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In some embodiments, the third determining module 305 is specifically configured to:
if the second detection result is that the 22 port is not opened, determining that the abnormal reason of the network is shielded by an opposite-end firewall based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
and if the second detection result is that the 22 port is opened, determining that the abnormal reason of the network is the SSH service fault based on the corresponding relation between the predetermined detection result of the second-level detection and the abnormal reason.
In some embodiments, the second determining module 303 is specifically configured to: and if the first detection result is that the PING detection is successful and the packet loss exceeds the threshold, determining a target second-stage detection mode corresponding to the first detection result as to determine whether the CRC error exceeds the threshold and/or whether the brightness of an indicator lamp at the side of the switch is normal based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In some embodiments, the third determining module 305 is specifically configured to:
if the second detection result is that the CRC error exceeds the threshold value, determining that the abnormal reason of the network is the fault of the optical module or the network cable based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
if the second detection result is that the brightness of the indicator lamp at the side of the exchange is abnormal, determining that the abnormal reason of the network is the fault of the optical module or the network cable based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
and if the second detection result is that the CRC error does not exceed the threshold value and the brightness of the indicator lamp at the side of the exchanger is normal, determining that the abnormality reason of the network is overloaded due to the data quantity based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormality reason.
In some embodiments, the second determining module 303 is specifically configured to: and if the first detection result is that the PING detection is not passed, determining a target second-stage detection mode corresponding to the first detection result as whether a route to the opposite end exists in the local route list or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
In some embodiments, the third determining module 305 is specifically configured to:
if the second detection result is that no route to the opposite terminal exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-stage detection and the abnormal reason;
and if the second detection result is that no route to the opposite end exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormal reason.
The network anomaly positioning device of the distributed storage system provided by the embodiment of the application has the same technical characteristics as the network anomaly positioning method of the distributed storage system provided by the embodiment, so that the same technical problems can be solved, and the same technical effect can be achieved.
As shown in fig. 4, an electronic device 400 provided in an embodiment of the present application includes: a processor 401, a memory 402 and a bus, wherein the memory 402 stores machine-readable instructions executable by the processor 401, when the electronic device is operated, the processor 401 and the memory 402 communicate with each other through the bus, and the processor 401 executes the machine-readable instructions to execute the steps of the network anomaly locating method of the distributed storage system.
In practical applications, the memory 402 and the processor 401 can be general memories and processors, which are not limited in particular, and when the processor 401 runs the computer program stored in the memory 402, the network anomaly locating method of the distributed storage system can be executed.
Specifically, the electronic device may further include a communication interface, and the processor, the communication interface, and the memory are connected by a bus; the processor is used to execute executable modules, such as computer programs, stored in the memory.
The Memory may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network and the like can be used.
The bus may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory is used for storing a program, and the processor executes the program after receiving an execution instruction, and the method performed by the apparatus defined by the process disclosed in any of the foregoing embodiments of the present application may be applied to or implemented by the processor.
The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
Corresponding to the network anomaly positioning method of the distributed storage system, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the network anomaly positioning method of the distributed storage system.
The network anomaly locating device of the distributed storage system provided by the embodiment of the application can be specific hardware on equipment, or software or firmware installed on the equipment, and the like. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the mobile control method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the scope of the embodiments of the present application. Are intended to be covered by the scope of the present application.

Claims (10)

1. A network anomaly positioning method of a distributed storage system is characterized by comprising the following steps:
periodically carrying out first-level detection on a network of a storage server to be detected in the distributed storage system to obtain a first detection result; the first-stage detection comprises SSH detection, PING detection and packet loss rate detection;
determining whether a first-level network anomaly exists in the network based on the first detection result;
if the first detection result exists, determining a target second-stage detection mode corresponding to the first detection result based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode;
detecting the network according to the target second-stage detection mode to obtain a second detection result;
and determining the abnormal reason of the network corresponding to the second detection result based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason.
2. The method according to claim 1, wherein the step of determining a target second-level detection mode corresponding to the first detection result based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode, if any, comprises:
and if the first detection result is that the PING detection is on and the SSH detection is not on, determining a target second-stage detection mode corresponding to the first detection result as whether a detection 22 port is open or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
3. The method according to claim 2, wherein the step of determining the abnormality cause of the network corresponding to the second detection result based on the predetermined correspondence relationship between the detection result of the second-stage detection and the abnormality cause comprises:
if the second detection result is that the 22 port is not opened, determining that the abnormal reason of the network is shielded by an opposite-end firewall based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
and if the second detection result is that the 22 port is opened, determining that the abnormal reason of the network is SSH service failure based on the corresponding relation between the predetermined detection result of the second-level detection and the abnormal reason.
4. The method according to claim 1, wherein the step of determining a target second-level detection mode corresponding to the first detection result based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode, if any, comprises:
and if the first detection result is that the PING detection is successful and the packet loss exceeds a threshold value, determining a target second-stage detection mode corresponding to the first detection result as to whether the CRC error exceeds the threshold value and/or whether the brightness of an indicator lamp at the side of the switch is normal based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
5. The method according to claim 4, wherein the step of determining the abnormality cause of the network corresponding to the second detection result is based on the predetermined correspondence relationship between the detection result of the second-stage detection and the abnormality cause:
if the second detection result is that the CRC error exceeds the threshold value, determining that the abnormal reason of the network is an optical module or network cable fault based on the corresponding relation between the predetermined detection result of the second-stage detection and the abnormal reason;
if the second detection result is that the brightness of the indicator lamp at the side of the exchange is abnormal, determining that the abnormal reason of the network is an optical module or network cable fault based on the corresponding relation between the detection result of the predetermined second-level detection and the abnormal reason;
and if the second detection result is that the CRC error does not exceed the threshold value and the brightness of the indicator lamp at the side of the exchanger is normal, determining that the reason of the abnormality of the network is data volume overload based on the predetermined corresponding relation between the detection result of the second-level detection and the reason of the abnormality.
6. The method according to claim 1, wherein the step of determining a target second-level detection mode corresponding to the first detection result based on a predetermined correspondence between the first-level network anomaly and the second-level detection mode, if any, comprises:
and if the first detection result is that the PING detection is not passed, determining that a target second-stage detection mode corresponding to the first detection result is to detect whether a route to the opposite end exists in a local route list or not based on the predetermined corresponding relation between the first-stage network abnormality and the second-stage detection mode.
7. The method according to claim 6, wherein the step of determining the abnormality cause of the network corresponding to the second detection result is based on the predetermined correspondence relationship between the detection result of the second-stage detection and the abnormality cause:
if the second detection result is that no route to the opposite terminal exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-stage detection and the abnormal reason;
and if the second detection result is that no route to the opposite end exists, determining that the abnormal reason of the network is incorrect in local routing table configuration based on the predetermined corresponding relation between the detection result of the second-level detection and the abnormal reason.
8. A network anomaly locating device for a distributed storage system, comprising:
the first detection module is used for periodically carrying out first-level detection on the network of the storage server to be detected in the distributed storage system to obtain a first detection result; the first-stage detection comprises SSH detection, PING detection and packet loss rate detection;
a first determining module, configured to determine whether a first-level network anomaly exists in the network based on the first detection result;
a second determining module, configured to determine, if the first detection result exists, a target second-stage detection manner corresponding to the first detection result based on a predetermined correspondence between the first-stage network anomaly and the second-stage detection manner;
the second detection module is used for detecting the network according to the target second-level detection mode to obtain a second detection result;
and the third determining module is used for determining the abnormal reason of the network corresponding to the second detection result based on the corresponding relation between the predetermined detection result of the second-level detection and the abnormal reason.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any of claims 1 to 7.
CN201911211569.1A 2019-11-29 2019-11-29 Network anomaly positioning method and device of distributed storage system Active CN110943864B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911211569.1A CN110943864B (en) 2019-11-29 2019-11-29 Network anomaly positioning method and device of distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911211569.1A CN110943864B (en) 2019-11-29 2019-11-29 Network anomaly positioning method and device of distributed storage system

Publications (2)

Publication Number Publication Date
CN110943864A true CN110943864A (en) 2020-03-31
CN110943864B CN110943864B (en) 2023-04-07

Family

ID=69908443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911211569.1A Active CN110943864B (en) 2019-11-29 2019-11-29 Network anomaly positioning method and device of distributed storage system

Country Status (1)

Country Link
CN (1) CN110943864B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113114413A (en) * 2021-03-03 2021-07-13 杭州迪普科技股份有限公司 Indicator lamp control method and device
CN114172825A (en) * 2022-01-17 2022-03-11 福建超智集团有限公司 Network equipment abnormality detection method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018145560A1 (en) * 2017-02-13 2018-08-16 中兴通讯股份有限公司 Method and device for link failure diagnosis
CN108696400A (en) * 2017-04-12 2018-10-23 北京京东尚科信息技术有限公司 network monitoring method and device
CN109039825A (en) * 2018-08-30 2018-12-18 湖北微源卓越科技有限公司 A kind of network data protection device and method
US20190123955A1 (en) * 2016-12-29 2019-04-25 Pismo Labs Technology Limited Methods and systems for restarting one or more components of a network device based on conditions
CN109714209A (en) * 2018-12-29 2019-05-03 中国科学院计算技术研究所 A kind of diagnostic method and system of website visiting failure
CN110311812A (en) * 2019-06-24 2019-10-08 深圳市腾讯计算机系统有限公司 A kind of network analysis method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190123955A1 (en) * 2016-12-29 2019-04-25 Pismo Labs Technology Limited Methods and systems for restarting one or more components of a network device based on conditions
WO2018145560A1 (en) * 2017-02-13 2018-08-16 中兴通讯股份有限公司 Method and device for link failure diagnosis
CN108696400A (en) * 2017-04-12 2018-10-23 北京京东尚科信息技术有限公司 network monitoring method and device
CN109039825A (en) * 2018-08-30 2018-12-18 湖北微源卓越科技有限公司 A kind of network data protection device and method
CN109714209A (en) * 2018-12-29 2019-05-03 中国科学院计算技术研究所 A kind of diagnostic method and system of website visiting failure
CN110311812A (en) * 2019-06-24 2019-10-08 深圳市腾讯计算机系统有限公司 A kind of network analysis method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113114413A (en) * 2021-03-03 2021-07-13 杭州迪普科技股份有限公司 Indicator lamp control method and device
CN114172825A (en) * 2022-01-17 2022-03-11 福建超智集团有限公司 Network equipment abnormality detection method and system

Also Published As

Publication number Publication date
CN110943864B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109495322B (en) Network fault positioning method, related equipment and computer storage medium
CN111008380A (en) Method and device for detecting industrial control system bugs and electronic equipment
CN114172794B (en) Network fault positioning method and server
CN110943864B (en) Network anomaly positioning method and device of distributed storage system
CN113938395B (en) Data analysis method, system, equipment and storage medium
CN114363151A (en) Fault detection method and device, electronic equipment and storage medium
CN114598506B (en) Industrial control network security risk tracing method and device, electronic equipment and storage medium
CN109818808B (en) Fault diagnosis method and device and electronic equipment
CN110071843B (en) Fault positioning method and device based on flow path analysis
CN108650134B (en) Network fault positioning method and device and electronic equipment
CN111526109A (en) Method and device for automatically detecting running state of web threat recognition defense system
CN110896368A (en) Network quality monitoring method and device
CN111654405A (en) Method, device, equipment and storage medium for fault node of communication link
CN108900488B (en) Decentralization abnormal terminal discovery method and device in scene of Internet of things
JP2017199250A (en) Computer system, analysis method of data, and computer
CN113810332B (en) Encrypted data message judging method and device and computer equipment
CN114172796A (en) Fault positioning method and related device for communication network
CN113434369A (en) Health detection method and system for network equipment alarm
CN113656302A (en) WAF rule automatic testing method, system, storage medium and terminal equipment
US20200382439A1 (en) Communication system and communication method
CN117255005B (en) CDN-based service alarm processing method, device, equipment and medium
US11329868B2 (en) Automated network monitoring and control
CN113472567B (en) Network SLA calculation method and device
CN116506327B (en) Physical node monitoring method, device, computer equipment and storage medium
CN115883434A (en) Routing inspection method and device for router, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant