CN114844760B - Network fault sensing and positioning method, device, terminal and storage medium - Google Patents

Network fault sensing and positioning method, device, terminal and storage medium Download PDF

Info

Publication number
CN114844760B
CN114844760B CN202210484276.6A CN202210484276A CN114844760B CN 114844760 B CN114844760 B CN 114844760B CN 202210484276 A CN202210484276 A CN 202210484276A CN 114844760 B CN114844760 B CN 114844760B
Authority
CN
China
Prior art keywords
transmission path
information
flow
data
inferred
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210484276.6A
Other languages
Chinese (zh)
Other versions
CN114844760A (en
Inventor
李清
左旭东
赵丹
蒋长林
江勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Peng Cheng Laboratory
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University, Peng Cheng Laboratory filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202210484276.6A priority Critical patent/CN114844760B/en
Publication of CN114844760A publication Critical patent/CN114844760A/en
Application granted granted Critical
Publication of CN114844760B publication Critical patent/CN114844760B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network fault sensing and positioning method, a device, a terminal and a storage medium, wherein the method comprises the following steps: extracting the characteristics of the data flow of the path switch to obtain the characteristics of the flow level, and sending the characteristics of the flow level to a classifier according to a preset period; generating inferred information of the data stream by a classifier; determining abnormal stream transmission path information according to the inferred information, and giving out an inferred result of the fault position by combining the transmission path information of the abnormal stream; the technical problem that terminal side information is difficult to acquire is solved by extracting the characteristics of the data flow of the path switch and generating the inference of the fault position by combining the data transmission path information; the problem of weaker network side perceptibility is solved by adopting a flow state classifier of a decision tree model and deploying the classifier on a switch; by storing the local inferred information in normal data packets, the inferred aggregation and decision in the network are completed, and the burden of data communication on the network is reduced.

Description

Network fault sensing and positioning method, device, terminal and storage medium
Technical Field
The present invention relates to the field of network fault early warning, and in particular, to a network fault sensing and positioning method and apparatus, a terminal, and a storage medium.
Background
The failure is one of the biggest enemies of computer network management teams. Today's web applications may respond to access requests from tens of thousands of end hosts within tens of milliseconds or handle the computation and distribution of tens of GB of data. This means that if a computer network fails and the network management team cannot complete the troubleshooting in a short time, the user will suffer a significant loss and its use experience will be degraded straight away.
When a fault is found in a computer network, if the location of the fault can be located, the process of subsequently resolving the fault is often relatively simple. When a fault is perceived, the location where the fault occurred is given. The accuracy and speed of fault sensing and localization directly determine the efficiency of the overall fault removal process.
The existing fault sensing and positioning mechanisms still have some defects, especially when applied to general networks. The existing fault sensing and positioning mechanisms are mostly applied to data center networks and operate according to good regularity and symmetry of the networks, and the effect is often greatly reduced on the general network topology with poor regularity. In the deployment location, many existing failure sensing and localization mechanisms require deployment of monitoring modules on end hosts in order to utilize information such as retransmission packets, which can be difficult in practical general networks.
Accordingly, there is a need in the art for improvement.
Disclosure of Invention
The invention aims to solve the technical problems of poor effect and difficult deployment of a fault sensing and positioning mechanism in a general network by providing a network fault sensing and positioning method and device, a terminal and a storage medium aiming at the defects of the prior art.
The technical scheme adopted for solving the technical problems is as follows:
a network failure awareness and localization method, the network failure awareness and localization method comprising:
extracting features of the data flow of the path switch to obtain flow-level features, and sending the flow-level features to a classifier according to a preset period;
generating inferred information of the data stream by the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics;
and determining transmission path information of the abnormal flow according to the inferred information, and giving an inferred result of the fault position by combining the transmission path information of the abnormal flow.
As a further improved technical solution, the extracting features of the data flow of the path switch to obtain features of the flow level, and sending the features of the flow level to the classifier according to a preset period includes:
extracting the characteristics through a message header parser for rewriting the switch, and maintaining the characteristics of the flow levels of all data flows of the switch according to a time window;
the classifier is a stream state classifier, and the characteristics of the stream level are stored through a register and are handed to the stream state classifier periodically.
As a further improvement, generating, by the classifier, inferred information of the data stream includes:
inputting corresponding classification rules and network path information into the flow state classifier in advance;
the flow state classifier generates inferred information for unidirectional data flows routed to the switch node according to the classification rules, wherein the classification rules include classifying data flows according to the principle that normal data flows have stability.
As a further improvement, determining abnormal stream transmission path information according to the inferred information, and giving an inferred result of a fault location in combination with the transmission path information of the abnormal stream includes:
determining abnormal streaming path information from the network path information according to inferred information;
and the switch gives out inferred information of a fault position based on the relative position relation between the transmission path and the node where the transmission path is located in the abnormal stream transmission path information.
As a further improvement technical scheme, the abnormal flow transmission path information is determined according to the inferred information, after the inferred result of the fault position is given by combining the transmission path information of the abnormal flow, local inferred information is generated, and the local inferred information is transmitted to nodes of other switches by means of the existing data packets in the network.
As a further improvement, the transmitting the local inferred information to the other switch nodes by means of the existing data packets in the network includes:
storing the local inference information in a normal data packet;
the local inference information is transmitted to other switch nodes by means of the flow of the data packets.
As a further improvement technical scheme, after transmitting the local inferred information to other switch nodes by means of the existing data packet in the network, other switches analyze the received local inferred information contained in the data packet, and aggregate the local inferred information with the local inferred information to obtain fault inferred information.
As a further improvement technical scheme, after obtaining the fault inference information, the switch checks the intensity of the fault inference information, and if the intensity reaches a preset threshold value, a subsequent processing flow is triggered; if the strength does not reach the preset threshold, the fault inference information is stored back into the data packet, and the data packet is transmitted according to the original transmission path.
As a further improvement technical scheme, the triggering subsequent processing flow comprises broadcasting fault warning information.
As a further improved technical solution, the classifier is a flow state classifier adopting a decision tree model, and training of the decision tree model includes:
offline training, deploying the decision tree model on the switch, training the decision tree by using the tagged data;
and carrying out online training, wherein the classifier receives the extracted flow-level characteristics, classifies each unidirectional data flow passing through the switch, generates a classification result, and generates local inference information according to the classification result.
A network failure sensing and localization apparatus, the network failure sensing and localization apparatus comprising:
the feature extraction module is used for extracting flow-level features from the data flow of the path switch and delivering the flow-level features to the classifier at regular intervals;
the data flow state deducing module is used for generating deduction of whether the data flow state is normal or not by the classifier;
and the deducing module of the fault position is used for giving the deducing of the fault position by combining the transmission path information of the abnormal flow.
A terminal, comprising: the system comprises a memory and a processor, wherein the memory stores a network fault sensing and positioning program, and the network fault sensing and positioning program realizes the following steps when being executed by the processor:
extracting features of the data flow of the path switch to obtain flow-level features, and sending the flow-level features to a classifier according to a preset period;
generating inferred information of the data stream by the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics;
and determining transmission path information of the abnormal flow according to the inferred information, and giving an inferred result of the fault position by combining the transmission path information of the abnormal flow.
A computer storage medium, the storage medium being a computer readable storage medium, the computer storage medium storing a network failure sensing and localization program, the network failure sensing and localization program implementing the following steps when executed by a processor:
extracting features of the data flow of the path switch to obtain flow-level features, and sending the flow-level features to a classifier according to a preset period;
generating inferred information of the data stream by the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics;
and determining transmission path information of the abnormal flow according to the inferred information, and giving an inferred result of the fault position by combining the transmission path information of the abnormal flow.
The technical scheme adopted by the invention has the following effects:
the invention obtains the characteristics of the flow level by extracting the characteristics of the data flow of the path switch, and generates the inference of the fault position by combining the data transmission path information, thereby solving the technical problem that the end side information is difficult to obtain; by adopting the flow state classifier of the decision tree model and disposing the classifier on the switch, the problem of weaker network side perceptibility is solved, and quick and accurate fault perception and positioning are realized through limited information; the local inferred information is stored in the normal data packet, and is transmitted to other switch nodes by means of the flow of the data packet, so that the inferred aggregation and decision in the network are completed, the burden of data communication on the network is reduced, and the additional expense caused by introducing additional data collection and analysis nodes is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a network failure sensing and localization method of the present invention.
Fig. 2 is a block diagram of a network failure sensing and localization method according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The inventor finds that if the quick and accurate fault sensing and positioning are to be realized on a general network, the following three technical problems need to be overcome:
1. the terminal side information is difficult to acquire: if the monitoring module can be deployed on the end hosts in the network, fault sensing and positioning can be performed by acquiring state information of data transmission on the end side, but the end side can only sense whether the whole data transmission process is abnormal or not, and cannot sense which part of the data transmission path the fault specifically occurs on, and in the general network, a network administrator has the condition that no authority is allowed to deploy the monitoring module on all the end hosts, so that the information acquired on the end side is often incomplete, and even no information can be acquired at all.
2. Network side perception capability is weaker: the monitoring module deployed on the end-host can conveniently obtain status information of the data stream, such as retransmission packets in data transmission, which is a strong signal for fault perception. However, if the monitoring module is deployed on the network side, it is difficult to obtain the information, and how to use the limited information obtained by the monitoring module on the switch to accomplish quick and accurate fault sensing and positioning is a major difficulty in the network side sensing.
3. Additional overhead to the network: after the monitoring modules are deployed on the switch, the modules need to integrate the information obtained by each other to improve the accuracy of fault sensing and positioning. The current common method is to introduce an extra node into the network, and the monitoring modules on each switch communicate with the extra node periodically, so that the centralized data collection and analysis are completed, which brings extra communication burden to the network, increases the congestion degree of the network, and brings extra cost for the introduction and maintenance of the centralized data collection and analysis node.
In order to solve the problems, in the embodiment of the application, the characteristics of the flow level are obtained by extracting the characteristics of the data flow of the path switch, and the inference on the fault position is generated by combining the data transmission path information, so that the technical problem that the end-side information is difficult to acquire is solved; by adopting the flow state classifier of the decision tree model and disposing the classifier on the switch, the problem of weaker network side perceptibility is solved, and quick and accurate fault perception and positioning are realized through limited information; the local inferred information is stored in the normal data packet, and is transmitted to other switch nodes by means of the flow of the data packet, so that the inferred aggregation and decision in the network are completed, the burden of data communication on the network is reduced, and the additional expense caused by introducing additional data collection and analysis nodes is avoided.
Various non-limiting embodiments of the present application are described in detail below with reference to the attached drawing figures.
As shown in fig. 1, a network fault sensing and positioning method includes:
s1, extracting features of data streams of a path switch to obtain stream-level features, and sending the stream-level features to a classifier according to a preset period.
Depending on the functionality, the switch may be notionally divided into a control plane and a data plane. The control plane is responsible for tasks such as generation and issuing of a flow table and runs on a CPU of the switch; the data plane is responsible for processing the data packet according to the flow table issued by the control plane and operates on a special hardware forwarding structure.
Specifically, the extracting the features of the data flow of the path switch to obtain the features of the flow level, and sending the features of the flow level to the classifier according to a preset period includes:
s101, extracting the characteristics through a message header analyzer for rewriting the switch, and maintaining the characteristics of the flow levels of all data flows of the switch according to a time window;
s102, the classifier is a stream state classifier, and the characteristics of the stream level are stored through a register and are handed to the stream state classifier periodically.
In actual implementation, the invention rewrites the message header parser in the switch data plane to realize the feature extraction function and invokes the register to complete the feature storage. The invention maintains a timer at the control plane of the exchanger, when the timer expires, the data plane will take out all temporary storage characteristics and send them to downstream flow state classifier for processing.
S2, generating inferred information of the data stream through the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics.
Generating, by the classifier, inferred information for the data stream includes:
s201, inputting corresponding classification rules and network path information into the flow state classifier in advance;
the flow state classifier generates inferred information for unidirectional data flows passing through the switch nodes according to the classification rules, wherein the classification rules include classification of data flows according to the principle that normal data flows have stability.
In particular, a normally-in-state data stream should have certain stable properties, such as self-similarity, and the occurrence of network failures breaks this stability. Thus, the present invention collects and examines certain characteristics of the data stream in the data plane and trains a classifier to determine the state of the data stream accordingly.
In practical application, let a switch node with a monitoring module deployed be v m IP for two IP addresses src With IP dst ,s 1 :IP src →IP dst And s 2 :IP dst →IP src Data streams determined for two opposite directions. For s 1 The transmission path is p link =(l 1 ,l 2 ,...,l m ,...,l n ) Upstream path p up =(l 1 ,l 2 ,...,l m ) Downstream path p down =(l m+1 ,l m+2 ,...,l n ). Regardless of the data stream s 1 Sum s 2 Transport layer for useWhether the protocol is TCP or UDP, upstream Path p up In case of failure, at v m It appears that s 1 Will be compared with s 2 Anomalies occur earlier. Thus, s is short after the occurrence of the fault 1 Will point to the upstream path p up S, s 2 Is directed by the downstream path p down Is a fault in (a).
S3, determining abnormal stream transmission path information according to the inferred information, and giving out an inferred result of the fault position by combining the transmission path information of the abnormal stream.
Determining abnormal stream transmission path information according to the inferred information, and giving an inferred result of a fault location in combination with the transmission path information of the abnormal stream includes:
s301, determining abnormal flow transmission path information from the network path information according to inferred information; an inferred piece of information of the location of the fault is given.
Specifically, an inference about the location of the fault is given based on the relative positional relationship of the transmission path of the data stream and the node at which it is located. The principle is that if one link fails, it affects the performance of its data flow for all paths from the network communication process. Thus, when a switch node in which the monitoring module is deployed senses that there are a plurality of data streams that are abnormal, naturally, there is a high probability that the intersection portion of their transmission paths will fail.
In actual implementation, all the nodes v determined by the IP source address and destination address and routed to the switch m Is one-way data flow set of (1)For a unidirectional data stream->Let its transmission path be p node =(v 0, v 1 ,...,v m ,...,v n ) The corresponding link sequence is p link =(l 1 ,l 2 ,...,l m ,...,l n ) Wherein l i Is formed by node v i-1 And v i And (5) determining a link. According to the position relation between the transmission path and the switch node, p can be calculated kink Divided into upstream paths p up =(l 1 ,l 2 ,...,l m ) And downstream path p down =(l m+1 ,l m+2 ,...,l n )。
If the flow state classifier considers s i Is an abnormal flow, the inference generation module considers the abnormality as being an upstream transmission path p up =(l 1 ,l 2 ,...,l m ) Then weighting the corresponding link by 1; downstream transmission path p down =(l m+1 ,l m+2 ,...,l n ) Whether a failure occurs is unclear and the corresponding link is then weighted by 0. The following inferences are ultimately generated for the stream:
predict i ={(l 1 ,1),(l 2 ,1),...,(l m ,1),(l m+1 ,0),...,(l n ,0)}
accordingly, if the flow state classifier considers s i Is a normal stream, the inference generation module considers the upstream transmission path p up =(l 1 ,l 2 ,...,l m ) No failure occurs, and then the corresponding link is weighted to be-1; downstream transmission path p down =(l m+1 ,l m+2 ,...,l n ) Whether a failure occurs is unclear and the corresponding link is then weighted by 0. The following inferences are ultimately generated for the stream:
predict i ={(l 1 ,-1),(l 2 ,-1),...,(l m ,-1),(l m+1 ,0),...,(l n ,0)}
finally, the module aggregates the inferences generated for all data streams together to obtain the inference of the fault location locally at the time window, i.eWherein operator->Representative inferenceAnd (3) an aggregation operation, wherein weights of the same links are simply added, and the aggregation operation is expressed as follows in a symbolic language:
it is easy to verify that the aggregation operation satisfies both the exchange law and the binding law, and therefore deducing the order of aggregation does not alter the final result obtained.
Thus, for a certain link l i Obviously, the more times it appears in the upstream transmission path of the abnormal data stream, the higher its weight; the more times it occurs in the upstream transmission path of the normal data stream, the lower its weight. Thus, the weight of a link in an inference can serve as a measure of its likelihood of failure.
After step S3 is completed, local inferred information is generated, and in order to increase the reliability of the final inference, the local inferred information of different switches may be aggregated together, and in order to reduce the overhead, after the local inferred information is generated, the switch does not actively transmit the inference, but transmits the local inferred information to other switch nodes by means of existing data packets in the network.
Said transmitting the local inference information to other switch nodes by means of existing data packets in the network comprises:
storing the local inference information in a normal data packet;
the local inference information is transmitted to other switch nodes by means of the flow of data packets.
After transmitting the local inferred information to other switch nodes by means of the existing data packet in the network, other switches analyze the received local inferred information contained in the data packet and aggregate the local inferred information with the local inferred information to obtain fault inferred information.
In actual implementation, after receiving a new data packet, the switch data plane parses out the fault inference information contained in its header and aggregates with its own local inference information to obtain a new inference. It should be noted that to prevent the inferences from being excessively aggregated, the switch does not update the local inference information to an aggregated inference.
After obtaining the fault inference information, checking the strength of the fault inference information by the switch, and triggering a subsequent processing flow if the strength reaches a preset threshold value, wherein the triggering of the subsequent processing flow comprises broadcasting fault warning information; if the strength does not reach the preset threshold, the fault inference information is stored in the header of the data packet, the data packet is transmitted according to the original transmission path, and the administrator can set a specific follow-up processing flow in advance.
The classifier is a flow state classifier adopting a decision tree model, and training of the decision tree model comprises the following steps:
offline training, deploying a decision tree model on the switch, training the decision tree by using the marked data;
on-line training, the classifier receives the extracted flow level characteristics, classifies each unidirectional data flow of the path switch, generates a classification result, and generates local inference information according to the classification result.
Specifically, the classifier is a flow state classifier adopting a decision tree model, and compared with a neural network and other deep layer models, the decision tree model is light enough and has high feasibility of being deployed on a switch; compared with models such as logistic regression, the decision tree model can reach higher precision when having a certain depth; the decision tree model is a collection of classification rules that can be easily translated into flow table rules for the switch data plane.
An administrator can set offline training, and in an offline training stage, a decision tree model is deployed on a switch, and marked data, which can be normal or abnormal unidirectional flows, is used for training a decision tree;
in the online training stage, the classifier receives the extracted features of the flow level, classifies each unidirectional data flow of the path switch, generates a classification result, and generates local inference information according to the classification result.
In order to have good versatility and scalability while distinguishing normal and abnormal flows, the following features of unidirectional flows may be taken as inputs:
topological properties of data streams: RTT of data flow and transmission path length;
steady state performance of data flow: average the number of data packets transmitted by each window, the number of data bytes, the maximum data packet size, the burst sequence length and the like;
recent performance of data streams: the number of data packets transmitted in the last several windows, the number of data bytes, the maximum data packet size, the burst sequence length, etc.
A network failure sensing and localization apparatus, the network failure sensing and localization apparatus comprising:
the feature extraction module is used for extracting flow-level features from the data flow of the path switch and delivering the flow-level features to the classifier at regular intervals;
the data flow state deducing module is used for generating deduction of whether the data flow state is normal or not by the classifier;
and the deducing module of the fault position is used for giving the deducing of the fault position by combining the transmission path information of the abnormal flow.
A terminal, comprising: the system comprises a memory and a processor, wherein the memory stores a network fault sensing and positioning program, and the network fault sensing and positioning program realizes the following steps when being executed by the processor:
extracting features of the data flow of the path switch to obtain flow-level features, and sending the flow-level features to a classifier according to a preset period;
generating inferred information of the data stream by the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics;
and determining transmission path information of the abnormal flow according to the inferred information, and giving an inferred result of the fault position by combining the transmission path information of the abnormal flow.
A computer storage medium storing a network failure awareness and localization program that when executed by a processor performs the steps of:
extracting features of the data flow of the path switch to obtain flow-level features, and sending the flow-level features to a classifier according to a preset period;
generating inferred information of the data stream by the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics;
and determining transmission path information of the abnormal flow according to the inferred information, and giving an inferred result of the fault position by combining the transmission path information of the abnormal flow.
In the embodiment of the application, the characteristics of the flow level are obtained by extracting the characteristics of the data flow of the path switch, and the inference on the fault position is generated by combining the data transmission path information, so that the technical problem that the end-side information is difficult to acquire is solved; by adopting the flow state classifier of the decision tree model and disposing the classifier on the switch, the problem of weaker network side perceptibility is solved, and quick and accurate fault perception and positioning are realized through limited information; the local inferred information is stored in the normal data packet, and is transmitted to other switch nodes by means of the flow of the data packet, so that the inferred aggregation and decision in the network are completed, the burden of data communication on the network is reduced, and the additional expense caused by introducing additional data collection and analysis nodes is avoided.
It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims (12)

1. A network fault awareness and localization method, characterized in that the network fault awareness and localization method comprises:
extracting features of the data flow of the path switch to obtain flow-level features, and sending the flow-level features to a classifier according to a preset period;
inputting corresponding classification rules and network path information into the classifier in advance;
generating inferred information of the data stream by the classifier; the inferred information is generated after the classifier judges whether the data flow is normal or not according to the characteristics;
determining abnormal stream transmission path information according to the inferred information, and giving an inferred result of the fault position by combining the transmission path information of the abnormal stream;
the determining abnormal stream transmission path information according to the inferred information, and giving the inferred result of the fault position by combining the transmission path information of the abnormal stream comprises the following steps:
determining abnormal streaming path information from the network path information according to inferred information;
the switch gives out inferred information of a fault position based on the relative position relation between a transmission path and a node where the transmission path is located in the transmission path information of the abnormal flow, and the inferred information comprises the following specific steps:
dividing the transmission path into an upstream transmission path and a downstream transmission path;
for abnormal data flow, the link weighting value of the upstream transmission path is 1, and the link weighting value of the downstream transmission path is 0; for normal data flow, link weighting value of upstream transmission path is-1, link weighting value of downstream transmission path is 0;
adding weights of links with the same transmission path in different data streams to obtain a final weight of the link;
and measuring the fault possibility of the link according to the occurrence frequency of the link in an upstream transmission path of the abnormal data stream, and giving out the inferred information of the fault position, wherein the inferred information comprises the following specific steps:
for an abnormal flow, links in the upstream transmission path are given higher weights, while links in the downstream transmission path are given lower weights; if a link occurs more times in the upstream transmission paths of a plurality of abnormal flows, the higher the weight of the link is, the higher the failure possibility of the link is;
for normal flows, links in the upstream transmission path are given lower weights, links in the downstream transmission path are given lower weights; the more times a link appears in the upstream transmission paths of the normal streams, the lower its weight, and the lower the likelihood of failure of the link.
2. The network fault sensing and locating method according to claim 1, wherein the feature extraction of the data flow of the path switch to obtain the flow-level feature, and the sending the flow-level feature to the classifier according to the preset period comprises:
extracting the characteristics through a message header parser for rewriting the switch, and maintaining the characteristics of the flow levels of all data flows of the switch according to a time window;
the classifier is a stream state classifier, and the characteristics of the stream level are stored through a register and are handed to the stream state classifier periodically.
3. The network failure sensing and localization method of claim 2 wherein generating inferred information for the data stream by the classifier comprises:
the flow state classifier generates inferred information for unidirectional data flows routed to the switch node according to the classification rules, wherein the classification rules include classifying data flows according to the principle that normal data flows have stability.
4. The network fault sensing and locating method according to claim 1, wherein the abnormal flow transmission path information is determined according to the inferred information, and after the inferred result of the fault location is given by combining the transmission path information of the abnormal flow, local inferred information is generated, and the local inferred information is transmitted to nodes of other switches by means of existing data packets in the network.
5. The network failure sensing and localization method of claim 4 wherein the transmitting the local inference information to other switch nodes via existing packets in the network comprises:
storing the local inference information in a normal data packet;
the local inference information is transmitted to other switch nodes by means of the flow of the data packets.
6. The method according to claim 5, wherein after transmitting the local inferred information to other switch nodes by using the existing data packet in the network, the other switch analyzes the local inferred information contained in the received data packet, and aggregates the local inferred information with the local inferred information to obtain a fault inferred information.
7. The network fault sensing and locating method according to claim 6, wherein after obtaining a fault inference message, the switch checks the strength of the fault inference message, and if the strength reaches a preset threshold, the switch triggers a subsequent process flow; if the strength does not reach the preset threshold, the fault inference information is stored back into the data packet, and the data packet is transmitted according to the original transmission path.
8. The method of claim 7, wherein triggering a subsequent process flow includes broadcasting a fault warning message.
9. The network fault awareness and localization method of claim 8 wherein the classifier is a flow state classifier employing a decision tree model, the training of the decision tree model comprising:
offline training, deploying the decision tree model on the switch, training the decision tree by using the tagged data;
and carrying out online training, wherein the classifier receives the extracted flow-level characteristics, classifies each unidirectional data flow passing through the switch, generates a classification result, and generates local inference information according to the classification result.
10. A network failure sensing and locating device, the network failure sensing and locating device comprising:
the feature extraction module is used for extracting flow-level features from the data flow of the path switch and delivering the flow-level features to the classifier at regular intervals;
inputting corresponding classification rules and network path information into the classifier in advance;
the data flow state deducing module is used for generating deduction of whether the data flow state is normal or not by the classifier;
the deducing module of the fault position is used for giving out the deduction of the fault position by combining the transmission path information of the abnormal flow;
the estimating of the fault position by combining the transmission path information of the abnormal flow comprises the following steps:
determining abnormal streaming path information from the network path information according to inferred information;
the switch gives out inferred information of a fault position based on the relative position relation between a transmission path and a node where the transmission path is located in the transmission path information of the abnormal flow, and the inferred information comprises the following specific steps:
dividing the transmission path into an upstream transmission path and a downstream transmission path;
for abnormal data flow, the link weighting value of the upstream transmission path is 1, and the link weighting value of the downstream transmission path is 0; for normal data flow, link weighting value of upstream transmission path is-1, link weighting value of downstream transmission path is 0;
adding weights of links with the same transmission path in different data streams to obtain a final weight of the link;
and measuring the fault possibility of the link according to the occurrence frequency of the link in an upstream transmission path of the abnormal data stream, and giving out the inference of the fault position, wherein the inference is specifically as follows:
for an abnormal flow, links in the upstream transmission path are given higher weights, while links in the downstream transmission path are given lower weights; if a link occurs more times in the upstream transmission paths of a plurality of abnormal flows, the higher the weight of the link is, the higher the failure possibility of the link is;
for normal flows, links in the upstream transmission path are given lower weights, links in the downstream transmission path are given lower weights; the more times a link appears in the upstream transmission paths of the normal streams, the lower its weight, and the lower the likelihood of failure of the link.
11. A terminal, comprising: a memory and a processor, the memory storing a network failure awareness and localization program that when executed by the processor is configured to implement the network failure awareness and localization method of any one of claims 1-9.
12. A computer storage medium, the storage medium being a computer readable storage medium, wherein the computer storage medium stores a network failure sensing and localization program, which when executed by a processor is adapted to implement the network failure sensing and localization method of any one of claims 1-9.
CN202210484276.6A 2022-05-05 2022-05-05 Network fault sensing and positioning method, device, terminal and storage medium Active CN114844760B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210484276.6A CN114844760B (en) 2022-05-05 2022-05-05 Network fault sensing and positioning method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210484276.6A CN114844760B (en) 2022-05-05 2022-05-05 Network fault sensing and positioning method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN114844760A CN114844760A (en) 2022-08-02
CN114844760B true CN114844760B (en) 2023-07-25

Family

ID=82568742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210484276.6A Active CN114844760B (en) 2022-05-05 2022-05-05 Network fault sensing and positioning method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN114844760B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116437162B (en) * 2023-06-12 2023-08-22 美视信息科技(常州)有限公司 Information transmission method and device, display and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020135339A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Network path convergence method and related device
CN112532517A (en) * 2020-11-05 2021-03-19 东北大学 OSPF protocol configuration comprehensive scheme based on domain specific language

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106470168B (en) * 2015-08-22 2019-12-06 华为技术有限公司 data transmission method, switch using the method and network control system
CN108123824B (en) * 2016-11-30 2021-06-01 华为技术有限公司 Network fault detection method and device
CN111526096B (en) * 2020-03-13 2022-03-15 北京交通大学 Intelligent identification network state prediction and congestion control system
CN114172706A (en) * 2021-11-29 2022-03-11 广州大学 Method, system, equipment and medium for detecting network flow abnormity of intelligent sound box
CN114221858B (en) * 2021-12-15 2022-09-30 中山大学 SDN network fault positioning method, device, equipment and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020135339A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Network path convergence method and related device
CN112532517A (en) * 2020-11-05 2021-03-19 东北大学 OSPF protocol configuration comprehensive scheme based on domain specific language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
大规模组播网络故障定位的模型及算法;王亚磊;陈嘉健;李崇荣;包丛笑;;清华大学学报(自然科学版)(10);第159-162页 *

Also Published As

Publication number Publication date
CN114844760A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
EP3366006B1 (en) Triggered in-band operations, administration, and maintenance in a network environment
Srinivasan et al. Machine learning-based link fault identification and localization in complex networks
Vishwanath et al. Swing: Realistic and responsive network traffic generation
US10320824B2 (en) Anomaly detection using network traffic data
US8601155B2 (en) Telemetry stream performance analysis and optimization
JP4727275B2 (en) High-speed traffic measurement and analysis methodologies and protocols
JP5666685B2 (en) Failure analysis apparatus, system thereof, and method thereof
US7889666B1 (en) Scalable and robust troubleshooting framework for VPN backbones
Jin et al. Network characterization service (NCS)
US20200382402A1 (en) Active probe construction using machine learning for measuring sd-wan tunnel metrics
US6850491B1 (en) Modeling link throughput in IP networks
US9774506B2 (en) Method and apparatus for analysis of the operation of a communication system using events
US8687507B2 (en) Method, arrangement and system for monitoring a data path in a communication network
US6909693B1 (en) Performance evaluation and traffic engineering in IP networks
US20190379605A1 (en) Inferring device load and availability in a network by observing weak signal network based metrics
US7903657B2 (en) Method for classifying applications and detecting network abnormality by statistical information of packets and apparatus therefor
JP2005508593A (en) System and method for realizing routing control of information in network
US11677819B2 (en) Peer-to-peer feature exchange for edge inference of forecasting models
CN114844760B (en) Network fault sensing and positioning method, device, terminal and storage medium
KR20150090216A (en) Monitoring encrypted sessions
Sarao Machine learning and deep learning techniques on wireless networks
CN108040007A (en) A kind of alternate routing link-quality monitoring method and system
US10547524B2 (en) Diagnostic transparency for on-premise SaaS platforms
KR20220029142A (en) Sdn controller server and method for analysing sdn based network traffic usage thereof
Harahap et al. A router-based management system for prediction of network congestion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant