CN112181758A - Fault root cause positioning method based on network topology and real-time alarm - Google Patents

Fault root cause positioning method based on network topology and real-time alarm Download PDF

Info

Publication number
CN112181758A
CN112181758A CN202010835820.8A CN202010835820A CN112181758A CN 112181758 A CN112181758 A CN 112181758A CN 202010835820 A CN202010835820 A CN 202010835820A CN 112181758 A CN112181758 A CN 112181758A
Authority
CN
China
Prior art keywords
information
alarm
feature
node
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010835820.8A
Other languages
Chinese (zh)
Other versions
CN112181758B (en
Inventor
徐康
李熠轩
刘海琦
张晓伟
叶宁
王汝传
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010835820.8A priority Critical patent/CN112181758B/en
Publication of CN112181758A publication Critical patent/CN112181758A/en
Application granted granted Critical
Publication of CN112181758B publication Critical patent/CN112181758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
    • Y04S10/52Outage or fault management, e.g. fault detection or location

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a fault root cause positioning method based on network topology and real-time alarm, which comprises the following steps: inputting an alarm data set, performing data processing, extracting the characteristics contained in the current corresponding node as a characteristic set, and acquiring time and node information in each piece of alarm information; according to the current node information, combining the topological relation to obtain upper and lower nodes, screening out alarm information of the upper and lower nodes within a certain time interval according to the time information, and combining a feature set of the current node to construct an alarm feature of the upper and lower nodes; dividing an alarm data set into a training set and a testing set, screening all obtained characteristic information, inputting a classification algorithm, taking a characteristic set with the best prediction performance as a model classification characteristic, inputting the characteristic values contained in the training set into the classification algorithm to train to obtain a prediction model, predicting data in the testing set by using the trained classification model and outputting a prediction result, and obtaining a final prediction root result according to the number of candidate root causes in the prediction result and time information.

Description

Fault root cause positioning method based on network topology and real-time alarm
Technical Field
The invention relates to the technical field of intelligent operation and maintenance, in particular to a fault root cause positioning method based on network topology and real-time alarm.
Background
The large e-commerce platform internally relates to the mutual calling among hundreds of methods, and generates tens of thousands of pieces of alarm data every day. How to utilize the network topology information and the alarm data to filter and analyze the alarm timely and effectively and finally give effective alarm and suspected root cause is a major challenge facing network operation and maintenance. When a node in the network topology fails, other nodes connected with the node are often abnormal, and then a large amount of alarms are generated, so that the root cause of true alarm is submerged. When a large number of alarms occur, the alarms need to be analyzed and processed, invalid alarms are filtered out, candidate root cause nodes are accurately positioned, and fault positioning time is shortened.
Disclosure of Invention
The invention aims to provide a fault root cause positioning method based on network topology and real-time alarm, which can accurately and quickly position network faults, can improve the operation and maintenance efficiency of a first-line network and reduce the loss generated by the network faults.
The invention adopts the following technical scheme for realizing the aim of the invention:
the invention provides a fault root cause positioning method based on network topology and real-time alarm, which comprises the following steps:
inputting an alarm data set, performing data processing on the alarm data set, extracting features contained in a current corresponding node from all alarm information to be used as a feature set, acquiring time and node information in each piece of alarm information, and extracting the features contained in each piece of alarm information by combining the obtained feature set;
according to the processed current node information, upper and lower nodes are obtained by combining the topological relation, alarm information of the upper and lower nodes in a certain time interval is screened out according to time information, alarm characteristics of the upper and lower nodes can be constructed by combining a characteristic set of the current node, and global characteristic information of each alarm information is obtained;
dividing an alarm data set into a training set and a testing set, then screening all obtained characteristic information, inputting a classification algorithm, taking a characteristic set with the best prediction performance as a model classification characteristic, inputting the characteristic value contained in the training set into the classification algorithm to train to obtain a prediction model, predicting data in the testing set by using the trained classification model and outputting a prediction result, and obtaining a final prediction root result according to the number of candidate roots and time information in the prediction result.
Further, the method for inputting an alarm data set, performing data processing on the alarm data set, and extracting features contained in a current corresponding node from all alarm information as a feature set specifically includes:
preprocessing an alarm data set providing alarm information, combining all files, removing all irrelevant information by using a regular expression, extracting features and feature values, removing duplication to obtain all features, using the features as a feature matching set, and customizing a regular expression for extracting the feature values for each feature.
Further, the method for acquiring time and node information in each piece of alarm information specifically includes:
and extracting the time and node information of each piece of alarm information by using a regular expression, so as to establish a dictionary to facilitate searching and matching.
Further, the method for extracting the features included in each piece of alarm information by combining the obtained feature set specifically includes:
and matching each piece of alarm information with the features in the feature set according to the file after line traversal processing, and filling the extracted feature values into the features corresponding to each piece of alarm information if the judgment is consistent.
Further, the method for screening out the alarm information of the upper node and the lower node within a certain time interval according to the time information specifically comprises the following steps:
traversing adjacent alarm information one minute before and after each piece of alarm information, inputting node information of the adjacent alarm information, screening upper and lower node alarm information from the adjacent alarm information, and performing characteristic matching on all associated nodes to obtain characteristics of the associated nodes.
Further, the method for constructing the alarm characteristics of the upper and lower nodes by combining the feature set to obtain the global characteristic information of each alarm information specifically includes:
the feature set is processed into a data set with T0 containing only the home position feature, a data set with T1 containing the home position feature and the upper node feature, a data set with T2 containing the home position feature and the lower node feature, and a data set with T3 containing the home position feature, the upper node feature and the lower node feature.
Further, dividing the alarm data set into a training set and a test set, then screening all the obtained feature information, inputting a classification algorithm, taking a feature set with the best prediction performance as a model classification feature, inputting the feature values contained in the training set into the classification algorithm to train to obtain a prediction model, predicting data in the test set by using the trained classification model and outputting a prediction result, and obtaining a final prediction root cause result according to the number of candidate root causes in the prediction result and time information specifically comprises the following steps:
selecting a T2 data set containing the characteristics of the home node and the lower node, inputting the data set into an XGboost classification model, and training to obtain a root prediction result;
selecting Borderline SMOTE to balance the data set;
training an XGboost classification model by using a training set aiming at a T2 data set containing a home position characteristic and a lower position characteristic, and then carrying out root factor prediction on a test set to obtain all candidate root factor information;
and judging the candidate root cause by combining the occurrence time and the occurrence times to obtain a root cause prediction result.
The invention has the following beneficial effects:
the invention locates the root cause node causing alarm and outputs the alarm information, which is convenient for operation and maintenance personnel to troubleshoot the fault; the method has good flexibility and expansibility, various classification algorithms can be replaced on the basis of the method, and the accuracy of the root cause positioning can be further improved by replacing the classification algorithms more suitable for a certain working environment and using different training set training models.
Drawings
FIG. 1 is a schematic overall flow chart provided according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of step S10 according to the embodiment of the present invention;
fig. 3 is a schematic flow chart of step S20 according to the embodiment of the present invention;
fig. 4 is a schematic flowchart of step S30 according to the embodiment of the present invention.
Detailed Description
The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The invention relates to a fault root cause positioning method based on network topology and real-time alarm, which carries out root cause prediction on alarm information containing time information and topology relation; extracting the characteristics of the alarm information, inputting the alarm information into a classification algorithm, and obtaining a model prediction root node through training; in order to improve the prediction accuracy, the topological relation between alarm information nodes is combined, and the upper node and the lower node of the alarm information nodes are searched by utilizing a topological graph; after the associated node is positioned, the alarm information with causal connection is screened out through the time information, the included characteristics of the alarm information are further judged, and upper and lower characteristic information is added to serve as the input characteristics of a classification algorithm; combining all the obtained characteristics, obtaining characteristic information with the highest F1 value after screening treatment, constructing a data set according to the characteristic information, and carrying out balance treatment on the data set by using Borderline SMOTE; inputting the data set into a machine learning classification algorithm, obtaining suspected root cause information after classification, and positioning root cause nodes by combining time information and classification quantity; the invention outputs the root cause node causing the alarm and the alarm information thereof, thereby being convenient for operation and maintenance personnel to troubleshoot the fault.
The embodiment is applicable to the case of filtering and analyzing alarms by using network topology information and alarm data to finally give effective alarms and suspected root causes, and the method can be executed by a machine learning module, wherein the machine learning module can be realized by software and/or hardware, and can also be applied to an alarm method such as an e-commerce platform, and as shown in fig. 1, the method is a flow diagram provided by an embodiment of the invention, and specifically comprises the following steps:
in step S10, an alarm data set is input, data processing is performed on the alarm data set, features included in all alarm information are extracted as a feature set, the alarm information is traversed, time and node information in each alarm is obtained through processing, and the features included in each alarm information can be extracted by combining the obtained feature set;
in step S20, according to the node information obtained by processing in S10, the upper and lower nodes are obtained in combination with the topological relation, the alarm information of the upper and lower nodes within a certain time interval is screened out according to the time information, and the alarm characteristics of the upper and lower nodes can be constructed in combination with the feature set in S10, so as to obtain the global characteristic information of each piece of alarm information;
in step S30, the alarm data set is divided into a training set and a test set, all the feature information obtained in step S20 is filtered to remove noise, a classification algorithm is input, the feature set with the best prediction performance is used as a model classification feature, the feature values of the training set data are input into the classification algorithm to be trained to obtain a prediction model, the trained classification model is used to predict the data in the test set and output a prediction result, and a final prediction root result is obtained according to the candidate root number and time information in the prediction result.
Preferably, the alarm text data is processed by a regular expression, and a feature dictionary is constructed to improve the feature extraction efficiency, as shown in fig. 2, the following is specifically provided:
in step S101, preprocessing a test set file and a training set file providing warning information, merging all files, removing all irrelevant information by using a regular expression, extracting features and feature values, removing duplicates to obtain all features, using the features as a feature matching set, and customizing a regular expression for extracting feature values for each feature;
in step S102, time and node information of each piece of alarm information is extracted by using a regular expression, so that a dictionary is established to facilitate searching and matching;
in step S103, each piece of alarm information is matched with the feature in the feature set according to the file after the line traversal processing, and if the judgment is consistent, the extracted feature value is filled in the feature corresponding to each piece of alarm information.
Preferably, the method for obtaining the characteristics of the upper and lower nodes of the current node according to the topological relation in step S20, as shown in fig. 3, specifically includes the following steps:
in step S201, the upper and lower node information of a node can be located by using the topological relation between nodes, and the upper and lower node information of the current node is output;
in step S202, traversing adjacent alarm information one minute before and after each alarm information, inputting node information thereof, and screening upper and lower node alarm information therefrom, and performing feature matching on all associated nodes to obtain features thereof;
in step S203, the feature set is processed into a data set of T0 containing only the local features; t1 data set containing home and upper node features; t2 data set containing the characteristics of the home node and the lower node; t3 data set containing characteristics of local, upper and lower nodes; each data set is uniform in format and should have a label in the corresponding column of the feature it contains.
The step S30, as shown in fig. 4, specifically includes:
in step S301, to reduce noise and adjust input features, a data set containing the features of the home node and the lower node of T2 is selected and input to the XGBoost classification model, and a root prediction result is obtained through training;
in step S302, since the problem of unbalanced category exists in the alarm information of whether the data set is the root cause, a Borderline SMOTE is selected to perform balance processing on the data set;
in step S303, for a T2 data set including a home position and a lower feature, a training set is used to train an XGBoost classification model, and then a root factor prediction is performed on a test set to obtain all candidate root factor information;
in step S304, since the same file only contains one root cause, the candidate root causes are determined by combining the occurrence time and the occurrence frequency, and a root cause prediction result is obtained.
Operation example:
assuming that a training set is given, which contains nodes, time and alarm information, and labels root cause alarm nodes, the following are two details of the training set of the embodiment:
example 1, time: "2019/6/41: 14", triggername: "host node _61FullGC average elapsed time: 2118ms (greater than threshold: 1000ms) ", is _ root: "0".
Example 2, time: "2019/6/41: 14", triggername: "host node _60 port 80 communication exception", is _ root: "1".
Also, the topological relation between the nodes is given, for example, the nodes and the lower nodes thereof are stored in a dictionary manner: { "node _50" [ "node _4", "node _83", "node _33", "node _17" ], "node _0" [ "node _4", "node _83", "node _33", "node _17" ], and the like.
S1: local feature extraction: step (1): traversing the alarm information, and processing each alarm by using a regular expression, for example, removing the specified information in the triggername in the case by using an r 'host node \ d +' expression, where the rest part of the triggername in the case is: "FullGC average time consumption: 2118ms (greater than threshold: 1000ms) "; step (2): the remaining portion of the regular expression after processing is processed again, and the regular expression is customized for each piece of information, such as "FullGC average elapsed time" remaining in the case: 2118ms (greater than threshold: 1000ms) "customize regular expression r' [ FullGC average elapsed time: the value contained in the method can be read out through an expression, and a specific characteristic value is replaced by a '+'; and (3): the information contained in the processed triggername is deduplicated, all features can be obtained as a feature set, and the features contained in each alarm message can be extracted by the feature set, for example, the average time consumption of the host node _61FullGC of "2019/6/41: 14: 2118ms (greater than the threshold value: 1000ms) ", and the alarm information thereof contains the following characteristics: "FullGC average time consumption: ms (greater than threshold: 1000ms) ", eigenvalue 2118;
s2: and customizing a regular expression according to the feature set obtained in the step S1, and then matching each piece of alarm information to obtain the contained features of all pieces of information. By using the given topological relation, specific association between nodes can be obtained, so that information of an upper node and a lower node of the current node can be obtained, for example, if the obtained node information is "node _50", the obtained lower node is "node _4", "node _83", "node _33", "node _ 17"; combining the time information, the alarm information of all upper/lower nodes in one minute of a certain alarm information can be positioned, so that the alarm information corresponding to the nodes 'node _ 4', 'node _ 83', 'node _ 33' and 'node _ 17' occurring in the same minute is positioned, and the characteristic contained in the node is screened out by combining the characteristic set obtained in S1, thereby obtaining the upper/lower node characteristics of the alarm information;
s3: screening all the characteristic information obtained in the step S2, and using a data set containing the characteristics of the home node and the lower node as a training set; carrying out balance processing on the data set by using Borderline SMOTE; and adopting an XGboost algorithm, after training by using a test set, carrying out root factor prediction on the processed test set containing the home position and the lower features to obtain all candidate root factor information, and judging the candidate root factors by combining the occurrence time and the occurrence times to obtain a root factor prediction result.
When the method is used for preprocessing data, the alarm information features are extracted from all data sets, and all files are integrated into a whole, so that the alarm information is conveniently and accurately positioned, the root cause nodes are intuitively previewed, and the subsequent addition of the alarm information features and file operation is facilitated; when a classification algorithm is adopted to predict the root cause, the extracted features of the alarm information are used as input, the features contained in the alarm information are extracted through a regular expression after traversing the alarm information and are used as classification basis, due to causal association among nodes, topological relation among the nodes is combined, the upper and lower nodes of the alarm information are positioned by using a topological graph, the alarm information of the related nodes is obtained by combining alarm time information, the features of the alarm information are extracted, and the prediction accuracy is obviously improved by adding the features; four data sets can be obtained after data are processed, and in order to reduce noise and ensure the effectiveness of characteristic adding information, different data sets can be adopted under different situations, so that the prediction accuracy is improved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (7)

1. A fault root cause positioning method based on network topology and real-time alarm is characterized by comprising the following steps:
inputting an alarm data set, performing data processing on the alarm data set, extracting features contained in a current corresponding node from all alarm information to be used as a feature set, acquiring time and node information in each piece of alarm information, and extracting the features contained in each piece of alarm information by combining the obtained feature set; according to the processed current node information, upper and lower nodes are obtained by combining the topological relation, alarm information of the upper and lower nodes in a certain time interval is screened out according to time information, alarm characteristics of the upper and lower nodes can be constructed by combining a characteristic set of the current node, and global characteristic information of each alarm information is obtained;
dividing an alarm data set into a training set and a testing set, then screening all obtained characteristic information, inputting a classification algorithm, taking a characteristic set with the best prediction performance as a model classification characteristic, inputting the characteristic value contained in the training set into the classification algorithm to train to obtain a prediction model, predicting data in the testing set by using the trained classification model and outputting a prediction result, and obtaining a final prediction root result according to the number of candidate roots and time information in the prediction result.
2. The method for locating the fault root cause based on the network topology and the real-time alarm according to claim 1, wherein the method for inputting the alarm data set, processing the alarm data set, and extracting the features contained in the current corresponding nodes from all the alarm information as the feature set specifically comprises:
preprocessing an alarm data set providing alarm information, combining all files, removing all irrelevant information by using a regular expression, extracting features and feature values, removing duplication to obtain all features, using the features as a feature matching set, and customizing a regular expression for extracting the feature values for each feature.
3. The method for locating the fault root cause based on the network topology and the real-time alarm according to claim 2, wherein the method for obtaining the time and node information in each piece of alarm information specifically comprises:
and extracting the time and node information of each piece of alarm information by using a regular expression, so as to establish a dictionary to facilitate searching and matching.
4. The method for locating the fault root cause based on the network topology and the real-time alarm according to claim 3, wherein the method for extracting the features included in each piece of alarm information by combining the obtained feature set specifically comprises:
and matching each piece of alarm information with the features in the feature set according to the file after line traversal processing, and filling the extracted feature values into the features corresponding to each piece of alarm information if the judgment is consistent.
5. The method for locating the fault root cause based on the network topology and the real-time alarm according to claim 1, wherein the method for screening out the alarm information of the upper node and the lower node within a certain time interval according to the time information specifically comprises:
traversing adjacent alarm information one minute before and after each piece of alarm information, inputting node information of the adjacent alarm information, screening upper and lower node alarm information from the adjacent alarm information, and performing characteristic matching on all associated nodes to obtain characteristics of the associated nodes.
6. The method for locating the fault root cause based on the network topology and the real-time alarm according to claim 5, wherein the alarm characteristics of the upper and lower nodes can be constructed by combining the characteristic set of the current node, and the method for obtaining the global characteristic information of each alarm information specifically comprises:
the feature set is processed into a data set with T0 containing only the home position feature, a data set with T1 containing the home position feature and the upper node feature, a data set with T2 containing the home position feature and the lower node feature, and a data set with T3 containing the home position feature, the upper node feature and the lower node feature.
7. The method according to claim 1, wherein the step of dividing the alarm data set into a training set and a test set, the step of screening all the obtained feature information, the step of inputting a classification algorithm, the step of inputting a feature set with the best prediction performance as a model classification feature, the step of inputting the feature values contained in the training set into the classification algorithm for training to obtain a prediction model, the step of predicting the data in the test set by using the trained classification model and outputting a prediction result, and the step of obtaining a final prediction root result according to the number of candidate root causes in the prediction result and time information specifically comprises the steps of:
selecting a T2 data set containing the characteristics of the home node and the lower node, inputting the data set into an XGboost classification model, and training to obtain a root prediction result; selecting Borderline SMOTE to balance the data set;
training an XGboost classification model by using a training set aiming at a T2 data set containing a home position characteristic and a lower position characteristic, and then carrying out root factor prediction on a test set to obtain all candidate root factor information;
and judging the candidate root cause by combining the occurrence time and the occurrence times to obtain a root cause prediction result.
CN202010835820.8A 2020-08-19 2020-08-19 Fault root cause positioning method based on network topology and real-time alarm Active CN112181758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010835820.8A CN112181758B (en) 2020-08-19 2020-08-19 Fault root cause positioning method based on network topology and real-time alarm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010835820.8A CN112181758B (en) 2020-08-19 2020-08-19 Fault root cause positioning method based on network topology and real-time alarm

Publications (2)

Publication Number Publication Date
CN112181758A true CN112181758A (en) 2021-01-05
CN112181758B CN112181758B (en) 2023-07-28

Family

ID=73919464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010835820.8A Active CN112181758B (en) 2020-08-19 2020-08-19 Fault root cause positioning method based on network topology and real-time alarm

Country Status (1)

Country Link
CN (1) CN112181758B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112822052A (en) * 2021-01-08 2021-05-18 河海大学 Network fault root cause positioning method based on network topology and alarm
CN113240139A (en) * 2021-06-03 2021-08-10 南京中兴新软件有限责任公司 Alarm cause and effect evaluation method, fault root cause positioning method and electronic equipment
CN113268370A (en) * 2021-05-11 2021-08-17 西安交通大学 Root cause alarm analysis method, system, equipment and storage medium
CN113791926A (en) * 2021-09-18 2021-12-14 平安普惠企业管理有限公司 Intelligent alarm analysis method, device, equipment and storage medium
CN114978878A (en) * 2022-05-12 2022-08-30 亚信科技(中国)有限公司 Positioning method, positioning device, electronic equipment and computer readable storage medium
WO2023040381A1 (en) * 2021-09-18 2023-03-23 中兴通讯股份有限公司 Alarm causal relationship mining method, alarm causal mining apparatus, and storage medium
CN116074180A (en) * 2023-02-20 2023-05-05 中国联合网络通信集团有限公司 Fault location method, fault repair method, device and storage medium
CN117544482A (en) * 2024-01-05 2024-02-09 北京神州泰岳软件股份有限公司 Operation and maintenance fault determining method, device, equipment and storage medium based on AI

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
US7986639B1 (en) * 2004-10-26 2011-07-26 Sprint Communications Company L.P. Topology management of a communications network
US20140189436A1 (en) * 2013-01-02 2014-07-03 Tata Consultancy Services Limited Fault detection and localization in data centers
CN105391579A (en) * 2015-11-25 2016-03-09 国家电网公司 Electric power communication network fault positioning method based on key alarm sets and supervised classification
CN106295692A (en) * 2016-08-05 2017-01-04 北京航空航天大学 Product initial failure root primordium recognition methods based on dimensionality reduction Yu support vector machine
CN106909487A (en) * 2017-01-18 2017-06-30 北京盛世全景科技股份有限公司 It is applied to the method for early warning and device of information system
CN109756376A (en) * 2019-01-11 2019-05-14 中电福富信息科技有限公司 Alarm correlation analysis method based on diagram data model
CN110309009A (en) * 2019-05-21 2019-10-08 北京云集智造科技有限公司 Situation-based operation and maintenance fault root cause positioning method, device, equipment and medium
CN110855502A (en) * 2019-11-22 2020-02-28 叶晓斌 Fault cause determination method and system based on time-space analysis log
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
US20200202179A1 (en) * 2018-12-21 2020-06-25 Capital One Services, Llc Methods and arrangements to identify feature contributions to erroneous predictions
CN111342997A (en) * 2020-02-06 2020-06-26 烽火通信科技股份有限公司 Construction method of deep neural network model, fault diagnosis method and system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7986639B1 (en) * 2004-10-26 2011-07-26 Sprint Communications Company L.P. Topology management of a communications network
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
US20140189436A1 (en) * 2013-01-02 2014-07-03 Tata Consultancy Services Limited Fault detection and localization in data centers
CN105391579A (en) * 2015-11-25 2016-03-09 国家电网公司 Electric power communication network fault positioning method based on key alarm sets and supervised classification
CN106295692A (en) * 2016-08-05 2017-01-04 北京航空航天大学 Product initial failure root primordium recognition methods based on dimensionality reduction Yu support vector machine
CN106909487A (en) * 2017-01-18 2017-06-30 北京盛世全景科技股份有限公司 It is applied to the method for early warning and device of information system
US20200202179A1 (en) * 2018-12-21 2020-06-25 Capital One Services, Llc Methods and arrangements to identify feature contributions to erroneous predictions
CN109756376A (en) * 2019-01-11 2019-05-14 中电福富信息科技有限公司 Alarm correlation analysis method based on diagram data model
CN110309009A (en) * 2019-05-21 2019-10-08 北京云集智造科技有限公司 Situation-based operation and maintenance fault root cause positioning method, device, equipment and medium
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN110855502A (en) * 2019-11-22 2020-02-28 叶晓斌 Fault cause determination method and system based on time-space analysis log
CN111342997A (en) * 2020-02-06 2020-06-26 烽火通信科技股份有限公司 Construction method of deep neural network model, fault diagnosis method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
九离: "基于网络拓扑及告警的故障根因定位系统实现及算法研究——赛题分析", 《HTTPSWWW.CNBLOGS.COMHUAN-CHP12805284.HTML》 *
黄兵明等: "人工智能在通信网络故障溯源的应用研究", 《邮电设计技术》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112822052A (en) * 2021-01-08 2021-05-18 河海大学 Network fault root cause positioning method based on network topology and alarm
CN112822052B (en) * 2021-01-08 2022-03-29 河海大学 Network fault root cause positioning method based on network topology and alarm
CN113268370A (en) * 2021-05-11 2021-08-17 西安交通大学 Root cause alarm analysis method, system, equipment and storage medium
CN113240139A (en) * 2021-06-03 2021-08-10 南京中兴新软件有限责任公司 Alarm cause and effect evaluation method, fault root cause positioning method and electronic equipment
CN113240139B (en) * 2021-06-03 2023-09-26 南京中兴新软件有限责任公司 Alarm cause and effect evaluation method, fault root cause positioning method and electronic equipment
CN113791926A (en) * 2021-09-18 2021-12-14 平安普惠企业管理有限公司 Intelligent alarm analysis method, device, equipment and storage medium
WO2023040381A1 (en) * 2021-09-18 2023-03-23 中兴通讯股份有限公司 Alarm causal relationship mining method, alarm causal mining apparatus, and storage medium
CN114978878A (en) * 2022-05-12 2022-08-30 亚信科技(中国)有限公司 Positioning method, positioning device, electronic equipment and computer readable storage medium
CN114978878B (en) * 2022-05-12 2024-03-08 亚信科技(中国)有限公司 Positioning method, positioning device, electronic equipment and computer readable storage medium
CN116074180A (en) * 2023-02-20 2023-05-05 中国联合网络通信集团有限公司 Fault location method, fault repair method, device and storage medium
CN117544482A (en) * 2024-01-05 2024-02-09 北京神州泰岳软件股份有限公司 Operation and maintenance fault determining method, device, equipment and storage medium based on AI

Also Published As

Publication number Publication date
CN112181758B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN112181758B (en) Fault root cause positioning method based on network topology and real-time alarm
US11947438B2 (en) Operation and maintenance system and method
CN111158977A (en) Abnormal event root cause positioning method and device
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN112152830A (en) Intelligent fault root cause analysis method and system
CN113935497A (en) Intelligent operation and maintenance fault processing method, device and equipment and storage medium thereof
CN111726248A (en) Alarm root cause positioning method and device
CN105577440A (en) Network fault time location method and analyzing device
CN114465874B (en) Fault prediction method, device, electronic equipment and storage medium
CN115981984A (en) Equipment fault detection method, device, equipment and storage medium
CN108021509B (en) Test case dynamic sequencing method based on program behavior network aggregation
CN105630656A (en) Log model based system robustness analysis method and apparatus
CN112988509A (en) Alarm message filtering method and device, electronic equipment and storage medium
CN106878038A (en) Fault Locating Method and device in a kind of communication network
CN107111609A (en) Lexical analyzer for neural language performance identifying system
CN115576834A (en) Software test multiplexing method, system, terminal and medium for supporting fault recovery
CN115981902A (en) Fine-grained distributed micro-service system abnormal root cause positioning method and device
CN117009180A (en) Log and abnormal alarm information processing method and device
CN112416800A (en) Intelligent contract testing method, device, equipment and storage medium
CN111352820A (en) Method, equipment and device for predicting and monitoring running state of high-performance application
CN116225752A (en) Fault root cause analysis method and system for micro-service system based on fault mode library
CN114629776B (en) Fault analysis method and device based on graph model
CN115016782A (en) vue component generation method and device
CN113240140A (en) Fault detection method, device, equipment and storage medium of physical equipment
CN113807462A (en) AI-based network equipment fault reason positioning method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant