CN114629776A

CN114629776A - Fault analysis method and device based on graph model

Info

Publication number: CN114629776A
Application number: CN202011453509.3A
Authority: CN
Inventors: 张勉知; 刘惜吾; 程亚锋; 叶晓斌; 陈孟尝; 曾昭才; 张园
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-06-14
Anticipated expiration: 2040-12-11
Also published as: CN114629776B

Abstract

The invention provides a fault analysis method and a fault analysis device based on a graph model, and the fault analysis method based on the graph model provided by the embodiment comprises the following steps: acquiring a first real-time log, and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence; performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism to determine a log sequence to be processed; performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation; and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model. By the fault analysis method based on the graph model, provided by the embodiment of the invention, the fault occurrence details are accurately positioned, and a foundation is laid for fault root cause diagnosis and rapid service recovery.

Description

Fault analysis method and device based on graph model

Technical Field

The invention relates to the technical field of communication, in particular to a fault analysis method and device based on a graph model.

Background

With the continuous expansion of network scale, more and more network devices operate in different scenes, system logs can be generated in the operation process of the network devices and are used for reflecting the real-time change conditions of the operation state and the service state of the devices, fault points can be located through the system logs, and when the number of the system logs is increased in a blowout mode, the massive system logs cannot depend on a traditional expert to construct an experience base for efficient and accurate fault diagnosis and analysis, so that the potential hazard troubleshooting and fault location of the system logs by means of an Artificial Intelligence (AI) algorithm become research hotspots in the field of mobile communication.

In the prior art, the AI algorithm is mainly applied to the fault analysis of the logs, and a complete framework is not formed to mine the fault propagation relationship in the massive logs, for example, the fault which is generated by the crossing of a plurality of network event logs at the same time is difficult to be solved in a timely and accurate manner at a fixed point.

Therefore, how to timely and accurately locate the fault problem that a plurality of network event logs cross occurs is an urgent problem to be solved.

Disclosure of Invention

The invention provides a fault analysis method based on a graph model, which aims to realize accurate positioning of fault occurrence details and lay a foundation for fault root cause diagnosis and rapid service recovery.

In a first aspect, the present invention provides a fault analysis method based on a graph model, including:

acquiring a first real-time log, and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, wherein the first real-time log is used for recording real-time change information of an equipment running state and a service state;

performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomalous log sequence screened out by the anomaly detection mechanism;

performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation;

and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model.

In one possible design, performing fault analysis on a log sequence to be processed according to a preset graph model, and determining a root cause device with a fault and a root cause fault of the root cause device includes: in a second aspect, the present invention further provides a log detection apparatus, including:

the graph model comprises a device topological graph, a fault transfer graph and a device transfer graph;

determining a first device topological graph corresponding to the log sequence to be processed according to the log sequence to be processed and the device topological graph; the device topological graph corresponds to all devices in a preset network coverage range;

determining a first fault transfer diagram and a first equipment transfer diagram corresponding to the first equipment topological diagram according to the first equipment topological diagram, the fault transfer diagram and the equipment transfer diagram; the equipment transfer graph corresponds to all equipment in a preset network coverage range, and the fault transfer graph corresponds to the equipment transfer graph;

and determining a root cause device and a root cause fault according to the first fail-over diagram and the first device transition diagram.

In one possible design, the pre-processing the first real-time log according to a preset processing rule to obtain a first log sequence, including:

extracting information of the first real-time log according to a preset key field, and determining the first real-time log after the information is extracted;

according to a preset filtering rule, filtering the first real-time log, and determining the filtered first real-time log;

and dividing a fault analysis domain and segmenting a time sequence of the equipment corresponding to the first real-time log according to a preset network planning rule and a clustering algorithm to obtain a first log sequence.

In one possible design, performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, includes:

and performing key field matching processing on the first log sequence according to a preset abnormal log level, and determining a log sequence to be processed containing the abnormal log level.

In one possible design, before obtaining the first real-time log, the method further includes:

acquiring a historical log training set;

preprocessing the historical logs in the historical log training set according to a preset processing rule to obtain a historical log sequence;

performing anomaly detection on the historical log sequence according to a preset anomaly detection mechanism, and determining the historical log sequence to be processed; the history log sequence to be processed is used for recording the abnormal history log sequence screened by the abnormal detection mechanism;

determining an equipment transfer graph according to a historical log sequence to be processed and a preset equipment topological graph;

and determining a fault transfer diagram according to the historical log sequence to be processed, the equipment topological diagram and the equipment transfer diagram.

In a second aspect, the present invention provides a failure analysis apparatus based on a graph model, including:

the first processing module is used for acquiring a first real-time log and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, wherein the first real-time log is used for recording real-time change information of an equipment running state and a service state;

the second processing module is used for carrying out abnormity detection on the first log sequence according to a preset abnormity detection mechanism and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the log sequence containing the abnormal log level; the anomaly log level is determined by an anomaly detection mechanism;

the first determining module is used for performing fault analysis on the log sequence to be processed according to a preset graph model, determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation;

and the second determining module is used for determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model.

In one possible design, the first determining module is specifically configured to:

the graph model comprises an equipment topological graph, a fault transfer graph and an equipment transfer graph;

In one possible design, a first processing module is to:

In one possible design, the second processing module is specifically configured to:

In one possible design, the first processing module is further configured to:

acquiring a historical log training set;

In a third aspect, the present invention further provides a fault analysis platform, including:

a processor; and the number of the first and second groups,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any one of the graph model based fault analysis methods of the first aspect via execution of executable instructions.

In a fourth aspect, an embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the graph model-based fault analysis methods in the first aspect.

In a fifth aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program, and when the computer program is executed by a processor, the computer program implements any one of the failure analysis methods based on the graph model in the first aspect.

The invention provides a fault analysis method and a fault analysis device based on a graph model.A first log sequence is obtained by acquiring a first real-time log and preprocessing the first real-time log according to a preset processing rule, wherein the first real-time log is used for recording real-time change information of an equipment running state and a service state; performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomalous log sequence screened out by the anomaly detection mechanism; performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation; and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model so as to timely and accurately position the fault problem of the cross occurrence of a plurality of network event logs.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a diagram illustrating an application scenario of a graph model-based fault analysis method according to an exemplary embodiment of the present invention;

FIG. 2 is a flow diagram illustrating a graph model-based fault analysis method in accordance with an exemplary embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating log preprocessing in a graph model-based fault analysis method according to an exemplary embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a domain partitioning of a graph model based fault analysis method according to an exemplary embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating log sequence segmentation in a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a device topology in a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating a device transition in a graph model-based failure analysis method according to an example embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a failover in a graph model-based failure analysis method according to an example embodiment of the present invention;

FIG. 9 is a schematic diagram of a graph model-based fault analysis apparatus according to an exemplary embodiment of the present invention;

fig. 10 is a schematic structural diagram of a fault analysis platform according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 1 is a diagram of an application scenario of a graph model-based fault analysis method according to an exemplary embodiment of the present invention, as shown in fig. 1, a history log 101 obtains a history log sequence through log preprocessing 103, then determines an abnormal history log sequence to be processed through abnormality detection 104, and performs inference model 105 processing on the abnormal history log sequence, including performing fault correlation statistics on the abnormal history log sequences according to a preset device topology map, that is, counting a probability of a next related device that fails when a device in the abnormal history log sequence fails, so as to obtain a device transfer map 106, and determining a fault transfer map 107 according to the abnormal history log sequence, the device topology map, and the device transfer map 106; the device topology map, the device transition map 106, and the failover map 107 are set as graph models. When the real-time log 102 is subjected to fault location processing, log preprocessing 103 is carried out on the real-time log 102 to obtain a real-time log sequence, then the real-time log sequence is subjected to abnormity detection 104 to determine an abnormal log sequence to be processed, different key fields are selected for the abnormal log sequence to carry out event separation 108 processing, root cause reasoning 109 is carried out according to a preset graph model, and root cause equipment with faults and root cause faults of the root cause equipment are determined; and the root cause prediction 110 can be performed according to the root cause device, the root cause failure, and the graph model, so as to predict the failure device which fails next time and the failure cause information of the failure device.

Fig. 2 is a schematic flowchart of a fault analysis method based on a graph model according to an exemplary embodiment of the present invention, and as shown in fig. 2, the fault analysis method based on a graph model according to the present embodiment includes:

step 201, obtaining a first real-time log, and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, where the first real-time log is used to record real-time change information of an equipment running state and a service state.

Specifically, fig. 3 is a schematic flowchart illustrating a log preprocessing process in a graph model-based fault analysis method according to an exemplary embodiment of the present invention, and as shown in fig. 3, key field extraction, data filtering, and data sequence processing are performed on a first real-time log, and a detailed processing procedure is as follows.

Because the log formats of different manufacturers and different devices are different, subsequent unified processing and analysis are inconvenient, and therefore, key information in the log needs to be identified according to the log specification, a log structure needs to be constructed, and the log structure needs to be converted into a universal format. Acquiring a first real-time log, extracting information of the first real-time log according to a preset key field, and determining the first real-time log after information extraction; for example, taking a manufacturer device log as an example, the structure of the parsed log is shown in table one.

Watch 1

Time stamp

Host name

Name of module

Log level

Information abstract

Log identification

Information counting

Detailed information

According to a preset filtering rule, filtering the first real-time log, and determining the filtered first real-time log; the filtering rules are formed by collecting expert experiences in advance. For example, because the original log data volume is large, there exists a certain amount of illegal data, such as that the occurrence time of an alarm is an illegal value, the information of the alarm source is undefined, etc.; meanwhile, some logs are irrelevant to fault root causes needing to be analyzed, and therefore data cleaning needs to be carried out on original logs to filter interference terms, noise and the like.

And dividing a fault analysis domain and segmenting a time sequence of the equipment corresponding to the first real-time log according to a preset network planning rule and a clustering algorithm to obtain a first log sequence. Specifically, according to a preset network planning rule, relevant devices of which the topological relations belong to the same minimum management domain are divided into a fault analysis domain, and logs of all the devices in the domain are sorted according to a fault occurrence time sequence, so that fault propagation relations among the devices are mined subsequently. For example, fig. 4 is a schematic diagram illustrating a fault analysis domain division in a fault analysis method based on a graph model according to an exemplary embodiment of the present invention; as shown in fig. 4, the convergence device 1 and the convergence device 2 belong to a shared convergence device, and the convergence device 1, the convergence device 2, the access device 1, the access device 2, the access device 3, and the access device 4 form an access ring 1; the convergence device 1, the convergence device 2, the access device 5, the access device 6, the access device 7 and the access device 8 form an access ring 2; the access ring 1 and the access ring 2 do not intersect and are not associated except for the shared aggregation device 1 and the shared aggregation device 2, so that when a device in one access ring fails, the failure of a single loop cannot be propagated to other loops, that is, the access ring 1 and the access ring 2 are independent failure analysis domains respectively.

And then, a Clustering algorithm, such as Density-Based Clustering of Applications with Noise (DBSCAN), is adopted to derive the log sample set with the maximum Density connection according to the Density reachable relation. The algorithm parameters such as a sample neighborhood distance threshold and a sample number threshold are continuously adjusted according to evaluation parameters such as contour coefficients, clustering numbers and actual segmentation effects, and then an optimal value is determined. For example, fig. 5 is a schematic diagram illustrating log sequence segmentation in a graph model-based fault analysis method according to an exemplary embodiment of the present invention; as shown in fig. 5, each dot in the graph can be regarded as an abstraction of one log, the logs are arranged according to the printing time sequence, and according to the DBSCAN algorithm, a plurality of logs in a circle are grouped into one cluster, that is, the time sequence segmentation of the plurality of logs is realized, so that the logs in different time periods are stripped, and therefore, the subsequent more accurate positioning of the root cause device with the fault and the root cause fault is made to be accurate.

And step 202, performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomaly log sequence screened out by the anomaly detection mechanism.

Specifically, key field matching processing is performed on the first log sequence according to a preset abnormal log level, and a to-be-processed log sequence containing the abnormal log level is determined. For example, different vendors have standardized definitions of device log levels, and for a vendor as an example, log levels 0(Emergency), 1(Alert), 2(Critical), and 3(Errors) respectively represent an extremely urgent error, an error that needs to be corrected immediately, a more serious error, and an error. One characterization of equipment failure is the production of a large number of logs at a high level, i.e., levels 0-3. Based on the method, the log level is selected as a key field for triggering exception, and when the first log sequence contains logs with higher levels, the log sequence is determined to be a log sequence to be processed which is most likely to have faults.

Step 203, performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation;

specifically, the graph model comprises an equipment topology graph, a fault transfer graph and an equipment transfer graph; determining a first device topological graph corresponding to the log sequence to be processed according to the log sequence to be processed and the device topological graph; the device topological graph corresponds to all devices in a preset network coverage range; determining a first fault transfer diagram and a first equipment transfer diagram corresponding to the first equipment topological diagram according to the first equipment topological diagram, the fault transfer diagram and the equipment transfer diagram; the equipment transfer graph corresponds to all equipment in a preset network coverage range, and the fault transfer graph corresponds to the equipment transfer graph; and determining a root cause device and a root cause fault according to the first fail-over diagram and the first device transition diagram.

For example, fig. 6 is a device topology schematic diagram in the graph model-based fault analysis method according to an exemplary embodiment of the present invention, as shown in fig. 6, there are 7 devices, which are device a, device B, device C, device D, device E, device F, and device G, on a first device topology diagram corresponding to a log sequence to be processed, a device transfer diagram corresponding to the topology diagram of the 7 devices is shown in fig. 7, and fig. 7 is a device transfer schematic diagram in the graph model-based fault analysis method according to an exemplary embodiment of the present invention; fig. 8 is a schematic diagram of a failover corresponding to the topological diagram of the 7 devices, where fig. 8 is a schematic diagram of failover in a graph model-based failure analysis method according to an exemplary embodiment of the present invention; the abnormal log set of the log sequence to be processed is marked as { A: a, A: B, A: C, B: a, B: B, B: C, B: d, B: e, C: d, C: e }, namely, the network event generates 10 logs in total, and the method comprises the following steps: the device a has faults a, B, and C, the device B has faults a, B, C, d, and e, and the device C has faults d and e. The above 3 devices are analyzed for faults according to the failover graph of fig. 8, and probability values of each directed edge of the fault set { a, b, c } of device a are compared. For example, the probability values of each fault pointing to a directed edge are 0.5 for a- > a, 0.1 for a- > b, 0.2 for a- > c, 0.1 for a- > d, and 0.1 for a- > e, respectively; b- > a is 0.1, b- > b is 0.2, b- > c is 0.4, b- > d is 0.1, b- > e is 0.2; c- > a is 0.2, c- > b is 0.1, c- > c is 0.2, c- > d is 0.3 and c- > e is 0.2; it can be seen that the directed edge with the highest probability value is a- > a, then the root cause failure of device a is determined to be a. Similarly, the root cause failure a of the device B and the root cause failure e of the device C are determined. And analyzing the fault transfer relationship of the equipment according to the equipment transfer diagram in fig. 7, comparing the probability values of the directional edges of the equipment set { A, B, C }, and if the directional edge with the maximum probability value is A- > B, determining that the root equipment is the equipment A. And (4) combining the analysis results, wherein the root cause log of the network event is { A: a }, namely the root cause device is a device A, the root cause failure is a failure a, and the rest 9 logs are associated logs.

In one possible design, the log sequence to be processed after the anomaly detection and screening may be a log set including a plurality of events, and the plurality of events respectively generate a root cause log and a set of associated logs corresponding to the events. The logs in the time slice are separated according to the characteristics of the events, so that each network event can be deeply analyzed more effectively, the root cause reasoning logs of each event are further obtained, and possible faults of each event are predicted. For example, different keywords are selected as indexes for log separation in events according to different types of event features in the network, so as to achieve more accurate positioning of failure occurrence reasons, for example, for common network failures, the keywords are usually selected from ports, Internet Protocol (IP), and the like.

And step 204, determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model.

Specifically, for example, on the basis of the root cause analysis result { a: a }, a directed edge of the maximum probability value that the device a points to other devices is obtained by combining the device transfer graph, when the pointing device is a device B, the directed edge of the maximum probability value that the fault a points to other faults is determined according to the fault transfer graph, and when the pointing fault is a fault B, the predicted faulty device is determined to be B and the predicted fault information is determined to be fault B.

The processing method of step 201-204 can more accurately locate the specific information of the fault by preprocessing the real-time log, detecting the abnormality, and then performing the fault root cause reasoning and fault prediction on the log according to the graph model based on the topological relation and the transfer relation of the equipment.

The graph model is obtained by training and processing the historical log, and specifically, a historical log training set is obtained; preprocessing the historical logs in the historical log training set according to a preset processing rule to obtain a historical log sequence; performing anomaly detection on the historical log sequence according to a preset anomaly detection mechanism, and determining the historical log sequence to be processed; the history log sequence to be processed is used for recording the abnormal history log sequence screened by the abnormal detection mechanism; determining an equipment transfer graph according to a historical log sequence to be processed and a preset equipment topological graph; and determining a fault transfer diagram according to the historical log sequence to be processed, the equipment topological diagram and the equipment transfer diagram. Wherein, the preprocessing and the exception detection can be performed according to the processing method of step 201-202. After determining the historical log sequence to be processed, counting the probability of the next fault occurring in the related equipment when each equipment has a fault, and obtaining a relation graph between the faulty equipment, wherein each directed edge in the graph corresponds to a corresponding probability value. Taking the device A, B in fig. 6 and 7 as an example, the edge pointing to B of a represents the historical statistical probability value of any fault occurring in the device a, which may cause any fault occurring in the device B; the edge of A pointing to A indicates the historical statistical probability value of the device A when a certain fault occurs, which can cause other faults of the device A. Referring to fig. 8, a failover diagram is determined, that is, when considering whether two failures are connected, the failure transfer relationship between two devices in the device failover diagram of fig. 7 needs to be considered correspondingly. For example, the device a is only connected to the devices B and C in topology, and the next adjacent fault to a certain fault on the device a can only be a certain fault in the three devices (a | B | C). Taking the faults a and B as examples, the edge pointing to B from a represents the historical statistical probability value that all devices (i.e. A/B/C/D/E/F/G) have a faults and other devices have B faults.

For example, when a fault a occurs, the next fault is the historical frequency of B, and the historical statistical probability value is obtained according to the frequency. If the historical frequency of other faults caused by the occurrence of the fault A is as follows: the number of faults A → B is n, the number of A → C is m, and the number of A → D is k; then, the historical statistical probability values pointed by the directional edges of the fault a corresponding to the 3 cases are:

and similarly, obtaining historical statistical probability values pointed by all directed edges of all the devices by counting the transfer frequency among the devices when the faults occur.

Acquiring a first real-time log by the method of step 201-204, and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, where the first real-time log is used to record real-time change information of an equipment running state and a service state; performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomalous log sequence screened out by the anomaly detection mechanism; performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation; and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model. Therefore, the fault occurrence details can be accurately positioned, and a foundation is laid for fault root cause diagnosis and rapid service recovery.

FIG. 9 is a schematic diagram of a graph model-based fault analysis apparatus according to an exemplary embodiment of the present invention; as shown in fig. 9, the fault analysis apparatus 90 based on a graph model according to the present embodiment includes:

a first processing module 901, configured to obtain a first real-time log, and pre-process the first real-time log according to a preset processing rule to obtain a first log sequence, where the first real-time log is used to record real-time change information of an operating state and a service state of a device;

a second processing module 902, configured to perform anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determine a log sequence to be processed, where the log sequence to be processed is used to record a log sequence including an anomaly log level; the anomaly log level is determined by an anomaly detection mechanism;

a first determining module 903, configured to perform fault analysis on a log sequence to be processed according to a preset graph model, and determine a root cause device that has a fault and a root cause fault of the root cause device, where the graph model is used to represent a device topology relationship and a transfer relationship;

a second determining module 904, configured to determine a predicted failure device and predicted failure information according to the root cause device, the root cause failure, and the graph model.

In one possible design, the first determining module 903 is specifically configured to:

In one possible design, the first processing module 901 is specifically configured to:

In one possible design, the second processing module 902 is specifically configured to:

In one possible design, the first processing module 901 is further configured to:

acquiring a historical log training set;

Fig. 10 is a schematic structural diagram of a fault analysis platform according to an exemplary embodiment of the present invention. As shown in fig. 10, the fault analysis platform 10 provided in this embodiment includes:

a processor 1001; and the number of the first and second groups,

a memory 1002 for storing executable instructions of the processor, which may also be a flash (flash memory);

wherein the processor 1001 is configured to perform the various steps of the above-described method via execution of executable instructions. Reference may be made in particular to the description relating to the previous method embodiments.

Alternatively, the memory 1002 may be separate or integrated with the processor 1001.

When the memory 1002 is a device independent of the processor 1001, the database 100 may further include:

the bus 1003 connects the processor 1001 and the memory 1002.

In addition, embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above-mentioned various possible methods.

Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A fault analysis method based on a graph model is characterized by comprising the following steps:

performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomaly log sequence screened out by the anomaly detection mechanism;

and determining predicted fault equipment and predicted fault information according to the root cause equipment, the root cause fault and the graph model.

2. The method according to claim 1, wherein the performing fault analysis on the log sequence to be processed according to a preset graph model to determine a root cause device with a fault and a root cause fault of the root cause device comprises:

determining a first fault transfer diagram and a first equipment transfer diagram corresponding to the first equipment topological diagram according to the first equipment topological diagram, the fault transfer diagram and the equipment transfer diagram; the equipment transfer graph corresponds to all equipment in the preset network coverage range, and the fault transfer graph corresponds to the equipment transfer graph;

determining the root cause device and the root cause failure according to the first failover graph and the first device failover graph.

3. The method according to claim 1, wherein the preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence comprises:

extracting information of the first real-time log according to a preset key field, and determining the first real-time log after information extraction;

according to a preset filtering rule, filtering the first real-time log, and determining the first real-time log after filtering;

4. The method according to claim 3, wherein the performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism to determine a log sequence to be processed comprises:

5. The method of any of claims 1-4, wherein prior to obtaining the first real-time log, further comprising:

acquiring a historical log training set;

performing anomaly detection on the historical log sequence according to a preset anomaly detection mechanism, and determining a historical log sequence to be processed; the history log sequence to be processed is used for recording the abnormal history log sequence screened by the abnormal detection mechanism;

determining the equipment transfer graph according to the historical log sequence to be processed and the preset equipment topological graph;

and determining the fault transfer diagram according to the historical log sequence to be processed, the equipment topological diagram and the equipment transfer diagram.

6. A graph model-based fault analysis apparatus, comprising:

the second processing module is used for carrying out abnormity detection on the first log sequence according to a preset abnormity detection mechanism and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the log sequence containing the abnormal log level; the anomaly log level is determined by the anomaly detection mechanism;

7. The apparatus of claim 6, wherein the first determining module is specifically configured to:

8. A fault analysis platform, comprising:

a processor; and the number of the first and second groups,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the graph model based fault analysis method of any of claims 1-5 via execution of the executable instructions.

9. A storage medium on which a computer program is stored, the program, when executed by a processor, implementing the graph model-based fault analysis method of any one of claims 1 to 5.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the graph model based failure analysis method according to any one of claims 1 to 5.