CN114629776B

CN114629776B - Fault analysis method and device based on graph model

Info

Publication number: CN114629776B
Application number: CN202011453509.3A
Authority: CN
Inventors: 张勉知; 刘惜吾; 程亚锋; 叶晓斌; 陈孟尝; 曾昭才; 张园
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2023-05-30
Anticipated expiration: 2040-12-11
Also published as: CN114629776A

Abstract

The invention provides a fault analysis method and a fault analysis device based on a graph model, and the fault analysis method based on the graph model provided by the embodiment comprises the following steps: acquiring a first real-time log, and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence; performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed; performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation; and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model. By the fault analysis method based on the graph model, the fault occurrence details are accurately positioned, and a foundation is laid for fault root diagnosis and service quick recovery.

Description

Fault analysis method and device based on graph model

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a fault analysis method and apparatus based on a graph model.

Background

With the continuous expansion of network scale, more and more network devices run in different scenes, the network devices can generate system logs in the running process, the system logs are used for reflecting the real-time change conditions of the running states and the service states of the devices, fault points can be positioned through the system logs, when the number of the system logs is increased in a blowout mode, the mass system logs cannot rely on a traditional expert to construct an experience base for efficient and accurate fault diagnosis and analysis, and therefore hidden trouble investigation and fault positioning of the system logs are achieved by means of an artificial intelligence (Artificial Intelligence, AI) algorithm and become research hotspots in the field of mobile communication.

In the prior art, the fault analysis of the AI algorithm applied to the logs mainly aims at single network events, a complete framework is not formed to mine fault propagation relations in massive logs, for example, faults which occur at the same moment and in the intersection of a plurality of network event logs are difficult to solve in a timely and accurate manner.

Therefore, how to timely and accurately locate the fault problem of the intersection of multiple network event logs is a problem to be solved.

Disclosure of Invention

The invention provides a fault analysis method based on a graph model, which is used for realizing the accurate positioning of fault occurrence details and laying a foundation for fault root cause diagnosis and service quick recovery.

In a first aspect, the present invention provides a fault analysis method based on a graph model, including:

acquiring a first real-time log, preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, wherein the first real-time log is used for recording real-time change information of equipment operation state and service state;

performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomaly log sequence screened by the anomaly detection mechanism;

performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation;

and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model.

In one possible design, performing fault analysis on a log sequence to be processed according to a preset graph model, determining a root cause device with a fault and a root cause fault of the root cause device, including: in a second aspect, the present invention also provides a log detection device, including:

the graph model comprises a device topology graph, a failover graph and a device transfer graph;

determining a first equipment topological graph corresponding to the log sequence to be processed according to the log sequence to be processed and the equipment topological graph; the equipment topological graph corresponds to all equipment in a preset network coverage area;

determining a first fault transfer diagram and a first equipment transfer diagram corresponding to the first equipment topological diagram according to the first equipment topological diagram, the fault transfer diagram and the equipment transfer diagram; the equipment transfer diagram corresponds to all equipment in a preset network coverage area, and the fault transfer diagram corresponds to the equipment transfer diagram;

root cause devices and root cause faults are determined from the first failover graph and the first device failover graph.

In one possible design, preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, including:

information extraction is carried out on the first real-time log according to a preset key field, and the first real-time log after information extraction is determined;

according to a preset filtering rule, filtering the first real-time log, and determining the first real-time log after the filtering;

and dividing fault analysis domains and segmenting time sequences of equipment corresponding to the first real-time log according to a preset network planning rule and a clustering algorithm to obtain a first log sequence.

In one possible design, the detecting the abnormality of the first log sequence according to a preset abnormality detection mechanism, and determining the log sequence to be processed includes:

and carrying out key field matching processing on the first log sequence according to the preset abnormal log level, and determining a log sequence to be processed containing the abnormal log level.

In one possible design, before the first real-time log is obtained, the method further includes:

acquiring a history log training set;

preprocessing a history log in a history log training set according to a preset processing rule to obtain a history log sequence;

performing anomaly detection on the history log sequence according to a preset anomaly detection mechanism, and determining a history log sequence to be processed; the history log sequence to be processed is used for recording the abnormal history log sequence screened out by the abnormality detection mechanism;

determining a device transfer diagram according to the history log sequence to be processed and a preset device topological diagram;

and determining a fault transfer diagram according to the history log sequence to be processed, the equipment topological diagram and the equipment transfer diagram.

In a second aspect, the present invention provides a fault analysis apparatus based on a graph model, including:

the first processing module is used for acquiring a first real-time log, preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, and recording real-time change information of equipment operation state and service state by the first real-time log;

the second processing module is used for carrying out anomaly detection on the first log sequence according to a preset anomaly detection mechanism, determining a log sequence to be processed, and the log sequence to be processed is used for recording log sequences containing anomaly log levels; the exception log level is determined by an exception detection mechanism;

the first determining module is used for carrying out fault analysis on the log sequence to be processed according to a preset graph model, determining the root cause equipment with faults and the root cause faults of the root cause equipment, and the graph model is used for representing the topological relation and the transfer relation of the equipment;

and the second determining module is used for determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model.

In one possible design, the first determining module is specifically configured to:

In one possible design, the first processing module is configured to:

In one possible design, the second processing module is specifically configured to:

In one possible design, the first processing module is further configured to:

acquiring a history log training set;

In a third aspect, the present invention also provides a fault analysis platform, including:

a processor; the method comprises the steps of,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any of the graph model based fault analysis methods of the first aspect via execution of the executable instructions.

In a fourth aspect, an embodiment of the present invention further provides a storage medium having stored thereon a computer program, which when executed by a processor implements any one of the graph model-based fault analysis methods of the first aspect.

In a fifth aspect, an embodiment of the present invention further provides a computer program product, which includes a computer program, where the program when executed by a processor implements any one of the fault analysis methods based on the graph model in the first aspect.

The invention provides a fault analysis method and device based on a graph model, which are characterized in that a first real-time log is obtained, and is preprocessed according to a preset processing rule to obtain a first log sequence, wherein the first real-time log is used for recording real-time change information of equipment running state and service state; performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomaly log sequence screened by the anomaly detection mechanism; performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation; according to the root cause equipment, the root cause fault and the graph model, the prediction fault equipment and the prediction fault information are determined, so that the fault problem caused by the intersection of a plurality of network event logs can be timely and accurately positioned.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is an application scenario diagram of a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 2 is a flow chart of a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 3 is a flow chart illustrating log preprocessing in a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 4 is a schematic diagram of fault analysis domain partitioning in a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 5 is a log sequence segmentation schematic diagram of a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 6 is a schematic diagram of a device topology in a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating a device transition in a graph model-based fault analysis method according to an example embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating failover in a graph model-based failure analysis method according to an example embodiment of the present invention;

FIG. 9 is a schematic diagram of a graph model-based fault analysis apparatus according to an example embodiment of the present invention;

fig. 10 is a schematic structural diagram of a fault analysis platform according to an exemplary embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The following describes the technical scheme of the present invention and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 1 is an application scenario diagram of a fault analysis method based on a graph model according to an exemplary embodiment of the present invention, as shown in fig. 1, a history log 101 is subjected to log preprocessing 103 to obtain a history log sequence, then an anomaly history log sequence to be processed is determined through anomaly detection 104, and an inference model 105 is performed on the anomaly history log sequence, including performing fault correlation statistics on the anomaly history log sequences according to a preset device topology graph, that is, counting the probability of a relevant device that fails next when each device fails in the anomaly history log sequence, so as to obtain a device transfer graph 106, and determining a fault transfer graph 107 according to the anomaly history log sequence, the device topology graph and the device transfer graph 106; the device topology map, the device transition map 106, and the failover map 107 are set as map models. When fault location processing is performed on the real-time log 102, firstly, log preprocessing 103 is performed on the real-time log 102 to obtain a real-time log sequence, then, anomaly detection 104 is performed on the real-time log sequence to determine an anomaly log sequence to be processed, event separation 108 processing is performed on different key fields selected by the anomaly log sequence, root cause reasoning 109 is performed according to a preset graph model, and root cause equipment with faults and root cause faults of the root cause equipment are determined; root cause prediction 110 can be performed according to the root cause device, the root cause fault and the graph model, so that the fault device which is in fault next time and the fault cause information of the fault device can be predicted.

Fig. 2 is a flow chart of a graph model-based fault analysis method according to an exemplary embodiment of the present invention, as shown in fig. 2, where the graph model-based fault analysis method provided in the present embodiment includes:

step 201, a first real-time log is obtained, and is preprocessed according to a preset processing rule, so that a first log sequence is obtained, and the first real-time log is used for recording real-time change information of equipment running state and service state.

Specifically, fig. 3 is a schematic flow chart of log preprocessing in a fault analysis method based on a graph model according to an exemplary embodiment of the present invention, and as shown in fig. 3, key field extraction, data filtering and data sequence processing are performed on a first real-time log, and detailed processing procedures are described below.

Because the log formats of different manufacturers and different devices are different, the subsequent unified processing analysis is inconvenient, and therefore, key information in the log is required to be identified according to the log specification, a log structure is constructed, and the key information is converted into a general format. Acquiring a first real-time log, extracting information from the first real-time log according to a preset key field, and determining the first real-time log after information extraction; for example, taking a log of a certain manufacturer device as an example, the log structure after parsing is shown in table one.

List one

Time stamp

Host name

Module name

Log level

Information abstract

Log identification

Information counting

Detailed information

According to a preset filtering rule, filtering the first real-time log, and determining the first real-time log after the filtering; the filtering rules are formed by collecting expert experience in advance. For example, since the original log data amount is large, there is a certain amount of illegal data such as that the occurrence time of the alarm is an illegal value, that the information of the alarm source is undefined, etc.; meanwhile, some logs are irrelevant to the fault root cause needing to be analyzed, so that data cleaning is needed to be carried out on the original logs to filter interference items, noise and the like.

And then dividing fault analysis domains and segmenting time sequences of the equipment corresponding to the first real-time log according to a preset network planning rule and a clustering algorithm to obtain a first log sequence. Specifically, according to a preset network planning rule, related devices of which the topological relation of the devices belongs to the same minimum management domain are divided into a fault analysis domain, and logs of all the devices in the domain are ordered according to a fault occurrence time sequence so as to facilitate the follow-up excavation of fault propagation relations among the devices. For example, fig. 4 is a schematic diagram illustrating a fault analysis domain division in a fault analysis method based on a graph model according to an exemplary embodiment of the present invention; as shown in fig. 4, the aggregation device 1, the aggregation device 2 belong to a shared aggregation device, and the aggregation device 1, the aggregation device 2, and the access device 1, the access device 2, the access device 3, and the access device 4 constitute an access ring 1; the convergence device 1, the convergence device 2, the access device 5, the access device 6, the access device 7 and the access device 8 form an access ring 2; the access rings 1 and 2 have no cross-over and association except for the common convergence device 1 and the convergence device 2, so that when one device in one access ring fails, the failure of the single ring does not propagate to affect other rings, i.e. the access rings 1 and 2 are independent failure analysis domains, respectively.

And then a clustering algorithm, such as Density-based clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, DBSCAN), is adopted to derive the maximum Density connected log sample set from the Density reachable relation. The algorithm parameters such as the sample neighborhood distance threshold and the sample number threshold are continuously adjusted according to the evaluation parameters such as the contour coefficient, the clustering number and the actual segmentation effect, so that an optimal value is determined. For example, fig. 5 is a log sequence segmentation schematic diagram in a graph model-based fault analysis method according to an exemplary embodiment of the present invention; as shown in fig. 5, each dot in the graph may be considered as an abstraction of a log, the logs are arranged according to the time sequence of printing, and according to the DBSCAN algorithm, multiple logs in a circle are clustered, that is, the time sequence segmentation of multiple logs is realized, so that logs in different time periods are stripped, and the accuracy is achieved for the subsequent more accurate positioning of the root cause equipment and the root cause fault.

Step 202, performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomaly log sequence screened by the anomaly detection mechanism.

Specifically, the key field matching processing is performed on the first log sequence according to a preset abnormal log level, and a log sequence to be processed including the abnormal log level is determined. For example, different manufacturers have standardized definition on log levels of devices, and taking a manufacturer as an example, log levels 0 (average), 1 (Alert), 2 (Critical), and 3 (error) respectively represent extremely urgent Errors, errors that need to be immediately corrected, more serious Errors, and Errors. One characterization of equipment failure is to generate a large number of logs of higher levels, i.e., levels 0-3. Based on the method, the log level is selected as a key field for triggering abnormality, and when the first log sequence contains the log with higher level, the log sequence to be processed is determined to be the log sequence which is most likely to have faults.

Step 203, performing fault analysis on the log sequence to be processed according to a preset graph model, and determining the root cause equipment with faults and the root cause faults of the root cause equipment, wherein the graph model is used for representing the topological relation and the transfer relation of the equipment;

specifically, the graph model includes a device topology graph, a failover graph, and a device transfer graph; determining a first equipment topological graph corresponding to the log sequence to be processed according to the log sequence to be processed and the equipment topological graph; the equipment topological graph corresponds to all equipment in a preset network coverage area; determining a first fault transfer diagram and a first equipment transfer diagram corresponding to the first equipment topological diagram according to the first equipment topological diagram, the fault transfer diagram and the equipment transfer diagram; the equipment transfer diagram corresponds to all equipment in a preset network coverage area, and the fault transfer diagram corresponds to the equipment transfer diagram; root cause devices and root cause faults are determined from the first failover graph and the first device failover graph.

For example, fig. 6 is a schematic diagram of a device topology in a graph model-based fault analysis method according to an exemplary embodiment of the present invention, as shown in fig. 6, 7 devices, respectively, including a device a, a device B, a device C, a device D, a device E, a device F, and a device G, are on a first device topology corresponding to a log sequence to be processed, and a device transfer diagram corresponding to the topology of the 7 devices is shown in fig. 7, and fig. 7 is a schematic diagram of a device transfer in the graph model-based fault analysis method according to an exemplary embodiment of the present invention; the failover diagrams corresponding to the topology diagrams of the 7 pieces of equipment are shown in fig. 8, and fig. 8 is a failover schematic diagram in a graph model-based failure analysis method according to an example embodiment of the invention; the abnormal log set of the log sequence to be processed is { A: a, A: B, A: C, B: a, B: B, B: C, B: d, B: e, C: d, C: e }, namely 10 logs are generated in total by the network event, and the method comprises the following steps: device a failed a, B, C, device B failed a, B, C, d, e, and device C failed d, e. The 3 devices were subjected to fault analysis according to the failover diagram of fig. 8, and the probability values of each directed edge of the device a fault set { a, b, c } were compared. For example, the probability value of each directed edge of each fault is 0.5 for the a- > a edge, 0.1 for the a- > b edge, 0.2 for the a- > c edge, 0.1 for the a- > d edge, and 0.1 for the a- > e edge, respectively; b- > a is 0.1, b- > b is 0.2, b- > c is 0.4, b- > d is 0.1, b- > e is 0.2; 0.2 for c- > a, 0.1 for c- > b, 0.2 for c- > c, 0.3 for c- > d, 0.2 for c- > e; it can be seen that the directed edge with the highest probability value is a- > a, the root cause fault of device a is determined to be a. Similarly, the root cause fault a for device B and the root cause fault e for device C are determined. And analyzing the failure transfer relation of the devices according to the device transfer diagram of fig. 7, comparing probability values of the directed edges of the device sets { A, B and C }, and if the directed edge with the maximum probability value is A- > B, judging that the root device is the device A. In summary, the root cause log of the network event is { A: a }, that is, the root cause device is device A, the root cause fault is failure a, and the other 9 logs are associated logs.

In one possible design, the log sequence to be processed after the anomaly detection screening may be a log set including a plurality of events, where the plurality of events may respectively generate root logs and associated log sets corresponding to the events. The logs in the time slices are separated according to the characteristics of the events, so that each network event can be effectively and deeply analyzed, the root cause reasoning log of each event is further obtained, and the possible faults of each event are predicted. For example, for different types of event features in a network, different keywords are selected as indexes of log separation in the event to achieve more accurate localization of fault occurrence sources, such as for common network faults, the keywords typically select ports, internet protocol (Internet Protocol, IP), etc.

Step 204, according to the root cause device, the root cause fault and the graph model, the predicted fault device and the predicted fault information are determined.

Specifically, for example, on the basis of the root cause analysis result { a: a }, a directed edge of the maximum probability value that the device a points to other devices is obtained by combining with the device transfer graph, when the pointing device is the device B, the directed edge of the maximum probability value that the fault a points to other faults is determined according to the fault transfer graph, and when the pointing fault is the fault B, the predicted fault device is the device B and the predicted fault information is the fault B.

The processing method of steps 201-204 can more accurately locate specific information of faults by preprocessing the real-time log, detecting abnormality, and then carrying out fault root cause reasoning and fault prediction on the log according to a graph model based on the topological relation and the transfer relation of the equipment.

The graph model is obtained by training a history log, and specifically, a history log training set is obtained; preprocessing a history log in a history log training set according to a preset processing rule to obtain a history log sequence; performing anomaly detection on the history log sequence according to a preset anomaly detection mechanism, and determining a history log sequence to be processed; the history log sequence to be processed is used for recording the abnormal history log sequence screened out by the abnormality detection mechanism; determining a device transfer diagram according to the history log sequence to be processed and a preset device topological diagram; and determining a fault transfer diagram according to the history log sequence to be processed, the equipment topological diagram and the equipment transfer diagram. The preprocessing and anomaly detection processes refer to the processing methods of steps 201-202. After the history log sequence to be processed is determined, the probability of the next fault occurring in the related equipment when each equipment fails is counted, a relation diagram among the fault equipment can be obtained, and each directed edge in the diagram corresponds to a corresponding probability value. Taking the device A, B in fig. 6 and 7 as an example, the edge of a pointing to B represents the historical statistical probability value that any failure of device a would result in any failure of device B; the edge of a pointing to a itself represents a historical statistical probability value that device a has a certain fault, which may lead to other faults of device a. And determining a fault transfer diagram in combination with fig. 8, namely, correspondingly considering the fault transfer relation between two devices in the device transfer diagram of fig. 7 when considering whether the two faults are connected. For example, the a device is topologically connected to only the B and C devices, and then the next adjacent fault to the a device is only a fault in the (a|b|c) three devices. Taking faults a, B as an example, the edge of a pointing to B represents a historical statistical probability value that all devices (i.e., a/B/C/D/E/F/G) have a fault, which would result in B faults in other devices.

For example, when a fault a occurs, the historical frequency of the next fault B is counted, and a historical statistical probability value is obtained according to the frequency. The historical frequencies that cause other faults when fault a occurs are respectively: the number of faults A-B is n, the number of faults A-C is m, and the number of faults A-D is k; the historical statistical probability values pointed by the directed edges of the faults a corresponding to the 3 cases are as follows:

and similarly, obtaining a historical statistical probability value pointed by each directed edge of each device by counting the transition frequency among the devices when the fault occurs.

Acquiring a first real-time log by the method in the steps 201-204, and preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, wherein the first real-time log is used for recording real-time change information of equipment running state and service state; performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed, wherein the log sequence to be processed is used for recording the anomaly log sequence screened by the anomaly detection mechanism; performing fault analysis on the log sequence to be processed according to a preset graph model, and determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation; and determining the predicted fault equipment and the predicted fault information according to the root cause equipment, the root cause fault and the graph model. Therefore, the accurate positioning of the fault occurrence details is realized, and a foundation is laid for fault root cause diagnosis and service quick recovery.

FIG. 9 is a schematic diagram of a graph model-based fault analysis apparatus according to an example embodiment of the present invention; as shown in fig. 9, the fault analysis apparatus 90 based on the graph model provided in the present embodiment includes:

the first processing module 901 is configured to obtain a first real-time log, and pre-process the first real-time log according to a preset processing rule to obtain a first log sequence, where the first real-time log is used for recording real-time change information of an equipment running state and a service state;

the second processing module 902 is configured to perform anomaly detection on the first log sequence according to a preset anomaly detection mechanism, determine a log sequence to be processed, where the log sequence to be processed is used to record a log sequence including an anomaly log level; the exception log level is determined by an exception detection mechanism;

the first determining module 903 is configured to perform fault analysis on the log sequence to be processed according to a preset graph model, determine a root cause device with a fault and a root cause fault of the root cause device, where the graph model is used to represent a device topology relationship and a transfer relationship;

a second determining module 904, configured to determine a predicted failure device and predicted failure information according to the root cause device, the root cause failure, and the graph model.

In one possible design, the first determining module 903 is specifically configured to:

In one possible design, the first processing module 901 is specifically configured to:

In one possible design, the second processing module 902 is specifically configured to:

In one possible design, the first processing module 901 is further configured to:

acquiring a history log training set;

Fig. 10 is a schematic structural diagram of a fault analysis platform according to an exemplary embodiment of the present invention. As shown in fig. 10, the fault analysis platform 10 provided in this embodiment includes:

a processor 1001; the method comprises the steps of,

a memory 1002 for storing executable instructions of the processor, which may also be a flash (flash memory);

wherein the processor 1001 is configured to perform the steps of the above-described method via execution of executable instructions. Reference may be made in particular to the description of the embodiments of the method described above.

Alternatively, the memory 1002 may be separate or integrated with the processor 1001.

When the memory 1002 is a device separate from the processor 1001, the database 100 may further include:

bus 1003 is used to connect processor 1001 and memory 1002.

In addition, the embodiment of the application further provides a computer-readable storage medium, in which computer-executable instructions are stored, when the at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above possible methods.

Among them, computer-readable media include computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a user device. The processor and the storage medium may reside as discrete components in a communication device.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A graph model-based fault analysis method, comprising:

acquiring a first real-time log, preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, wherein the first real-time log is used for recording real-time change information of equipment running state and service state;

determining predicted fault equipment and predicted fault information according to the root cause equipment, the root cause fault and the graph model;

the fault analysis is performed on the log sequence to be processed according to a preset graph model, and the determination of the root cause equipment with fault and the root cause fault of the root cause equipment comprises the following steps:

the graph model comprises a device topology graph, a fault transfer graph and a device transfer graph;

determining a first equipment topological graph corresponding to the log sequence to be processed according to the log sequence to be processed and the equipment topological graph; the device topological graph corresponds to all devices in a preset network coverage area;

determining a first failover graph and a first equipment transfer graph corresponding to the first equipment topological graph according to the first equipment topological graph, the failover graph and the equipment transfer graph; the equipment transfer diagram corresponds to all equipment within the preset network coverage range, and the fault transfer diagram corresponds to the equipment transfer diagram;

determining the root cause device and the root cause fault according to the first failover graph and the first device failover graph;

the method for determining root cause equipment and root cause faults according to the first fault transfer diagram and the first equipment transfer diagram specifically comprises the following steps:

determining the root cause fault of any one device according to the fault set of the any one device and the probability value of each directed edge of each fault in the first fault transfer graph aiming at any one device contained in the log sequence to be processed;

according to the probability values of the directed edges of the devices contained in the log sequence to be processed and the devices in the first device transfer diagram, determining the device with the maximum probability value directed to the directed edge as the root cause device;

the first device transfer graph is obtained by counting the probability that the next fault occurs in the related device when each device fails.

2. The method of claim 1, wherein preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence comprises:

and dividing fault analysis domains and segmenting time sequences of the equipment corresponding to the first real-time log according to a preset network planning rule and a clustering algorithm to obtain a first log sequence.

3. The method according to claim 2, wherein the performing anomaly detection on the first log sequence according to a preset anomaly detection mechanism, and determining a log sequence to be processed includes:

and carrying out key field matching processing on the first log sequence according to a preset abnormal log level, and determining a log sequence to be processed containing the abnormal log level.

4. A method according to any one of claims 1-3, wherein prior to said obtaining the first real-time log, further comprising:

acquiring a history log training set;

preprocessing the history logs in the history log training set according to a preset processing rule to obtain a history log sequence;

performing anomaly detection on the history log sequence according to a preset anomaly detection mechanism, and determining a history log sequence to be processed; the history log sequence to be processed is used for recording the abnormal history log sequence screened out by the abnormal detection mechanism;

determining the equipment transfer diagram according to the history log sequence to be processed and the preset equipment topological diagram;

and determining the fault transfer diagram according to the history log sequence to be processed, the equipment topological diagram and the equipment transfer diagram.

5. A graph model-based fault analysis apparatus, comprising:

the first processing module is used for acquiring a first real-time log, preprocessing the first real-time log according to a preset processing rule to obtain a first log sequence, and recording real-time change information of equipment running state and service state;

the second processing module is used for carrying out anomaly detection on the first log sequence according to a preset anomaly detection mechanism, determining a log sequence to be processed, and recording the log sequence containing the anomaly log level; the anomaly log level is determined by the anomaly detection mechanism;

the first determining module is used for carrying out fault analysis on the log sequence to be processed according to a preset graph model, determining root cause equipment with faults and root cause faults of the root cause equipment, wherein the graph model is used for representing equipment topological relation and transfer relation;

the second determining module is used for determining prediction fault equipment and prediction fault information according to the root cause equipment, the root cause fault and the graph model; the first determining module is specifically configured to:

the first determining module is further configured to:

6. The apparatus of claim 5, wherein the first determining module is further configured to:

7. The apparatus of claim 6, wherein the second determining module is specifically configured to:

8. A fault analysis platform, comprising:

a processor; the method comprises the steps of,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the graph model-based fault analysis method of any one of claims 1 to 4 via execution of the executable instructions.

9. A storage medium having stored thereon a computer program, which when executed by a processor implements the graph model based fault analysis method of any one of claims 1 to 4.