CN111585799A - Network fault prediction model establishing method and device - Google Patents

Network fault prediction model establishing method and device Download PDF

Info

Publication number
CN111585799A
CN111585799A CN202010354291.XA CN202010354291A CN111585799A CN 111585799 A CN111585799 A CN 111585799A CN 202010354291 A CN202010354291 A CN 202010354291A CN 111585799 A CN111585799 A CN 111585799A
Authority
CN
China
Prior art keywords
network
fault
log
log data
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010354291.XA
Other languages
Chinese (zh)
Inventor
杨印州
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou DPTech Technologies Co Ltd
Original Assignee
Hangzhou DPTech Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou DPTech Technologies Co Ltd filed Critical Hangzhou DPTech Technologies Co Ltd
Priority to CN202010354291.XA priority Critical patent/CN111585799A/en
Publication of CN111585799A publication Critical patent/CN111585799A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0622Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour

Abstract

The disclosure relates to a network fault prediction model establishing method and device, electronic equipment and a computer readable medium. The method comprises the following steps: acquiring a plurality of log information of a plurality of network devices in a network system; preprocessing the plurality of log information based on the alarm information characteristics in the log information to generate a plurality of log data; dividing the plurality of log data into a plurality of log data sets according to a preset time window; and establishing a fault prediction model by a classification prediction method based on the plurality of log data sets, wherein the fault prediction model is used for predicting the overall network fault or the fault equipment. The network fault prediction model establishing method, the network fault prediction model establishing device, the electronic equipment and the computer readable medium can improve the reliability of a network system, reduce the loss caused by network equipment faults and improve the efficiency of network management.

Description

Network fault prediction model establishing method and device
Technical Field
The disclosure relates to the field of computer information processing, and in particular relates to a network fault prediction model establishing method and device, electronic equipment and a computer readable medium.
Background
With the popularization and broadband speed of the internet and the popularization and development of enterprise network technology and the application of the internet, a single computer system cannot meet the current various and ubiquitous network application requirements at all in the past. The new network technologies such as cloud computing and internet of things are gradually applied and popularized, and the scale and the structure of a computer network are gradually large and complex. As more and more devices are accessed in the network, failures inevitably occur.
The network device can be divided into hardware and software as a whole, and the corresponding network faults can also be divided into two categories, namely hardware faults and software faults. The hardware faults comprise line faults such as electromagnetic interference, port faults such as loose plugs, faults of hubs or routers, physical faults of a host computer and the like; software failures include router logic failures such as configuration errors, critical processes or ports being shut down, host logic failures such as lack of a network card driver, etc. Other types of failures are of course also included. The occurrence of these software and hardware failures seriously affects the reliability of the network system, and brings inconvenience to people's daily life and work. In some special environments, the reliability of the network system is very important, such as a traffic management system, an aircraft navigation system, a military weapon system, and the like. During the execution of critical tasks, a slight network failure may cause significant loss, such as loss of important information or even failure of a task. Therefore, it is urgent to reduce the influence of such network failures on the system reliability to some extent.
Therefore, a new network failure prediction model building method, device, electronic device and computer readable medium are needed.
The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for building a network failure prediction model, an electronic device, and a computer readable medium, which can improve the reliability of a network system, reduce loss caused by network device failure, and improve the efficiency of network management.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, a method for building a network fault prediction model is provided, where the method includes: acquiring a plurality of log information of a plurality of network devices in a network system; preprocessing the plurality of log information based on the alarm information characteristics in the log information to generate a plurality of log data; dividing the plurality of log data into a plurality of log data sets according to a preset time window; and establishing a fault prediction model by a classification prediction method based on the plurality of log data sets, wherein the fault prediction model is used for predicting the overall network fault or the fault equipment.
In an exemplary embodiment of the present disclosure, further comprising: acquiring real-time log information in a network system; and inputting the real-time log information into the fault prediction model, and outputting the integral fault probability or the fault equipment identification of the network.
In an exemplary embodiment of the present disclosure, preprocessing the plurality of log information based on alarm information features in the log information to generate a plurality of log data includes: filtering out derived alarm information in the plurality of log information; and/or filtering the alarm information of which the time interval is smaller than a preset value in the plurality of log information; and/or filtering out repeated alarm information in the plurality of log information based on a time threshold.
In an exemplary embodiment of the present disclosure, filtering out repeated alarm information in the plurality of log information based on a time threshold includes: sorting the alarm information according to the occurrence time and the category; judging whether the alarm information of one category is repeated information according to at least one time threshold value based on the sorting; and deleting the alarm information when the preset type of alarm information is the repeated information.
In an exemplary embodiment of the present disclosure, dividing the plurality of log data into a plurality of log data sets according to a preset time window includes: determining the ranges of an observation time window, a prediction time window and a current time window; and dividing the plurality of log data into an observation time window log data set, a prediction time window log data set and a current time window log data set based on the observation time window, the prediction time window and the current time window.
In an exemplary embodiment of the present disclosure, establishing a fault prediction model by a classification prediction method based on the plurality of log data sets includes: establishing a network fault prediction model for predicting the overall network fault through a classification prediction method based on the plurality of log data sets; and/or establishing an equipment fault prediction model for predicting the fault equipment by a classification prediction method based on the plurality of log data sets.
In an exemplary embodiment of the present disclosure, building a network failure prediction model for predicting a network overall failure by a classification prediction method based on the plurality of log data sets includes: extracting network alarm information related to network faults from an observation time window log data set; extracting network fault information related to the network fault from the prediction time window log data set; and establishing the network fault prediction model based on the network alarm information, the network fault information and a classification prediction method.
In an exemplary embodiment of the present disclosure, building an equipment failure prediction model for predicting a failed equipment by a classification prediction method based on the plurality of log data sets includes: extracting equipment alarm information related to equipment faults from an observation time window log data set; extracting equipment fault information related to equipment faults from a prediction time window log data set; and establishing the equipment fault prediction model based on the equipment alarm information, the equipment fault information and the classification prediction method.
In an exemplary embodiment of the present disclosure, the classification prediction method includes: a rule-based repeat increment pruning classification algorithm; and/or a bayesian network classification algorithm; and/or a random forest algorithm based on decision trees.
According to an aspect of the present disclosure, a network failure prediction model establishing apparatus is provided, the apparatus including: the system comprises a log module, a log module and a log module, wherein the log module is used for acquiring a plurality of log information of a plurality of network devices in a network system; the processing module is used for preprocessing the log information based on the alarm information characteristics in the log information to generate a plurality of log data; the collection module is used for dividing the plurality of log data into a plurality of log data collections according to a preset time window; and the model module is used for establishing a fault prediction model through a classification prediction method based on the plurality of log data sets, and the fault prediction model is used for predicting the overall network fault or the fault equipment.
In an exemplary embodiment of the present disclosure, further comprising: the real-time module is used for acquiring real-time log information in the network system; and the prediction module is used for inputting the real-time log information into the fault prediction model and outputting the integral network fault probability or the fault equipment identification.
According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.
According to the network fault prediction model establishing method, the network fault prediction model establishing device, the electronic equipment and the computer readable medium, the plurality of log information are preprocessed based on the alarm information characteristics in the log information to generate a plurality of log data; dividing the plurality of log data into a plurality of log data sets according to a preset time window; and establishing a fault prediction model based on the plurality of log data sets through a classification prediction method, wherein the fault prediction model is used for predicting the whole network fault or fault equipment, so that the reliability of a network system can be improved, the loss caused by the network equipment fault can be reduced, and the network management efficiency can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.
Fig. 1 is a system block diagram illustrating a network failure prediction model building method and apparatus according to an exemplary embodiment.
FIG. 2 is a flow chart illustrating a method of network fault prediction model building in accordance with an exemplary embodiment.
Fig. 3 is a schematic diagram illustrating a network failure prediction model building method according to an example embodiment.
Fig. 4 is a flow chart illustrating a method of network fault prediction model establishment according to another exemplary embodiment.
Fig. 5 is a flow chart illustrating a method of network fault prediction model establishment according to another exemplary embodiment.
Fig. 6 is a block diagram illustrating a network failure prediction model creation apparatus in accordance with an example embodiment.
FIG. 7 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 8 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.
Fig. 1 is a system block diagram illustrating a network failure prediction model building method, apparatus, electronic device, and computer readable medium according to an example embodiment.
As shown in fig. 1, system architecture 10 may include network devices 101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between network devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
Network devices 101, 102, 103 may interact with server 105 over network 104 to receive or transmit log information and the like. Various data transceiving applications may be installed on the network devices 101, 102, 103. A large amount of log information is generated during the operation of the network device. The log information may include: run logs, fault logs, debug logs, and the like.
When the network equipment fails, log analysis is an important component of system failure analysis. A system administrator is required to collect and summarize the device analysis operation log, and through the analysis of the log, the operation of a user can be effectively monitored, external attacks can be detected, the operation bottleneck of the system can be found, and faults in the system can be diagnosed.
The server 105 may be a server that provides various services, such as a background management server that analyzes log information generated by the network devices 101, 102, 103. The server 105 may, for example, obtain a plurality of log information for a plurality of network devices in the network system; the server 105 may pre-process the plurality of log information to generate a plurality of log data, for example, based on alarm information characteristics in the log information; the server 105 may divide the plurality of log data into a plurality of log data sets according to a preset time window, for example; server 105 may build a fault prediction model for predicting a network overall fault or a faulty device, for example, by a classification prediction method based on the plurality of log data sets.
The server 105 may also, for example, obtain real-time log information in the network system; and inputting the real-time log information into the fault prediction model, and outputting the integral fault probability or the fault equipment identification of the network.
The server 105 may be a physical server, or may be composed of a plurality of servers, for example, it should be noted that the network failure prediction model establishment method provided by the embodiment of the present disclosure may be executed by the server 105, and accordingly, the network failure prediction model establishment apparatus may be disposed in the server 105.
FIG. 2 is a flow chart illustrating a method of network fault prediction model building in accordance with an exemplary embodiment. The network failure prediction model building method 20 includes at least steps S202 to S208.
As shown in fig. 2, in S202, a plurality of log information of a plurality of network devices in the network system is acquired.
In S204, the plurality of log information are preprocessed based on the alarm information features in the log information to generate a plurality of log data. Can be for example: filtering out derived alarm information in the plurality of log information; and/or filtering the alarm information of which the time interval is smaller than a preset value in the plurality of log information; and/or filtering out repeated alarm information in the plurality of log information based on a time threshold.
The alarm information in the log information mainly comprises the following components:
grade: the emergency degree of the alarm information is represented, and the emergency degree is as follows in sequence from low to high: prompt, secondary, important, urgent, in the study herein, the goal of failure prediction is to predict the occurrence of an urgent level alarm.
Name: describing the specific types of the alarm information, mainly comprising: link down, OSPF interface state change, OSPF neighbor state change, device offline, etc.
An alarm source: the specific device numbers indicating the alarm information are respectively device 0, device 1, device 86 and the like starting from 0. In the process of fault prediction, if specific equipment which is possible to have an emergency level fault needs to be predicted, the prediction of the attribute value is completed.
Network element type: the models of the devices for sending alarm information include S5300S 5328C-EI, S9300S9303, etc. There may be some correlation between the type of network device and the failure, and a certain type of device may often experience a certain type of failure.
Positioning information: and describing the position information of the failed equipment or interface, such as 8 interface name GigabitEthernet0/0 and the like.
The occurrence time is as follows: indicating the time to record the alarm message to the nearest second, e.g., 01/29/201308:29: 04.
Based on the above data, the statistical distribution of different levels of alarms, the derived alarms and the root alarms, the flash alarms, and the change rule of the emergency level alarms with time are analyzed, and then the plurality of log information are preprocessed to generate a plurality of log data, which refers to the embodiment shown in fig. 4.
In S206, the plurality of log data are divided into a plurality of log data sets according to a preset time window. The method comprises the following steps: determining the ranges of an observation time window, a prediction time window and a current time window; and dividing the plurality of log data into an observation time window log data set, a prediction time window log data set and a current time window log data set based on the observation time window, the observation time window and the current time window.
As shown in fig. 3, in the embodiment of the present disclosure, the time axis may be divided into time windows (which may be, for example, 1 hour) with a certain size, and the goal of fault prediction is to determine whether a fault event occurs within the predicted time window, and if the fault prediction is for a specific device, a device that may have a fault in the predicted time window needs to be predicted.
The definition of the time window can be as follows: the size of the unit time window is delta; the prediction time window is a unit time window which needs to predict whether a fault event will occur or not, and the window size is delta; the current time window refers to a unit time window before the predicted time window, and the window size is delta; the observation time window is n unit time windows before the prediction time window, including the current time window, and the size of the observation time window is n; the sample window is a smaller division of the unit time window, and the window size is that the number of the sample windows in the unit time window is delta/.
And dividing the log data into an observation time window log data set, a prediction time window log data set and a current time window log data set based on the time window.
In S208, a fault prediction model is established by a classification prediction method based on the plurality of log data sets, and the fault prediction model is used for predicting the network overall fault or the faulty equipment. The method comprises the following steps: establishing a network fault prediction model for predicting the overall network fault through a classification prediction method based on the plurality of log data sets; and/or establishing an equipment fault prediction model for predicting the fault equipment by a classification prediction method based on the plurality of log data sets.
In the process of fault prediction, if the alarm information in the time window is used as the sample characteristic and the number of the prediction result whether the fault occurs or the specific equipment is used as the class label according to the established prediction target, namely whether the fault occurs or the specific equipment which fails is predicted, the process of fault prediction can be equivalent to the classification problem in data mining. Since the classification prediction method has been widely used and there are many mature classification prediction algorithms with excellent performance, a fault prediction model can be established by a classification prediction technique. The classification prediction method can comprise the following steps: a rule-based repeat increment pruning classification algorithm; and/or a bayesian network classification algorithm; and/or a random forest algorithm based on decision trees.
In one embodiment, further comprising: acquiring real-time log information in a network system; and inputting the real-time log information into the fault prediction model, and outputting the integral fault probability or the fault equipment identification of the network. After the fault prediction model is successfully established, the real-time log information can be input into the fault prediction model to generate a fault prediction result.
According to the network fault prediction model establishing method, a plurality of log information of a plurality of network devices in a network system is obtained; preprocessing the plurality of log information based on the alarm information characteristics in the log information to generate a plurality of log data; dividing the plurality of log data into a plurality of log data sets according to a preset time window; and establishing a fault prediction model based on the plurality of log data sets through a classification prediction method, wherein the fault prediction model is used for predicting the whole network fault or fault equipment, so that the reliability of a network system can be improved, the loss caused by the network equipment fault can be reduced, and the network management efficiency can be improved.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Fig. 4 is a flow chart illustrating a method of network fault prediction model establishment according to another exemplary embodiment. The flow 40 shown in fig. 4 is a detailed description of S204 "preprocessing the plurality of log information based on the alarm information features in the log information to generate a plurality of log data" in the flow shown in fig. 2.
The inventor of the present disclosure analyzes the statistical distribution of different levels of alarms, the change rules of derivative alarms and root alarms, flash alarms and emergency level alarms over time, and the analysis results are summarized as follows:
the method is characterized in that statistics is carried out through alarms in different levels, and the number of the alarms in the prompt level is very large, and most of the prompt alarms can be filtered because the emergency degree is low and the influence on the network environment is small.
Secondly, by analyzing the relationship between the derived alarm and the root alarm, the derived alarm generally appears immediately after the root alarm and the occurrence time of the derived alarm and the root alarm are basically the same, probably due to the structure of hardware equipment, and the derived alarm can be filtered in the log filtering process.
And thirdly, by carrying out statistical analysis on the interval between the fault occurrence time and the clearing time, the fact that a large number of alarms with clearing intervals within 21 seconds exist is found, the alarms are regarded as flash alarms, and the alarms can be filtered in the log filtering process.
And fourthly, by analyzing the change rule of the emergency level alarm along with time, certain alarms such as link disconnection and equipment offline have certain time correlation and peak values appear at certain moments, so that time characteristics can be added in the process of establishing a fault prediction model, and the performance of the model is improved.
Based on the above analysis, log information is filtered as follows:
as shown in fig. 4, in S402, the derived alarm information in the plurality of log information is filtered. According to the relation between the derived alarm and the root alarm, the derived alarm in the alarm log can be deleted firstly in the log filtering process.
In S404, alarm information in the plurality of log information, in which the time interval is smaller than a preset value, is filtered. Setting a time interval threshold Trp to be 22 seconds according to the characteristics of the flash alarm, then calculating the time interval t between the occurrence time and the clearing time of each alarm by traversing the alarm log, if t is less than Trp, deleting the alarm record, otherwise, keeping the record
In S406, repeated alarm information in the plurality of log information is filtered out based on a time threshold. The method comprises the following steps: sorting the alarm information according to the occurrence time and the category; judging whether the alarm information of one category is repeated information according to at least one time threshold value based on the sorting; and deleting the alarm information when the preset type of alarm information is the repeated information.
Other types of redundant data still exist in the alarm log, for example, the maintenance time interval of the alarm is greater than the threshold value Trp, so that the alarm will not be filtered in S404, but some types of alarms may occur repeatedly, and these alarms occurring repeatedly in a short time may be regarded as redundant alarms, and may be filtered by an alarm log filtering algorithm based on the time interval.
The alarm log filtering algorithm based on the time interval is as follows:
different levels of alarm records may be filtered by setting different time thresholds. Specifically, four time thresholds T1, T2, T3 and T4 may be set from low to high according to the degree of urgency to filter the alarm records of the prompt, secondary, important and urgent levels. The algorithm is described in detail as follows:
initializing a data structure X for storing the latest occurrence time of various types of alarms of an alarm source N, and sequencing unfiltered alarm records (SourceAlerts) according to the alarm source and the occurrence time;
traversing each record in the SourceAlerts, and assuming that the current record is x;
judging whether the warning source corresponding to the warning record X is the same as the warning source N, if so, continuing, otherwise, updating the warning source N corresponding to the warning record X, emptying the warning source X, and skipping to the step II;
judging whether the record of the same type (name and positioning information) as the record X exists in the X, if so, continuing, otherwise, adding the record X into the X, and skipping to the step II;
fig. 5 is a flow chart illustrating a method of network fault prediction model establishment according to another exemplary embodiment. The process 50 shown in fig. 5 is a detailed description of S208 "building a fault prediction model by a classification prediction method based on the plurality of log data sets" in the process shown in fig. 2.
As shown in fig. 5, in S502, a failure prediction model is established by a classification prediction method based on the plurality of log data sets. The classification prediction method comprises the following steps: a rule-based repeat increment pruning classification algorithm; and/or a bayesian network classification algorithm; and/or a random forest algorithm based on decision trees. More specifically, the classification prediction algorithm may be: a fault prediction model is built based on a regular repeated incremental pruning (RIPPER) algorithm, a Bayes Net (Bayes Net) based on probability theory, and a Random Forest (Random Forest) algorithm based on decision tree classification.
The significance of the fault prediction is to predict the occurrence of the fault in advance, and then prevent the occurrence of the fault or reduce the loss caused by the occurrence of the fault to the maximum extent through some preventive measures such as task scheduling and the like. The method combines the characteristics of the network environment and the network equipment alarm log to establish two prediction targets: failure prediction for the network as a whole, and failure prediction for specific devices.
The invention also defines the following characteristic extraction rules for the overall network fault prediction model: the first type of feature: the current time window can represent the current operation state of the network system, and is closest to the prediction time window, so that alarm events of various levels and various types in the current time window are counted as characteristic items.
The second kind of characteristics: in consideration of the interconnectivity and openness of the network system, a certain degree of association relationship exists between network faults, which is shown in an alarm log, namely the association between alarm events, so that the quantity of alarm events of each level and each type in an observation time window is counted as a feature item.
The third kind of characteristics: the unit time window (Δ) is divided into sample windows () with smaller time intervals, for example: the size is 1 hour, which can be divided into 4 sample windows of 15 minutes.
The fourth kind of characteristics: and (3) statistical distribution conditions of alarm events of all levels and alarm events of all types in all sample windows in the whole observation time window.
The fifth kind of characteristics: considering that if no fault event occurs in the network for a long time, the probability of the impending fault event is relatively high, so the number of unit time windows spaced between the current time window and the last fault time window is extracted as a feature item:
a sixth feature: according to the statistics of 24-hour change rule of alarm record in chapter III, the time factor also has a certain influence on the occurrence of fault, so that the time corresponding to the midpoint of the prediction time window is selected
In S504, a network failure prediction model for predicting the overall network failure is established by a classification prediction method based on the plurality of log data sets. The method comprises the following steps: extracting network alarm information related to network faults from an observation time window log data set; extracting network fault information related to the network fault from the prediction time window log data set; and establishing the network fault prediction model based on the network alarm information, the network fault information and a classification prediction method.
The overall network fault prediction is to predict whether the whole network system will have faults within a next period of time, and once the faults are predicted, the loss caused by the faults can be reduced through data backup, task scheduling, or restarting of core equipment and the like, so that the reliability of the network system is improved to a certain extent. The prediction method has the significance of improving the reliability of the network system and improving the efficiency of network management so as to save resources, and the prediction method only needs to predict whether the network system has a fault or not, so that the prediction can be carried out by establishing a fault prediction model.
In S506, an equipment failure prediction model for predicting the failed equipment is established by a classification prediction method based on the plurality of log data sets. The method comprises the following steps: extracting equipment alarm information related to equipment faults from an observation time window log data set; extracting equipment fault information related to equipment faults from a prediction time window log data set; and establishing the equipment fault prediction model based on the equipment alarm information, the equipment fault information and the classification prediction method. .
The prediction of a failure for a particular device is a prediction of the particular device that is likely to fail within a subsequent period of time. If the device which is likely to have a fault can be accurately predicted, the device can be subjected to security check or restart and other operations in advance, so that the occurrence of the fault can be avoided to a certain extent, and the reliability of the network system can be improved to a greater extent. The significance of the prediction mode is more obvious than that of the prediction for the whole network; however, because the scale of the network system is often large, and the number of the network devices is large, in this case, how to establish the fault prediction model can accurately perform fault prediction on a large number of network devices will be very challenging.
The idea of classification prediction in data mining can be applied, and a proper classification algorithm is selected to learn according to a training set, so that a classification-based fault prediction model is established to realize fault prediction. In the classification prediction process, the most critical is the selection of sample characteristics and a classification algorithm. The classification can be effectively carried out through a group of characteristics with higher classification degree, and the difficulty of selecting the characteristics with high classification degree is the difficulty of characteristic selection; the classification algorithm is the core of the classifier, and a proper classification algorithm not only has better classification effect, but also has better efficiency in the aspects of classification time and resource consumption
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
Fig. 6 is a block diagram illustrating a network failure prediction model creation apparatus according to another exemplary embodiment. As shown in fig. 6, the network failure prediction model creation device 60 includes: a log module 602, a processing module 604, an aggregation module 606, and a model module 608.
The log module 602 is configured to obtain a plurality of log information of a plurality of network devices in the network system;
the processing module 604 is configured to pre-process the log information based on alarm information characteristics in the log information to generate a plurality of log data; the processing module 604 is further configured to filter out derived alarm information in the plurality of log information; and/or filtering the alarm information of which the time interval is smaller than a preset value in the plurality of log information; and/or filtering out repeated alarm information in the plurality of log information based on a time threshold.
The aggregation module 606 is configured to divide the plurality of log data into a plurality of log data aggregates according to a preset time window; the aggregation module 606 is further configured to determine ranges of the observation time window, the prediction time window, and the current time window; and dividing the plurality of log data into an observation time window log data set, a prediction time window log data set and a current time window log data set based on the observation time window, the observation time window and the current time window.
The model module 608 is configured to build a fault prediction model by a classification prediction method based on the plurality of log data sets, where the fault prediction model is used to predict a network overall fault or a faulty device. The model module 608 is further configured to establish a network failure prediction model for predicting a network overall failure through a classification prediction method based on the plurality of log data sets; and/or establishing an equipment fault prediction model for predicting the fault equipment by a classification prediction method based on the plurality of log data sets.
The network failure prediction model creation means 60 may further include: the real-time module is used for acquiring real-time log information in the network system; and the prediction module is used for inputting the real-time log information into the fault prediction model and outputting the integral network fault probability or the fault equipment identification.
According to the network fault prediction model establishing device disclosed by the invention, the plurality of log information are preprocessed based on the alarm information characteristics in the log information to generate a plurality of log data; dividing the plurality of log data into a plurality of log data sets according to a preset time window; and establishing a fault prediction model based on the plurality of log data sets through a classification prediction method, wherein the fault prediction model is used for predicting the whole network fault or fault equipment, so that the reliability of a network system can be improved, the loss caused by the network equipment fault can be reduced, and the network management efficiency can be improved.
FIG. 7 is a block diagram illustrating an electronic device in accordance with an example embodiment.
An electronic device 700 according to this embodiment of the disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 that connects the various system components (including the memory unit 720 and the processing unit 710), a display unit 740, and the like.
Wherein the storage unit stores program codes executable by the processing unit 710 to cause the processing unit 710 to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 710 may perform the steps as shown in fig. 2, 3, 4.
The memory unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.
The memory unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 700 may also communicate with one or more external devices 700' (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. The network adapter 760 may communicate with other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 8, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: acquiring a plurality of log information of a plurality of network devices in a network system; preprocessing the plurality of log information based on the alarm information characteristics in the log information to generate a plurality of log data; dividing the plurality of log data into a plurality of log data sets according to a preset time window; and establishing a fault prediction model by a classification prediction method based on the plurality of log data sets, wherein the fault prediction model is used for predicting the overall network fault or the fault equipment.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (11)

1. A network fault prediction model building method is characterized by comprising the following steps:
acquiring a plurality of log information of a plurality of network devices in a network system;
preprocessing the plurality of log information based on the alarm information characteristics in the log information to generate a plurality of log data;
dividing the plurality of log data into a plurality of log data sets according to a preset time window;
and establishing a fault prediction model by a classification prediction method based on the plurality of log data sets, wherein the fault prediction model is used for predicting the overall network fault or the fault equipment.
2. The method of claim 1, further comprising:
acquiring real-time log information in a network system;
and inputting the real-time log information into the fault prediction model, and outputting the integral fault probability or the fault equipment identification of the network.
3. The method of claim 1, wherein preprocessing the plurality of log information based on alarm information characteristics in the log information generates a plurality of log data, comprising:
filtering out derived alarm information in the plurality of log information; and/or
Filtering alarm information of which the time interval is smaller than a preset value in the plurality of log information; and/or
And filtering repeated alarm information in the plurality of log information based on a time threshold.
4. The method of claim 3, wherein filtering out duplicate alarm information in the plurality of log information based on a time threshold comprises:
sorting the alarm information according to the occurrence time and the category;
judging whether the alarm information of one category is repeated information according to at least one time threshold value based on the sorting;
and deleting the alarm information when the preset type of alarm information is the repeated information.
5. The method of claim 1, wherein dividing the plurality of log data into a plurality of log data sets according to a preset time window comprises:
determining the ranges of an observation time window, a prediction time window and a current time window;
and dividing the plurality of log data into an observation time window log data set, a prediction time window log data set and a current time window log data set based on the observation time window, the prediction time window and the current time window.
6. The method of claim 1, wherein building a fault prediction model based on the plurality of log data sets by a classification prediction method comprises:
establishing a network fault prediction model for predicting the overall network fault through a classification prediction method based on the plurality of log data sets; and/or
And establishing an equipment fault prediction model for predicting the fault equipment by a classification prediction method based on the plurality of log data sets.
7. The method of claim 1, wherein building a network failure prediction model for predicting a network global failure through a classification prediction method based on the plurality of log data sets comprises:
extracting network alarm information related to network faults from an observation time window log data set;
extracting network fault information related to the network fault from the prediction time window log data set;
and establishing the network fault prediction model based on the network alarm information, the network fault information and a classification prediction method.
8. The method of claim 1, wherein building an equipment failure prediction model for predicting failed equipment based on the plurality of log data sets by a classification prediction method comprises:
extracting equipment alarm information related to equipment faults from an observation time window log data set;
extracting equipment fault information related to equipment faults from a prediction time window log data set;
and establishing the equipment fault prediction model based on the equipment alarm information, the equipment fault information and the classification prediction method.
9. The method of claim 7 or 8, wherein the classification prediction method comprises:
a rule-based repeat increment pruning classification algorithm; and/or
A Bayesian network classification algorithm; and/or
And (4) a random forest algorithm based on a decision tree.
10. A network failure prediction model creation apparatus, comprising:
the system comprises a log module, a log module and a log module, wherein the log module is used for acquiring a plurality of log information of a plurality of network devices in a network system;
the processing module is used for preprocessing the log information based on the alarm information characteristics in the log information to generate a plurality of log data;
the collection module is used for dividing the plurality of log data into a plurality of log data collections according to a preset time window;
and the model module is used for establishing a fault prediction model through a classification prediction method based on the plurality of log data sets, and the fault prediction model is used for predicting the overall network fault or the fault equipment.
11. The apparatus of claim 10, further comprising:
the real-time module is used for acquiring real-time log information in the network system;
and the prediction module is used for inputting the real-time log information into the fault prediction model and outputting the integral network fault probability or the fault equipment identification.
CN202010354291.XA 2020-04-29 2020-04-29 Network fault prediction model establishing method and device Withdrawn CN111585799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010354291.XA CN111585799A (en) 2020-04-29 2020-04-29 Network fault prediction model establishing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010354291.XA CN111585799A (en) 2020-04-29 2020-04-29 Network fault prediction model establishing method and device

Publications (1)

Publication Number Publication Date
CN111585799A true CN111585799A (en) 2020-08-25

Family

ID=72111862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010354291.XA Withdrawn CN111585799A (en) 2020-04-29 2020-04-29 Network fault prediction model establishing method and device

Country Status (1)

Country Link
CN (1) CN111585799A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113395182A (en) * 2021-06-21 2021-09-14 山东八五信息技术有限公司 Intelligent network equipment management system and method with fault prediction
CN114095333A (en) * 2021-11-23 2022-02-25 天翼数字生活科技有限公司 Network troubleshooting method, device, equipment and readable storage medium
CN114385551A (en) * 2021-12-20 2022-04-22 武汉物易云通网络科技有限公司 Log time-sharing management method, device, equipment and storage medium
WO2022089202A1 (en) * 2020-10-27 2022-05-05 深圳前海微众银行股份有限公司 Fault identification model training method, fault identification method, apparatus and electronic device
CN115190519A (en) * 2022-07-08 2022-10-14 唐尚禹 Industrial application communication method and system oriented to industrial application intelligent double-transmission selective-reception
CN117176480A (en) * 2023-11-03 2023-12-05 北京锐服信科技有限公司 Method and system for tracing attack event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038049A (en) * 2017-12-13 2018-05-15 西安电子科技大学 Real-time logs control system and control method, cloud computing system and server
CN108123834A (en) * 2017-12-18 2018-06-05 佛山市米良仓科技有限公司 Log analysis system based on big data platform
CN109885456A (en) * 2019-02-20 2019-06-14 武汉大学 A kind of polymorphic type event of failure prediction technique and device based on system log cluster
CN110958136A (en) * 2019-11-11 2020-04-03 国网山东省电力公司信息通信公司 Deep learning-based log analysis early warning method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038049A (en) * 2017-12-13 2018-05-15 西安电子科技大学 Real-time logs control system and control method, cloud computing system and server
CN108123834A (en) * 2017-12-18 2018-06-05 佛山市米良仓科技有限公司 Log analysis system based on big data platform
CN109885456A (en) * 2019-02-20 2019-06-14 武汉大学 A kind of polymorphic type event of failure prediction technique and device based on system log cluster
CN110958136A (en) * 2019-11-11 2020-04-03 国网山东省电力公司信息通信公司 Deep learning-based log analysis early warning method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
倪震等: "面向电力大数据日志分析平台的异常监测集成预测算法", 《南京理工大学学报》 *
钟将等: "基于告警日志的网络故障预测", 《计算机应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022089202A1 (en) * 2020-10-27 2022-05-05 深圳前海微众银行股份有限公司 Fault identification model training method, fault identification method, apparatus and electronic device
CN113395182A (en) * 2021-06-21 2021-09-14 山东八五信息技术有限公司 Intelligent network equipment management system and method with fault prediction
CN114095333A (en) * 2021-11-23 2022-02-25 天翼数字生活科技有限公司 Network troubleshooting method, device, equipment and readable storage medium
CN114385551A (en) * 2021-12-20 2022-04-22 武汉物易云通网络科技有限公司 Log time-sharing management method, device, equipment and storage medium
CN115190519A (en) * 2022-07-08 2022-10-14 唐尚禹 Industrial application communication method and system oriented to industrial application intelligent double-transmission selective-reception
CN117176480A (en) * 2023-11-03 2023-12-05 北京锐服信科技有限公司 Method and system for tracing attack event
CN117176480B (en) * 2023-11-03 2024-01-09 北京锐服信科技有限公司 Method and system for tracing attack event

Similar Documents

Publication Publication Date Title
CN111585799A (en) Network fault prediction model establishing method and device
US11689557B2 (en) Autonomous report composer
CN110865929B (en) Abnormality detection early warning method and system
US11522881B2 (en) Structural graph neural networks for suspicious event detection
CN103513983B (en) method and system for predictive alert threshold determination tool
CN110958136A (en) Deep learning-based log analysis early warning method
US8918345B2 (en) Network analysis system
US20240129327A1 (en) Context informed abnormal endpoint behavior detection
CN111858526B (en) Failure time space prediction method and system based on information system log
CN114465874B (en) Fault prediction method, device, electronic equipment and storage medium
CN109992484B (en) Network alarm correlation analysis method, device and medium
CN114785666B (en) Network troubleshooting method and system
CN115225536B (en) Virtual machine abnormality detection method and system based on unsupervised learning
CN115809183A (en) Method for discovering and disposing information-creating terminal fault based on knowledge graph
CN111459692A (en) Method, apparatus and computer program product for predicting drive failure
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN116882756B (en) Power safety control method based on block chain
CN115659351B (en) Information security analysis method, system and equipment based on big data office
CN116842520A (en) Anomaly perception method, device, equipment and medium based on detection model
CN114816962B (en) ATTENTION-LSTM-based network fault prediction method
CN114500075A (en) User abnormal behavior detection method and device, electronic equipment and storage medium
CN115145623A (en) White box monitoring method, device, equipment and storage medium of software business system
CN113778792A (en) Alarm classification method and system for IT equipment
CN113472582A (en) System and method for alarm correlation and alarm aggregation in information technology monitoring
CN115484150B (en) Alarm information processing method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200825

WW01 Invention patent application withdrawn after publication