CN107918629B - Correlation method and device for alarm fault - Google Patents

Correlation method and device for alarm fault Download PDF

Info

Publication number
CN107918629B
CN107918629B CN201610887618.3A CN201610887618A CN107918629B CN 107918629 B CN107918629 B CN 107918629B CN 201610887618 A CN201610887618 A CN 201610887618A CN 107918629 B CN107918629 B CN 107918629B
Authority
CN
China
Prior art keywords
alarm
fault
event
data
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610887618.3A
Other languages
Chinese (zh)
Other versions
CN107918629A (en
Inventor
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenzhou Taiyue Software Co Ltd
Original Assignee
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Taiyue Software Co Ltd filed Critical Beijing Shenzhou Taiyue Software Co Ltd
Priority to CN201610887618.3A priority Critical patent/CN107918629B/en
Publication of CN107918629A publication Critical patent/CN107918629A/en
Application granted granted Critical
Publication of CN107918629B publication Critical patent/CN107918629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a device for associating alarm faults. The method comprises the following steps: establishing an alarm fault association model according to the occurrence period information of the alarm events and the occurrence period information of the fault events in the training set, and performing association processing on the alarm events and the fault events in the training set according to the alarm fault association model to obtain association events; receiving an alarm event from an alarm system or a fault event from a fault system; selecting a related event corresponding to the alarm or fault event from the related events according to the received occurrence time interval information of the alarm or fault event; and determining a fault event corresponding to the alarm event or an alarm event corresponding to the fault event according to the selected associated event. The technical scheme of the invention is based on the idea of machine learning, can automatically obtain the associated event which represents the association between the alarm event and the corresponding fault event, and does not need manual configuration, so the invention can quickly and uninterruptedly associate the alarm event or the fault event.

Description

Correlation method and device for alarm fault
Technical Field
The present invention relates to the field of network management technologies, and in particular, to a method and an apparatus for associating alarm faults.
Background
The scale and complexity of telecommunication networks are getting bigger and bigger, and many various faults occur on the networks at any moment, each fault can cause the system to send one or more alarms to inform network operation and maintenance personnel, and the fault source must be quickly positioned in the face of massive alarm data.
In the prior art, different alarm rules are set, topology analysis is performed on alarms conforming to the alarm rules, association relations among the alarms are found, and finally fault sources are found according to the found alarm information. However, the fault positioning method in the prior art has at least the following problems:
when the network topology changes, the change of the alarm rule needs to be configured manually aiming at different network environments, so that the alarm and the fault cannot be associated rapidly and uninterruptedly, and rapid fault positioning cannot be realized.
Disclosure of Invention
The invention provides an association method and an association system for alarm faults, which aim to solve the problem that alarm and faults cannot be associated rapidly and uninterruptedly due to the fact that alarm rules need to be configured manually in the prior art.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
in one aspect, the present invention provides a method for associating alarm faults, where the method further includes:
establishing an alarm fault association model according to the occurrence period information of the alarm events in the training set and the occurrence period information of the fault events, and performing association processing on the alarm events and the fault events in the training set according to the alarm fault association model to obtain an association event representing the association between the alarm events and the corresponding fault events;
receiving an alarm event from an alarm system or a fault event from a fault system;
selecting the received alarm event or the correlation event corresponding to the received fault event from the correlation events according to the occurrence period information of the received alarm event or fault event;
and determining the fault event corresponding to the received alarm event or the alarm event corresponding to the received fault event according to the selected associated event.
In another aspect, the present invention further provides an apparatus for associating alarm faults, where the apparatus includes:
the model establishing unit is used for establishing an alarm fault association model according to the occurrence period information of the alarm event and the occurrence period information of the fault event in the training set;
the correlation event acquisition unit is used for performing correlation processing on the alarm events and the fault events in the training set according to the alarm fault correlation model to obtain correlation events representing the correlation between the alarm events and the corresponding fault events;
the event receiving unit is used for receiving the alarm event from the alarm system or the fault event from the fault system;
the related event selection unit is used for selecting the received alarm event or the related event corresponding to the received fault event from the related events according to the occurrence period information of the received alarm event or fault event;
and the event determining unit is used for determining the fault event corresponding to the received alarm event or the alarm event corresponding to the received fault event according to the selected associated event.
The invention has the beneficial effects that: the invention is based on a machine learning method, by establishing an alarm fault association model and utilizing the established alarm fault association model to perform association processing on alarm events and fault events in a training set, the association events are automatically obtained without manual configuration, and therefore, when the alarm or fault events are received, the alarm events or fault events can be quickly associated.
Drawings
FIG. 1 is a flow chart of a method for associating alarm faults according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of alarm fault association analysis according to another embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an alarm fault association apparatus according to another embodiment of the present invention;
FIG. 4 is a block diagram of the unit modules of the alarm fault association apparatus of FIG. 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for associating an alarm fault according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
s100, establishing an alarm fault association model according to the occurrence period information of the alarm events and the occurrence period information of the fault events in the training set, and performing association processing on the alarm events and the fault events in the training set according to the alarm fault association model to obtain an association event representing the association between the alarm events and the corresponding fault events.
And S120, receiving an alarm event from the alarm system or a fault event from the fault system.
And S140, selecting the received alarm event or the correlation event corresponding to the received fault event from the correlation events according to the occurrence period information of the received alarm event or fault event.
S160, according to the selected correlation event, determining a fault event corresponding to the received alarm event or an alarm event corresponding to the received fault event, and realizing the correlation between the alarm and the fault.
In the embodiment, based on the machine learning method, the alarm fault association model is established, and the established alarm fault association model is used for performing association processing on the alarm events and the fault events in the training set, so that the association events are automatically obtained without manual configuration, and therefore, when the alarm or fault events are received, the alarm events or the fault events can be quickly associated.
The alarm events and fault events in the training set are generally alarm events generated by an alarm system and fault events generated by a fault system, and the alarm data corresponding to the alarm events generated by the alarm system generally include various data contents.
In one implementation of this embodiment, the alarm fault association model is established by the following method: preprocessing original alarm data corresponding to alarm events and original fault data corresponding to fault events in a training set to generate alarm data and fault data in a preset format, wherein the preset format at least comprises occurrence time interval information; it should be noted that, in order to identify the alarm data and the fault data, the generated alarm data and fault data have unique representations.
Performing data integration on fault data and alarm data of which the occurrence time interval information has time intersection to obtain integrated data and time interval intersection duration information of the integrated data; it should be noted that, when data integration is performed, alarm data and fault data may be integrated in a one-to-many manner or in a many-to-one manner; that is, one alarm data may be integrated with a plurality of fault data, and one fault data may be integrated with a plurality of alarm data.
And establishing a posterior probability model of the time interval intersection duration information of the integrated data and the occurrence time interval information of the alarm data in the integrated data according to Bayesian theorem, wherein the posterior probability model is an alarm fault association model.
The preprocessing of the original alarm data corresponding to the alarm event in the training set and the original fault data corresponding to the fault event specifically comprises the following steps:
analyzing whether original alarm data and original fault data in the training set lack attribute field content or not, and filtering the original alarm data and the original fault data lacking the attribute field content to obtain filtered alarm data and filtered fault data; performing secondary filtering on the filtered alarm data and the filtered fault data according to the attribute field to be mined to generate alarm data and fault data in a preset format; the attribute field to be mined is consistent with the attribute field in the preset format, and the preset format also comprises an equipment name, an alarm type and a fault type; namely, the alarm data after the secondary filtering includes device name information, alarm type information and alarm occurrence time period information, and the fault data after the secondary filtering includes device name information, fault category information and fault occurrence time period information.
Because the effective contents of the alarm data and the fault data are simplified after the original data in the training set are filtered twice, and after the two times of filtering, the alarm data with the same equipment name information and alarm type information exists, and the fault data with the same equipment name information and fault category information also exists, after the integrated data and the time interval intersection duration information of the integrated data are obtained, the integrated data can be subjected to data collection, specifically:
converging the integrated data with the same equipment name, alarm type and fault type to obtain converged data, and calculating the total alarm occurrence time period duration, the total fault occurrence time period duration and the total time period intersection duration of the converged data;
at this time, when the alarm fault association model is established, a posterior probability model of the time period aggregation time length of the aggregated data and the total time length of the alarm occurrence time period of the aggregated data is established according to the bayesian theorem, and the posterior probability model obtained at this time is used as the alarm fault association model.
After obtaining the alarm fault association model, obtaining an association event representing the association between the alarm event and the corresponding fault event by the following method:
calculating a confidence coefficient value and/or a support degree value corresponding to the integrated data according to the posterior probability model;
cutting off the integrated data which do not accord with the set confidence coefficient value and/or the set support degree value, and taking the rest integrated data as the associated data; at this time, the alarm event and the fault event corresponding to the cut integrated data can be removed from the training set;
and obtaining the associated event according to the alarm event corresponding to the alarm data in the associated data and the fault event corresponding to the fault data.
It should be noted that, because the obtained associated event is actually a result of screening the integrated data obtained in the alarm fault model building process, and because the alarm data and the fault data in the integrated data may be one-to-many or many-to-one, in the obtained associated event, the alarm data and the fault data may also be in a one-to-many or many-to-one form.
It should be further noted that, if data aggregation is performed on the integrated data in the alarm fault model building process, the obtained association event is that a confidence value and/or a support value corresponding to the aggregated data is calculated according to the posterior probability model, the aggregated data that do not meet the set confidence value and/or the set support value is cut out, and the remaining aggregated data is used as the association data.
In addition, in the design process, the remaining integrated data can be supervised and corrected according to cross validation or expert knowledge, and the supervised and corrected integrated data is used as associated data, so that the accuracy of the obtained associated data is improved.
In another implementation of this embodiment, the received associated event corresponding to the alarm event or the fault event is obtained by the following method:
acquiring the equipment name, the alarm type and the alarm occurrence time period duration information of the received alarm event, or acquiring the equipment name, the fault type and the fault occurrence time period duration information of the fault event;
traversing the associated event according to the acquired equipment name and the alarm type of the alarm event or the acquired equipment name and the acquired fault type of the fault event to acquire the received alarm event or the associated event corresponding to the received fault event;
determining the posterior probability value of the obtained associated event according to the received alarm event or the occurrence time interval information of the received fault event and the alarm fault associated model;
and screening the obtained associated events according to the posterior probability value, and determining the fault event corresponding to the received alarm event or determining the alarm event corresponding to the received fault event according to the screened associated events.
Because the alarm data and the fault data may also be in a one-to-many or many-to-one form in the associated events determined according to the alarm events and the fault events in the training set, when the associated events are determined according to the occurrence period information of the received alarm events or fault events, taking the received alarm events as an example, it may be determined that the received alarm events correspond to a plurality of associated events at this time, and the posterior probability value of each associated event may be obtained according to the alarm fault associated model, i.e., a plurality of posterior probability values are obtained. The posterior probability values are divided into a plurality of posterior probability values, the posterior probability values can be screened according to a set probability threshold, the associated events corresponding to the posterior probability values larger than the set probability threshold are used as the screened associated events, and then fault events in the screened associated events are associated with the received alarm events.
In another implementation of this embodiment, the alarm fault association model may be further optimized by the following method to update the association event:
collecting alarm events generated by an alarm system and fault events generated by a fault system in batches at regular or irregular intervals;
adding the collected alarm events and fault events into a training set, and optimizing an alarm fault correlation model by using the collected alarm events and fault events to obtain an optimized alarm fault correlation model;
and performing relevance processing on the alarm events and the fault events in the training set according to the optimized alarm fault relevance model to obtain updated relevance events.
To describe the association method between the alarm and the fault in this embodiment in more detail, a specific embodiment is described below.
Fig. 2 is a schematic diagram of alarm fault association analysis provided in the embodiment of the present invention, and as shown in fig. 2, the analysis process is as follows:
s200, alarm events generated by the alarm system and fault events generated by the fault system are collected in batches, and the collected alarm events and fault events are used as original training data in a training set.
In this embodiment, a Hadoop Distributed File System (hdfs) may be used as a training set, and the raw alarm data corresponding to the collected alarm event is added to hdfs, and the raw fault data corresponding to the collected fault event is added to hdfs.
S201, loading and analyzing whether the original alarm data and the original fault data in hdfs lack attribute field content, and filtering the original alarm data and the original fault data of the missing attribute field content to obtain filtered alarm data and filtered fault data.
Illustratively, if a certain piece of original alarm data in hdfs lacks alarm occurrence period information, the piece of original alarm data is filtered, and this filtering operation may also be referred to as a cleaning operation.
S202, secondary filtering is carried out on the filtered alarm data and the filtered fault data according to the attribute field to be mined, and alarm data and fault data in a preset format are generated.
The attribute field to be mined is consistent with the attribute field with the preset format, illustratively, the alarm data after the secondary filtering includes the device name, the alarm type and the alarm occurrence period information, the fault data after the secondary filtering includes the device name, the fault category and the fault occurrence period information, and this filtering operation may also be referred to as a selection operation.
And S203, performing data integration on the fault data with the time intersection of the occurrence time interval information and the alarm data to obtain integrated data and time interval intersection duration information of the integrated data.
Illustratively, if the occurrence period information of the alarm data a after the secondary filtering and the failure data B after the secondary filtering has a time intersection, the alarm data a and the failure data B are synthesized into an integrated data, the integrated data retains the respective data content of the alarm data a and the failure data B, and meanwhile, the time intersection duration information of the alarm data a and the failure data B is newly added.
S204, aggregating the integrated data of the same equipment name, alarm type and fault type to obtain aggregated data, and calculating the total alarm occurrence time interval duration, the total fault occurrence time interval duration and the total time interval intersection duration of the aggregated data.
S205, establishing a posterior probability model of the time interval aggregation total time interval of the aggregated data and the occurrence time interval total time interval of the alarm data in the aggregated data, and taking the posterior probability model as an alarm fault association model.
Suppose the time period of the aggregated data is TunionTotal time length T of alarm occurrence period of aggregated dataalarmThe total time length of the fault occurrence period of the aggregated data is Taccident(ii) a Establishing the following posterior probability formula according to a Bayesian formula:
Figure BDA0001128729270000071
the posterior probability formula is an alarm fault correlation model.
And S206, calculating a confidence coefficient value and/or a support degree value corresponding to the converged data according to the posterior probability formula in the step S205, cutting out integrated data which do not accord with the set confidence coefficient value and/or the set support degree value, and taking the residual converged data as initial associated data.
And S207, performing supervision and correction on the initial associated data according to cross validation or expert knowledge to obtain final associated data.
The associated data in this step represents the association of the alarm event with the corresponding fault event.
S208, receiving the alarm event from the alarm system, and acquiring the device name, the alarm type and the alarm occurrence time period duration information of the received alarm event.
Since the method for associating the alarm event with the fault event is the same as the method for associating the alarm event with the fault event, the present embodiment exemplarily takes the case that the alarm event is associated with the fault event as an example for explanation.
S209, traversing the associated event obtained in the step S207 according to the obtained device name and alarm type of the alarm event, and obtaining the associated event corresponding to the received alarm event.
And S210, calculating the posterior probability value of the obtained associated event according to the occurrence time interval information of the received alarm event and the posterior probability formula in the step S205.
S211, screening the obtained associated events according to the posterior probability value, and determining the fault event corresponding to the received alarm event according to the screened associated events.
The present embodiment determines the fault event associated with the received alarm event through the step S209, and then determines the degree of association between the received alarm event and the fault event through the step S210, thereby determining the fault event corresponding to the alarm event.
Thus, the present embodiment implements alarm fault association analysis through the above steps S200 to S211.
Based on the same technical concept as the method, the invention also provides a device for correlating the alarm fault.
Fig. 3 is a schematic structural diagram of an alarm fault association apparatus provided in an embodiment of the present invention, and fig. 4 is a block diagram of unit modules of the alarm fault association apparatus in fig. 3, as shown in fig. 3 and fig. 4, the apparatus includes: a model establishing unit 300, an associated event acquiring unit 320, an event receiving unit 340, an associated event selecting unit 360 and an event determining unit 380;
the model establishing unit 300 is configured to establish an alarm fault association model according to the occurrence period information of the alarm event and the occurrence period information of the fault event in the training set;
the associated event obtaining unit 320 is configured to perform association processing on the alarm event and the fault event in the training set according to the alarm fault association model to obtain an associated event indicating the association between the alarm event and the corresponding fault event;
an event receiving unit 340 for receiving an alarm event from an alarm system or a fault event from a fault system;
a correlation event selecting unit 360, configured to select a received alarm event or a correlation event corresponding to a received fault event from correlation events according to occurrence period information of the received alarm event or fault event;
the event determining unit 380 is configured to determine, according to the selected associated event, a fault event corresponding to the received alarm event or an alarm event corresponding to the received fault event.
In an implementation scheme of this embodiment, the model building unit 300 includes a preprocessing module 301, an integrating module 302, and a building module 303, and the associated event obtaining unit 320 includes a first calculating module 321, a pruning module 322, and a first obtaining module 323;
the preprocessing module 301 is configured to preprocess original alarm data corresponding to an alarm event in a training set and original fault data corresponding to a fault event, and generate alarm data and fault data in a predetermined format, where the predetermined format at least includes occurrence period information;
the integration module 302 is configured to perform data integration on the fault data and the alarm data where the time intersection exists in the occurrence time interval information, so as to obtain integrated data and time interval intersection duration information of the integrated data;
the establishing module 303 is configured to establish a posterior probability model of the time interval intersection duration information of the integrated data and the occurrence time interval information of the alarm data in the integrated data according to bayesian theorem, where the posterior probability model is an alarm fault association model;
the first calculating module 321 is configured to calculate a confidence value and/or a support value corresponding to the integrated data according to the posterior probability model;
a cutting module 322, configured to cut out integrated data that does not meet the set confidence value and/or the set support value, and use the remaining integrated data as associated data;
the first obtaining module 323 is configured to obtain a correlation event according to an alarm event corresponding to alarm data in the correlation data and a fault event corresponding to fault data.
Preferably, the associated event acquiring unit 320 further includes a rectification module 324;
and the correcting module 324 is used for performing supervision and correction on the remaining integrated data according to the cross validation, and taking the integrated data after supervision and correction as the associated data.
In another implementation of this embodiment, the preprocessing module 301 is specifically configured to analyze whether the original alarm data and the original fault data lack the content of the attribute field, and filter the original alarm data and the original fault data that lack the content of the attribute field to obtain filtered alarm data and filtered fault data; performing secondary filtering on the filtered alarm data and the filtered fault data according to the attribute field to be mined to generate alarm data and fault data in a preset format; the attribute field to be mined is consistent with the attribute field in the preset format, and the preset format also comprises the equipment name, the alarm type and the fault category.
In this implementation, the model building unit 300 further includes a convergence module 304;
the aggregation module 304 is configured to aggregate the integrated data with the same device name, alarm type, and fault type to obtain aggregated data, and calculate total alarm occurrence period duration, total fault occurrence period duration, and total period intersection duration of the aggregated data;
the establishing module 303 is further configured to establish a posterior probability model of a time period aggregation total time length of the aggregated data and an alarm occurrence time period total time length of the aggregated data according to bayesian theorem, and use the posterior probability model as an alarm fault association model;
the first calculating module 321 is further configured to calculate a confidence value and/or a support value corresponding to the aggregated data according to the posterior probability model;
the pruning module 322 is further configured to prune the aggregated data that does not meet the set confidence value and/or the set support value, and use the remaining aggregated data as the associated data.
In another implementation of this embodiment, the associated event selecting unit 360 includes: a second obtaining module 361, a traversing module 362, a second calculating module 363, and a selecting module 364;
the second obtaining module 361 is configured to obtain the device name, the alarm type, and the alarm occurrence period duration information of the received alarm event, or obtain the device name, the fault type, and the fault occurrence period duration information of the fault event;
a traversal module 362, configured to traverse the associated event according to the obtained device name and alarm type of the alarm event, or according to the obtained device name and fault type of the fault event, to obtain the received alarm event or the associated event corresponding to the received fault event;
the second calculating module 363 is configured to calculate a posterior probability value of the obtained associated event according to the received alarm event or the occurrence time period information of the received fault event and the alarm fault association model;
the selecting module 364 is configured to screen the obtained associated events according to the posterior probability value, and determine a fault event corresponding to the received alarm event according to the screened associated events, or determine an alarm event corresponding to the received fault event.
In another implementation of this embodiment, the apparatus further includes a collecting unit and an adding unit;
the acquisition unit is used for acquiring alarm events generated by the alarm system and fault events generated by the fault system in batches at regular or irregular intervals;
the adding unit is used for adding the collected alarm events and fault events into a training set;
the model establishing unit is further used for optimizing an alarm fault correlation model by utilizing the collected alarm event and fault event to obtain an optimized alarm fault correlation model;
and the correlation event acquisition unit is further used for performing correlation processing on the alarm events and the fault events in the training set according to the optimized alarm fault correlation model to obtain updated correlation events.
In summary, embodiments of the present invention provide a method and an apparatus for associating alarm faults, based on a machine learning method, by establishing an alarm fault association model and performing association processing on alarm events and fault events in a training set by using the established alarm fault association model, an association event is automatically obtained without manual configuration, and thus when an alarm or fault event is received, the alarm event or fault event can be quickly associated.
For the convenience of clearly describing the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", and the like are used to distinguish the same items or similar items with basically the same functions and actions, and those skilled in the art can understand that the words "first", "second", and the like do not limit the quantity and execution order.
While the foregoing is directed to embodiments of the present invention, other modifications and variations of the present invention may be devised by those skilled in the art in light of the above teachings. It should be understood by those skilled in the art that the foregoing detailed description is for the purpose of better explaining the present invention, and the scope of the present invention should be determined by the scope of the appended claims.

Claims (8)

1. A method for associating alarm faults is characterized by comprising the following steps:
establishing an alarm fault association model according to the occurrence period information of the alarm events in the training set and the occurrence period information of the fault events, and performing association processing on the alarm events and the fault events in the training set according to the alarm fault association model to obtain an association event representing the association between the alarm events and the corresponding fault events; receiving an alarm event from an alarm system or a fault event from a fault system;
selecting the received alarm event or the correlation event corresponding to the received fault event from the correlation events according to the occurrence period information of the received alarm event or fault event;
determining a fault event corresponding to the received alarm event or an alarm event corresponding to the received fault event according to the selected associated event;
the establishing of an alarm fault association model according to the occurrence period information of the alarm event and the occurrence period information of the fault event in the training set, and performing association processing on the alarm event and the fault event in the training set according to the alarm fault association model to obtain an association event representing the association between the alarm event and the corresponding fault event comprises:
preprocessing original alarm data corresponding to an alarm event in a training set and original fault data corresponding to a fault event to generate alarm data and fault data in a preset format, wherein the preset format at least comprises occurrence time interval information;
performing data integration on fault data and alarm data of which the occurrence time interval information has time intersection to obtain integrated data and time interval intersection duration information of the integrated data;
establishing a posterior probability model of the time period intersection duration information of the integrated data about the occurrence time period information of the alarm data in the integrated data according to Bayesian theorem, wherein the posterior probability model is the alarm fault association model;
calculating a confidence coefficient value and/or a support degree value corresponding to the integrated data according to the posterior probability model;
cutting off the integrated data which do not accord with the set confidence coefficient value and/or the set support degree value, and taking the rest integrated data as the associated data;
and obtaining the associated event according to the alarm event corresponding to the alarm data in the associated data and the fault event corresponding to the fault data.
2. The method for associating alarm fault according to claim 1, wherein said using the remaining integrated data as the associated data comprises:
and carrying out supervision and correction on the residual integrated data according to cross validation or expert knowledge, and taking the supervised and corrected integrated data as associated data.
3. The method according to claim 1, wherein the preprocessing the original alarm data corresponding to the alarm event in the training set includes:
analyzing whether the original alarm data and the original fault data lack attribute field content or not, and filtering the original alarm data and the original fault data lacking the attribute field content to obtain filtered alarm data and filtered fault data;
performing secondary filtering on the filtered alarm data and the filtered fault data according to the attribute field to be mined to generate alarm data and fault data in a preset format; the attribute field to be mined is consistent with the attribute field in the preset format, and the preset format further comprises equipment name, alarm type and fault category.
4. The method for associating alarm faults according to claim 3, wherein after obtaining the integrated data and the time interval intersection duration information of the integrated data, the method further comprises:
converging the integrated data with the same equipment name, alarm type and fault type to obtain converged data, and calculating the total alarm occurrence time period duration, the total fault occurrence time period duration and the total time period intersection duration of the converged data;
establishing a posterior probability model of the total time of the time period intersection of the converged data with respect to the total time of the alarm occurrence time period of the converged data according to Bayesian theorem, and taking the posterior probability model as an alarm fault association model;
calculating a confidence coefficient value and/or a support degree value corresponding to the converged data according to the posterior probability model;
and cutting out the aggregation data which do not accord with the set confidence coefficient value and/or the set support degree value, and taking the residual aggregation data as the associated data.
5. The method according to claim 4, wherein obtaining the associated event corresponding to the alarm event or the fault event according to the received alarm event or the occurrence period information of the fault event and referring to the associated event comprises:
acquiring the equipment name, the alarm type and the alarm occurrence time period duration information of the received alarm event, or acquiring the equipment name, the fault type and the fault occurrence time period duration information of the fault event;
traversing the associated event according to the acquired equipment name and the alarm type of the alarm event or the acquired equipment name and the acquired fault type of the fault event to acquire the received alarm event or the associated event corresponding to the received fault event;
calculating the posterior probability value of the obtained associated event according to the received alarm event or the occurrence time interval information of the received fault event and the alarm fault associated model;
and screening the obtained associated events according to the posterior probability value, and determining the fault event corresponding to the received alarm event or determining the alarm event corresponding to the received fault event according to the screened associated events.
6. The method for associating alarm faults according to claim 1, further comprising:
collecting alarm events generated by an alarm system and fault events generated by a fault system in batches at regular or irregular intervals;
adding the collected alarm events and fault events into the training set, and optimizing the alarm fault association model by using the collected alarm events and fault events to obtain an optimized alarm fault association model;
and performing relevance processing on the alarm events and the fault events in the training set according to the optimized alarm fault relevance model to obtain updated relevance events.
7. An apparatus for correlating alarm faults, the apparatus comprising:
the model establishing unit is used for establishing an alarm fault association model according to the occurrence period information of the alarm event and the occurrence period information of the fault event in the training set;
the correlation event acquisition unit is used for performing correlation processing on the alarm events and the fault events in the training set according to the alarm fault correlation model to obtain correlation events representing the correlation between the alarm events and the corresponding fault events;
the event receiving unit is used for receiving the alarm event from the alarm system or the fault event from the fault system;
the related event selection unit is used for selecting the received alarm event or the related event corresponding to the received fault event from the related events according to the occurrence period information of the received alarm event or fault event;
the event determining unit is used for determining a fault event corresponding to the received alarm event or an alarm event corresponding to the received fault event according to the selected associated event;
the model establishing unit comprises a preprocessing module, an integration module and an establishing module, and the associated event acquiring unit comprises a first calculating module, a cutting module and a first acquiring module;
the preprocessing module is used for preprocessing original alarm data corresponding to an alarm event in a training set and original fault data corresponding to a fault event to generate alarm data and fault data in a preset format, wherein the preset format at least comprises occurrence time interval information;
the integration module is used for performing data integration on fault data with time intersection of occurrence time interval information and alarm data to obtain integrated data and time interval intersection duration information of the integrated data;
the establishing module is used for establishing a posterior probability model of the time interval intersection duration information of the integrated data about the occurrence time interval information of the alarm data in the integrated data according to Bayesian theorem, wherein the posterior probability model is the alarm fault association model;
the calculation module is used for calculating a confidence coefficient value and/or a support degree value corresponding to the integrated data according to the posterior probability model;
the cutting module is used for cutting the integrated data which do not accord with the set confidence coefficient value and/or the set support degree value, and taking the rest integrated data as the associated data;
the acquisition module is used for acquiring the correlation event according to the alarm event corresponding to the alarm data in the correlation data and the fault event corresponding to the fault data.
8. The apparatus for correlating alarm faults as claimed in claim 7, wherein the model building unit further comprises a convergence module;
the aggregation module is used for aggregating the integrated data with the same equipment name, alarm type and fault type to obtain aggregated data, and calculating the total alarm occurrence time period duration, the total fault occurrence time period duration and the total time period intersection duration of the aggregated data;
the establishing module is used for establishing a posterior probability model of the total time length of the time interval intersection of the converged data relative to the total time length of the alarm occurrence time interval of the converged data according to Bayesian theorem, and taking the posterior probability model as an alarm fault association model;
the first calculation module is further used for calculating a confidence coefficient value and/or a support degree value corresponding to the converged data according to the posterior probability model;
the pruning module is further used for pruning the aggregation data which do not accord with the set confidence coefficient value and/or the set support degree value, and taking the residual aggregation data as the associated data;
the associated event selection unit includes: the device comprises a second acquisition module, a traversal module, a second calculation module and a selection module;
the second obtaining module is used for obtaining the equipment name, the alarm type and the alarm occurrence time period duration information of the received alarm event, or obtaining the equipment name, the fault type and the fault occurrence time period duration information of the fault event;
the traversal module is used for traversing the associated event according to the acquired device name and alarm type of the alarm event or according to the acquired device name and fault type of the fault event to acquire the received alarm event or the associated event corresponding to the received fault event;
the second calculation module is used for calculating the posterior probability value of the obtained associated event according to the received alarm event or the occurrence time interval information of the received fault event and the alarm fault associated model;
and the selection module is used for screening the obtained associated events according to the posterior probability value, determining the fault event corresponding to the received alarm event according to the screened associated events, or determining the alarm event corresponding to the received fault event.
CN201610887618.3A 2016-10-11 2016-10-11 Correlation method and device for alarm fault Active CN107918629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610887618.3A CN107918629B (en) 2016-10-11 2016-10-11 Correlation method and device for alarm fault

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610887618.3A CN107918629B (en) 2016-10-11 2016-10-11 Correlation method and device for alarm fault

Publications (2)

Publication Number Publication Date
CN107918629A CN107918629A (en) 2018-04-17
CN107918629B true CN107918629B (en) 2020-09-04

Family

ID=61891940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610887618.3A Active CN107918629B (en) 2016-10-11 2016-10-11 Correlation method and device for alarm fault

Country Status (1)

Country Link
CN (1) CN107918629B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738827A (en) * 2018-07-20 2020-01-31 珠海格力电器股份有限公司 Abnormity early warning method, system, device and storage medium of electric appliance
CN109687999A (en) * 2018-12-11 2019-04-26 山东中创软件商用中间件股份有限公司 A kind of association analysis method of alarm failure, device and equipment
CN110334078B (en) * 2019-06-18 2022-02-08 国网四川省电力公司 Power system alarm fault model definition method, computer equipment and storage medium
CN110844092B (en) * 2019-11-28 2022-05-10 中国商用飞机有限责任公司北京民用飞机技术研究中心 Aircraft fault warning method and system
CN111651340B (en) * 2020-06-10 2023-07-18 创新奇智(上海)科技有限公司 Alarm data rule mining method and device and electronic equipment
CN112039695A (en) * 2020-08-19 2020-12-04 朔黄铁路发展有限责任公司肃宁分公司 Transmission network fault positioning method and device based on Bayesian inference
CN113656287B (en) * 2021-07-28 2024-06-04 北京宝兰德软件股份有限公司 Method and device for predicting software instance faults, electronic equipment and storage medium
CN116599820B (en) * 2023-05-26 2024-03-19 北京天融信网络安全技术有限公司 Alarm filtering processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945009A (en) * 2010-09-14 2011-01-12 国网电力科学研究院 Positioning method and device of power communication network fault based on case and pattern matching
CN103414581A (en) * 2013-07-24 2013-11-27 佳都新太科技股份有限公司 Equipment fault alarm, prediction and processing mechanism based on data mining
CN104348667A (en) * 2014-11-11 2015-02-11 上海新炬网络技术有限公司 Fault positioning method based on warning information
CN104518905A (en) * 2013-10-08 2015-04-15 华为技术有限公司 Fault locating method and fault locating device
CN105182122A (en) * 2015-09-02 2015-12-23 许继集团有限公司 Fault early warning method of random power supply access equipment
CN105245001A (en) * 2015-09-18 2016-01-13 贵州电力试验研究院 Event-driven intelligent alarm processing method and device for transformer station accidents

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256828B2 (en) * 2013-06-29 2016-02-09 Huawei Technologies Co., Ltd. Alarm correlation analysis method, apparatus and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945009A (en) * 2010-09-14 2011-01-12 国网电力科学研究院 Positioning method and device of power communication network fault based on case and pattern matching
CN103414581A (en) * 2013-07-24 2013-11-27 佳都新太科技股份有限公司 Equipment fault alarm, prediction and processing mechanism based on data mining
CN104518905A (en) * 2013-10-08 2015-04-15 华为技术有限公司 Fault locating method and fault locating device
CN104348667A (en) * 2014-11-11 2015-02-11 上海新炬网络技术有限公司 Fault positioning method based on warning information
CN105182122A (en) * 2015-09-02 2015-12-23 许继集团有限公司 Fault early warning method of random power supply access equipment
CN105245001A (en) * 2015-09-18 2016-01-13 贵州电力试验研究院 Event-driven intelligent alarm processing method and device for transformer station accidents

Also Published As

Publication number Publication date
CN107918629A (en) 2018-04-17

Similar Documents

Publication Publication Date Title
CN107918629B (en) Correlation method and device for alarm fault
CN104348667B (en) Fault Locating Method based on warning information
CN111158977B (en) Abnormal event root cause positioning method and device
CN111125268B (en) Network alarm analysis model creation method, alarm analysis method and device
CN111064635B (en) Abnormal traffic monitoring method and system
CN109189736B (en) Method and device for generating alarm association rule
KR20210019564A (en) Operation maintenance system and method
CN112650762B (en) Data quality monitoring method and device, electronic equipment and storage medium
CN109359098B (en) System and method for monitoring scheduling data network behaviors
CN105827422B (en) A kind of method and device of determining network element alarming incidence relation
CN110995153B (en) Abnormal data detection method and device for photovoltaic power station and electronic equipment
CN109981326B (en) Method and device for positioning household broadband sensing fault
CN105786919A (en) Alarm association rule mining method and device
CN112199805B (en) Power transmission line hidden danger identification model evaluation method and device
CN114020581A (en) Alarm correlation method based on topological optimization FP-Growth algorithm
CN112769605A (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
CN113283824A (en) Comprehensive management method and system for intelligent park data
CN106575254B (en) Log analysis device, Log Analysis System, log analysis method and storage medium
CN112600719A (en) Alarm clustering method, device and storage medium
CN112187914A (en) Remote control robot management method and system
CN116319081B (en) Electronic signature security management system based on big data cloud platform
CN113472640B (en) Intelligent gateway information processing method and system
CN111078443B (en) Method and device for automatically collecting and reporting defects and server
CN112015627A (en) Data acquisition method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: Room 818, 8 / F, 34 Haidian Street, Haidian District, Beijing 100080

Patentee after: BEIJING ULTRAPOWER SOFTWARE Co.,Ltd.

Address before: 100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building 6 storey block A Room 601

Patentee before: BEIJING ULTRAPOWER SOFTWARE Co.,Ltd.

CP02 Change in the address of a patent holder