CN110727538A - Fault positioning system and method based on model hit probability distribution - Google Patents

Fault positioning system and method based on model hit probability distribution Download PDF

Info

Publication number
CN110727538A
CN110727538A CN201911305679.4A CN201911305679A CN110727538A CN 110727538 A CN110727538 A CN 110727538A CN 201911305679 A CN201911305679 A CN 201911305679A CN 110727538 A CN110727538 A CN 110727538A
Authority
CN
China
Prior art keywords
fault
abnormal data
model
module
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911305679.4A
Other languages
Chinese (zh)
Other versions
CN110727538B (en
Inventor
陈晓莉
王俊
纪坤鹏
刘刚
徐菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Ponshine Information Technology Co Ltd
Original Assignee
Zhejiang Ponshine Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Ponshine Information Technology Co Ltd filed Critical Zhejiang Ponshine Information Technology Co Ltd
Priority to CN201911305679.4A priority Critical patent/CN110727538B/en
Publication of CN110727538A publication Critical patent/CN110727538A/en
Application granted granted Critical
Publication of CN110727538B publication Critical patent/CN110727538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention discloses a fault positioning system and method based on model hit probability distribution, the fault positioning system related by the invention comprises: the creating module is used for creating a fault model judgment rule base and a fault model base in advance; the receiving module is used for receiving abnormal data; the analysis module is used for analyzing the received abnormal data and matching the abnormal data with the pre-established fault model judgment rule base; the generating module is used for generating an obstacle removing task according to the received abnormal data, and the obstacle removing task comprises a plurality of abnormal data; the matching module is used for matching a plurality of fault models corresponding to the abnormal data according to the generated abnormal data in the troubleshooting task; the calculation module is used for carrying out fault delimitation on the current troubleshooting task according to the matched plurality of fault models and obtaining delimitation probability through calculation; and the correcting module is used for confirming or correcting the obtained fault delimitation probability to obtain the final fault delimitation probability.

Description

Fault positioning system and method based on model hit probability distribution
Technical Field
The invention relates to the technical field of fault analysis, in particular to a fault positioning system and method based on model hit probability distribution.
Background
With the continuous development of the information age, IT operation and maintenance become important components in the content of IT services. The IT operation and maintenance management is one of the most popular topics in the IT world. With the continuous deepening and perfecting of IT construction, the operation and maintenance of computer hardware and software systems become a problem which is generally concerned and overwhelmed by leaders and information service departments of various industries and universities. Since this is a new problem with the deep application of computer information technology, how to perform effective IT operation and maintenance management is just starting to accumulate knowledge and apply technology. The research and exploration in the field have wide development prospect and great practical significance. The IT operation and maintenance management refers to the comprehensive management of the soft and hard IT operating environments (software environment, network environment, etc.), the IT business system and the IT operation and maintenance personnel by the unit IT department using relevant methods, means, techniques, systems, processes, documents, etc.
In the face of the existing more and more complex services and more diversified user requirements, the continuously expanded IT application needs a more and more reasonable mode to ensure that the IT service can be continuously ensured flexibly, conveniently, safely and stably, and the guarantee factor in the mode is the IT operation and maintenance. From the development of a few initial servers to a huge data center, the requirements on the aspects of technology, business, management and the like cannot be met by people, and the factors for reducing the IT service cost such as standardization, automation, architecture optimization, process optimization and the like are more and more emphasized by people. Among them, automation has been widely studied and applied as a starting point for replacing manual operation.
Disclosure of Invention
The invention aims to provide a fault positioning system and method based on model hit probability distribution, aiming at the defects of the prior art, which can actively detect and analyze abnormal data, quickly realize fault delimitation by using a fault model and indicate the fault removing direction. And meanwhile, according to the analysis steps, the possible causes of the fault are gradually checked, the fault is continuously tried and error, the fault elimination range is reduced, the possible causes of the fault are finally given, and reasonable evidences are output.
In order to achieve the purpose, the invention adopts the following technical scheme:
a fault location system based on model hit probability distribution, comprising: the device comprises a creating module, a receiving module, an analyzing module, a generating module, a matching module, a calculating module and a correcting module;
the creating module is used for creating a fault model judgment rule base and a fault model base in advance;
the receiving module is used for receiving abnormal data;
the analysis module is used for analyzing the received abnormal data and matching the abnormal data with the pre-established fault model judgment rule base;
the generating module is used for generating an obstacle removing task according to the received abnormal data, and the obstacle removing task comprises a plurality of abnormal data;
the matching module is used for matching a plurality of fault models corresponding to each abnormal data according to a plurality of abnormal data in the generated troubleshooting task;
the calculation module is used for carrying out fault delimitation on the current fault elimination task according to the matched plurality of fault models and obtaining delimitation probability through calculation;
and the correction module is used for confirming or correcting the obtained fault definition probability to obtain the final fault definition probability.
Further, the generating module further includes recording the hit times if there are a plurality of abnormal data hits in the same rule base.
Further, the calculating module calculates the activity of the fault model corresponding to each of the plurality of fault models according to the plurality of matched fault models.
Further, the generating module further comprises:
and the judging module is used for judging whether a fault removing task which is the same as the new abnormal data or is not finished by the associated equipment exists or not when the new abnormal data is generated.
Further, the matching module further comprises a step of automatically simulating a fault model according to the unmatched abnormal data and storing the automatically simulated fault model in a fault model library if the generated abnormal data in the troubleshooting task is not matched with the fault model in a fault model library established in advance.
Correspondingly, the fault positioning method based on the model hit probability distribution comprises the following steps:
s0. creating a fault model judgment rule base and a fault model base in advance;
s1, receiving abnormal data;
s2, analyzing the received abnormal data, and matching the abnormal data with the pre-established fault model judgment rule base; if the matching is successful, go to step S3;
s3, generating an obstacle removing task according to the received abnormal data, wherein the obstacle removing task comprises a plurality of abnormal data;
s4, matching a plurality of fault models corresponding to the abnormal data according to the generated abnormal data in the fault removal task;
s5, carrying out fault delimitation on the current fault removing task according to the matched fault models, and obtaining delimitation probability through calculation;
and S6, confirming or correcting the obtained fault delimiting probability to obtain the final fault delimiting probability.
Further, the step S3 includes recording the hit number if there are multiple abnormal data hits in the same rule base.
Further, the step S4 further includes:
and calculating the activity of the fault model corresponding to each of the plurality of fault models according to the matched plurality of fault models.
Further, the step S3 further includes:
when new abnormal data is generated, judging whether a fault removing task which is the same as the new abnormal data or not and is not finished by associated equipment exists or not; if the abnormal data exists, the new abnormal data represents the same troubleshooting task as the same or related equipment; and if the abnormal data does not exist, generating a new fault elimination task by the new abnormal data.
Further, the step S3 includes automatically simulating a fault model according to the unmatched abnormal data if the generated abnormal data in the troubleshooting task does not match the fault model in the pre-created fault model library, and storing the automatically simulated fault model in the fault model library.
Compared with the prior art, the method can actively detect and analyze the abnormal data, quickly realize the delimitation of the fault by utilizing the fault model, and indicate the fault removing direction. And meanwhile, according to the analysis steps, the possible causes of the fault are gradually checked, the fault is continuously tried and error, the fault elimination range is reduced, the possible causes of the fault are finally given, and reasonable evidences are output. The invention is a special IT operation and maintenance field, and aims at fault location of the equipment of the IAAS layer (IT equipment).
Drawings
FIG. 1 is a block diagram of a fault location system based on a model hit probability distribution according to an embodiment;
FIG. 2 is a schematic diagram of a fault locating process for a probability distribution of hit in a model according to an embodiment;
fig. 3 is a flowchart illustrating a method for locating a fault based on a model hit probability distribution according to an embodiment.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The invention aims to provide a fault positioning system and method based on model hit probability distribution, aiming at the defects of the prior art.
IT should be noted that, the present invention analyzes and processes the time from the occurrence of the fault of the IT device to the solution of the fault.
Example one
The present embodiment provides a fault location system based on model hit probability distribution, as shown in fig. 1, a creating module 11, a receiving module 12, an analyzing module 13, a generating module 14, a matching module 15, a calculating module 16, and a correcting module 17;
the creating module 11 is configured to create a fault model judgment rule base and a fault model base in advance;
the receiving module 12 is configured to receive abnormal data;
the analysis module 13 is configured to analyze the received abnormal data, and match the received abnormal data with the pre-established fault model judgment rule base;
the generating module 14 is configured to generate an obstacle elimination task according to the received abnormal data, where the obstacle elimination task includes a plurality of abnormal data;
the matching module 15 is configured to match a plurality of fault models corresponding to each abnormal data according to a plurality of abnormal data in the generated troubleshooting task;
the calculation module 16 is configured to perform fault delimitation on the current troubleshooting task according to the matched multiple fault models, and obtain a delimitation probability through calculation;
and the correcting module 17 is configured to confirm or correct the obtained fault definition probability to obtain a final fault definition probability.
The creating module 11 is used for creating a fault model judgment rule base and a fault model base in advance.
Specifically, the establishment of the rule base is that a system administrator presets a fault model judgment rule, wherein the rule information includes:
(1) rule name: such as a rule, b rule, c rule, etc.;
(2) the index type is as follows: the index comprises a state index, an index threshold value index and a keyword index;
(3) index name: such as CPU utilization;
(4) and (3) rule comparison: equal to, greater than, less than, not equal to, inclusive, not inclusive, between;
(5) comparison value: the corresponding judgment standard of each index;
(6) the number of occurrences: and in the same fault removal task, the abnormal data hit the rule times.
Specifically, the model base is established by combining preset rules by a system administrator to generate a fault model, wherein fault model information includes:
(1) the model name: such as model A, model B, model C, etc.;
(2) describing the model;
(3) containing rules or rule sets: e.g. A model contains rules A (a, b, c), etc
(4) And judging the probability of the result.
In this embodiment, the pre-created fault model determination rule base and the fault model base are stored in the system, which is convenient for subsequent use.
In the receiving module 12, exception data is received.
In this embodiment, when the device generates abnormal data, the system receives the abnormal data.
The analysis module 13 is configured to analyze the received abnormal data, and match the abnormal data with a fault model determination rule base created in advance.
Specifically, when the system receives abnormal data, the system performs text analysis on the abnormal data, acquires key information including configuration item numbers, data types, rule comparison, comparison values, abnormal alarm content and the like from the abnormal data text, and starts to perform abnormal analysis according to the analyzed key information. And the anomaly analysis is to match the analyzed anomaly data with a pre-established fault model judgment rule base.
In the generating module 14, an obstacle elimination task is generated according to the received abnormal data, where the obstacle elimination task includes a plurality of abnormal data.
Matching the analyzed abnormal data with a system rule base, and if the rules are hit, generating a troubleshooting task; if multiple abnormal data hit the same rule, recording the hit times.
A plurality of abnormal data, such as abnormal data 1, abnormal data 2, abnormal data 3, etc., may be included in one troubleshooting task.
It should be noted that the troubleshooting task of the embodiment relates to multiple faults, and the multiple related faults are aggregated into one abnormal event (troubleshooting task).
In the matching module 15, a plurality of fault models corresponding to each abnormal data are matched according to a plurality of abnormal data in the generated troubleshooting task.
After generating the troubleshooting task, matching each abnormal data in the troubleshooting task with a fault model:
(1) when the abnormal data 1 is matched with a certain rule (such as a rule), generating a troubleshooting task, searching all fault models (such as an A model and a B model) containing the rule, and obtaining all possible fault models of the abnormal data 1;
(2) when the abnormal data 2 in the same troubleshooting task is received and matched with the rule (such as the rule b), all fault models (such as models A) containing the two rules (the rule a and the rule b) are searched and are all possible fault models of the abnormal data 2;
(3) and so on. And when abnormal data enters, the judgment probability of the related abnormal event is recalculated once, so that the latest system judgment result is ensured to be displayed on the troubleshooting task page.
(4) The matching module 15 further includes that if the abnormal data in the generated troubleshooting task is not matched with the fault model in the fault model library created in advance, the fault model is automatically simulated according to the unmatched abnormal data, and the automatically simulated fault model is stored in the fault model library.
Specifically, if the abnormal data in the troubleshooting task is not matched with the existing fault model, the fault model matching state in the troubleshooting task is marked as fault model missing, the system automatically simulates a new fault model according to the rule of the missing, then calculates the simulation liveness according to the condition that all the abnormal data hit the fault model, and when the simulation liveness of the fault model exceeds a certain threshold value, the system actively adds the fault model and keeps the hit record and the model liveness.
(5) And when the fault removing task does not receive new abnormal data within 30 minutes, the fault removing task is ended, and the fault model matching is finished.
In the calculation module 16, fault delimitation is performed on the current troubleshooting task according to the matched plurality of fault models, and delimitation probability is obtained through calculation.
In this embodiment, the calculating module further calculates the activity of the fault model corresponding to each of the plurality of fault models according to the plurality of matched fault models.
The method specifically comprises the following steps: after all abnormal data in a certain fault removal task are matched with the fault model, the system can carry out fault delimitation on the current abnormal event according to the matching result and output delimitation probability. The bounded probability decision principle (liveness change principle) is as follows:
(1) each hit rule included in a certain barrier task basically increases the activity of the hit fault model by 1. Namely, in the same troubleshooting task, when multiple abnormal data hit the same rule, the activity of the fault model hit by the rule is increased by 1.
(2) When a certain abnormal data hits multiple rules, the rules are necessarily in an inclusion relationship (special case), so if a fault model containing the rules is matched, the activity of the fault model is only increased by 1.
(3) When the hit rule is changed due to the change of the times of the merged abnormal data in the fault elimination task due to the merging of the abnormal data, so that the matched fault model is changed, the activity of the originally matched fault model is restored, and the activity of the fault model containing the new hit rule is increased by 1.
Specifically, the setting of the occurrence times in the preset rule base is as follows: in the same troubleshooting task, the number of times that the abnormal data hits the rule is 1, and the number of times that the abnormal data is merged currently is 3, which is different from the number of times that the abnormal data hits the rule preset in the rule base, so that the hit rule changes, the matched fault model changes, and a new fault model needs to be determined again according to the abnormal data merged currently.
(4) And when the abnormal data set to which the troubleshooting task is applied changes, recalculating the probability once.
Assuming that one troubleshooting task contains n possible fault models, the bounding probability of the result A is as follows:
Figure DEST_PATH_IMAGE001
wherein:P A shows the resultsAA bounding probability in the troubleshooting task;Siindicating a hit in the troubleshooting taskiLiveness of individual fault models;P Ai is shown asiSelf-directed result of fault modelAThe probability of (c).
In the correction module 17, the obtained fault definition probability is confirmed or corrected to obtain the final fault definition probability.
In this embodiment, after a troubleshooting task is finished, the delimiting result given by the system can be confirmed or corrected manually according to the actual equipment failure condition.
In this embodiment, the system further includes an updating module, configured to update the fault model library.
The fault model is provided with the reference probability, and the reference probability is perfected through continuous manual feedback (manual confirmation when the fault removal task is finished), so that the judgment of the fault model is more accurate.
The sum of the probabilities of all the outcome groups contained in a fault model is 100%, i.e. the sum of the probabilities of all possible outcomes in the outcome groups contained in the fault model is 100%.
The fault model can be associated with a plurality of result groups, each result group is independent, because a fault has fault reasons with a plurality of dimensions, a fault removing task can have results of a plurality of result groups (multi-dimension), and when the fault removing task generates a delimited result, the probability cardinality of the result in each result group of the hit model is increased by 1.
When the system judges that the result has a plurality of results with the highest probability in the result group, the self-increased 1 probability cardinality in the result group is equally divided into a plurality of same results.
Fig. 2 is a schematic diagram of a fault locating process of a model hit probability distribution.
It should be noted that the device failure abnormal data received in this embodiment does not represent that the device is abnormal, and may be that the device associated with the device is abnormal, which results in that the device is not in network communication, so that the associated device is also used as an analysis object in the troubleshooting task.
The embodiment can actively detect and analyze abnormal data, quickly realize fault delimitation by utilizing the fault model, and indicate the fault removing direction. And meanwhile, according to the analysis steps, the possible causes of the fault are gradually checked, the fault is continuously tried and error, the fault elimination range is reduced, the possible causes of the fault are finally given, and reasonable evidences are output.
Correspondingly, the present embodiment further provides a fault location method based on model hit probability distribution, as shown in fig. 3, including the steps of:
s10, a fault model judgment rule base and a fault model base are created in advance;
s11, receiving abnormal data;
s12, analyzing the received abnormal data, and matching the abnormal data with the pre-established fault model judgment rule base; if the matching is successful, go to step S13;
s13, generating an obstacle removing task according to the received abnormal data, wherein the obstacle removing task comprises a plurality of abnormal data;
s14, matching a plurality of fault models corresponding to each abnormal data according to a plurality of abnormal data in the generated fault removal task;
s15, carrying out fault delimitation on the current fault removing task according to the matched fault models, and obtaining delimitation probability through calculation;
and S16, confirming or correcting the obtained fault delimiting probability to obtain the final fault delimiting probability.
Further, the step S13 includes recording the hit number if there are multiple abnormal data hits in the same rule base.
Further, the step S14 further includes:
and calculating the activity of the fault model corresponding to each of the plurality of fault models according to the matched plurality of fault models.
Further, the step S13 further includes:
when new abnormal data is generated, judging whether a fault removing task which is the same as the new abnormal data or not and is not finished by associated equipment exists or not; if the abnormal data exists, the new abnormal data represents the same troubleshooting task as the same or related equipment; and if the abnormal data does not exist, generating a new fault elimination task by the new abnormal data.
Further, the step S13 includes automatically simulating a fault model according to the unmatched abnormal data if the generated abnormal data in the troubleshooting task does not match the fault model in the pre-created fault model library, and storing the automatically simulated fault model in the fault model library.
Example two
The difference between the fault location system based on the model hit probability distribution provided in this embodiment and the first embodiment is that:
and in the generation module, generating an obstacle elimination task according to the received abnormal data, wherein the obstacle elimination task comprises a plurality of abnormal data.
Matching the analyzed abnormal data with a system rule base, and if the rules are hit, generating a troubleshooting task; if multiple abnormal data hit the same rule, recording the hit times.
A plurality of abnormal data, such as abnormal data 1, abnormal data 2, abnormal data 3, etc., may be included in one troubleshooting task.
When new abnormal data is generated in one troubleshooting task, whether an incomplete troubleshooting task of equipment which is the same as or related to the abnormal data exists at present needs to be judged firstly (judgment is carried out according to the configuration item number in the abnormal data and the equipment topology association relation recorded in the CMDB), and if the association information exists, the new entering abnormal data can be regarded as the same troubleshooting task to which the same or similar object belongs; and if the correlation information does not exist, generating a new troubleshooting task by the abnormal data.
According to the fault-removing task processing method and device, the fault-removing task where the currently generated abnormal data is located can be accurately obtained by judging the new abnormal data, and the fault-removing task can be quickly completed.
EXAMPLE III
The difference between the fault location system based on the model hit probability distribution in this embodiment and the first embodiment is that:
this embodiment illustrates the final fault definition probability obtained in a certain fault removal task and the correction of the obtained fault definition probability.
1. Fault model delimiting probability
Matching a rules corresponding to the abnormal event 1 in a certain fault removing task with a fault model A, B, C, wherein the fault model a comprises the rules a (a, B, C), the fault model B comprises the rules (a, B), and the fault model C comprises the rules (a, C), and then increasing the activity of the fault model a, the fault model B, and the fault model C by 1 respectively; when a B rule corresponding to the second abnormal event 2 occurs, the B rule is matched with the fault model A, B, and the activity degrees of the fault model A and the fault model B are respectively increased by 1; when the rule C corresponding to the third abnormal event 3 occurs, the fault model A is matched, the activity of the fault model A is increased by 1, and the activity of the fault model A, the activity of the fault model B and the activity of the fault model C are 3, 2 and 1. The delimited probability of the fault model A is set as 80% of the main machine fault and 20% of the digital communication fault, the delimited probability of the fault model B is set as 60% of the main machine fault and 40% of the digital communication fault, and the delimited probability of the fault model C is set as 70% of the main machine fault and 30% of the digital communication fault. Then according to the formula:
Figure 906087DEST_PATH_IMAGE001
wherein:P A shows the resultsAA bounding probability in the troubleshooting task;Siindicating a hit in the troubleshooting taskiLiveness of individual fault models;P Ai is shown asiSelf-directed result of fault modelAThe probability of (c).
(1) When an abnormal event occurs again, when a rule occurs, the activity of all matched fault models A, B, C is increased by 1, and the delimiting probability at this time is:
host failure:
4/(4+3+2)*80%+3/(4+3+2)*60%+2/(4+3+2)*70%=71.11%;
digital communication fault:
4/(4+3+2)*20%+3/(4+3+2)*40%+2/(4+3+2)*30%=28.89%;
(2) when the abnormal event occurs again, and b rules occur, the activity of all matched fault models A, B is increased by 1, and the delimiting probability is:
host failure: 5/(5+4) × 80% +4/(5+4) × 60% =71.11%
Digital communication fault: 5/(5+4) × 20% +4/(5+4) × 40% = 28.89%;
(3) when the abnormal event occurs again, if the rule c occurs, the activity of all matched fault models A is increased by 1, and the delimiting probability is as follows:
host failure: 6/6 =80%
Digital communication fault: 6/6 20% = 20%.
2. Bounded probability result correction
(1) When the fault model configures the result a for the first time, the probability base of the result a for the fault model is set to 1, and the reference probability of the model at this time is the result a: 100 percent;
(2) when the fault model a is matched for the second time, the reference probability of the fault model a at this time is result a: 100%, the result is a result A after manual confirmation, and the probability cardinality of the result A corresponding to the fault model is increased by 1 to 2;
(3) when the fault model A is matched for the third time, the reference probability of the fault model is still the result A: 100%, the result after manual confirmation is a result B, the probability base number of the result A corresponding to the fault model is 2, and the probability base number of the result B corresponding to the fault model is 1;
(4) when the fourth matching is carried out on the fault model A, the reference probability of the fault model at this time is result A: 66.7% (calculated from the previous 3 results, 100%. 2/3=66.7%) of the percentage of the sum of the probability bases of all the results of the corresponding result group occupied by the current result, the result is a result a after manual confirmation, the probability base of the result a corresponding to the fault model is increased by 1 to 3, and the probability base of the result B corresponding to the fault model is 1;
(5) and so on. If the state is not confirmed, the probability base calculation is not included. The addition of a model is considered as a manual confirmation of the original result of the model.
The embodiment can actively detect and analyze abnormal data, quickly realize fault delimitation by utilizing the fault model, and indicate the fault removing direction. And meanwhile, according to the analysis steps, the possible causes of the fault are gradually checked, the fault is continuously tried and error, the fault elimination range is reduced, the possible causes of the fault are finally given, and reasonable evidences are output.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A fault location system based on a model hit probability distribution, comprising: the device comprises a creating module, a receiving module, an analyzing module, a generating module, a matching module, a calculating module and a correcting module;
the creating module is used for creating a fault model judgment rule base and a fault model base in advance;
the receiving module is used for receiving abnormal data;
the analysis module is used for analyzing the received abnormal data and matching the abnormal data with the pre-established fault model judgment rule base;
the generating module is used for generating an obstacle removing task according to the received abnormal data, and the obstacle removing task comprises a plurality of abnormal data;
the matching module is used for matching a plurality of fault models corresponding to each abnormal data according to a plurality of abnormal data in the generated troubleshooting task;
the calculation module is used for carrying out fault delimitation on the current fault elimination task according to the matched plurality of fault models and obtaining delimitation probability through calculation;
and the correction module is used for confirming or correcting the obtained fault definition probability to obtain the final fault definition probability.
2. The system of claim 1, wherein the generating module further records the number of hits if there are multiple hits of abnormal data in the same rule base.
3. The system of claim 2, wherein the computing module further calculates a fault model activity corresponding to each of the plurality of fault models according to the matched plurality of fault models.
4. The system of claim 2, wherein the generating module further comprises:
and the judging module is used for judging whether a fault removing task which is the same as the new abnormal data or is not finished by the associated equipment exists or not when the new abnormal data is generated.
5. The system of claim 4, wherein the matching module further comprises automatically simulating a fault model according to the unmatched abnormal data and storing the automatically simulated fault model in a fault model library if the abnormal data in the generated troubleshooting task is not matched with the fault model in a fault model library created in advance.
6. A fault positioning method based on model hit probability distribution is characterized by comprising the following steps:
s0. creating a fault model judgment rule base and a fault model base in advance;
s1, receiving abnormal data;
s2, analyzing the received abnormal data, and matching the abnormal data with the pre-established fault model judgment rule base; if the matching is successful, go to step S3;
s3, generating an obstacle removing task according to the received abnormal data, wherein the obstacle removing task comprises a plurality of abnormal data;
s4, matching a plurality of fault models corresponding to the abnormal data according to the generated abnormal data in the fault removal task;
s5, carrying out fault delimitation on the current fault removing task according to the matched fault models, and obtaining delimitation probability through calculation;
and S6, confirming or correcting the obtained fault delimiting probability to obtain the final fault delimiting probability.
7. The method as claimed in claim 6, wherein the step S3 further includes recording the number of hits if there are multiple abnormal data hits in the same rule base.
8. The method for fault location based on model hit probability distribution according to claim 7, wherein the step S4 further comprises:
and calculating the activity of the fault model corresponding to each of the plurality of fault models according to the matched plurality of fault models.
9. The method for fault location based on model hit probability distribution according to claim 7, wherein the step S3 further comprises:
when new abnormal data is generated, judging whether a fault removing task which is the same as the new abnormal data or not and is not finished by associated equipment exists or not; if the abnormal data exists, the new abnormal data represents the same troubleshooting task as the same or related equipment; and if the abnormal data does not exist, generating a new fault elimination task by the new abnormal data.
10. The method for fault location based on model hit probability distribution of claim 9, wherein the step S3 further includes automatically simulating a fault model according to the unmatched abnormal data if the abnormal data in the generated troubleshooting task is not matched to the fault model in the pre-created fault model library, and storing the automatically simulated fault model in the fault model library.
CN201911305679.4A 2019-12-18 2019-12-18 Fault positioning system and method based on model hit probability distribution Active CN110727538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911305679.4A CN110727538B (en) 2019-12-18 2019-12-18 Fault positioning system and method based on model hit probability distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911305679.4A CN110727538B (en) 2019-12-18 2019-12-18 Fault positioning system and method based on model hit probability distribution

Publications (2)

Publication Number Publication Date
CN110727538A true CN110727538A (en) 2020-01-24
CN110727538B CN110727538B (en) 2020-04-07

Family

ID=69226054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911305679.4A Active CN110727538B (en) 2019-12-18 2019-12-18 Fault positioning system and method based on model hit probability distribution

Country Status (1)

Country Link
CN (1) CN110727538B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418449A (en) * 2020-10-13 2021-02-26 国网山东省电力公司莘县供电公司 Generation method, positioning method and device of power supply line fault positioning model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120233112A1 (en) * 2011-03-10 2012-09-13 GM Global Technology Operations LLC Developing fault model from unstructured text documents
CN105245185A (en) * 2015-09-30 2016-01-13 南京南瑞集团公司 Regional distributed photovoltaic fault diagnosis system and method for access power distribution network
CN106789243A (en) * 2016-12-22 2017-05-31 烟台东方纵横科技股份有限公司 A kind of IT operational systems with intelligent trouble analytic function
CN107846016A (en) * 2017-11-16 2018-03-27 中国南方电网有限责任公司 A kind of Distribution Network Failure localization method and equipment based on Bayes and Complex event processing
CN109840157A (en) * 2017-11-28 2019-06-04 中国移动通信集团浙江有限公司 Method, apparatus, electronic equipment and the storage medium of fault diagnosis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120233112A1 (en) * 2011-03-10 2012-09-13 GM Global Technology Operations LLC Developing fault model from unstructured text documents
CN105245185A (en) * 2015-09-30 2016-01-13 南京南瑞集团公司 Regional distributed photovoltaic fault diagnosis system and method for access power distribution network
CN106789243A (en) * 2016-12-22 2017-05-31 烟台东方纵横科技股份有限公司 A kind of IT operational systems with intelligent trouble analytic function
CN107846016A (en) * 2017-11-16 2018-03-27 中国南方电网有限责任公司 A kind of Distribution Network Failure localization method and equipment based on Bayes and Complex event processing
CN109840157A (en) * 2017-11-28 2019-06-04 中国移动通信集团浙江有限公司 Method, apparatus, electronic equipment and the storage medium of fault diagnosis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨朝鹏等: "基于日志的机器学习方法实现故障快速定界的研究与应用", 《智能网络》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418449A (en) * 2020-10-13 2021-02-26 国网山东省电力公司莘县供电公司 Generation method, positioning method and device of power supply line fault positioning model

Also Published As

Publication number Publication date
CN110727538B (en) 2020-04-07

Similar Documents

Publication Publication Date Title
EP3796176B1 (en) Fault root cause analysis method and apparatus
Lou et al. Mining dependency in distributed systems through unstructured logs analysis
US11483213B2 (en) Enterprise process discovery through network traffic patterns
Gainaru et al. Event log mining tool for large scale HPC systems
Kobayashi et al. Towards an NLP-based log template generation algorithm for system log analysis
US20230132116A1 (en) Prediction of impact to data center based on individual device issue
WO2023071761A1 (en) Anomaly positioning method and device
CN114430365B (en) Fault root cause analysis method, device, electronic equipment and storage medium
WO2022134911A1 (en) Diagnosis method and apparatus, and terminal and storage medium
CN111930597B (en) Log abnormality detection method based on transfer learning
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
CN112070416A (en) AI-based RPA process generation method, apparatus, device and medium
CN112445775A (en) Fault analysis method, device, equipment and storage medium of photoetching machine
CN115455429A (en) Vulnerability analysis method and system based on big data
US20200117566A1 (en) Methods and Systems to Determine Baseline Event-Type Distributions of Event Sources and Detect Changes in Behavior of Event Sources
CN110727538B (en) Fault positioning system and method based on model hit probability distribution
US11290325B1 (en) System and method for change reconciliation in information technology systems
US20220284045A1 (en) Matching machine generated data entries to pattern clusters
US20230011129A1 (en) Log analyzer for fault detection
CN114465875A (en) Fault processing method and device
CN114329453A (en) Anomaly detection method based on system log
CN112380042A (en) Fault positioning and analyzing method and device for database software and storage medium
WO2024027127A1 (en) Fault detection method and apparatus, and electronic device and readable storage medium
JPH11175144A (en) Maintenance supporting device for plant equipment
CN113434193B (en) Root cause change positioning method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant