CN107357730B - System fault diagnosis and repair method and device - Google Patents

System fault diagnosis and repair method and device Download PDF

Info

Publication number
CN107357730B
CN107357730B CN201710580322.1A CN201710580322A CN107357730B CN 107357730 B CN107357730 B CN 107357730B CN 201710580322 A CN201710580322 A CN 201710580322A CN 107357730 B CN107357730 B CN 107357730B
Authority
CN
China
Prior art keywords
fault
data
type
diagnosis model
establishing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710580322.1A
Other languages
Chinese (zh)
Other versions
CN107357730A (en
Inventor
王慧锋
王晓通
张凯顺
郭锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201710580322.1A priority Critical patent/CN107357730B/en
Publication of CN107357730A publication Critical patent/CN107357730A/en
Application granted granted Critical
Publication of CN107357730B publication Critical patent/CN107357730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The patent refers to the field of 'transmission of digital information'. Disclosed herein is a system fault diagnosis and repair method, comprising: respectively extracting the characteristics of fault data of each fault type through historical fault data, and establishing a fault diagnosis model; when the system has a fault, inputting the data of the current fault into the fault diagnosis model, and analyzing and determining the fault type of the current fault; and calling a fault processing strategy corresponding to the fault type of the current fault to repair the fault. A system fault diagnosis and repair device is also provided.

Description

System fault diagnosis and repair method and device
Technical Field
The invention relates to a cloud computing data center technology, in particular to a fault automatic diagnosis and repair scheme of an operation and maintenance automation platform system.
Background
With the advancement of the informatization process of companies, the number of business modules is increased rapidly, and the operation and maintenance difficulty is increased. The complexity of the service module makes system failure handling more difficult. How to automatically diagnose and repair system faults, reduce operation and maintenance cost and reduce the loss of the system faults to companies becomes important.
Disclosure of Invention
The invention aims to solve the technical problem of providing a system fault diagnosis and repair method and device, which can improve the system fault diagnosis efficiency.
In order to solve the technical problem, the invention discloses a system fault diagnosis and repair method, which comprises the following steps:
respectively extracting the characteristics of fault data of each fault type through historical fault data, and establishing a fault diagnosis model;
when the system has a fault, inputting the data of the current fault into the fault diagnosis model, and analyzing and determining the fault type of the current fault;
and calling a fault processing strategy corresponding to the fault type of the current fault to repair the fault.
Optionally, in the method, the extracting the features of the fault data of each fault type respectively through the historical fault data, and the establishing the fault diagnosis model includes:
and extracting the characteristics of the fault data of different fault types from the historical fault data by using a comparison mode, and establishing a fault diagnosis model according to the extracted characteristics.
Optionally, in the above method, the extracting, by using the comparison mode, the characteristics of the fault data of different fault types from the historical fault data includes:
the support degree sup (P, D) of the contrast pattern P is calculated by the following formulai) The contrast pattern P ═ I1I2I3…I|P|For frequent occurrences in a data set of one fault type, infrequent patterns in data sets of other fault types, are characterized for fault data of that fault type:
sup(P,Di)=|{S|S∈Diand P appears in S }/| Di|;i∈[1,k]
Wherein D i denotes the failure data set of the i-th failure type, and k is the total number of types of failure data;
and sup (P, D)i) The first threshold value alpha is larger than the support degree and is smaller than the second threshold value beta.
Optionally, in the method, the establishing a fault diagnosis model includes:
establishing a fault diagnosis model phi according to the following formula:
Φ={F1,F2,...Fk};
Fi={f(P)|P∈Ti};
Figure BDA0001352098880000021
wiis the weight of the function f (Pi);
Ti={P1,P2…Pnis the i-th type failure data set Di(i∈[1,k]) N ═ k.
Optionally, the method further includes:
and calling a fault processing strategy corresponding to the fault type of the current fault, and after fault repair, if the fault cannot be effectively solved, sending fault data to operation and maintenance personnel for manual intervention processing.
There is also provided a system fault diagnosis and repair apparatus comprising:
the first unit is used for respectively extracting the characteristics of fault data of each fault type through historical fault data and establishing a fault diagnosis model;
the second unit is used for inputting the data of the current fault into the fault diagnosis model when the system has the fault, and analyzing and determining the fault type of the current fault;
and the third unit is used for calling a fault processing strategy corresponding to the fault type of the current fault to carry out fault repair.
Optionally, in the above apparatus, the first unit respectively extracts features of fault data of each fault type according to historical fault data, and the establishing a fault diagnosis model includes:
and extracting the characteristics of the fault data of different fault types from the historical fault data by using a comparison mode, and establishing a fault diagnosis model according to the extracted characteristics.
Optionally, in the foregoing apparatus, the extracting, by using the comparison mode, the characteristics of the fault data of different fault types from the historical fault data includes:
the support degree sup (P, D) of the contrast pattern P is calculated by the following formulai) The contrast pattern P ═ I1I2I3…I|P|For frequent occurrences in a data set of one fault type, infrequent patterns in data sets of other fault types, are characterized for fault data of that fault type:
sup(P,Di)=|{S|S∈Diand P appears in S }/| Di|;i∈[1,k]
Wherein D i denotes the failure data set of the i-th failure type, and k is the total number of types of failure data;
and sup (P, D)i) The first threshold value alpha is larger than the support degree and is smaller than the second threshold value beta.
Optionally, in the above apparatus, the establishing a fault diagnosis model from the extracted features includes:
establishing a fault diagnosis model phi according to the following formula:
Φ={F1,F2,...Fk};
Fi={f(P)|P∈Ti};
Figure BDA0001352098880000031
wiis the weight of the function f (Pi);
Ti={P1,P2…Pnis the i-th type failure data set Di(i∈[1,k]) N ═ k.
Optionally, in the above apparatus, the third unit calls a fault handling policy corresponding to a fault type of the current fault, and after the fault is repaired, if the fault cannot be effectively solved, sends the fault data to an operation and maintenance worker for manual intervention.
According to the technical scheme, on one hand, the effect of efficient access of fault data is achieved by establishing the multi-level index, on the other hand, the problem that manual fault data classification is difficult is solved through semi-supervised learning in machine learning, so that efficient access and automatic classification of the fault data are achieved, time spent on manual troubleshooting and processing is reduced, and losses of companies are reduced.
Drawings
Fig. 1 is a flowchart of a system fault diagnosis and repair method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be further described in detail with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments of the present application may be arbitrarily combined with each other without conflict.
The embodiment provides a system fault diagnosis and repair method, as shown in fig. 1, which mainly includes the following operations:
step 100: respectively extracting the characteristics of fault data of each fault type through historical fault data, and establishing a fault diagnosis model;
in this step, the purpose of extracting the features of the fault data is to perform fault classification. The types of failures referred to herein may include disk-like failures (e.g., failures such as insufficient disk space), CPU-like failures (e.g., failures such as full CPU load), traffic-like failures (e.g., failures such as abnormal traffic types).
In this embodiment, considering that the comparison mode has inherent advantages in describing the characteristics of various samples, the comparison mode of various fault types may be selected as the characteristics of the fault data of different fault types when establishing the fault diagnosis model and performing fault repair.
The specific extraction method is as follows, and for convenience of description, we use D ═ D1,D2,…,DkDenotes the set of failure data, DkA fault data set representing a kth type of fault. Contrast pattern P ═ I1I2I3…I|P|Patterns that occur frequently in one class of failure data and infrequently in other classes are described. Support of usage patterns (sup (P, D)i) Measure the pattern P in the data set DiThe calculation method may employ equation 1 shown below.
sup(P,Di)=|{S|S∈DiAnd P appears in S }/| DiEquation 1
Wherein i ∈ [1, k ]]) And contrast pattern P support sup (P, D)i) The following requirements are to be met:
sup(P,Di)>α;sup(P,Dj)<β(j∈[1,k]∧j!=i);
namely sup (P, D)i) And the first threshold value alpha of the support degree is the minimum value of the support degree in the fault data sets of various fault types, and the second threshold value beta of the support degree is the maximum value of the support degree in the fault data sets of various fault types. The basis for establishing the fault diagnosis model is a fault data warehouse, and the characteristics of the fault data in the fault data warehouse are obtained according to the mode.
Given some kind of fault data Di(i∈[1,k]) Mode set T ofi={P1,P2…PnF, failure diagnosis model phi1,F2,...FkIn which FiWhere { f (P) | P ∈ T } represents a mathematical model of the ith type of fault data, where
Figure BDA0001352098880000051
The pattern set T can be obtained by a data mining algorithm, such as DPMiner algorithm, MDSP-CGC algorithm, etc. Function f (P)i) Weight w iniCan be obtained by a corresponding weight learning algorithm. In summary, a mathematical model Φ for fault diagnosis can be obtained.
Step 200: when the system has a fault, inputting the data of the current fault into a fault diagnosis model, and analyzing and confirming the fault type of the current fault;
the method comprises the steps of collecting fault information (namely acquiring data of a current fault through a fault log) when a system has a fault, inputting a fault diagnosis model phi, and judging to obtain a fault type.
Step 300: and calling a fault processing strategy corresponding to the fault type of the current fault to repair the fault.
In this step, after the fault type is obtained, a method for solving a certain fault built in the system may be called, that is, a fault processing policy corresponding to the fault type performs a corresponding repair operation. If the built-in fault processing method (namely the fault processing strategy corresponding to the fault type) fails to effectively solve the fault, the fault information can be sent to the relevant operation and maintenance personnel for manual intervention and solution. The fault handling policy corresponding to different fault types may adopt any existing manner, and this embodiment is not particularly limited to this.
The embodiment also provides a system fault diagnosis and repair device, which at least comprises the following units.
The first unit is used for respectively extracting the characteristics of fault data of each fault type through historical fault data and establishing a fault diagnosis model;
optionally, the first unit may extract features of the fault data of different fault types from the historical fault data using the comparison mode, and build the fault diagnosis model from the extracted features.
Specifically, the support level sup (P, D) of the contrast pattern P is calculated using the following formulai) The contrast pattern P ═ I1I2I3…I|P|For frequent occurrences in a data set of one fault type, infrequent patterns in data sets of other fault types, are characterized for fault data of that fault type:
sup(P,Di)=|{S|S∈Diand P appears in S }/| Di|;i∈[1,k]
Wherein D i denotes the failure data set of the i-th failure type, and k is the total number of types of failure data;
and sup (P, D)i) The first threshold value alpha is larger than the support degree and is smaller than the second threshold value beta.
Then, establishing a fault diagnosis model according to the extracted characteristics of the fault data of each fault type comprises the following steps:
establishing a fault diagnosis model phi according to the following formula:
Φ={F1,F2,...Fk};
Fi={f(P)|P∈Ti};
Figure BDA0001352098880000061
wiis the weight of the function f (Pi);
Ti={P1,P2…Pnis the i-th type failure data set Di(i∈[1,k]) N ═ k.
The second unit is used for inputting the data of the current fault into the fault diagnosis model when the system has the fault, and analyzing and determining the fault type of the current fault;
and the third unit is used for calling a fault processing strategy corresponding to the fault type of the current fault to carry out fault repair.
It should be noted that, a fault handling policy corresponding to the fault type of the current fault is called, after the fault is repaired, if the fault cannot be solved effectively, the fault data is sent to the operation and maintenance personnel for manual intervention handling. The fault handling policy corresponding to different fault types referred to herein may adopt any existing manner, and this embodiment is not particularly limited to this.
In addition, the apparatus can implement the system fault diagnosis and repair method described in the above embodiment, and therefore, for some specific operation details of the apparatus, reference may be made to corresponding contents of the above method embodiment, which is not described herein again.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present application is not limited to any specific form of hardware or software combination.
The above description is only a preferred example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. A system fault diagnosis and repair method is characterized by comprising the following steps:
respectively extracting the characteristics of fault data of each fault type through historical fault data, and establishing a fault diagnosis model;
when the system has a fault, inputting the data of the current fault into the fault diagnosis model, and analyzing and determining the fault type of the current fault;
calling the fault processing strategy corresponding to the fault type of the current fault to carry out fault repair,
the step of respectively extracting the characteristics of the fault data of each fault type through historical fault data and establishing a fault diagnosis model comprises the following steps:
extracting the characteristics of fault data of different fault types from historical fault data by using a contrast mode, establishing a fault diagnosis model according to the extracted characteristics,
the extracting the characteristics of the fault data of different fault types from the historical fault data by using the comparison mode comprises the following steps:
the support degree sup (P, D) of the contrast pattern P is calculated by the following formulai) The contrast pattern P ═ I1I2I3…I|P|For frequent occurrences in a data set of one fault type, infrequent patterns in data sets of other fault types, are characterized for fault data of that fault type:
sup(P,Di)=|{S|S∈Diand P appears in S }/| Di|;i∈[1,k]
In the formula, Di represents a fault data set of the ith fault type, and k is the total number of the types of the fault data;
and sup (P, D)i) The first threshold value alpha is greater than the support degree and is smaller than the second threshold value beta,
the establishing of the fault diagnosis model comprises the following steps:
establishing a fault diagnosis model phi according to the following formula:
Φ={F1,F2,...Fk};
Fi={f(P)|P∈Ti};
Figure FDA0002820810290000011
wiis the weight of the function f (Pi);
Ti={P1,P2…Pnis the i-th type failure data set Di(i∈[1,k]) N ═ k.
2. The method of claim 1, further comprising:
and calling a fault processing strategy corresponding to the fault type of the current fault, and after fault repair, if the fault cannot be effectively solved, sending fault data to operation and maintenance personnel for manual intervention processing.
3. A system fault diagnosis repair apparatus, comprising:
the first unit is used for respectively extracting the characteristics of fault data of each fault type through historical fault data and establishing a fault diagnosis model;
the second unit is used for inputting the data of the current fault into the fault diagnosis model when the system has the fault, and analyzing and determining the fault type of the current fault;
a third unit for calling the fault processing strategy corresponding to the fault type of the current fault to carry out fault repair,
the first unit respectively extracts the characteristics of the fault data of each fault type through historical fault data, and the establishment of the fault diagnosis model comprises the following steps:
extracting the characteristics of fault data of different fault types from historical fault data by using a contrast mode, establishing a fault diagnosis model according to the extracted characteristics,
the extracting the characteristics of the fault data of different fault types from the historical fault data by using the comparison mode comprises the following steps:
the support degree sup (P, D) of the contrast pattern P is calculated by the following formulai) The contrast pattern P ═ I1I2I3…I|P|For frequent occurrences in a data set of one fault type, infrequent patterns in data sets of other fault types, are characterized for fault data of that fault type:
sup(P,Di)=|{S|S∈Diand P appears in S }/| Di|;i∈[1,k]
In the formula, Di represents a fault data set of the ith fault type, and k is the total number of the types of the fault data;
and sup (P, D)i) The first threshold value alpha is greater than the support degree and is smaller than the second threshold value beta,
the establishing of the fault diagnosis model by the extracted features refers to:
establishing a fault diagnosis model phi according to the following formula:
Φ={F1,F2,...Fk};
Fi={f(P)|P∈Ti};
Figure FDA0002820810290000031
wiis the weight of the function f (Pi);
Ti={P1,P2…Pnis the i-th type failure data set Di(i∈[1,k]) N ═ k.
4. The apparatus of claim 3,
and the third unit calls a fault processing strategy corresponding to the fault type of the current fault, and after fault repair is carried out, if the fault cannot be effectively solved, the fault data is sent to operation and maintenance personnel for manual intervention processing.
CN201710580322.1A 2017-07-17 2017-07-17 System fault diagnosis and repair method and device Active CN107357730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710580322.1A CN107357730B (en) 2017-07-17 2017-07-17 System fault diagnosis and repair method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710580322.1A CN107357730B (en) 2017-07-17 2017-07-17 System fault diagnosis and repair method and device

Publications (2)

Publication Number Publication Date
CN107357730A CN107357730A (en) 2017-11-17
CN107357730B true CN107357730B (en) 2021-03-19

Family

ID=60293294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710580322.1A Active CN107357730B (en) 2017-07-17 2017-07-17 System fault diagnosis and repair method and device

Country Status (1)

Country Link
CN (1) CN107357730B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108054734B (en) * 2017-11-22 2019-10-22 深圳供电局有限公司 Distribution network protection method and system based on fault feature matching
CN108322345B (en) * 2018-02-07 2020-08-21 平安科技(深圳)有限公司 Method for issuing fault repair data packet and server
CN108334427B (en) * 2018-02-24 2022-03-25 腾讯科技(深圳)有限公司 Fault diagnosis method and device in storage system
CN109088773B (en) * 2018-08-24 2022-03-11 广州视源电子科技股份有限公司 Fault self-healing method and device, server and storage medium
CN110011825A (en) * 2019-02-26 2019-07-12 贵阳忆联网络有限公司 A kind of network failure automatic intelligent processing method and system
CN110191003A (en) * 2019-06-18 2019-08-30 北京达佳互联信息技术有限公司 Fault repairing method, device, computer equipment and storage medium
CN112630657B (en) * 2019-09-24 2024-06-21 上海汽车集团股份有限公司 Method and device for determining power battery fault
CN111752963A (en) * 2020-06-28 2020-10-09 中国银行股份有限公司 System problem processing method and device
CN112084100B (en) * 2020-09-11 2023-02-28 山东英信计算机技术有限公司 Server operation and maintenance method, device and equipment and readable storage medium
CN115616423B (en) * 2022-12-20 2023-05-23 广东采日能源科技有限公司 Liquid cooling energy storage system and state detection method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819411B (en) * 2010-03-17 2011-06-15 燕山大学 GPU-based equipment fault early-warning and diagnosis method for improving weighted association rules
US8301333B2 (en) * 2010-03-24 2012-10-30 GM Global Technology Operations LLC Event-driven fault diagnosis framework for automotive systems
CN103760901B (en) * 2013-12-31 2016-06-29 北京泰乐德信息技术有限公司 A kind of rail transit fault identification method based on Classification of Association Rules device
CN103901298A (en) * 2014-03-13 2014-07-02 广东电网公司电力科学研究院 Method and system for detecting operating states of substation equipment
US10180867B2 (en) * 2014-06-11 2019-01-15 Leviathan Security Group, Inc. System and method for bruteforce intrusion detection
CN105372557A (en) * 2015-12-03 2016-03-02 国家电网公司 Power grid resource fault diagnosis method based on association rules
CN106201828A (en) * 2016-07-18 2016-12-07 云南电网有限责任公司信息中心 A kind of virtual-machine fail detection method based on data mining and system
CN106326426A (en) * 2016-08-24 2017-01-11 四川大学 Comparison sequence pattern mining method by adopting item sets as sequential elements

Also Published As

Publication number Publication date
CN107357730A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
CN107357730B (en) System fault diagnosis and repair method and device
CN112087334B (en) Alarm root cause analysis method, electronic device and storage medium
CN110221145B (en) Power equipment fault diagnosis method and device and terminal equipment
CN111435366A (en) Equipment fault diagnosis method and device and electronic equipment
CN113191509A (en) Intelligent order dispatching method, equipment, medium and product based on maintenance personnel portrait
CN109905268B (en) Network operation and maintenance method and device
CN109684309A (en) A kind of quality of data evaluating method and device, computer equipment and storage medium
CN111274084A (en) Fault diagnosis method, device, equipment and computer readable storage medium
CN115686910A (en) Fault analysis method and device, electronic equipment and medium
CN110336590A (en) A kind of Fault Locating Method of power telecom network, device and equipment
CN110389840B (en) Load consumption early warning method and device, computer equipment and storage medium
CN117729576A (en) Alarm monitoring method, device, equipment and storage medium
CN109995554A (en) The control method and cloud dispatch control device of multi-stage data center active-standby switch
CN116974934A (en) Memory leakage detection method, device, equipment and storage medium
CN111580894A (en) Data analysis early warning method, device, computer system and readable storage medium
CN113986618B (en) Cluster brain fracture automatic repair method, system, device and storage medium
CN112925668B (en) Method, device, equipment and storage medium for evaluating server health
CN114091699A (en) Power communication equipment fault diagnosis method and system
US11844134B1 (en) Cell site repair part prediction machine learning system
CN113342518A (en) Task processing method and device
CN113222778A (en) Method, electronic device and storage medium for power network adaptation analysis
CN115372752A (en) Fault detection method, device, electronic equipment and storage medium
CN111722977A (en) System inspection method and device and electronic equipment
CN111694705A (en) Monitoring method, device, equipment and computer readable storage medium
CN112651447B (en) Ontology-based resource classification labeling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210204

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: Room 1601, floor 16, 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant