CN108776625A - A kind of restorative procedure of service fault, device and storage medium - Google Patents

A kind of restorative procedure of service fault, device and storage medium Download PDF

Info

Publication number
CN108776625A
CN108776625A CN201810665134.3A CN201810665134A CN108776625A CN 108776625 A CN108776625 A CN 108776625A CN 201810665134 A CN201810665134 A CN 201810665134A CN 108776625 A CN108776625 A CN 108776625A
Authority
CN
China
Prior art keywords
failure
service
fault
solution
knowledge library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810665134.3A
Other languages
Chinese (zh)
Inventor
钟以冠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201810665134.3A priority Critical patent/CN108776625A/en
Publication of CN108776625A publication Critical patent/CN108776625A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Abstract

The invention discloses a kind of restorative procedure of service fault, device and storage mediums.The method includes:The service operation of real time monitoring judges in the failure whether existing fault knowledge library in the fault knowledge library there is no when the failure when breaking, restore the failure according to preset strategy when the service failure state.Compared to conventional artificial manual handle, real time monitoring and intelligent repair process are repaired automatically when service being enable to break down.

Description

A kind of restorative procedure of service fault, device and storage medium
Technical field
The present invention relates to computer technology technology, espespecially a kind of restorative procedure of service fault, device and storage medium.
Background technology
In the information age, more and more important is become to the monitoring of application service, the normal operation of application service produces company Raw huge economic benefit.It can inevitably break down in application program operational process, the solution after application failure is more For artificial manual handle, labor intensive, treatment effeciency is relatively low, and a period of time is needed from pinpointing the problems to being disposed, and influences The normal use of application program generates large effect to the business efficiency of company.
Invention content
In order to solve the above technical problem, the present invention provides a kind of restorative procedure of service fault, device and storages to be situated between Matter can intelligently repair service fault.
In order to reach the object of the invention, the present invention provides a kind of restorative procedure of service fault, the method includes:
The service operation of real time monitoring judges the whether existing event of the failure when the service failure state Hinder in knowledge base, in the fault knowledge library there is no when the failure when breaking, the failure is restored according to preset strategy.
Further, the method further includes:
When judging in the fault knowledge library there are when the failure, the solution described in the fault knowledge library Scheme restores the failure.
Further, the method further includes:Record the running log information of the service;
It is described to include according to the preset strategy recovery failure:
Obtain the running log information of the service;
The reason of failure being obtained according to the log information;
According to inquiring solution in preset program the reason of the failure;
Restore the failure according to the solution.
Further, after the failure according to preset strategy recovery, further include:
The evaluation result to the Petri Nets is obtained, when the evaluation result is correct, by the failure Failure cause and solution are stored in the fault knowledge library.
Further, it when the service failure state, opens preset timer and starts timing;
When reaching the preset time when the timer, failure does not release yet, sends out failure and artificially handles alarm;
Restore failure according to the instruction of input.
In order to reach the object of the invention, the present invention also provides a kind of prosthetic device of service fault, described device includes: Monitoring module, recovery module, wherein:
The monitoring module, the service operation for real time monitoring;
The recovery module, for when the service failure state, judging the whether existing failure of the failure In knowledge base, it in the fault knowledge library there is no when the failure when breaking, the failure is restored according to preset strategy.
Further, described device further includes:
When the recovery module is judged in the fault knowledge library there are when the failure, according to the fault knowledge library Described in solution restore the failure.
Further, described device further includes:Logging modle, the logging modle are used to record the operation day of the service Will information;
The recovery module restores the failure according to preset strategy:
The recovery module obtains the running log information of the service;
The reason of recovery module obtains the failure according to the log information;
The recovery module in preset program the reason of the failure according to inquiring solution;
The recovery module restores the failure according to the solution.
Further, further include deposit module after the failure according to preset strategy recovery:
It is stored in evaluation result of the module acquisition to the Petri Nets, when the evaluation result is correct, by institute The failure cause and solution for stating failure are stored in the fault knowledge library.
In order to reach the object of the invention, the present invention also provides a kind of computer readable storage mediums, are stored thereon with meter Calculation machine program, when which is executed by processor the step of the realization above method.
Compared with prior art, the present invention includes the service operation of real time monitoring, when the service failure state, Judge in the failure whether existing fault knowledge library, when break in the fault knowledge library be not present the failure when, root Restore the failure according to preset strategy.Real time monitoring and intelligent repair process, are repaired automatically when service being enable to break down.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages can be by specification, rights Specifically noted structure is realized and is obtained in claim and attached drawing.
Description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is the flow chart of the restorative procedure of one service fault of the embodiment of the present invention;
Fig. 2 is another flow chart of the restorative procedure of two service fault of the embodiment of the present invention;
Fig. 3 is the structural schematic diagram of the prosthetic device of three service fault of the embodiment of the present invention.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature mutually can arbitrarily combine.
Step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions It executes.Also, although logical order is shown in flow charts, and it in some cases, can be with suitable different from herein Sequence executes shown or described step.
Embodiment one
The present invention provides a kind of restorative procedures of service fault, as shown in Figure 1, being somebody's turn to do, the method includes S11-S12:
The service operation of S11, real time monitoring;
S12, when servicing failure state, in failure judgement whether existing fault knowledge library, when disconnected be out of order is known Know in library there is no when failure, failure is restored according to preset strategy.
In the embodiment of the present invention, when servicing failure state, failure is restored according to preset strategy.Compared to routine Artificial manual handle can intelligently repair service fault.
When the service failure state, the method further includes:Send out fault warning.
In the present embodiment, further include:
When judging in fault knowledge library there are when failure, the solution described in fault knowledge library restore therefore Barrier.
Solution described in fault knowledge library restores failure, including:The failure for obtaining the service of failure is former Cause finds corresponding solution according to service fault reason in fault knowledge library, obtains and server state is matched repaiies Multiple program, is repaired automatically.
In the present embodiment, further include:Record the running log information of service;
It is described to include according to preset strategy recovery failure:
Obtain the running log information of service;
The reason of failure being obtained according to log information;
According to inquiring solution in preset program the reason of failure;
Restore failure according to solution.
In a program the reason of preset failure and the solution of failure cause, according to obtaining keyword message in daily record, Fault type and failure cause are analyzed, solution is matched to according to failure cause, is carried out automatically according to fault solution It repairs.
Optionally, it is described according to it is preset strategy restore failure after, further include:
The evaluation result to Petri Nets is obtained, when evaluation result is correct, the failure cause of failure is conciliate Certainly in scheme deposit fault knowledge library.
In one alternate embodiment, it when servicing failure state, opens preset timer and starts timing;
When reaching the preset time when timer, failure does not release yet, sends out failure and artificially handles alarm;
Restore failure according to the instruction of input.
Optionally, the service is Web service.
In the embodiment of the present invention, the service operation of real time monitoring judges the event when the service failure state Whether barrier in existing fault knowledge library, when break the failure is not present in the fault knowledge library when, according to preset plan Slightly restore the failure, compared to conventional artificial manual handle, service can be alerted in real time, and is diagnosed to be service event The reason of barrier, analyzes failure cause, automatic to repair failure so that service recovery normal operation, timely and effectively solves clothes The failure of business influences caused by reducing failure.
Embodiment two
The method of above-described embodiment is specifically described in the present embodiment.
First, the status information of monitoring module real time monitoring service sends out alarm and according to clothes when servicing the when of breaking down The analyzing failure causes such as business state and log information navigate to the reason of failure occurs and automatically carry out failure after solution Reparation, make service normal operation.Failure cause, analysis result, the information such as solution can be carried out in fault treating procedure It preserving, user can evaluate troubleshooting result, when processing mode is correct, when there is such failure in next time, this Processing mode identical with this may be used in program, further shortens fault handling time.
As shown in Fig. 2, including mainly following steps when the monitoring of web services is realized with intelligent restorative procedure:
(1) status information of web services is monitored in real time;
(2) when service is broken down, fault warning is sent out in real time, real-time informing user while carries out in next step automatically Fault analysis and handling;
(3) Analysis Service failure cause.It is former caused by the information analyses failure such as running log according to the status information of service Cause.If there is identical failure in fault knowledge library, fault restoration is directly carried out;If newly-increased failure, which is arrived In fault knowledge library;
(4) positioning failure reason and determining solution, are handled the failure of application service, according to the daily record of analysis The reason of failure, targetedly repairs failure, makes service recovery normal operation.
(5) user can evaluate troubleshooting situation, if failure cause and fault repairing method are correct, occur later It is directly handled according to fault knowledge library processing mode when same fault.
The present invention can be used JAVA modes and realize, the status information of monitoring module real-time collecting service, according to the shape of service State information judges service whether normal operation, when servicing failure, sends out alarm notification user and carries out intelligent fault point Analysis, analysis module find failure cause and solution according to the information analyses failure cause such as service state and running log, and The failure of analysis and solution can be stayed shelves, the processing of failure can be directly carried out when identical failure occurring next time, After navigating to the failure cause and scheme of service, the reparation of failure is carried out the reason of automatically according to failure.
Such as after monitoring service stopping operation, service can be started;When the configuration file for the service that monitors exists When mistake, the mistake of configuration file can be repaired automatically and starts service, makes service normal operation.Meeting in fault treating procedure By failure cause, analysis result, the information such as solution are preserved, and user can evaluate troubleshooting result, when When processing mode is correct, when such failure occurs in next time, processing mode identical with this may be used in this program, further Shorten fault handling time.
Embodiment three
In order to reach the object of the invention, the present invention also provides a kind of prosthetic devices of service fault, shown in Fig. 3, the dress Set including:Monitoring module 31, recovery module 32, wherein:
Monitoring module 31, the service operation for real time monitoring;
Recovery module 32 is used for when servicing failure state, in failure judgement whether existing fault knowledge library, when It is disconnected to be out of order in knowledge base there is no when failure, failure is restored according to preset strategy.
In the present embodiment, which further includes:
When recovery module 32 is judged in fault knowledge library there are when failure, the solution party described in fault knowledge library Case restores failure.
In the present embodiment, which further includes:Logging modle 33, logging modle 33 are used to record the running log of service Information;
Recovery module 32 restores failure according to preset strategy:
Recovery module 32 obtains the running log information of service;
The reason of recovery module 32 obtains failure according to log information;
Recovery module 32 in preset program the reason of failure according to inquiring solution;
Recovery module 32 restores failure according to solution.
Optionally, it is described according to it is preset strategy restore failure after, further include deposit module 34:
It is stored in the evaluation result that module 34 is obtained to Petri Nets and is stored in module 34 when evaluation result is correct It will be in the failure cause of failure and solution deposit fault knowledge library.
The embodiment of the present invention also provides a kind of computer storage media, and the computer storage media is stored with computer journey Sequence;After the computer program is performed, the attack evidence collecting method that previous embodiment provides can be realized, for example, executing such as Fig. 1 In shown method.
Although disclosed herein embodiment it is as above, the content only for ease of understanding the present invention and use Embodiment is not limited to the present invention.Technical staff in any fields of the present invention is taken off not departing from the present invention Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (10)

1. a kind of restorative procedure of service fault, which is characterized in that the method includes:
The service operation of real time monitoring judges that the whether existing failure of the failure is known when the service failure state Know in library, in the fault knowledge library there is no when the failure when breaking, the failure is restored according to preset strategy.
2. according to the method described in claim 1, it is characterized in that, the method further includes:
When judging in the fault knowledge library there are when the failure, the solution described in the fault knowledge library Restore the failure.
3. according to the method described in claim 1, it is characterized in that, the method further includes:Record the operation day of the service Will information;
It is described to include according to the preset strategy recovery failure:
Obtain the running log information of the service;
The reason of failure being obtained according to the log information;
According to inquiring solution in preset program the reason of the failure;
Restore the failure according to the solution.
4. according to the method described in claim 3, it is characterized in that, it is described the failure is restored according to preset strategy after, Further include:
The evaluation result to the Petri Nets is obtained, when the evaluation result is correct, by the failure of the failure Reason and solution are stored in the fault knowledge library.
5. according to the method described in claim 1, it is characterized in that, when the service failure state, open preset Timer starts timing;
When reaching the preset time when the timer, failure does not release yet, sends out failure and artificially handles alarm;
Restore failure according to the instruction of input.
6. a kind of prosthetic device of service fault, which is characterized in that described device includes:Monitoring module, recovery module, wherein:
The monitoring module, the service operation for real time monitoring;
The recovery module, for when the service failure state, judging the whether existing fault knowledge of the failure In library, it in the fault knowledge library there is no when the failure when breaking, the failure is restored according to preset strategy.
7. device according to claim 6, which is characterized in that described device further includes:
When the recovery module is judged to remember according in the fault knowledge library there are when the failure in the fault knowledge library The solution of load restores the failure.
8. device according to claim 6, which is characterized in that described device further includes:Logging modle, the logging modle Running log information for recording the service;
The recovery module restores the failure according to preset strategy:
The recovery module obtains the running log information of the service;
The reason of recovery module obtains the failure according to the log information;
The recovery module in preset program the reason of the failure according to inquiring solution;
The recovery module restores the failure according to the solution.
9. device according to claim 8, which is characterized in that after the failure according to preset strategy recovery, It further include deposit module:
It is stored in evaluation result of the module acquisition to the Petri Nets, it, will the event when the evaluation result is correct The failure cause and solution of barrier are stored in the fault knowledge library.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The step of any one of claim 1-5 the methods are realized when execution.
CN201810665134.3A 2018-06-26 2018-06-26 A kind of restorative procedure of service fault, device and storage medium Pending CN108776625A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810665134.3A CN108776625A (en) 2018-06-26 2018-06-26 A kind of restorative procedure of service fault, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810665134.3A CN108776625A (en) 2018-06-26 2018-06-26 A kind of restorative procedure of service fault, device and storage medium

Publications (1)

Publication Number Publication Date
CN108776625A true CN108776625A (en) 2018-11-09

Family

ID=64026382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810665134.3A Pending CN108776625A (en) 2018-06-26 2018-06-26 A kind of restorative procedure of service fault, device and storage medium

Country Status (1)

Country Link
CN (1) CN108776625A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109474470A (en) * 2018-11-27 2019-03-15 郑州云海信息技术有限公司 One kind is from monitoring method and device
CN109757771A (en) * 2019-02-22 2019-05-17 红云红河烟草(集团)有限责任公司 Filter-stick forming device shuts down duration calculation method and computing device
CN110011854A (en) * 2019-04-12 2019-07-12 苏州浪潮智能科技有限公司 MDS fault handling method, device, storage system and computer readable storage medium
CN112286797A (en) * 2020-09-29 2021-01-29 长沙市到家悠享网络科技有限公司 Service monitoring method and device, electronic equipment and storage medium
CN112988537A (en) * 2021-03-11 2021-06-18 山东英信计算机技术有限公司 Server fault diagnosis method and device and related equipment
WO2021143483A1 (en) * 2020-01-17 2021-07-22 中兴通讯股份有限公司 System maintenance method and apparatus, device, and storage medium
CN112286797B (en) * 2020-09-29 2024-05-03 长沙市到家悠享网络科技有限公司 Service monitoring method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130286859A1 (en) * 2011-04-21 2013-10-31 Huawei Technologies Co., Ltd. Fault detection method and system
CN103838637A (en) * 2014-03-03 2014-06-04 江苏智联天地科技有限公司 Terminal automatic fault diagnosis and restoration method on basis of data mining
CN105262616A (en) * 2015-09-21 2016-01-20 浪潮集团有限公司 Failure repository-based automated failure processing system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130286859A1 (en) * 2011-04-21 2013-10-31 Huawei Technologies Co., Ltd. Fault detection method and system
CN103838637A (en) * 2014-03-03 2014-06-04 江苏智联天地科技有限公司 Terminal automatic fault diagnosis and restoration method on basis of data mining
CN105262616A (en) * 2015-09-21 2016-01-20 浪潮集团有限公司 Failure repository-based automated failure processing system and method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109474470A (en) * 2018-11-27 2019-03-15 郑州云海信息技术有限公司 One kind is from monitoring method and device
CN109757771A (en) * 2019-02-22 2019-05-17 红云红河烟草(集团)有限责任公司 Filter-stick forming device shuts down duration calculation method and computing device
CN110011854A (en) * 2019-04-12 2019-07-12 苏州浪潮智能科技有限公司 MDS fault handling method, device, storage system and computer readable storage medium
CN110011854B (en) * 2019-04-12 2022-03-04 苏州浪潮智能科技有限公司 MDS fault processing method, device, storage system and computer readable storage medium
WO2021143483A1 (en) * 2020-01-17 2021-07-22 中兴通讯股份有限公司 System maintenance method and apparatus, device, and storage medium
CN112286797A (en) * 2020-09-29 2021-01-29 长沙市到家悠享网络科技有限公司 Service monitoring method and device, electronic equipment and storage medium
CN112286797B (en) * 2020-09-29 2024-05-03 长沙市到家悠享网络科技有限公司 Service monitoring method and device, electronic equipment and storage medium
CN112988537A (en) * 2021-03-11 2021-06-18 山东英信计算机技术有限公司 Server fault diagnosis method and device and related equipment

Similar Documents

Publication Publication Date Title
CN108776625A (en) A kind of restorative procedure of service fault, device and storage medium
CN105337765B (en) A kind of distribution hadoop cluster automatic fault diagnosis repair system
US10545807B2 (en) Method and system for acquiring parameter sets at a preset time interval and matching parameters to obtain a fault scenario type
CN107612756A (en) A kind of operation management system with intelligent trouble analyzing and processing function
CN104125085B (en) A kind of data management-control method and device based on ESB
CN106649040A (en) Automatic monitoring method and device for performance of Weblogic middleware
CN104750596B (en) A kind of alarm information processing method and service subsystem
CN110581773A (en) automatic service monitoring and alarm management system
CN106789306A (en) Restoration methods and system are collected in communication equipment software fault detect
CN109462490B (en) Video monitoring system and fault analysis method
CN107995255A (en) A kind of method and its system of remote monitoring intelligent cabinet
US20210271555A1 (en) Traffic data self-recovery processing method, readable storage medium, server and apparatus
CN106452811B (en) A kind of malfunction elimination method and system
CN107766208A (en) A kind of method, system and device of monitoring business system
CN110808856A (en) Big data operation and maintenance method and system based on data center
CN113485220A (en) Cloud cooperation method and system for simplifying field network diagnosis of operation and maintenance personnel
CN109032058A (en) A kind of device management method, device, system and storage medium
CN107846314A (en) A kind of intelligent operation management system
CN108681780A (en) A kind of device management method, apparatus and system based on collection control big data
CN114493203A (en) Method and device for safety arrangement and automatic response
CN110311802A (en) Network operation method, device, electronic equipment and storage medium
CN113471864A (en) Transformer substation secondary equipment field maintenance device and method
CN116560893B (en) Computer application program operation data fault processing system
CN112803587A (en) Intelligent inspection method for state of automatic equipment based on diagnosis decision library
CN113760634A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181109

RJ01 Rejection of invention patent application after publication