CN117234855A

CN117234855A - Method, device and storage medium for processing faults

Info

Publication number: CN117234855A
Application number: CN202311409012.5A
Authority: CN
Inventors: 王东伟
Original assignee: CCB Finetech Co Ltd
Current assignee: CCB Finetech Co Ltd
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2023-12-15

Abstract

The embodiment of the application provides a method, a device and a storage medium for processing faults, wherein the method comprises the following steps: monitoring log information of a service, and acquiring fault information in the log information; carrying out fault matching on the fault information carried out in a scheme library to obtain at least one processing scheme, wherein the scheme library comprises a plurality of fault types and processing schemes, and each fault type corresponds to one or more processing schemes; selecting a processing scheme with highest credibility according to the credibility of the at least one processing scheme; and solving the fault of the service through the processing scheme with the highest credibility. The method realizes quick response and fault recovery.

Description

Method, device and storage medium for processing faults

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a storage medium for processing a fault.

Background

The fault handling method in the prior art comprises the following steps: and monitoring the application system, and searching a solution in a knowledge base by a worker when the system fails. Because the existing knowledge base is static, the adopted scheme needs to be determined manually and is used for treatment, and sometimes even a plurality of schemes need to be used, so that the response time of faults is too long and quick recovery cannot be realized.

Disclosure of Invention

The embodiment of the application aims to provide a method, a device and a storage medium for processing faults, which realize quick response and fault recovery.

To achieve the above object, a first aspect of the present application provides a method of handling a fault, the method comprising:

monitoring log information of a service, and acquiring fault information in the log information;

carrying out fault matching on the fault information carried out in a scheme library to obtain at least one processing scheme, wherein the scheme library comprises a plurality of fault types and processing schemes, and each fault type corresponds to one or more processing schemes;

selecting a processing scheme with highest credibility according to the credibility of the at least one processing scheme;

and solving the fault of the service through the processing scheme with the highest credibility.

Optionally, the scheme library is constructed through a path sorting algorithm in the knowledge graph.

Optionally, the step of carrying the fault information into a scheme library to perform fault matching to obtain at least one processing scheme includes:

extracting a fault fingerprint ID in the fault information and matching the fault fingerprint ID with the scheme library;

if the scheme library has a fault node matched with the fault fingerprint ID, selecting a processing scheme corresponding to the fault node;

and if no fault node matched with the fault fingerprint ID exists in the scheme library, alarming.

Optionally, the credibility is:

wherein,for credibility, W is weight, F is score, and the score comprises expert score, scheme source and success rate.

Optionally, the method further comprises:

updating parameters of the processing scheme according to log information after the service faults are solved;

the parameters include at least one of a number of times the scheme is executed, a number of times the fault is successfully eliminated, and a degree of reliability.

Optionally, the updating the parameters of the processing scheme according to the log information after the service is failed comprises

Determining whether the fault is solved according to the log information, and if the fault is solved, improving the credibility of the processing scheme;

if the fault is not resolved, the reliability of the processing scheme is reduced.

Optionally, the fault information includes at least one of a number of occurrence of the fault, a fault level, a description of a fault phenomenon, whether the fault is eliminated, and a treatment scheme used for the fault.

A second aspect of the present application provides an apparatus for handling faults, the apparatus comprising:

the acquisition module is used for monitoring log information of the service and acquiring fault information in the log information;

the first processing module is used for carrying the fault information into a scheme library to carry out fault matching to obtain at least one processing scheme, the scheme library comprises a plurality of fault types and processing schemes, and each fault type corresponds to one or more processing schemes;

the second processing module is used for selecting a processing scheme with highest credibility according to the credibility of the at least one processing scheme;

and the third processing module is used for solving the fault of the service through the processing scheme with the highest credibility.

Optionally, the scheme library is built through a path sorting algorithm of the knowledge graph structure.

Optionally, the apparatus further comprises:

and the updating module is used for updating parameters of the processing scheme according to the log information after the faults of the service are solved, wherein the parameters comprise at least one of the executed times of the scheme, the times of successfully eliminating the faults and the credibility.

A third aspect of the present application provides a machine-readable storage medium having stored thereon instructions for causing a machine to perform: the method for processing faults according to the above.

A fourth aspect of the application provides a processor for running a program, wherein the program when run is for performing: the method for processing faults according to the above.

A fifth aspect of the application provides a computer program product comprising a computer program which, when executed by a processor, implements: the method for processing faults according to the above.

Through the technical scheme, the fault processing method provided by the application comprises the following steps: monitoring log information of a service, and acquiring fault information in the log information; carrying out fault matching on the fault information carried out in a scheme library to obtain at least one processing scheme, wherein the scheme library comprises a plurality of fault types and processing schemes, and each fault type corresponds to one or more processing schemes; selecting a processing scheme with highest credibility according to the credibility of the at least one processing scheme; and solving the fault of the service through the processing scheme with the highest credibility. The application organically combines the monitoring faults, the knowledge base and the fault treatment through the knowledge map form. Closed loop of fault discovery, fault handling and status checking is realized, and fast response and fault recovery are realized.

Additional features and advantages of embodiments of the application will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the embodiments of the application. In the drawings:

FIG. 1 schematically illustrates a flow diagram of a method of handling a fault according to an embodiment of the application;

FIG. 2 schematically illustrates a flow diagram according to a specific embodiment of the present application;

FIG. 3 schematically illustrates a monitoring module processing logic diagram in accordance with the present application;

FIG. 4 schematically illustrates a flow diagram of an automatic recovery module according to the present application;

FIG. 5 schematically illustrates a flow diagram of a node management module according to the present application;

fig. 6 schematically shows an internal structural view of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the detailed description described herein is merely for illustrating and explaining the embodiments of the present application, and is not intended to limit the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present application, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.

Fig. 1 schematically shows a flow diagram of a method of handling a fault according to an embodiment of the application. As shown in fig. 1, in one embodiment of the present application, there is provided a method for handling a fault, including the steps of:

and 101, monitoring log information of a service, and acquiring fault information in the log information. Specifically, the fault information includes at least one of the number of occurrence of the fault, the fault level, the description of the fault phenomenon, whether the fault is eliminated or not, and the processing scheme used for the fault. Specifically, the fault information is used as input information of a knowledge graph, and various scheme knowledge in a knowledge base is used as nodes of the knowledge graph.

Step 102, carrying out fault matching on the fault information into a scheme library to obtain at least one processing scheme, wherein the scheme library comprises a plurality of fault types and processing schemes, and each fault type corresponds to one or more processing schemes. Specifically, the scheme library is constructed through a path sorting algorithm in the knowledge graph. A Knowledge Graph (knowledgegraph) is a graphical database used to represent and store structured, semi-structured, and unstructured data. The method consists of three parts, namely an entity, an attribute and a relation, wherein the entity is an object with a unique identifier, the attribute is a characteristic or description of the entity, and the relation is a relation between the entities.

The solution library is presented in the form of a knowledge graph, created and expanded based on a path ordering algorithm (Path Ranking Algorithm, PRA). The specific method comprises the following steps: and extracting triples (entity, attribute and relation) according to the source information, sorting the knowledge according to the relation (also called path), forming coherent knowledge nodes, and finally presenting the knowledge nodes by using a visualization technology.

According to a specific embodiment, the step of carrying the fault information into a scheme library to perform fault matching to obtain at least one processing scheme includes: extracting a fault fingerprint ID in the fault information and matching the fault fingerprint ID with the scheme library; if the scheme library has a fault node matched with the fault fingerprint ID, selecting a processing scheme corresponding to the fault node; and if no fault node matched with the fault fingerprint ID exists in the scheme library, alarming.

And step 103, selecting a processing scheme with highest credibility according to the credibility of the at least one processing scheme. Specifically, the credibility is as follows:wherein->For credibility, W is weight, F is score, and the score comprises expert score, scheme source and success rate.

And 104, solving the fault of the service through the processing scheme with the highest credibility. When a fault occurs, the device can analyze the fault and match the existing fault information in the knowledge graph. When the existing fault type is matched, adopting a corresponding processing scheme to carry out fault treatment, and evaluating the effect of treatment. When the fault is relieved, the credibility of the processing scheme is increased; when the fault is not relieved, the credibility of the processing scheme is reduced, and the alarm information is upgraded so as to facilitate manual intervention.

According to a specific embodiment, the process of the fault information matching processing scheme specifically includes: firstly, extracting contents such as application system information, fault level, fault information (msg) and the like in fault information to compile fault fingerprint IDs, if the same fingerprint IDs exist, reducing the number of times of fault retries by 1, entering an alarm module until the number of times of retries of the same faults is 0, and taking the fault fingerprint IDs as input information; step two, finding out a fault node matched with the fault fingerprint ID, and entering an alarm module if the fault node is not matched with the fault fingerprint ID; thirdly, if the history processing scheme exists in the fault node inquired in the second step, the fault node is used as an optimal processing scheme to carry out fault treatment; fourth, if no history processing scheme exists, searching the processing scheme according to the fault fingerprint ID, sorting according to the credibility, and selecting the processing scheme with the highest credibility as the optimal processing scheme; and fifthly, if the fault is recovered, the credibility of the selected processing scheme is increased by updating the success rate information. If the fault is not recovered, the reliability of the selected processing scheme is reduced by updating the success rate information. The method can automatically recover the fault according to the prior knowledge after the fault occurs. And the investigation time and the manual intervention cost are saved. In addition, the existing knowledge can be dynamically evaluated according to the treatment result.

The application also includes: updating parameters of the processing scheme according to log information after the service faults are solved; the parameters include at least one of a number of times the scheme is executed, a number of times the fault is successfully eliminated, and a degree of reliability.

Specifically, the updating of the parameters of the processing scheme according to the log information after the fault of the service is solved includes determining whether the fault is solved according to the log information, and if the fault is solved, improving the credibility of the processing scheme; if the fault is not resolved, the reliability of the processing scheme is reduced.

Fig. 2 schematically shows a flow chart according to an embodiment of the application, as shown in fig. 2, the application provides three models: the system comprises a monitoring module for monitoring logs of each service system, a self-recovery module for carrying out fault treatment and an alarm module for alarming the identified faults.

The monitoring module is mainly responsible for monitoring log information of each service, reads the formatted log information through a predefined monitoring rule, and then rapidly captures fault information when each service system breaks down, and further judges the fault level. As shown in fig. 3, after the fault information is transmitted to the self-recovery module, the self-recovery module performs subsequent fault handling. The monitoring module also receives the record data of the newly added or changed application information, and transfers the record data of the newly added or changed application information to the node management module for processing.

As shown in fig. 4, after receiving the new fault information, the self-recovery module processes the logic as follows: first, searching a solution for the fault within the recovery times, if the maximum attempt recovery times are reached, entering an alarm module, and upgrading the fault for subsequent treatment. And secondly, when the number of attempted recovery times does not reach the maximum value, comparing and matching the fault with the existing fault information in the experience library, and when the existing fault is not matched, entering an alarm module, and upgrading the fault for subsequent treatment. Again, when an existing fault can be matched, it is checked whether a history handling scheme exists for the existing fault. If yes, the processing scheme is used for self-recovery processing of the faults. When no history processing scheme exists, whether other related processing schemes exist or not is searched, if no related processing scheme exists, an alarm module is entered, and the fault is upgraded for subsequent treatment. And when the related processing scheme exists, sequencing the existing schemes according to the credibility, and selecting the processing scheme with the highest credibility for fault recovery.

The reliability calculation formula is as follows:

wherein->For confidence, W is the weight and F is the score.

The score of the score comprises three parts, namely expert score, scheme source and success rate. The expert score weight is 30% and the final score is the sum of all expert scores (percent system) divided by the total number of experts. For example: when there are 5 expert scores, the scores are 80%, 90%, 85%, respectively, the final score is 85% = (80% +80% +90% +90% + 85%)/5. The scheme source weight is 20%, the score of the scheme source weight is divided into three gears, the official script is 100%, the authentication script is 80%, and the personal writing script is 60%. Success rate weight 50% scored as the total number of functions divided by the total number of executions multiplied by 100%.

For example, if a script is executed 5 times, successfully 3 times, and failed 2 times, the success rate score is (3/5) ×100% =60%. The sum of the weights and the scores of the three is the credibility of the processing scheme.

And finally, performing fault recovery and fault state recovery by using the knowledge of the processing scheme. The reliability of the processing scheme is updated through the treated effect, and the reliability of the processing scheme is increased through fault resolution. If the fault is not solved, the reliability of the processing scheme is reduced, and after the history processing scheme is emptied, an alarm that the fault is not recovered is sent.

The alarm module can receive the fault alarm sent by the monitoring module and send the fault alarm to corresponding operation and maintenance personnel according to the set address, and can also send alarm information when the treatment effect is not achieved after the self-recovery module is treated or the fault and the solution are not matched, so that the intervention of the operation and maintenance personnel is facilitated.

The node management module is responsible for newly adding and changing node information. As shown in fig. 5, after application information is obtained from the monitoring module or other docking system, the node management module first determines whether the application node belongs to a newly added application node or a changed application node. If the node belongs to the newly added application node information, the knowledge graph can check the node attribute and correlate the node attribute with the existing node. If the information belongs to the change application node information, the corresponding node attribute in the knowledge graph is updated.

The application relates to an application system, fault information, processing scheme knowledge and the like through a knowledge graph form, and provides required scripts or knowledge for fault quick recovery. The nodes in the knowledge graph comprise application system nodes, fault information nodes, processing scheme nodes and the like. The application system node contains a plurality of attributes such as IP, service module name, association relation and the like.

There is an association between an application system and fault information, one application system may associate multiple fault information, and one fault information may also be associated by multiple application systems. The fault information node contains a plurality of attributes such as fault phenomenon, description, duration, last processing scheme and the like. There is an association relationship between the fault information and the processing schemes, one fault information may be associated with a plurality of processing schemes, and one processing scheme may also be associated with a plurality of fault information. The processing scheme node includes processing times, credibility, script information and the like. The number of treatments is the number of times the treatment regimen has been used so far.

The application also introduces credibility and realizes dynamic evaluation of the knowledge of the processing scheme. The traditional knowledge base can not realize dynamic evaluation of knowledge according to implementation effects. The application introduces the credibility concept to realize the dynamic evaluation of the knowledge of the processing scheme.

The reliability initial value is determined according to the sum of index scores such as expert judgment, sources, success rate and the like and corresponding weight products. The confidence level can be updated by manually adjusting the scores of expert judgment and scoring indexes. The credibility of the knowledge of the treatment scheme can be dynamically adjusted according to the treatment effect, namely the increase or decrease of the success rate. The higher the confidence the more opportunity for solution knowledge to be used preferentially in subsequent recommendations.

The application also continues to automatically update node information, such as attribute information of newly added nodes and changed nodes. When new node information is received, the knowledge graph verifies the node attribute and associates it with the existing node. For example, when the new node is a mysql database node, the knowledge graph associates mysql configuration nodes with the mysql configuration nodes, and associates fault nodes related to mysql with the mysql configuration nodes at the same time, so that automatic addition of application nodes is realized. If the original application attribute is changed, such as memory and disk capacity adjustment. When the node information is received, the node management module automatically updates the attribute information of the application node.

The application organically combines the monitoring system, the knowledge base and the treatment system through the knowledge map form. The information such as faults of the monitoring system is used as input information of the knowledge graph, various scheme knowledge in the knowledge base is used as nodes of the knowledge graph, the disposal system receives actions of the scheme knowledge and directly acts on the application system, and feedback information of the application system is captured by the monitoring system to be used as new input information, so that a closed loop for fault discovery, fault disposal and state inspection is formed. After one round of fault treatment is completed, the knowledge graph updates fault information and scheme knowledge information, and self-updating of the knowledge base is realized.

Wherein the updated fault information comprises: the occurrence times of faults, the fault level, the fault phenomenon description, the fault elimination or failure and the knowledge of the processing scheme used by the faults, etc. The updated knowledge of the treatment plan includes: the number of times the scheme is executed, the number of times the fault is successfully eliminated, the scheme reliability, and the like.

When a fault has a plurality of processing scheme knowledge, the last processing scheme knowledge is selected for fault treatment. And when the faults are not eliminated, reducing the reliability of the processing scheme knowledge, and selecting the processing scheme knowledge with the highest reliability in the rest schemes for fault treatment. When all associated processing scheme knowledge cannot resolve the fault, escalation to manual intervention occurs. When the fault occurs, if the fault is matched with the existing fault, the corresponding processing scheme knowledge can be directly used for acting on the application system so as to realize quick response and recovery of the fault.

The method for processing the faults provided by the application comprises the following steps: monitoring log information of a service, and acquiring fault information in the log information; carrying out fault matching on the fault information carried out in a scheme library to obtain at least one processing scheme, wherein the scheme library comprises a plurality of fault types and processing schemes, and each fault type corresponds to one or more processing schemes; selecting a processing scheme with highest credibility according to the credibility of the at least one processing scheme; and solving the fault of the service through the processing scheme with the highest credibility. The application organically combines the monitoring faults, the knowledge base and the fault treatment through the knowledge map form. Closed loop of fault discovery, fault handling and status checking is realized, and fast response and fault recovery are realized.

The embodiment of the application provides a storage medium, on which a program is stored, which when executed by a processor implements the above-described method of handling faults.

The embodiment of the application provides a processor for running a program, wherein the program runs to execute the fault processing method.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer apparatus includes a processor a01, a network interface a02, a display screen a04, an input device a05, and a memory (not shown in the figure) which are connected through a system bus. Wherein the processor a01 of the computer device is adapted to provide computing and control capabilities. The memory of the computer device includes an internal memory a03 and a nonvolatile storage medium a06. The nonvolatile storage medium a06 stores an operating system B01 and a computer program B02. The internal memory a03 provides an environment for the operation of the operating system B01 and the computer program B02 in the nonvolatile storage medium a06. The network interface a02 of the computer device is used for communication with an external terminal through a network connection. The computer program, when executed by the processor a01, implements a method of managing a server. The display screen a04 of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device a05 of the computer device may be a touch layer covered on the display screen, or may be a key, a track ball or a touch pad arranged on a casing of the computer device, or may be an external keyboard, a touch pad or a mouse.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The embodiment of the application provides equipment, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the processor executes the program to treat faults according to the method for treating faults according to any embodiment of the application.

The application also provides a computer program product adapted to perform a program initialized with the method steps of the management server according to any embodiment of the application, when executed on a data processing device.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method of handling a fault, the method comprising:

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

and constructing the scheme library through a path ordering algorithm in the knowledge graph.

3. The method according to claim 1, wherein said bringing said fault information into a library of schemes for fault matching results in at least one processing scheme, comprising:

4. The method of claim 1, wherein the confidence level is:

5. The method according to claim 1, characterized in that the method further comprises:

6. The method of claim 5, wherein updating parameters of the processing scheme based on log information after solving the failure of the service comprises:

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the fault information comprises at least one of the occurrence times of faults, the fault level, the fault phenomenon description, the fault elimination or not and the processing scheme used by the faults.

8. An apparatus for handling faults, the apparatus comprising:

9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,

and constructing the scheme library through a path sequencing algorithm of the knowledge graph structure.

10. The apparatus of claim 8, wherein the means for bringing the fault information into a library of schemes for fault matching to obtain at least one processing scheme comprises:

11. The apparatus of claim 8, wherein the apparatus further comprises:

12. The apparatus of claim 11, wherein the updating the parameters of the processing scheme based on log information after the failure of the service is resolved comprises:

13. A machine-readable storage medium having stored thereon instructions for causing a machine to perform: a method of handling faults according to any of claims 1 to 7.

14. A processor configured to execute a program, wherein the program is configured to, when executed, perform: a method of handling faults according to any of claims 1 to 7.

15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, realizes: a method of handling faults according to any of claims 1 to 7.