WO2020119369A1

WO2020119369A1 - Intelligent it operation and maintenance fault positioning method, apparatus and device, and readable storage medium

Info

Publication number: WO2020119369A1
Application number: PCT/CN2019/117548
Authority: WO
Inventors: 方振宇
Original assignee: 平安普惠企业管理有限公司
Priority date: 2018-12-13
Filing date: 2019-11-12
Publication date: 2020-06-18
Also published as: CN109633351A; CN109633351B

Abstract

An intelligent IT operation and maintenance fault positioning method, apparatus and device, and a readable storage medium. The method comprises: receiving a fault analysis report, and obtaining all potential fault points in the fault analysis report (S10); executing the following steps for each potential fault point: performing health detection on the potential fault points, and obtaining a first detection return value (S20); obtaining a new fault point specified by the first detection return value, and performing continuous health detection on the new fault point until no new fault point is generated, and determining the fault point corresponding to the no longer generated fault point as a target fault point, to obtain a second detection return value of the target fault point (S30); and outputting the target fault point and the second detection return value (S40). The method solves the technical problem in the operation and maintenance fault of an existing IT system that the positioning efficiency of a fault repairing node is low, and the repairing period is too long.

Description

Intelligent IT operation and maintenance fault location method, device, equipment and readable storage medium The

This application requires the priority of the Chinese patent application submitted to the Chinese Patent Office on December 13, 2018, with the application number 201811530943.X and the invention titled "Intelligent IT O&M fault location method, device, equipment and readable storage medium" , The entire contents of which are incorporated into the application by reference.

Technical field

This application relates to the field of computer technology, and in particular to an intelligent IT operation and maintenance fault location method, device, equipment, and readable storage medium.

Background technique

At present, during the operation and maintenance of IT systems, various types of failure accidents will inevitably occur. In order to reduce the losses caused by failure accidents as often as possible, it is often necessary to quickly locate the fault. After the positioning, the corresponding The repair solution, because traditional fault repair tools do not have efficient and rapid sub-tools, they are manually checked, which makes the location of fault repair nodes inefficient, consumes valuable repair time, extends the repair cycle, and affects the user's use Experience.

Summary of the invention

The main purpose of this application is to provide an intelligent IT operation and maintenance fault location method, device, equipment and readable storage medium, aiming to solve the existing IT system operation and maintenance faults, the location efficiency of the fault repair node is low, resulting in a repair cycle Long technical problems.

To achieve the above purpose, the present application provides an intelligent IT operation and maintenance fault location method. The intelligent IT operation and maintenance fault location method includes:

Receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential failure point, perform the following steps:

Health detection is performed on the potential failure point, and the first detection return value is obtained;

Acquiring a new fault point specified by the first detection return value, and performing continuous health detection on the new fault point, until no new fault point is generated, the corresponding new fault point is generated The fault point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

The target fault point and the second detection return value are output.

Optionally, the step of performing health detection on the potential failure point and obtaining the first detection return value includes:

Locate the operation source and operation process referenced by the potential fault point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

Wherein, the health detection includes the step of performing detection using the pre-stored data I/O indicators of the calculation source and each calculation flow in a normal state.

Optionally, the step of performing health detection on the potential failure point and obtaining the first detection return value further includes:

Obtain the node type of the potential failure point, and obtain the detection function corresponding to the node type from the preset tool library;

Obtain the detection index of the detection function, and perform a sniffing test on the index data in the potential fault point according to the detection index to obtain the first detection return value of the potential fault point.

Optionally, after the step of outputting the target fault point and the second detection return value includes:

Acquire and select the target emergency plan corresponding to the target fault point from the pre-stored plan database according to the fault state of the target fault point, and execute the target emergency plan for the target fault point;

After executing the target emergency plan, health detection is performed on the target fault point again.

Optionally, after performing the target emergency plan, after performing the health detection step on the target fault point again, the method includes:

Acquiring a third detection return value obtained after performing health detection on the target fault point again, and determining whether the third detection return value points to a new fault point;

If the third detection return value points to a new fault point, a warning message that cannot be automatically processed is output.

Optionally, the steps of acquiring and selecting a target emergency plan corresponding to the target fault point from a pre-stored plan database according to the fault state of the target fault point include:

Obtain and obtain all emergency plans corresponding to the target fault point according to the fault status of the target fault point; if there are multiple emergency plans, count the statistics after the execution of the emergency plans in the past historical time period Passing frequency of successfully passing the health detection of the target fault point;

The emergency plan with the highest passing frequency is selected from the pre-stored plan database, and the emergency plan with the highest passing frequency is set as the target emergency plan.

This application also provides an intelligent IT operation and maintenance fault locating device. The intelligent IT operation and maintenance fault locating device includes:

A receiving module, used to receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential point of failure, there is an execution module, which includes:

A health detection submodule, configured to perform health detection on the potential failure point and obtain a first detection return value;

A first obtaining submodule, configured to obtain a new fault point specified by the first detection return value, and perform continuous health detection on the new fault point, until no new fault point is generated, the The fault point corresponding to the new fault point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

The output submodule is used to output the target fault point and the second detection return value.

Optionally, the health detection sub-module includes:

A positioning unit, configured to locate the operation source and operation process cited by the potential failure point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

Optionally, the health detection sub-module includes:

A first obtaining unit, configured to obtain the node type of the potential failure point, and obtain a detection function corresponding to the node type from a preset tool library;

The second obtaining unit obtains the detection index of the detection function, and performs a sniffing test on the index data in the potential fault point according to the detection index to obtain a first detection return value of the potential fault point.

Optionally, the intelligent IT operation and maintenance fault locating device further includes:

The first obtaining module is used to obtain and select the target emergency plan corresponding to the target fault point from the pre-stored scheme database according to the fault state of the target fault point, and execute the target emergency plan for the target fault point ;

The re-detection module is used to perform health detection on the target failure point again after the execution of the target emergency plan.

A second obtaining module, configured to obtain a third detection return value obtained after performing health detection on the target failure point again, and determine whether the third detection return value points to a new failure point;

The output module is configured to output a warning message that cannot be automatically processed if the third detection return value points to a new fault point.

Optionally, the first obtaining module includes:

The second obtaining submodule is used to obtain and obtain all the emergency plans corresponding to the target fault point according to the fault state of the target fault point; if there are multiple emergency plans, the statistics of the past historical time period The frequency with which the target failure point successfully passed the health detection after the execution of each emergency plan;

A selection submodule is used to select the emergency plan with the highest passing frequency from the pre-stored plan database, and set the emergency plan with the highest passing frequency as the target emergency plan.

In addition, in order to achieve the above object, the present application also provides an intelligent IT operation and maintenance fault locating device, the intelligent IT operation and maintenance fault locating device includes: a memory, a processor, a communication bus, and an intelligent IT operation stored on the memory Dimensional fault locating computer readable instructions,

The communication bus is used to realize the communication connection between the processor and the memory;

The processor is used to execute the intelligent IT operation and maintenance fault location computer readable instructions to achieve the following steps:

For each potential failure point, perform the following steps:

The target fault point and the second detection return value are output.

In addition, to achieve the above purpose, the present application also provides a readable storage medium that stores one or more computer-readable instructions, and the one or more computer-readable instructions may be used by one or one The above processor executes for:

Receiving a failure analysis report and obtaining all potential failure points in the failure analysis report;

For each potential failure point, perform the following steps:

The target fault point and the second detection return value are output.

This application receives the failure analysis report and obtains all potential failure points in the failure analysis report; for each potential failure point, the following steps are performed: health detection is performed on the potential failure point, and the first detection return value is obtained ; Obtaining a new fault point specified by the first detection return value, and performing continuous health detection on the new fault point until no new fault point is generated, corresponding to the new fault point is no longer generated The fault point of is determined as the target fault point, and the second detection return value of the target fault point is obtained; the target fault point and the second detection return value are output. That is, in this application, after receiving the failure analysis report, the potential failure point is automatically obtained, and continuous iterative health detection is automatically performed on the potential failure point, rather than manual detection, to quickly obtain the target failure point, that is Quickly locate the target fault point, because it quickly locates the target fault point, it saves the positioning time, so it also saves the repair time accordingly, and improves the experience of users who are O&M personnel. Therefore, the technical problem of locating the fault repair node in the prior art is low, consumes valuable repair time, prolongs the repair cycle, and affects the user experience.

BRIEF DESCRIPTION

FIG. 1 is a schematic flowchart of a first embodiment of a smart IT operation and maintenance fault location method of this application;

2 is a detailed flow diagram of the step of performing health detection on the potential fault point and obtaining the first detection return value in the intelligent IT operation and maintenance fault locating method of the present application;

FIG. 3 is a schematic diagram of the device structure of the hardware operating environment involved in the method of the embodiment of the present application.

The implementation, functional characteristics and advantages of the present application will be further described in conjunction with the embodiments and with reference to the drawings.

detailed description

It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

The present application provides an intelligent IT O&M fault location method. In the first embodiment of the present intelligent IT O&M fault location method, referring to FIG. 1, the intelligent IT O&M fault location method includes:

Step S10: Receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential failure point, perform the following steps:

Step S20, health detection is performed on the potential failure point, and a first detection return value is obtained;

Step S30: Obtain a new fault point specified by the first detection return value, and perform continuous health detection on the new fault point until no new fault point is generated, and then the new fault point is not generated. The fault point corresponding to the point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

Step S40, output the target fault point and the second detection return value.

Specific steps are as follows:

It should be noted that, in this embodiment, positioning detection is performed on a plurality of potential fault points in the forefront of credibility, so as to finally detect the determined target fault point.

Specifically, the intelligent IT operation and maintenance fault location method is applied to the intelligent IT operation and maintenance fault location system. Before receiving the failure analysis report, the intelligent IT operation and maintenance fault analysis system that communicates with the intelligent IT operation and maintenance fault location system will get a certain time Alarm information of all related parties. After obtaining the alarm information of each related party, the intelligent IT operation and maintenance analysis system will analyze the alarm information of each related party according to the pre-stored fault analysis computer-readable instructions, obtain a fault analysis report, and report the fault The analysis report is sent to the intelligent IT operation and maintenance fault location system, where the fault analysis report lists various potential fault points.

For the intelligent IT operation and maintenance fault location system, after receiving the fault analysis report, all potential fault points in the fault analysis report can be parsed and obtained. For example, node A in the current system cannot call node B, while the report Node A and node B are listed as potential failure points at the same time. Then the system will directly obtain all potential failure points of node A and node B, and locate and detect all potential failure points, namely node A and node B, to determine whether there is a failure of node A, or a failure of node B, or A Both node and node B have a fault, and further determine the specific fault flow or source of the faulty node.

For each potential failure point, perform the following steps:

There must be one or several fault points in the potential fault points that cause other nodes to fail. In this embodiment, the intelligent IT operation and maintenance fault location system performs health detection on each potential fault point to obtain each first detection return value.

Specifically, the step of performing health detection on the potential fault point and obtaining the first detection return value includes:

Step S21: Locate the operation source and operation process cited by the potential fault point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

In this embodiment, during the health detection process, the potential failure node is assumed to be a normal node, and the operation source and each operation process referenced by the potential failure point are located, and the operation source and each operation process are healthy Detection, specifically, detection is performed through pre-stored operation sources and data I/O indicators of each operation process in a normal state.

It should be noted that, in this embodiment, after acquiring the node type of the potential failure point, a detection function corresponding to the node type may be obtained from a preset tool library, and the detection function may be obtained Detection index, and perform a sniffing test on the operation source and operation process in the potential fault point according to the detection index, and after sniffing detection, then accurately select the operation source and each operation process under normal conditions The pre-stored data I/O indicators are detected to obtain the first detection return value of the potential fault point, so as to save the detection process.

To illustrate with specific embodiments, the three sequential links A1, A2, and A3 need to be executed in the A fault point, that is, the three calculation processes of A1, A2, and A3. The intelligent IT operation and maintenance fault location system starts from the A1 link. By entering the preset starting parameters (data I/O indicators) in the A1 link, the A1 link will get an operation value to judge the operation value and the preset result value Is it consistent. If they are consistent, it means that there is no problem in the A1 link. At this time, the intelligent IT operation and maintenance fault location system will receive the first detection return value of the A1 link without problems, such as a10. Otherwise, it means that there is a problem in the A1 link. At this time, the intelligent IT The operation and maintenance fault location system will receive the first detection return value of the problem in the A1 link, such as a11, that is, the intelligent IT operation and maintenance fault determination system will locate the operation source and operation process referenced in the A1 link, so as to obtain the corresponding The return value of each first probe. Then detect the subsequent A2 link, the principle is the same as the A1 link. Finally, all the first detection return values in the A fault point are obtained. If the fault point A is complete and normal, then it is the fault point A that provides the value of the A3 link to the fault point B, and the fault point B has an error. At this time, the detection return value of the fault point B node can be obtained as b.

Further, for ease of understanding, the following is explained through examples: A fault point is the product order node, B fault point is the order database, and the fault condition is that A fault point cannot call the order content in the B fault point database. The intelligent IT operation and maintenance fault locating system will determine whether the order number of the A fault point is correct, and detect the order call of the A fault point by detecting the known order number to determine whether the A fault point normally retrieves the order number in the B node. The order number in node B can be retrieved normally, and then the fault point A can be used to query the record of the number. If the order number can be used to query the record, then point A can determine the order content in the record. Pull, if the content of the order in the record can be pulled normally, it is judged whether the content pulled at point A has changed. If the content pulled at point A does not change, it is judged whether the way of displaying the order content at point A is normal. If which step in the detection flow is different from the step result that the normal step should get, the system will locate the step node with different step result. For example, when the intelligent IT operation and maintenance fault location system locates the fault point A and calls the order record in the fault point B, node B cannot feedback the corresponding record information, then the intelligent IT operation and maintenance fault location system will detect which step in the calling process caused the error Failure, and return a return value of the first probe representing the failure of the step call.

The return value of the first detection may point to a new fault point. Therefore, the intelligent IT operation and maintenance fault location system will perform another health detection on the new fault point to obtain a new return value of the first detection. The new first detection return value points to another fault point, and iterates the above steps cyclically until no new fault point is generated in the end. The fault point corresponding to the new fault point is determined as the target fault point, and The second detection return value of the target fault point.

In other words, the intelligent IT operation and maintenance fault location system needs to iteratively detect potential fault points, that is, iteratively obtain the first detection return value of each potential fault point. If the first detection return value points to a new fault point, it means data detection Not detected in the end, if no new fault point is generated, it means that the system has traversed all the current fault points that may be abnormal. At this time, after the operation of detecting multiple potential failure points to the end, the intelligent IT operation and maintenance fault location system will obtain one or more target failure points pointed by the first detection return value multiple times, that is, the intelligent IT operation The dimension fault location system has extracted the common fault points (target fault points) generated by the intersection of each potential fault point. There can be more than one common fault point, which is the source of data offset for all associated fault points.

For example, when A calls B, A generates a fault, and A calls C to generate a fault, but B calls C to generate a fault, then as a common intersection fault point of BC, A is the source fault point.

Step S40, output the target fault point and the second detection return value.

After the target fault point and the second detection return value are obtained, the target fault point and the second detection return value are output to prompt the user or the operation and maintenance personnel.

Further, referring to FIG. 2, the present application provides another embodiment of an intelligent IT operation and maintenance fault locating method. In this embodiment, the step of performing health detection on the potential fault point and obtaining the first detection return value further include:

Step S22: Obtain the node type of the potential failure point, and obtain the detection function corresponding to the node type from the preset tool library;

In this embodiment, different potential fault points have respective node types in the intelligent IT O&M fault location system, and different node types have corresponding ones in the preset function library of the intelligent IT O&M fault location system system Exclusive detection function to obtain the node type of the potential failure point, and obtain the detection function corresponding to the node type from the preset tool library, for example, if the potential failure point is a network communication node, then the system will be preset The network detection function mapped to the network communication node category is obtained from the function library.

Step S23: Obtain the detection index of the detection function, and perform a sniffing test on the index data in the potential fault point according to the detection index to obtain the first detection return value of the potential fault point.

Different detection functions have different detection indicators. For example, the detection indicators of the network detection function are network link status, data transmission rate, and so on. In this embodiment, a sniffing test is performed on the index data in the corresponding potential failure points through the detection function. The sniffer test is to classify and filter the index data in the potential failure point, so as to filter out the index data of the same type as the detection index in the index data, and perform traceability detection on the selected index data of the same type. Thus, each first detection return value or second detection return value corresponding to the potential fault point is further obtained.

For example, the current detection index detects the network connection status in the potential fault point, then the detection step of the system may include the following steps: intelligent IT operation and maintenance fault location to determine the network connection status of the connected dual-end object, and initiate the establishment from node A to node B In the network connection instruction, the system determines the ip1 address of node A, obtains the ip2 address of node B, and establishes whether the DNS resolution service between node A and node B is correct, and so on. Through the detection function, the output and input of all network index data involved in the network connection are tested to confirm which part of each process has a problem.

In this embodiment, the node type of the potential failure point is obtained, and the detection function corresponding to the node type is obtained from a preset tool library; the detection index of the detection function is obtained, and according to the detection The indicator performs a sniffing test on the indicator data in the potential fault point to obtain the first detection return value of the potential fault point. Due to the accurate sniffer test, it can lay the foundation for orderly and rapid location to obtain the target fault point.

Further, this application provides another embodiment of the intelligent IT operation and maintenance fault locating method. In this embodiment, the step of outputting the target fault point and the second detection return value includes:

Step S50: Acquire and select the target emergency plan corresponding to the target fault point from the pre-stored plan database according to the fault state of the target fault point, and execute the target emergency plan for the target fault point;

In this embodiment, a plan database is pre-stored, and the plan database includes various emergency plans for the node type or fault state of the target fault point, and is used to solve the fault situation of the target fault point. After determining the target failure point, the system directly retrieves and executes the corresponding target emergency plan from the system database.

Step S60, after executing the target emergency plan, perform health detection on the target fault point again.

After the system implements the target emergency plan, in order to verify whether the problem of the current target fault point is solved, the target fault point will be re-health-detected, the steps are the same as the above-mentioned health detection steps.

After performing the target emergency plan, after performing the health detection step on the target failure point again, the method includes:

Step S70: Obtain a third detection return value obtained after performing health detection on the target fault point again, and determine whether the third detection return value points to a new fault point;

Acquiring a third detection return value obtained after re-health detection of the target fault point, and determining whether the third detection return value points to a new fault point, in this embodiment, the target fault point is re-executed After health detection, if a new fault point is obtained, it is clear that the intelligent IT operation and maintenance fault location system has not solved the fault state of the corresponding target fault point.

Step S80: If the third detection return value points to a new fault point, a warning message that cannot be automatically processed is output.

If the intelligent IT operation and maintenance fault locating system cannot automatically complete the processing of the above-mentioned fault status, it is necessary to output a warning message that cannot be automatically processed, so that the operation and maintenance personnel can perform manual processing to improve the fault tolerance performance of the intelligent IT operation and maintenance fault locating system.

In this embodiment, the target emergency plan corresponding to the target fault point is selected from the pre-stored plan database by acquiring and according to the fault state of the target fault point, and the target emergency plan is executed on the target fault point ; After executing the target emergency plan, re-health the target fault point. Therefore, the possible inconsistency between the target emergency plan and the fault state of the target fault point can be avoided, and the fault tolerance of the intelligent IT operation and maintenance fault location system can be improved.

Further, this application provides another embodiment of the intelligent IT operation and maintenance fault locating method. In this embodiment, the target fault is acquired and selected from the pre-stored solution database according to the fault status of the target fault point The steps of the target emergency plan corresponding to the points include:

Step S51: Obtain and obtain all emergency plans corresponding to the target fault point according to the fault status of the target fault point; if there are multiple emergency plans, count and execute the emergency plans in the past historical time period The frequency of passing the target failure point successfully passing the health detection;

In this embodiment, there may be multiple emergency plans corresponding to the target fault point. Therefore, the system counts the frequency of passing the target fault point through the health detection after the emergency plans are executed in the past historical time period.

In step S52, the emergency plan with the highest passing frequency is selected from the pre-stored plan database, and the emergency plan with the highest passing frequency is set as the target emergency plan.

Specifically, the system automatically recognizes the number of successful emergency plans that directly pass health detection, selects the emergency plan that passes the highest frequency from the pre-stored plan database, sets it as the highest priority recommended plan, and recommends the implementation of priority in the future emergency plan matching, Therefore, the system sets the emergency plan with the highest passing frequency as the target emergency plan.

In this embodiment, since all the emergency plans corresponding to the target fault point are pre-stored and obtained according to the fault status of the target fault point; if there are multiple emergency plans, the statistics of the past historical time period are counted. After each emergency plan is executed, the target failure point successfully passes the health detection pass frequency; the emergency plan with the highest pass frequency is selected from the pre-stored plan database, and the emergency plan with the highest pass frequency is set as the target emergency plan. Therefore, the target fault point can be resolved most quickly, and thus the experience of the operation and maintenance personnel, that is, the user can be improved.

Referring to FIG. 3, FIG. 3 is a schematic diagram of a device structure of a hardware operating environment involved in a solution of an embodiment of the present application.

The intelligent IT operation and maintenance fault locating device in the embodiment of the present application may be a PC, or may be a smartphone, tablet computer, e-book reader, MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio layer 3 player, MP4 (Moving Picture Experts Group Audio Layer IV, the standard audio layer for motion picture experts compression 3) Terminal devices such as players and portable computers.

As shown in FIG. 3, the intelligent IT operation and maintenance fault locating device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage. The memory 1005 may optionally be a storage device independent of the foregoing processor 1001.

Optionally, the intelligent IT operation and maintenance fault locating device may further include a target user interface, a network interface, a camera, and RF (Radio Frequency (radio frequency) circuits, sensors, audio circuits, WiFi modules, etc. The target user interface may include a display (Display) and an input sub-module, such as a keyboard (Keyboard), and the optional target user interface may also include a standard wired interface and a wireless interface. The network interface may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).

Those skilled in the art can understand that the structure of the intelligent IT O&M fault locating device shown in FIG. 3 does not constitute a limitation on the intelligent IT O&M fault locating device, and may include more or fewer components than the illustration, or a combination Some components, or different component arrangements.

As shown in FIG. 3, the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, and computer-readable instructions for intelligent IT operation and maintenance fault location. The operating system is a computer-readable instruction that manages and controls the hardware and software resources of the intelligent IT O&M fault location equipment, and supports the operation of the intelligent IT O&M fault location computer-readable command and other software and/or computer-readable instructions. The network communication module is used to realize communication between various components inside the memory 1005, and to communicate with other hardware and software in the intelligent IT operation and maintenance fault locating device.

In the intelligent IT O&M fault locating device shown in FIG. 3, the processor 1001 is configured to execute the intelligent IT O&M fault locating computer readable instructions stored in the memory 1005 to implement the intelligent IT O&M fault described in any one of the above Steps of positioning method.

The specific implementation of the intelligent IT operation and maintenance fault locating device of the present application is basically the same as the above-mentioned embodiments of the intelligent IT operation and maintenance fault locating method, which will not be repeated here.

This application also provides an intelligent IT operation and maintenance fault locating device. The specific implementation of the intelligent IT operation and maintenance fault locating device in this application is basically the same as the above-mentioned embodiments of the intelligent IT operation and maintenance fault locating method, and details are not described herein again.

The present application provides a readable storage medium. The readable storage medium may be a non-volatile readable storage medium. The readable storage medium stores one or more computer-readable instructions. The one or one The above computer readable instructions may also be executed by one or more processors for implementing the steps of the intelligent IT operation and maintenance fault location method described in any one of the above.

The specific implementation of the readable storage medium of the present application is basically the same as the above embodiments of the intelligent IT operation and maintenance fault locating method, which will not be repeated here.

The above are only the preferred embodiments of the present application, and do not limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by the description and drawings of this application, or directly or indirectly used in other related technical fields The same reason is included in the patent processing scope of this application.

Claims

An intelligent IT operation and maintenance fault location method, wherein the intelligent IT operation and maintenance fault location method includes:

Receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential failure point, perform the following steps:

Health detection is performed on the potential failure point, and the first detection return value is obtained;

Acquiring a new fault point specified by the first detection return value, and performing continuous health detection on the new fault point, until no new fault point is generated, the corresponding new fault point is generated The fault point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

The target fault point and the second detection return value are output.
The intelligent IT operation and maintenance fault locating method according to claim 1, wherein the step of performing health detection on the potential fault point and obtaining a first detection return value comprises:

Locate the operation source and operation process referenced by the potential fault point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

Wherein, the health detection includes the step of performing detection using the pre-stored data I/O indicators of the calculation source and each calculation flow in a normal state.
The intelligent IT operation and maintenance fault locating method according to claim 1, wherein the step of performing health detection on the potential fault point and obtaining a first detection return value further comprises:

Obtain the node type of the potential failure point, and obtain the detection function corresponding to the node type from the preset tool library;

Obtain the detection index of the detection function, and perform a sniffing test on the index data in the potential fault point according to the detection index to obtain the first detection return value of the potential fault point.
The intelligent IT operation and maintenance fault locating method according to claim 1, wherein the step of outputting the target fault point and the second detection return value comprises:

Acquire and select the target emergency plan corresponding to the target fault point from the pre-stored plan database according to the fault state of the target fault point, and execute the target emergency plan for the target fault point;

After executing the target emergency plan, health detection is performed on the target fault point again.
The intelligent IT operation and maintenance fault locating method according to claim 4, wherein after performing the target emergency plan, after performing the health detection step on the target fault point again includes:

Acquiring a third detection return value obtained after performing health detection on the target fault point again, and determining whether the third detection return value points to a new fault point;

If the third detection return value points to a new fault point, a warning message that cannot be automatically processed is output.
The intelligent IT operation and maintenance fault locating method according to claim 5, wherein the step of acquiring and selecting a target emergency plan corresponding to the target fault point from a pre-stored plan database according to the fault state of the target fault point includes :

Obtain and obtain all emergency plans corresponding to the target fault point according to the fault status of the target fault point; if there are multiple emergency plans, count the statistics after the execution of the emergency plans in the past historical time period Passing frequency of successfully passing the health detection of the target fault point;

The emergency plan with the highest passing frequency is selected from the pre-stored plan database, and the emergency plan with the highest passing frequency is set as the target emergency plan.
An intelligent IT operation and maintenance fault locating device, wherein the intelligent IT operation and maintenance fault locating device includes:

A receiving module, used to receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential point of failure, there is an execution module, which includes:

A health detection submodule, configured to perform health detection on the potential failure point and obtain a first detection return value;

A first obtaining submodule, configured to obtain a new fault point specified by the first detection return value, and perform continuous health detection on the new fault point, until no new fault point is generated, the The fault point corresponding to the new fault point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

The output submodule is used to output the target fault point and the second detection return value.
The intelligent IT operation and maintenance fault locating device according to claim 7, wherein the health detection sub-module includes:

A positioning unit, configured to locate the operation source and operation process cited by the potential failure point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

Wherein, the health detection includes the step of performing detection using the pre-stored data I/O indicators of the calculation source and each calculation flow in a normal state.
An intelligent IT operation and maintenance fault locating device, wherein the intelligent IT operation and maintenance fault locating device includes: a memory, a processor, a communication bus, and computer-readable instructions stored on the memory,

The communication bus is used to realize the communication connection between the processor and the memory;

The processor is used to execute the computer-readable instructions to implement the following steps:

Receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential failure point, perform the following steps:

Health detection is performed on the potential failure point, and the first detection return value is obtained;

Acquiring a new fault point specified by the first detection return value, and performing continuous health detection on the new fault point, until no new fault point is generated, the corresponding new fault point is generated The fault point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

The target fault point and the second detection return value are output.
The intelligent IT operation and maintenance fault locating device according to claim 9, wherein the step of performing health detection on the potential fault point and obtaining the first detection return value comprises:

Locate the operation source and operation process referenced by the potential fault point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

Wherein, the health detection includes the step of performing detection using the pre-stored data I/O indicators of the calculation source and each calculation flow in a normal state.
The intelligent IT operation and maintenance fault locating device according to claim 9, wherein the step of performing health detection on the potential fault point and obtaining a first detection return value further comprises:

Obtain the node type of the potential failure point, and obtain the detection function corresponding to the node type from the preset tool library;

Obtain the detection index of the detection function, and perform a sniffing test on the index data in the potential fault point according to the detection index to obtain the first detection return value of the potential fault point.
The intelligent IT operation and maintenance fault locating device according to claim 9, wherein the step of outputting the target fault point and the second detection return value comprises:

Acquire and select the target emergency plan corresponding to the target fault point from the pre-stored plan database according to the fault state of the target fault point, and execute the target emergency plan for the target fault point;

After executing the target emergency plan, health detection is performed on the target fault point again.
The intelligent IT operation and maintenance fault locating device according to claim 12, wherein after performing the target emergency plan, after performing the health detection step on the target fault point again includes:

Acquiring a third detection return value obtained after performing health detection on the target fault point again, and determining whether the third detection return value points to a new fault point;

If the third detection return value points to a new fault point, a warning message that cannot be automatically processed is output.
The intelligent IT operation and maintenance fault locating device according to claim 13, wherein the step of acquiring and selecting a target emergency plan corresponding to the target fault point from a pre-stored plan database according to the fault state of the target fault point includes :

Obtain and obtain all emergency plans corresponding to the target fault point according to the fault status of the target fault point; if there are multiple emergency plans, count the statistics after the execution of the emergency plans in the past historical time period Passing frequency of successfully passing the health detection of the target fault point;

The emergency plan with the highest passing frequency is selected from the pre-stored plan database, and the emergency plan with the highest passing frequency is set as the target emergency plan.
A readable storage medium, wherein the computer-readable instructions for intelligent IT operation and maintenance fault location are stored on the readable storage medium, and the following steps are implemented when the intelligent IT operation and maintenance fault location computer-readable instructions are executed by a processor:

Receive a failure analysis report and obtain all potential failure points in the failure analysis report;

For each potential failure point, perform the following steps:

Health detection is performed on the potential failure point, and the first detection return value is obtained;

Acquiring a new fault point specified by the first detection return value, and performing continuous health detection on the new fault point, until no new fault point is generated, the corresponding new fault point is generated The fault point is determined as the target fault point, and the second detection return value of the target fault point is obtained;

The target fault point and the second detection return value are output.
The readable storage medium of claim 15, wherein the step of performing health detection on the potential failure point and obtaining a first detection return value includes:

Locate the operation source and operation process referenced by the potential fault point, perform health detection on the operation source and operation process, and obtain the first detection return value corresponding to the operation source and each operation process;

Wherein, the health detection includes the step of performing detection using the pre-stored data I/O indicators of the calculation source and each calculation flow in a normal state.
The readable storage medium of claim 15, wherein the step of performing health detection on the potential failure point and obtaining a first detection return value further comprises:

Obtain the node type of the potential failure point, and obtain the detection function corresponding to the node type from the preset tool library;

Obtain the detection index of the detection function, and perform a sniffing test on the index data in the potential fault point according to the detection index to obtain the first detection return value of the potential fault point.
The readable storage medium of claim 15, wherein the step of outputting the target fault point and the second detection return value comprises:

Acquire and select the target emergency plan corresponding to the target fault point from the pre-stored plan database according to the fault state of the target fault point, and execute the target emergency plan for the target fault point;

After executing the target emergency plan, health detection is performed on the target fault point again.
The readable storage medium according to claim 18, wherein after performing the target emergency plan, after performing the health detection step on the target failure point again includes:

Acquiring a third detection return value obtained after performing health detection on the target fault point again, and determining whether the third detection return value points to a new fault point;

If the third detection return value points to a new fault point, a warning message that cannot be automatically processed is output.
The readable storage medium according to claim 19, wherein the step of acquiring and selecting a target emergency plan corresponding to the target fault point from a pre-stored plan database according to the fault state of the target fault point includes:

Obtain and obtain all emergency plans corresponding to the target fault point according to the fault status of the target fault point; if there are multiple emergency plans, count the statistics after the execution of the emergency plans in the past historical time period Passing frequency of successfully passing the health detection of the target fault point;

The emergency plan with the highest passing frequency is selected from the pre-stored plan database, and the emergency plan with the highest passing frequency is set as the target emergency plan. The