CN111859047A - Fault solving method and device - Google Patents

Fault solving method and device Download PDF

Info

Publication number
CN111859047A
CN111859047A CN201910329934.2A CN201910329934A CN111859047A CN 111859047 A CN111859047 A CN 111859047A CN 201910329934 A CN201910329934 A CN 201910329934A CN 111859047 A CN111859047 A CN 111859047A
Authority
CN
China
Prior art keywords
solution
atomic
fault
anomaly
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910329934.2A
Other languages
Chinese (zh)
Inventor
席佼佼
袁健清
徐日东
张文革
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910329934.2A priority Critical patent/CN111859047A/en
Publication of CN111859047A publication Critical patent/CN111859047A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Abstract

The application discloses a fault solving method and device. The method comprises the following steps: receiving a solution matching request to resolve the failure, the solution matching request including one or more of the following parameters: target system name, fault scene type information and atom abnormal type; when the existing solutions are matched in the solution library, binding index data; according to the index data and the atomic anomaly combinational logic defined by the existing solution, executing an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic; and outputting the instantiated solution. The scheme of the application can be used for cloud service, and by adopting the scheme of the application, in the face of complex problems, operation and maintenance personnel do not need to have deeper algorithm knowledge, and can utilize the generated solution to carry out fault analysis, so that various complex faults can be solved by adopting an artificial intelligence scheme.

Description

Fault solving method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for solving a fault.
Background
At present, service providers generally adopt a cloud service mode to provide services, and do not sell equipment. In addition to simple equipment failures, the dependency between services becomes more and more complex. Some complex problems need to be solved by intelligent analysis means, such as performance degradation, reasonable scheduling of loads, Key Performance Indicator (KPI) anomalies (e.g., service success rate), root cause analysis, and the like.
The intelligent fault management is a main scene of operation and maintenance, and means that early warning, discovery, diagnosis, delimitation, root cause analysis, rapid recovery and the like of faults are realized based on artificial intelligence and automation means. The typical intelligent fault management scheme is that an algorithm expert trains a model offline or online in advance, and the model is directly called by an operation and maintenance expert after being released and deployed.
The role division scheme is difficult to support operation and maintenance experts which do not know the algorithm to simply and flexibly use Artificial Intelligence (AI) to solve the problem. The specific reasons are: the algorithm experts usually have insufficient understanding on the business, and modeling the business starts from data rules, which may cause the model result not to be well matched at the business level or even to be contrary to the business logic. The operation and maintenance experts find that the model needs to be optimized in the using process, but the operation and maintenance experts only understand the service characteristics but do not understand the algorithm, the problem can be thrown to the algorithm experts which do not understand the service again, and the operation and maintenance experts are easy to fall into endless loops. The domain knowledge of the business experts is not effectively converted into rules or patterns available for the model, and only stays at the exchange discussion level.
The development of machine learning platforms in the industry is moving toward automation in order to reduce the threshold for using AI. Advanced solutions are Google's AutoML, AWS' SageMaker, Microsoft's Azure Machine Learning, Ali's PAI, etc. Most of the solutions have a common place that an arrangement tool (which is not supported by the SageMaker) is used, the analysis process of data by an algorithm expert is atomized, and then a workflow (workflow) is established in a graphical drag modeling mode to achieve full flow automation: the method comprises the steps of codeless (no code is needed to be written, only an analysis flow needs to be designed), automatic parameter adjustment (the parameters of the algorithm are determined, the optimal value can be automatically determined through a series of means), automatic model screening (the effect of the models generated by different algorithms can be automatically evaluated, the optimal model is selected), and automatic model integration (a plurality of models are combined together to ensure the optimal end-to-end effect). But only a very low level of atomic algorithms are available for orchestration. Such as: iForest, PCA, SVM, LSTM, etc. Therefore, the users considered in the design of these platforms are algorithmic experts. The operation and maintenance experts cannot grasp the algorithms, and an artificial intelligence solution (AI solution) facing the operation and maintenance problem cannot be constructed through the platforms, and the artificial intelligence solution and the algorithm experts must work cooperatively.
Therefore, the problem of the gap in the knowledge field between the operation and maintenance experts and the algorithm experts cannot be solved by the automatic machine learning: for the operation and maintenance experts, deep understanding of the machine learning algorithm is still required, such as: how to select the algorithm and set the initial parameters. For algorithm experts, the existing automatic machine learning solutions only can improve the model training efficiency, and the operation and maintenance problem needs to be accurately decomposed and converted into the algorithm problem.
Therefore, how to enable the operation and maintenance experts to simply and flexibly use the AI to solve the operation and maintenance problem is a problem to be solved urgently at present.
Disclosure of Invention
The application provides a fault solution method and device, so that operation and maintenance experts can simply and flexibly use an artificial intelligence scheme to solve various complex faults.
In a first aspect, a method for fault resolution is provided, the method including: receiving a solution matching request to resolve a failure, the solution matching request including one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception; when the existing solutions are matched in the solution library, binding index data; according to the index data and the atomic anomaly combinational logic defined by the existing solution, executing an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic; and outputting the instantiated solution. In the aspect, in the face of complex problems, deeper algorithm knowledge is not needed, the generated solutions can be used for fault analysis, and various complex faults can be solved by adopting artificial intelligence solutions.
In yet another implementation, the method further comprises: storing the instantiated solution. In the implementation, when the existing solution is matched, the existing solution is directly instantiated, and the problem is solved by adopting the instantiated solution, so that the method is convenient and direct.
In yet another implementation, when an existing solution is not matched in the solution library, decomposing the fault into a plurality of atomic anomalies according to a data rule of the fault; searching an atomic algorithm corresponding to each atomic anomaly; and generating a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies. In the aspect, in the face of complex problems, deep algorithm knowledge is not needed, a solution for solving the problems can be generated simply and flexibly by arranging the atomic algorithm, and various complex problems are solved by adopting an artificial intelligence scheme.
In yet another implementation, the method further comprises: instantiating the generated solution for the fault; and storing the instantiated solution. In this implementation, the solution obtained by orchestrating the atomic algorithm is instantiated, which can be used for the resolution of the actual problem.
In yet another implementation, the instantiating the generated solution to the failure includes: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
In yet another implementation, the fault scenario includes one or more of the following types: resource type faults, success rate type faults, time delay type faults and performance type faults. In this implementation, through an explicit fault scenario, the corresponding solution can be indexed and the fault resolved into corresponding atomic exceptions.
In yet another implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality. In the implementation, by determining the atom exceptions of each type, the fault can be decomposed into one or more types of atom exceptions according to the fault type, so that a problem solving scheme can be obtained by arranging the atom algorithm corresponding to the atom exceptions.
In a second aspect, there is provided a fault analysis apparatus, the apparatus comprising: a receiving unit, configured to receive a solution matching request for resolving a failure, where the solution matching request includes one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception; a first binding unit for binding the index data when the existing solution is matched in the solution library; a first execution unit, configured to execute an atomic algorithm corresponding to each atomic exception in the atomic exception combinational logic according to the index data and the atomic exception combinational logic defined by the existing solution; and a first output unit for outputting the instantiated solution.
In yet another implementation, the apparatus further comprises: a first storage unit for storing the instantiated solution.
In yet another implementation, the apparatus further comprises: the decomposition unit is used for decomposing the fault into a plurality of atomic anomalies according to the data rule of the fault when the existing solution is not matched in the solution library; the searching unit is used for searching the atomic algorithm corresponding to each atomic anomaly; and the generating unit is used for generating a solution of the fault according to the atomic algorithm corresponding to each atomic anomaly, and the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In yet another implementation, the apparatus further comprises: an instantiation unit to instantiate the generated solution for the fault; and a second storage unit for storing the instantiated solution.
In yet another implementation, the instantiation unit includes: a second binding unit for binding the index data; the second execution unit is used for executing the atomic algorithm corresponding to each atomic anomaly according to the index data; and a second output unit for outputting the instantiated solution.
Based on the same inventive concept, as the principle and the beneficial effects of the device for solving the problems can be referred to the method implementation mode and the brought beneficial effects, the implementation of the device can be referred to the implementation of the method, and repeated parts are not repeated.
In a third aspect, a fault analysis apparatus is provided, the apparatus comprising: an input device, an output device, a memory, and a processor; wherein the memory stores a set of program codes therein, and the processor is configured to call the program codes stored in the memory to perform the following operations: controlling the input device to receive a recipe match request to resolve a fault, the recipe match request including one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception; when the existing solutions are matched in the solution library, binding index data; according to the index data and the atomic anomaly combinational logic defined by the existing solution, executing an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic; controlling the output device to output the instantiated solution.
In yet another implementation, the processor further performs the following: storing the instantiated solution.
In yet another implementation, when an existing solution is not matched in the solution library, decomposing the fault into a plurality of atomic anomalies according to a data rule of the fault; searching an atomic algorithm corresponding to each atomic anomaly; and generating a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In another implementation, the processor further performs the following operations: instantiating the generated solution for the fault; and storing the instantiated solution.
In yet another implementation, the processor performs the operation of instantiating the generated solution to the fault, including: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
Based on the same inventive concept, as the principle and the beneficial effects of the device for solving the problems can be referred to the method implementation mode and the brought beneficial effects, the implementation of the device can be referred to the implementation of the method, and repeated parts are not repeated.
In a fourth aspect, there is provided a method of generating a solution to a fault, the method comprising: acquiring a data rule of a fault; decomposing the fault into a plurality of atomic anomalies; searching an atomic algorithm corresponding to each atomic anomaly; and generating a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies. In the aspect, in the face of complex problems, deep algorithm knowledge is not needed, a solution for solving the problems can be generated simply and flexibly by arranging the atomic algorithm, and various complex problems are solved by adopting an artificial intelligence scheme.
In one implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality. In the implementation, by determining the atom exceptions of each type, the fault can be decomposed into one or more types of atom exceptions according to the fault type, so that a problem solving scheme can be obtained by arranging the atom algorithm corresponding to the atom exceptions.
In yet another implementation, the method further comprises: instantiating the solution; and storing the instantiated solution. In this implementation, the solution obtained by orchestrating the atomic algorithm is instantiated, which can be used for the resolution of the actual problem.
In yet another implementation, the instantiating the solution includes: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
In a fifth aspect, there is provided an apparatus for generating a solution for a fault, the apparatus comprising a solution orchestration module, the solution orchestration module comprising: the acquisition unit is used for acquiring a data rule of a fault; a decomposition unit to decompose the fault into a plurality of atomic anomalies; the searching unit is used for searching the atomic algorithm corresponding to each atomic anomaly; and the generating unit is used for generating a solution of the fault according to the atomic algorithm corresponding to each atomic anomaly, and the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In one implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
In yet another implementation, the apparatus further includes a solution instantiation module, the solution instantiation module comprising: an instantiation unit for instantiating the solution; and a storage unit for storing the instantiated solution.
In yet another implementation, the instantiation unit is specifically configured to: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
Based on the same inventive concept, as the principle and the beneficial effects of the device for solving the problems can be referred to the method implementation mode and the brought beneficial effects, the implementation of the device can be referred to the implementation of the method, and repeated parts are not repeated.
In a sixth aspect, there is provided an apparatus for generating a solution to a fault, the apparatus comprising: an input device, an output device, a memory, and a processor; wherein the memory stores a set of program codes therein, and the processor is configured to call the program codes stored in the memory to perform the following operations: decomposing the fault into a plurality of atomic anomalies; searching an atomic algorithm corresponding to each atomic anomaly; and generating a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In one implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
In yet another implementation, the processor is further configured to perform the following: instantiating the solution; and storing the instantiated solution.
In yet another implementation, the processor performs the step of instantiating the solution, including: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
Based on the same inventive concept, as the principle and the beneficial effects of the device for solving the problems can be referred to the method implementation mode and the brought beneficial effects, the implementation of the device can be referred to the implementation of the method, and repeated parts are not repeated.
In a seventh aspect, there is provided a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method of the first aspect, the fourth aspect, or any one thereof.
In an eighth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the first, fourth or any of the above-mentioned aspects.
Drawings
The drawings that are required to be used in this application, either in the examples or in the background, are described below.
FIG. 1 is a schematic diagram of relationships among roles in an intelligent operation and maintenance process;
FIG. 2 is a system architecture diagram according to an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for generating a solution to a fault according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a specific example of a solution to generate a fault;
fig. 5 is a schematic flowchart of a fault resolution method provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of a fault solution provided in an embodiment of the present application;
fig. 7 is a schematic flowchart of a fault solution provided in an embodiment of the present application;
FIG. 8 is an interface diagram of an exemplary protocol arrangement;
FIG. 9 is a schematic flow diagram illustrating further details of the fault resolution method shown in FIG. 5;
FIG. 10 is a block diagram illustrating a fault-generating solution according to an embodiment of the present disclosure;
fig. 11 is a schematic hardware structure diagram of a solution for generating a fault according to an embodiment of the present application;
fig. 12 is a schematic block diagram of a fault resolution apparatus according to an embodiment of the present application;
Fig. 13 is a schematic hardware structure diagram of a fault resolution apparatus according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings.
The following definitions are given for several terms that may be referred to in the present application:
the solution of the fault is generally an artificial intelligence solution (AI solution), which refers to a solution for solving the fault through artificial intelligence, and specifically refers to a solution that can solve various operation and maintenance faults in a manner similar to human intelligence. After the system acquires the solution of the fault, corresponding index data can be bound, according to the index data and the atomic anomaly combinational logic defined by the solution, an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic is executed, and finally the instantiated solution is output. The user can use the instantiated solution for fault analysis to solve various complex faults. When no existing fault Solution exists in the Solution library (AI Solution library), the fault can be decomposed into a plurality of atomic anomalies according to the data rule of the fault, the atomic algorithm corresponding to each atomic anomaly is searched, and the Solution of the fault is generated according to the atomic algorithm corresponding to each atomic anomaly, so that the complex problem is faced, deeper algorithm knowledge is not needed, the Solution for solving the problem can be generated simply and flexibly by arranging the atomic algorithms, and the artificial intelligence scheme is adopted to solve various complex problems. The fault can be distinguished according to the scene, and specifically, the following fault scenes are included: the method comprises the following steps of success type fault, time delay type fault, resource type fault and performance type fault.
Atomic anomalies, refers to anomalies in any dimension of the data law. For each type of fault scenario, the corresponding data rule can be found. Atomic exceptions include the following types: multiple index correlation anomalies, single index horizontal component anomalies, single index trend component anomalies, and single index periodic component anomalies. Each fault scenario may be broken down into a logical combination of several atomic exceptions.
Atomic algorithm refers to the lower layer anomaly detection (anomaly detection) algorithm. The abnormal detection means that an attack event is determined by a characteristic matching method through a characteristic library of an attack behavior. The abnormal detection may also refer to detecting intrusion behavior according to abnormal behavior (system or user) and abnormal resources of a computer, and the key point is to establish a Profile of normal behavior of the user and the system (Profile) and detect actual activities to determine whether the Profile deviates from the normal Profile. The method for detecting the abnormality firstly defines a group of data of the system under a normal condition, such as Central Processing Unit (CPU) utilization rate, memory utilization rate, file checksum and the like, and then analyzes to determine whether the abnormality occurs. Each atomic exception has a corresponding atomic algorithm. Similar atomic algorithms can be adopted for dealing with a class of data rules.
In the service operation and maintenance process, there are multiple service roles, such as the relationship diagram of each role in the intelligent operation and maintenance process shown in fig. 1, and the operation and maintenance process mainly involves the following roles: business experts, operation and maintenance experts, and algorithm experts. As shown in fig. 1, where business experts are familiar with business domain knowledge, but often are not adept at data analysis algorithms; the operation and maintenance experts are familiar with the reasons and processes of problems generated in the operation and maintenance process of the business, for example, which problems generally exist in the system, which problems often appear, how to analyze, investigate and solve the problems, and meanwhile, certain knowledge in the business field is mastered. The operation and maintenance experts are also not good at the data analysis algorithm; the algorithm expert is skilled in mastering the data analysis algorithm, but does not know the knowledge of the business field and the operation and maintenance problem.
Aiming at the problems existing at present: an algorithm expert usually has insufficient understanding on the business, and modeling of the algorithm expert starts from a data rule, so that a model result cannot be well matched with a business level or even contradicts business logic; the operation and maintenance experts find that the model needs to be optimized in the using process, but the operation and maintenance experts only understand the service characteristics but do not understand the algorithm, the problem can be thrown to the algorithm experts which do not understand the service again, and the operation and maintenance experts are easy to fall into endless loops.
Fig. 2 is a schematic diagram of an architecture of a troubleshooting system provided in an embodiment of the present application, which may include a solution orchestration module 11 and a solution instantiation module 12. The solution arranging module 11 is configured to obtain a data rule of a fault, decompose the fault into a plurality of atomic anomalies, search for an atomic algorithm corresponding to each atomic anomaly, and generate a solution for the fault according to the atomic algorithm corresponding to each atomic anomaly, where the solution is a combination of atomic algorithms corresponding to the plurality of atomic anomalies. The generated failure solution is stored to a solution library. The solution instantiation module 12 is configured to instantiate the generated solution and store the instantiated solution. Specifically, the solution instantiation module 12 binds metric data, which refers to data related to the solution, from a database. And acquiring the atomic algorithms corresponding to the plurality of atomic exceptions included in the solution from the atomic exception algorithm warehouse. And inputting the solution template after the matching data (binding index data) to the machine learning platform. The machine learning platform provides training capabilities for a variety of atomic algorithms. The data analysis process of a solution generally consists of one or more atomic algorithms. The instantiation module calls the atomic algorithm training capacity provided by the machine learning platform one by one according to the set flow to finish the model training of the whole solution. And storing the trained solution model into a fault scenario solution analysis library.
The solution template is instantiated through data of a specific application system to obtain a solution model. The solution model is applied to the data set to be detected, and the abnormity detection of the fault scene can be carried out.
When an operation and maintenance fault occurs, a solution is sought from a solution library. Therefore, the system may further include a solution matching module 13. The solution matching module 13 is based on one or more of the following parameters: and searching a matched solution in a solution library by the name of the target system, the type of the fault scene and the type of the atomic exception. When the existing solutions are matched in the solution library, binding index data; according to the index data and the atomic anomaly combinational logic defined by the existing solution, executing an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic; and outputting the instantiated solution.
By adopting the method and the device for generating the fault solution, the problem is solved without deep algorithm knowledge in the face of complex problems, the solution for solving the problem can be generated simply and flexibly by arranging the atomic algorithm, the generated solution can be utilized for fault analysis, and various complex faults can be solved by adopting an artificial intelligence scheme.
Referring to fig. 3, a schematic flow chart of a method for generating a solution to a fault according to an embodiment of the present application is shown, where:
and S101, acquiring a data rule of the fault.
The operation and maintenance problem can be summarized as a resource problem (e.g., throughput), a success rate problem, a delay problem, and a performance problem, and the data rule thereof can also be summarized as a single-index mutation (or referred to as a single-index horizontal component abnormality), a single-index trend component abnormality, a single-index periodic component abnormality, a multi-index correlation abnormality, which are referred to as an atomic abnormality hereinafter. As shown in table 1, normal behavior and abnormal behavior are for each type of atomic anomaly.
TABLE 1
Figure BDA0002037370750000061
Figure BDA0002037370750000071
From the algorithm perspective, a similar algorithm model can be adopted for dealing with a class of data rules.
S102, decomposing the fault into a plurality of atomic anomalies.
Based on the principle, the operation and maintenance problems of similar data laws can be converted into a solution for a type of fault scenes by means of an arranging technology, and each fault scene is decomposed into a plurality of logical combinations of data law atomic anomalies by taking the fault scenes as input. Wherein the fault scenario types include: the method comprises the following steps of success type fault, time delay type fault, resource type fault and performance type fault.
S103, searching an atom algorithm corresponding to each atom anomaly.
Each atomic anomaly has its corresponding atomic algorithm.
Specifically, the atomic algorithm corresponding to each atomic exception may be looked up in the atomic exception algorithm repository.
S104, generating a solution of the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
After the atomic algorithm corresponding to each atomic anomaly is found, the atomic algorithm is called according to a workflow (workflow), and finally a fault analysis detection scheme including an algorithm model logic combination, namely an AI solution template, is output.
Further, the method may further comprise the steps of:
and S105, instantiating the solution.
The output is an AI solution template, which is a process description for a batch of data, and further, can be instantiated. The instantiation of the AIsolution indicates which of these data are specific. Correspondingly, step S105 specifically includes: binding index data, and executing an atomic algorithm corresponding to each atomic anomaly according to the index data; and outputting the instantiated solution.
In a specific implementation, index data may be bound from the database, where the index data refers to data related to the AI solution template. And acquiring the atomic algorithms corresponding to the plurality of atomic exceptions included in the AI solution from the atomic exception algorithm warehouse. And inputting the AI solution template after the matched data (binding index data) into the machine learning platform. The machine learning platform provides training capabilities for a variety of atomic algorithms. The data analysis process of an AI solution generally consists of one or more atomic algorithms. The instantiation module calls the atomic algorithm training capacity provided by the machine learning platform one by one according to the set flow to complete the model training of the whole AI solution.
The AI solution template is instantiated by the data of the specific application system to obtain an AI solution model. The AI solution model is applied to a data set to be detected, and the abnormity detection of the fault scene can be carried out.
And S106, storing the instantiated solution.
The instantiated solution can be used directly for fault detection. And storing the instantiated AIsolution to a fault scene analysis library, wherein the AIsolution can be used for subsequent fault detection of the type.
Fig. 4 is a schematic diagram of a failure-generating solution of a specific example, which mainly describes how a new AI solution template is obtained by programming. The failure scenarios are input, and each failure scenario can be decomposed into a logical combination of a plurality of atomic anomalies. For example, when a memory leak occurs (a resource type fault occurs), the fault can be decomposed into a multi-index correlation abnormality (specifically, the memory occupancy rate and the traffic correlation rule are abnormal) and a single-index trend abnormality (the traffic index is not stable, or the memory occupancy rate is raised under the condition that the traffic is stable). Then, each atomic anomaly is algorithmically mapped. The current anomaly detection atomic algorithm includes: time series anomaly detection (e.g., hold-winners, least squares, etc.), correlation analysis (pearson coefficients, entropy of information, etc.), and other common anomaly detection algorithms (e.g., iForest, PCA, LOF, etc.). The detection algorithm corresponding to the multi-index correlation abnormality in the present example may be pearson coefficient analysis, and the detection algorithm corresponding to the single-index trend abnormality may be static threshold judgment and hold-winter index smooth prediction. Through the fault arrangement and the algorithm mapping, an AI solution template can be output and stored in an AI solution library. And then instantiating the AI solution template, binding index data, executing an atomic algorithm corresponding to each atomic anomaly according to the bound index data, outputting a mature and available AI solution model, and storing the mature and available AI solution model in a fault scene intelligent analysis library. Still taking memory leak detection as an example, binding related data such as resource occupancy, memory consumption and the like in a database, inputting the AI solution template after binding the index data into a machine learning platform, calling the atomic algorithm training capacity provided by the machine learning platform one by one according to a set flow, and finishing the model training of the whole AIsolution.
According to the method for generating the fault solution provided by the embodiment of the application, in the face of complex problems, deeper algorithm knowledge is not needed, the solution for solving the problems can be generated simply and flexibly by arranging the atomic algorithm, the generated solution can be used for fault analysis, and various complex faults can be solved by adopting an artificial intelligence scheme. Fig. 5 shows how a fault is resolved using the solution generated by the above method.
Please refer to fig. 5, which is a flowchart illustrating a fault resolution method according to an embodiment of the present application, wherein:
s201, receiving a scheme matching request for solving the fault, wherein the scheme matching request comprises one or more of the following parameters: target system name, fault scenario type, atomic exception type.
When an operation and maintenance fault occurs, a solution is sought from a solution library for solving the fault. The solution herein refers generally to an artificial intelligence solution. And inputting a scheme matching request and indexing the existing scheme. The parameters of the index include: target system name, fault scenario type information, atomic exception type. The index parameter is selected for indexing, and thus, the schema matching request includes the one or more index parameters. For example, the target system name may be a system network element, an audio video conference, an internet of things (IoT); the fault scenario types include: the method comprises the following steps of (1) success rate type fault, time delay type fault, resource type fault and performance type fault; atomic exception types include: horizontal component abnormality, trend component abnormality, periodic abnormality, multi-index correlation abnormality.
S202, judging whether existing solutions exist in a solution library or not; if so, proceed to step S203; otherwise, further, it may jump to step S206.
After receiving the solution matching request, indexing the existing solution in the solution library according to the indexing parameters. If the system has solved similar failures before, the system may store solutions to solve the failures, so that the existing solutions (or AI solution templates) may be indexed in the solution library according to the index parameters. If the existing scheme is indexed, the existing scheme, simple scheme, can be directly adopted.
S203, when the existing solutions are matched in the solution library, the index data are bound.
S204, according to the index data and the atom anomaly combinational logic defined by the existing solution, executing an atom algorithm corresponding to each atom anomaly in the atom anomaly combinational logic;
and S205, outputting the instantiated solution.
If an existing AI solution template is matched in S202, further, the matched AI solution template may be instantiated. The instantiation process includes the above-described steps S203 to S205. Specifically, index data, which refers to data related to the AI solution template, is bound from the database. And acquiring the atomic algorithms corresponding to the atomic exceptions included in the AIsolution from an atomic exception algorithm warehouse. And inputting the AIsolution template after the matching data (binding index data) into the machine learning platform. The machine learning platform provides training capabilities for a variety of atomic algorithms. The data analysis process of an AI solution generally consists of one or more atomic algorithms. The instantiation module calls the atomic algorithm training capacity provided by the machine learning platform one by one according to the set flow to complete the model training of the whole AI solution.
The AI solution template is instantiated by the data of the specific application system to obtain an AI solution model. The AI solution model is applied to a data set to be detected, and the abnormity detection of the fault scene can be carried out. Fig. 6 is a schematic flow chart of the fault solution method provided in the embodiment of the present application, where the schematic flow chart is a scene matched with an existing scheme for fault analysis. Taking the example of detecting whether the memory leak occurs, the parameters may be used as keywords to perform indexing, and the AI solution library is searched for whether there is an existing AI solution. When an existing AI solution template is matched in the AI solution library, the AI solution template is instantiated. Specifically, index data, which refers to data related to the AI solution template, is bound from the database. And acquiring the atomic algorithms corresponding to the atomic exceptions included in the AIsolution from an atomic exception algorithm warehouse. And inputting the AIsolution template after the matching data (binding index data) into the machine learning platform. The machine learning platform provides training capabilities for a variety of atomic algorithms. The data analysis process of an AI solution generally consists of one or more atomic algorithms. The instantiation module calls the atomic algorithm training capacity provided by the machine learning platform one by one according to the set flow to complete the model training of the whole AI solution. The AI solution template is instantiated by the data of the specific application system, and finally the memory leak detection model is output.
Further, the existing solutions after instantiation can also be stored.
S206, when the existing solution is not matched in the solution library, decomposing the fault into a plurality of atom exceptions according to the data rule of the fault.
If an existing AI solution template is not matched in the solution library, a new AI solution can be created by orchestrating the combination atomic exceptions.
According to the above description of the fault scenario types, the operation and maintenance problem can be summarized as a resource problem (e.g., throughput), a success rate problem, a delay problem, and a performance problem, and the data rule thereof can also be summarized as a single-index mutation (or referred to as a single-index horizontal component anomaly), a single-index trend component anomaly, a single-index periodic component anomaly, and a multi-index correlation anomaly, which are referred to as an atomic anomaly hereinafter. As shown in table 1, normal behavior and abnormal behavior are for each type of atomic anomaly.
From the algorithm perspective, a similar algorithm model can be adopted for dealing with a class of data rules.
Based on the principle, the operation and maintenance problems of similar data laws can be converted into a solution for a type of fault scenes by means of an arranging technology, and each fault scene is decomposed into a plurality of logical combinations of data law atomic anomalies by taking the fault scenes as input.
And S207, searching an atom algorithm corresponding to each atom anomaly.
Atomic algorithm with corresponding atomic exceptions
Specifically, an atomic algorithm corresponding to each atomic exception is searched in an atomic exception algorithm warehouse.
S208, outputting a solution of the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
After the atomic algorithm corresponding to each atomic anomaly is found, the atomic algorithm is called according to a workflow (workflow), and finally a fault analysis detection scheme including an algorithm model logic combination, namely an AI solution template, is output.
S209, instantiating the generated solution of the fault.
The output is an AI solution template, which is a process description for a batch of data, and further, can be instantiated. The instantiation of the AIsolution indicates which of these data are specific.
Step S206 specifically includes: and binding index data, executing an atomic algorithm corresponding to each atomic anomaly according to the index data, and then outputting the instantiated solution.
Specifically, index data, which refers to data related to the AI solution template, is bound from the database. And acquiring the atomic algorithms corresponding to the plurality of atomic exceptions included in the AI solution from the atomic exception algorithm warehouse. And inputting the AI solution template after the matched data (binding index data) into the machine learning platform. The machine learning platform provides training capabilities for a variety of atomic algorithms. The data analysis process of an AI solution generally consists of one or more atomic algorithms. The instantiation module calls the atomic algorithm training capacity provided by the machine learning platform one by one according to the set flow to complete the model training of the whole AI solution.
The AI solution template is instantiated by the data of the specific application system to obtain an AI solution model. The AI solution model is applied to a data set to be detected, and the abnormity detection of the fault scene can be carried out.
And S210, storing the instantiated solution.
The instantiated solution can be used directly for fault detection. And storing the instantiated AIsolution to a fault scene analysis library, wherein the AIsolution can be used for subsequent fault detection of the type.
Fig. 4 is a schematic diagram of a failure-generating solution of a specific example, which mainly describes how a new AI solution template is obtained by programming. The failure scenarios are input, and each failure scenario can be decomposed into a logical combination of a plurality of atomic anomalies. For example, when a memory leak occurs (a resource type fault occurs), the fault can be decomposed into a multi-index correlation abnormality (specifically, the memory occupancy rate and the traffic correlation rule are abnormal) and a single-index trend abnormality (the traffic index is not stable, or the memory occupancy rate is raised under the condition that the traffic is stable). Then, each atomic anomaly is algorithmically mapped. The current anomaly detection atomic algorithm includes: time series anomaly detection (e.g., hold-winners, least squares, etc.), correlation analysis (pearson coefficients, entropy of information, etc.), and other common anomaly detection algorithms (e.g., iForest, PCA, LOF, etc.). The detection algorithm corresponding to the multi-index correlation abnormality in the present example may be pearson coefficient analysis, and the detection algorithm corresponding to the single-index trend abnormality may be static threshold judgment and hold-winter index smooth prediction. Through the fault arrangement and the algorithm mapping, an AI solution template can be output and stored in an AI solution library. And then instantiating the AI solution template, binding index data, executing an atomic algorithm corresponding to each atomic anomaly according to the bound index data, outputting a mature and available AI solution model, and storing the mature and available AI solution model in a fault scene intelligent analysis library. Still taking memory leak detection as an example, binding related data such as resource occupancy, memory consumption and the like in a database, inputting the AI solution template after binding the index data into a machine learning platform, calling the atomic algorithm training capacity provided by the machine learning platform one by one according to a set flow, and finishing the model training of the whole AIsolution.
Fig. 7 is a schematic flowchart of a fault solution provided in an embodiment of the present application, where the schematic flowchart is an example of fault analysis performed by using a rearranged scheme. When the memory leak needs to be detected, if the existing AI solution is not matched in the AIsolution library, the logic combination can be performed based on the existing atomic exception and the fault scene to form a new AI solution. Specifically, describing according to the data rule of the atomic anomaly, arranging the logic combination of the atomic anomaly algorithm according to the fault scene, and outputting a newly arranged AI solution; then, calling a related data source and binding index data; calling an atomic algorithm corresponding to each atomic exception from an atomic exception algorithm warehouse; inputting the AIsolution template bound with the index data into a machine learning platform, calling the atomic algorithm training capacity provided by the machine learning platform one by one according to a set flow, completing the model training of the whole AI solution, and outputting a memory leakage detection model.
It should be noted that the AI solution arrangement may be to arrange a completely new AI solution template, or to optimize the original AI solution template. As shown in fig. 8, an exemplary AI solution layout interface diagram provided in the embodiment of the present application is that after an existing AI solution template is loaded, atomic algorithms corresponding to some atomic exceptions of the existing template are recombined, and a modified AI solution template is saved. As shown in fig. 8, the existing template with memory leakage is loaded, the atomic algorithms corresponding to the two atomic anomalies are considered to be modified or added for multi-index correlation analysis and single-index trend anomaly detection, two index data (index 1: memory occupancy rate and index 2: traffic volume) are bound, the corresponding atomic algorithms are executed, and a new AI solution template is output.
From the above, the application has provided the AI solution for basic fault scenario detection, and the operation and maintenance experts can simply make a selection directly by number. When the fault scene is complex and the existing basic fault scene type AI solutions cannot be matched, the application provides the atomic exception of the data rule, the existing AI solutions can be arranged and optimized through the atomic exception, and a new AI solution is output.
According to the fault solution provided by the embodiment of the application, in the face of complex problems, deeper algorithm knowledge is not needed, the generated solution can be used for fault analysis, and various complex faults can be solved by adopting an artificial intelligence scheme.
Please refer to fig. 9, which is a flowchart illustrating a further detailed method for solving the fault shown in fig. 5. The process includes a design-time process and a runtime process. An AI solution model is generated by a flow at the time of design, and the generated AI solution model is used by a flow at the time of runtime.
Furthermore, the process during design comprises a data preparation stage, a solution and model creation stage, a model training and tuning stage and a model storage stage; the runtime flow specifically includes a model run phase.
The system has three types of participation roles: algorithm engineers, operation and maintenance engineers and operation and maintenance platforms.
1) First, a flow at the time of design (a flow corresponding to the 1# line in fig. 9) is described:
firstly, a training task is created, and an existing AI solution template is retrieved according to the fault scene type. Instantiating the AI solution if the existing AI solution template is retrieved; if the existing AI solution template is not retrieved, a new AI solution template is rearranged and instantiated.
Rearranging to obtain a new AI solution template, namely taking fault scenes as input, decomposing each fault scene into a plurality of logical combinations of data rule atomic anomalies, and searching an atomic algorithm corresponding to each atomic anomaly; after the atomic algorithm corresponding to each atomic anomaly is found, the atomic algorithms are called from an atomic anomaly algorithm library according to a workflow (workflow), and finally a fault analysis detection scheme including an algorithm model logic combination, namely an AI solution template, is output.
After the AI solution template is instantiated, a training task may be initiated to perform model training. Specifically, whether the model meets expected requirements, such as accuracy and missing report rate, is verified through the training set. Through a specified data analysis process (i.e., AI solution as referred to herein), a data profile, i.e., a colloquially-referred model, is generated. Such as: it may be a simple mathematical formula representing the relationship between multiple indices. And adjusting the adjustable parameters in the AIsolution to obtain the optimal model. A test is performed using the tagged data set to assess whether the model is available. And storing the models which can be used for reasoning into a fault scene intelligent analysis library.
2) Data acquisition and use procedure (procedure corresponding to # 2 line in fig. 9):
after the data is collected from the target system, the data can be simultaneously stored in a real-time database and a historical database. When the method is used for training and reasoning, preprocessing and characteristic engineering are performed. The real-time database is used for supporting the reasoning process of model calling; the historical database is used for providing a training set, a verification set and a test set in model training.
Wherein, the training set refers to data used for training the model;
the verification set is used for verifying whether the model generated by the training set meets expected requirements, such as accuracy and missing report rate;
a test set refers to a data set for a test model;
preprocessing refers to the necessary processing of auditing, screening, sorting, etc., before classifying or grouping the collected data.
There are various methods for data preprocessing: data cleaning, data integration, data transformation, data reduction and the like.
Characteristic engineering: feature engineering is a process of transforming raw data into features, which can describe the data well and make the model built by using the features perform the best on unknown data.
3) Model evaluation procedure (procedure corresponding to line # 3 in fig. 9):
The process is a verification action performed after the model is deployed and applied in a reasoning mode, so that the reasoning result does not meet the online requirement due to the fact that the model fails due to sample drift is avoided.
Specifically, the capability of the model is evaluated based on a set capability baseline (a series of model capability indicators).
4) Run-time flow (flow corresponding to line 4# in fig. 9):
the process is an inference application process after model deployment. The method specifically comprises the following steps: establishing an inference task and selecting an AIsolution model; and reading the AI solution model and outputting an analysis result. And therefore, the new data can be inferred by the generated model.
In addition, when the inference result is in question, the operation and maintenance result can be rewound to the feature sample library to check the generation process of the result in detail, so that the problem can be analyzed.
Based on the same concept of the method for generating a solution to a failure in the above embodiment, as shown in fig. 10, the embodiment of the present application further provides an apparatus 100 for generating a solution to a failure, which may be used to implement the method shown in fig. 3. The apparatus 100 includes a solution orchestration module 11 and a solution instantiation module 12.
Illustratively, the solution orchestration module 11 includes: an obtaining unit 111, configured to obtain a data rule of a fault; a decomposition unit 112 for decomposing the fault into a plurality of atomic anomalies; a searching unit 113, configured to search an atomic algorithm corresponding to each atomic anomaly; and a generating unit 114, configured to generate a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, where the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
Illustratively, in yet another implementation, the solution instantiation module 12 comprises: an instantiation unit 121 for instantiating the solution; and a storage unit 122 for storing the instantiated solution.
For example, in yet another implementation, the instantiation unit 121 is specifically configured to: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
Illustratively, in one implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
More detailed descriptions about the solution orchestration module 11 and the solution instantiation module 12 can be obtained by referring to the related descriptions in the method embodiment shown in fig. 3, and are not repeated herein.
According to the device for generating the fault solution, the complex problems are faced, deep algorithm knowledge is not needed, the solution for solving the problems can be simply and flexibly output through arranging the atomic algorithm, and the artificial intelligence scheme is adopted to solve various complex problems.
Referring to fig. 11, a schematic diagram of a hardware structure of an apparatus for generating a solution to a fault according to an embodiment of the present application is shown, where the fault analysis apparatus 200 includes: an input device 21, an output device 22, a memory 23 and a processor 24 (the number of the processors 24 in the device may be one or more, and one processor is taken as an example in fig. 11). In some embodiments of the present invention, the input device 21, the output device 22, the memory 23 and the processor 24 may be connected by a bus or other means, wherein the bus connection is taken as an example in fig. 11.
Wherein processor 24 is configured to perform the following operations:
decomposing the fault into a plurality of atomic anomalies; searching an atomic algorithm corresponding to each atomic anomaly; and generating a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In one implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
In yet another implementation, the processor is further configured to perform the following: instantiating the solution; and storing the instantiated solution.
In yet another implementation, the processor performs the step of instantiating the solution, including: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
The processor may be a CPU, a Network Processor (NP), or a combination of the CPU and the NP.
The processor may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The memory may include volatile memory (volatile memory), such as random-access memory (RAM); the memory may also include a non-volatile memory (non-volatile) such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the memory may also comprise a combination of memories of the kind described above.
According to the device for generating the fault solution, the complex problems are faced, deep algorithm knowledge is not needed, the solution for solving the problems can be simply and flexibly output through arranging the atomic algorithm, and the artificial intelligence scheme is adopted to solve various complex problems.
Based on the same concept of the fault solution method in the foregoing embodiment, as shown in fig. 12, the embodiment of the present application further provides a fault analysis apparatus 300, which can be used to implement the method shown in fig. 5. The failure analysis device includes 300: the receiving unit 31, the first binding unit 32, the first executing unit 33, and the first outputting unit 34 may further include a first storing unit 35, a decomposing unit 36, a searching unit 37, a generating unit 38, an instantiating unit 39, and a second storing unit 30.
Illustratively, the apparatus 300 comprises: a receiving unit 31, configured to receive a solution matching request for resolving a failure, where the solution matching request includes one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception; a first binding unit 32 for binding the index data when the existing solution is matched in the solution library; a first execution unit 33, configured to execute an atomic algorithm corresponding to each atomic exception in the atomic exception combinational logic according to the index data and the atomic exception combinational logic defined by the existing solution; and a first output unit 34 for outputting the instantiated solution.
In yet another implementation, the apparatus further comprises: a first storage unit 35 for storing the instantiated solutions.
In yet another implementation, the apparatus 300 further comprises: a decomposition unit 36, configured to, when an existing solution is not matched in the solution library, decompose the fault into a plurality of atomic anomalies according to a data rule of the fault; a searching unit 37, configured to search an atomic algorithm corresponding to each atomic exception; and a generating unit 38, configured to generate a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, where the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In another implementation, the apparatus 300 further comprises: an instantiation unit 39 for instantiating the generated solution for the fault; and a second storage unit 30 for storing the instantiated solution.
In another implementation, the instantiation unit 39 includes: a second binding unit 391 for binding the index data; a second executing unit 392, configured to execute an atomic algorithm corresponding to each atomic anomaly according to the index data; and a second output unit 393 for outputting the instantiated solution.
More detailed descriptions about the receiving unit 31, the first binding unit 32, the first executing unit 33, the first outputting unit 34, the first storing unit 35, the decomposing unit 36, the searching unit 37, the generating unit 38, the instantiating unit 39, and the second storing unit 30 can be obtained by referring to the related descriptions in the embodiment of the method shown in fig. 5, which are not repeated herein.
According to the fault solving device provided by the embodiment of the application, in the face of complex problems, deeper algorithm knowledge is not needed, the generated solution can be used for fault analysis, and various complex faults can be solved by adopting an artificial intelligence scheme.
Referring to fig. 13, a schematic diagram of a hardware structure of a fault resolution apparatus according to an embodiment of the present application is shown, where the fault resolution apparatus 400 includes: an input device 41, an output device 42, a memory 43, and a processor 44 (the number of processors 44 in the device may be one or more, and one processor is taken as an example in fig. 13). In some embodiments of the present invention, the input device 41, the output device 42, the memory 43 and the processor 44 may be connected by a bus or other means, wherein the bus connection is taken as an example in fig. 13.
Wherein processor 44 is configured to perform the following operations:
receiving a solution matching request to resolve a failure, the solution matching request including one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception; when the existing solutions are matched in the solution library, binding index data; according to the index data and the atomic anomaly combinational logic defined by the existing solution, executing an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic; and outputting the instantiated solution.
In one implementation, processor 44 is further configured to perform the following operations: storing the instantiated solution.
In yet another implementation, processor 44 is further configured to perform the following: when the existing solution is not matched in the solution library, decomposing the fault into a plurality of atomic anomalies according to the data rule of the fault; searching an atomic algorithm corresponding to each atomic anomaly; and generating a solution to the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
In yet another implementation, processor 44 is further configured to perform the following: instantiating the generated solution for the fault; and storing the instantiated solution.
In yet another implementation, processor 44 performs the operation of instantiating the generated solution to the fault including: binding the index data; according to the index data, executing an atomic algorithm corresponding to each atomic anomaly; and outputting the instantiated solution.
In yet another implementation, the fault scenario includes one or more of the following types: resource type faults, success rate type faults, time delay type faults and performance type faults.
In yet another implementation, the atomic exceptions include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
The processor may be a CPU, a network processor NP, or a combination of the CPU and NP.
The processor may further include a hardware chip. The hardware chip may be an ASIC, PLD, or a combination thereof. The PLD may be a CPLD, an FPGA, a GAL, or any combination thereof.
The memory may include volatile memory, such as RAM; the memory may also include non-volatile memory, such as flash memory, a HDD or a SSD; the memory may also comprise a combination of memories of the kind described above.
According to the fault analysis device provided by the embodiment of the application, in the face of complex problems, deeper algorithm knowledge is not needed, the generated solution can be used for fault analysis, and various complex faults can be solved by adopting an artificial intelligence scheme.
Embodiments of the present application also provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the above method.
Embodiments of the present application also provide a computer program product containing instructions which, when run on a computer, cause the computer to perform the above-mentioned method.
In the present embodiment, "a plurality" means two or more, and in view of this, a plurality may also be understood as "at least two". "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" generally indicates that the preceding and following related objects are in an "or" relationship, unless otherwise specified.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

Claims (17)

1. A method of fault resolution, the method comprising:
receiving a solution matching request to resolve a failure, the solution matching request including one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception;
when the existing solutions are matched in the solution library, binding index data;
according to the index data and the atomic anomaly combinational logic defined by the existing solution, executing an atomic algorithm corresponding to each atomic anomaly in the atomic anomaly combinational logic;
and outputting the instantiated solution.
2. The method of claim 1, further comprising: storing the instantiated solution.
3. The method of claim 1, further comprising:
when the existing solution is not matched in the solution library, decomposing the fault into a plurality of atomic anomalies according to the data rule of the fault;
searching an atomic algorithm corresponding to each atomic anomaly;
and generating a solution of the fault according to the atomic algorithm corresponding to each atomic anomaly, wherein the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
4. The method of claim 3, further comprising:
instantiating the generated solution for the fault;
storing the instantiated solution.
5. The method of claim 4, wherein instantiating the generated solution to the fault comprises:
binding the index data;
according to the index data, executing an atomic algorithm corresponding to each atomic anomaly;
and outputting the instantiated solution.
6. The method according to any one of claims 1 to 5, wherein the fault scenario comprises one or more of the following types: resource type faults, success rate type faults, time delay type faults and performance type faults.
7. The method of any one of claims 1 to 5, wherein the atomic anomalies include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
8. A fault analysis device, characterized in that the device comprises:
a receiving unit, configured to receive a solution matching request for resolving a failure, where the solution matching request includes one or more of the following parameters: the name of a target system, the type of a fault scene and the type of atomic exception;
A first binding unit for binding the index data when the existing solution is matched in the solution library;
a first execution unit, configured to execute an atomic algorithm corresponding to each atomic exception in the atomic exception combinational logic according to the index data and the atomic exception combinational logic defined by the existing solution;
and the first output unit is used for outputting the instantiated solution.
9. The apparatus of claim 8, further comprising: a first storage unit for storing the instantiated solution.
10. The apparatus of claim 8, further comprising:
the decomposition unit is used for decomposing the fault into a plurality of atomic anomalies according to the data rule of the fault when the existing solution is not matched in the solution library;
the searching unit is used for searching the atomic algorithm corresponding to each atomic anomaly;
and the generating unit is used for generating a solution of the fault according to the atomic algorithm corresponding to each atomic anomaly, and the solution is a combination of the atomic algorithms corresponding to the atomic anomalies.
11. The apparatus of claim 10, further comprising:
an instantiation unit to instantiate the generated solution for the fault;
and the second storage unit is used for storing the instantiated solution.
12. The apparatus of claim 11, wherein the instantiation unit comprises:
a second binding unit for binding the index data;
the second execution unit is used for executing the atomic algorithm corresponding to each atomic anomaly according to the index data;
and the second output unit is used for outputting the instantiated solution.
13. The apparatus of any one of claims 8 to 12, wherein the fault scenario comprises one or more of the following types: resource type faults, success rate type faults, time delay type faults and performance type faults.
14. The apparatus of any one of claims 8 to 12, wherein the atomic anomalies include one or more of the following types: single index mutation, single index trend abnormality, single index periodic abnormality and multi-index correlation abnormality.
15. A fault analysis device, characterized in that the device comprises: an input device, an output device, a memory, and a processor; wherein the memory stores a set of program codes and the processor is configured to call the program codes stored in the memory to execute the method according to any one of claims 1 to 7.
16. A computer readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.
17. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.
CN201910329934.2A 2019-04-23 2019-04-23 Fault solving method and device Pending CN111859047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910329934.2A CN111859047A (en) 2019-04-23 2019-04-23 Fault solving method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910329934.2A CN111859047A (en) 2019-04-23 2019-04-23 Fault solving method and device

Publications (1)

Publication Number Publication Date
CN111859047A true CN111859047A (en) 2020-10-30

Family

ID=72951954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910329934.2A Pending CN111859047A (en) 2019-04-23 2019-04-23 Fault solving method and device

Country Status (1)

Country Link
CN (1) CN111859047A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330156A (en) * 2020-11-06 2021-02-05 联通(浙江)产业互联网有限公司 KPI management method, device, equipment and storage medium
CN112446511A (en) * 2020-11-20 2021-03-05 中国建设银行股份有限公司 Fault handling method, device, medium and equipment
CN113009896A (en) * 2021-03-09 2021-06-22 国能大渡河猴子岩发电有限公司 Production control method and system based on edge computing and cloud computing
CN114285721A (en) * 2021-11-02 2022-04-05 北京思特奇信息技术股份有限公司 Fault automatic diagnosis method and system
CN114693186A (en) * 2022-05-31 2022-07-01 广东电网有限责任公司佛山供电局 Method and system for analyzing and processing multiple fault events of differentiated combined type transformer substation
WO2022253054A1 (en) * 2021-05-31 2022-12-08 中兴通讯股份有限公司 Fault handling method and apparatus, and server and storage medium
CN114285721B (en) * 2021-11-02 2024-04-19 北京思特奇信息技术股份有限公司 Automatic fault diagnosis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103138960A (en) * 2011-11-24 2013-06-05 百度在线网络技术(北京)有限公司 Method and device for processing network failures
CN106341248A (en) * 2015-07-09 2017-01-18 阿里巴巴集团控股有限公司 Fault processing method and device based on cloud platform
US20170102982A1 (en) * 2015-10-13 2017-04-13 Honeywell International Inc. Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics
CN107291565A (en) * 2017-06-09 2017-10-24 千寻位置网络有限公司 O&M visualizes automated job platform and implementation method
CN107888397A (en) * 2016-09-30 2018-04-06 华为技术有限公司 The method and apparatus for determining fault type

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103138960A (en) * 2011-11-24 2013-06-05 百度在线网络技术(北京)有限公司 Method and device for processing network failures
CN106341248A (en) * 2015-07-09 2017-01-18 阿里巴巴集团控股有限公司 Fault processing method and device based on cloud platform
US20170102982A1 (en) * 2015-10-13 2017-04-13 Honeywell International Inc. Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics
CN107888397A (en) * 2016-09-30 2018-04-06 华为技术有限公司 The method and apparatus for determining fault type
CN107291565A (en) * 2017-06-09 2017-10-24 千寻位置网络有限公司 O&M visualizes automated job platform and implementation method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330156A (en) * 2020-11-06 2021-02-05 联通(浙江)产业互联网有限公司 KPI management method, device, equipment and storage medium
CN112330156B (en) * 2020-11-06 2024-04-09 联通(浙江)产业互联网有限公司 KPI management method, apparatus, device and storage medium
CN112446511A (en) * 2020-11-20 2021-03-05 中国建设银行股份有限公司 Fault handling method, device, medium and equipment
CN113009896A (en) * 2021-03-09 2021-06-22 国能大渡河猴子岩发电有限公司 Production control method and system based on edge computing and cloud computing
CN113009896B (en) * 2021-03-09 2022-05-24 国能大渡河猴子岩发电有限公司 Production control method and system based on edge computing and cloud computing
WO2022253054A1 (en) * 2021-05-31 2022-12-08 中兴通讯股份有限公司 Fault handling method and apparatus, and server and storage medium
CN114285721A (en) * 2021-11-02 2022-04-05 北京思特奇信息技术股份有限公司 Fault automatic diagnosis method and system
CN114285721B (en) * 2021-11-02 2024-04-19 北京思特奇信息技术股份有限公司 Automatic fault diagnosis method and system
CN114693186A (en) * 2022-05-31 2022-07-01 广东电网有限责任公司佛山供电局 Method and system for analyzing and processing multiple fault events of differentiated combined type transformer substation
CN114693186B (en) * 2022-05-31 2022-08-23 广东电网有限责任公司佛山供电局 Method and system for analyzing and processing multiple fault events of differentiated combined type transformer substation

Similar Documents

Publication Publication Date Title
CN111859047A (en) Fault solving method and device
US11126493B2 (en) Methods and systems for autonomous cloud application operations
US8098585B2 (en) Ranking the importance of alerts for problem determination in large systems
CN110928772B (en) Test method and device
US20150121136A1 (en) System and method for automatically managing fault events of data center
JP2018185808A (en) Apparatus for and method of testing smart agreement based on block chain
US11847130B2 (en) Extract, transform, load monitoring platform
CN1425234A (en) System and method for assessing security vulnerability of network using fuzzy logic rules
US20180026848A9 (en) Isolation of problems in a virtual environment
US20130132778A1 (en) Isolation of problems in a virtual environment
CN107168995B (en) Data processing method and server
CN110457175B (en) Service data processing method and device, electronic equipment and medium
US10942801B2 (en) Application performance management system with collective learning
CN106708738B (en) Software test defect prediction method and system
US20230033680A1 (en) Communication Network Performance and Fault Analysis Using Learning Models with Model Interpretation
US11704186B2 (en) Analysis of deep-level cause of fault of storage management
US9706005B2 (en) Providing automatable units for infrastructure support
Devine et al. Assessment and cross-product prediction of software product line quality: accounting for reuse across products, over multiple releases
CN111108481B (en) Fault analysis method and related equipment
CN114528175A (en) Micro-service application system root cause positioning method, device, medium and equipment
US20190354991A1 (en) System and method for managing service requests
CN117041029A (en) Network equipment fault processing method and device, electronic equipment and storage medium
Sapna et al. Clustering test cases to achieve effective test selection
Dhanalaxmi et al. A review on software fault detection and prevention mechanism in software development activities
CN114416573A (en) Defect analysis method, device, equipment and medium for application program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination