WO2022253054A1

WO2022253054A1 - Fault handling method and apparatus, and server and storage medium

Info

Publication number: WO2022253054A1
Application number: PCT/CN2022/094796
Authority: WO
Inventors: 蔡金玲
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-05-31
Filing date: 2022-05-24
Publication date: 2022-12-08
Also published as: CN115934451A

Abstract

The embodiments of the present application relate to the field of fault handling, and particularly to a fault handling method and apparatus, and a server and a storage medium. The fault handling method comprises: selecting a preset atomic service model or creating an atomic service model according to the handling status of the current fault, wherein the atomic service model is used for providing at least one type of atomic service process, and the atomic service process is used for performing one or any combination of the following handlings on a fault: monitoring, analysis, correction, and evaluation; creating a workflow according to the atomic service process in the atomic service model, wherein the workflow comprises a handling chain of the atomic service process; generating a fault model for the current fault, wherein the fault model comprises indication information of the workflow; and according to the workflow determined by the indication information, calling the atomic service process in the atomic service model, so as to handle the current fault.

Description

A fault handling method, device, server and storage medium

cross reference

This application is based on the Chinese patent application with the application number "202110600323.4" and the filing date is May 31, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference. Application.

technical field

The embodiments of the present application relate to the field of fault handling, and in particular, to a fault handling method, device, server, and storage medium.

Background technique

With the development of artificial intelligence, data mining, machine learning and other methods can automatically identify faults and give solutions, and can optimize and evaluate the network, greatly reducing the dependence on the experience of fault handlers and saving labor costs.

However, most of the current systems support preset fault scenarios, which cannot help new fault scenarios in the field. The conditions of each field are different, and the requirements for network performance are also different. At the same time, the system that supports modification only allows technicians to set the coding level for newly added scenarios. There are certain technical thresholds for custom troubleshooting scenarios and methods, and the efficiency of daily troubleshooting for users is low.

Contents of the invention

The purpose of the embodiments of the present application is to provide a fault handling method, device, server, and storage medium.

The embodiment of the present application provides a fault handling method, including the following steps: according to the processing status of the current fault, select a preset atomic service model or create an atomic service model; the atomic service model is used to provide at least one atomic service process, Among them, the atomic service process is used to process one of the following faults or any combination thereof: monitoring, analysis, correction, evaluation; create a workflow according to the atomic service process in the atomic service model; the workflow includes the processing chain of the atomic service process ; Generate the fault model to which the current fault belongs, and the fault model includes the indication information of the workflow; the workflow determined according to the indication information calls the atomic service process in the atomic service model to handle the current fault.

Embodiments of the present application also provide a fault handling device, including:

The atomic service module is used to select a preset atomic service model or create an atomic service model according to the processing status of the current fault; the atomic service model is used to provide at least one type of atomic service process, wherein the atomic service process uses One or any combination of the following processes for faults: monitoring, analysis, correction, evaluation; workflow module, used to create a workflow according to the atomic service process in the atomic service model; the workflow includes the The processing chain of the atomic service process; the fault scenario module, used to generate the fault model to which the current fault belongs, and the fault model includes the indication information of the workflow; the processing module, used to determine the workflow according to the indication information Invoking the atomic service process in the atomic service model to handle the current fault.

The embodiment of the present application also provides a server, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores the information executable by the at least one processor Instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the above fault handling method.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, and implementing the above fault handling method when the computer program is executed by a processor.

Description of drawings

One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute a limitation to the embodiments. Elements with the same reference numerals in the drawings represent similar elements. Unless otherwise stated, the drawings in the drawings are not limited to scale.

FIG. 1 is a flowchart of a fault handling method provided according to an embodiment of the present application;

FIG. 2 is a fault handling device provided according to an embodiment of the present application;

Fig. 3 is a server provided according to an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, various implementations of the present application will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present application, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.

The terms "first" and "second" in the embodiments of the present application are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present application, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a system, product or equipment comprising a series of components or units is not limited to the listed components or units, but optionally also includes components or units not listed, or optionally also includes Other parts or units inherent in equipment. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.

An embodiment of the present application relates to a fault handling method. The specific process is shown in Figure 1.

Step 101, according to the processing status of the current fault, select a preset atomic service model or create an atomic service model; the atomic service model is used to provide at least one type of atomic service process, wherein the atomic service process is used to handle the fault as follows One or any combination thereof: monitoring, analysis, correction, evaluation;

Step 102, create a workflow according to the atomic service process in the atomic service model; the workflow includes a processing chain of the atomic service process;

Step 103, generating a fault model to which the current fault belongs, where the fault model includes workflow indication information;

Step 104, calling the atomic service process in the atomic service model according to the workflow determined by the indication information to handle the current fault.

In this embodiment, first analyze the closed-loop fault resolution process, and split the fault closed-loop fault resolution process, for example, into process types such as fault identification and fault analysis; use the workflow model to assemble and call the atomic service model according to the actual situation; The types of faults in the model and the corresponding workflows are used to achieve the purpose of customizing and adding fault scenarios and solving them. Reduce the professional technical requirements for custom fault scenarios, enabling users to conveniently and flexibly increase or decrease the fault handling process individually, and improve the efficiency of fault handling.

The implementation details of a fault handling method in this embodiment are described in detail below, and the following content is only implementation details provided for easy understanding, and is not necessary for implementing this solution.

In step 101, according to the processing status of the current fault, select a preset atomic service model or create an atomic service model; the atomic service model is used to provide at least one type of atomic service process, wherein the atomic service process is used to perform the following faults One or any combination of processes: monitoring, analysis, correction, evaluation. In this step, firstly, according to the fault parameters of the current fault, an atomic service model that can be used to process the current fault parameters is searched in the preset atomic service models. If there is no atomic service model for the current failure parameters in the preset atomic service models, you need to create an atomic service model; if there is an atomic service model for the current failure parameters, just call it.

In one example, each atomic service step in the atomic service model can be classified into different types of atomic service processes, such as fault monitoring, fault analysis, fault correction and fault evaluation. Specifically, the fault monitoring process, for example, monitors the operation of the external field according to the fault definition information, wherein the fault definition information can be obtained from the fault model or preset in the fault monitoring process; the fault analysis process, for example, monitors the monitored fault Carry out fault analysis, get the cause of the fault and give automatic or manual solution suggestions; fault correction process, for example, correct the fault according to the fault solution suggestions obtained from fault analysis; fault assessment process, for example, use various related fault Indicator data is used to evaluate the effect of fault correction, and the closed loop confirms whether the fault has been recovered.

In a specific implementation, the main data characteristics of the atomic service model (atomicService) may include: the name of the atomic service (name), which can briefly describe the function of the atomic service; the process of fault handling (faultStep), such as fault monitoring, fault analysis , fault correction, fault evaluation, etc.; reference measures (invokeMethod), used to identify and call the corresponding program segment to realize the function of the current atomic service model, which may include descriptions such as uniform resource locator (URL) prefixes Information on how to invoke the functionality implementing this atomic service.

In this step, first confirm the atomic service model, including selecting a preset model or adding a new one; as the basis for handling the current fault.

In step 102, according to the atomic service process in the atomic service model, a workflow is created; the workflow includes the processing chain of the atomic service process;

After determining the atomic service process for handling the current fault, it is necessary to determine the order of the execution steps of each determined atomic service process to complete the processing flow for the current fault, making the framework planning for fault processing more flexible. Wherein, the workflow, for example, needs to execute the fault assessment process after the fault correction process is performed, so as to feedback the validity of the processing result in a closed loop.

In a specific embodiment, the main data characteristics of a workflow (workflow) may include: a workflow name (workflowName), which is unique; a processing chain (atomicServiceList) of an atomic service process, which is used to describe the workflow Execution link information, that is, the corresponding data information for invoking the atomic service function.

This step establishes a workflow, which is used to define the execution sequence and execution logic of the atomic service process, so that it can perform corresponding personalized solutions to the current fault.

In step 103, a fault model to which the current fault belongs is generated, and the fault model includes indication information of the workflow, and the indication information may be a name of the workflow. In a specific embodiment, the main data characteristics of the fault model (faultScene) include: the name of the current fault; the type of fault (faultSceneType), such as network fault class, alarm class, transmission class, etc.; fault index information (sceneDefinition), For example, the observed indicators and threshold conditions; the name of the workflow data (workflowName) that needs to be called, etc.

In one example, generating the fault model to which the current fault belongs includes: if the fault parameters of the current fault are of the same type as the fault parameters of the first fault model in the preset fault models, copying the first fault model, and according to The fault parameters of the current fault modify the copied first fault model to generate the fault model to which the current fault belongs; failure model. For example, according to the current fault parameters and the types of fault parameters in each preset fault model, it is judged that a fault model needs to be added or modified, for example: the fault parameters in the first fault model in the preset fault models are a, b, c ; The fault parameters of the current fault are b, c, d, wherein, the parameters a, b, c, d all belong to the parameters of network congestion; that is, although the current fault cannot be completely processed according to the first fault model, but because the fault parameters are of the same type, In fact, the fault model for the current fault can be obtained with a slight modification. It does not need to be completely new, reducing the data processing and calculation process. In addition, there are preset categories of fault parameters in the system to support this step.

In another example, the fault model also includes: execution mode (executeMode), such as immediate execution, timing execution, or periodic execution, etc., to personalize the fault processing time period, so that the effect of fault processing is better or better Meet the needs of users for troubleshooting.

In one step, creating an atomic service model includes: selecting a template of the atomic service model; and creating the atomic service model according to fault parameters of the current fault and the selected template of the atomic service model. That is, there are preset atomic service model templates. When an atomic model needs to be created, select the preset atomic service model template to create a new one, which can ensure that the necessary data information for this execution process is obtained, and the newly created atomic service model The service model can be put into use. Specifically, an atomic service model is created according to the template of the selected atomic service model and combined with fault parameters that need to be processed for the current fault. For example, user instructions are received, and data required in the template are adaptively modified according to fault parameters of the current fault.

In step 104, the workflow determined according to the indication information invokes the atomic service process in the atomic service model to handle the current fault. Specifically, since the current fault model is generated according to the current fault, the fault model can be called to deal with the current fault in a targeted manner; according to the indication information of the workflow in the fault model, point to the corresponding workflow, and according to the corresponding workflow In the atomic service process, call the corresponding atomic service process to handle the current fault.

In one example, there is a data model (data) used to define the input and output data types of each execution process when creating atomic service models and creating workflow information, so as to ensure the validity of output results for users; output validity is for example user Visible or able to be used for analysis, without a series of useless codewords that cannot be processed or recognized, etc. That is, limit the range of data types during the execution of this method, and only respond to corresponding data types, so as to ensure that this method can be implemented smoothly. Specifically, the main data characteristics of the data model (data) can include: input data form (inputData), for example, the input of the fault overview step can be the entire network, grouping, or a list of designated network elements/cells; output data form (outputData), For example, the output data of fault analysis includes a list of poor-quality cells, reasons for poor quality, and proposed operation suggestions.

In an example, after invoking the atomic service process in the atomic service model according to the workflow determined by the instruction information to handle the current fault, it also includes: determining the format of the corresponding output data according to the atomic service process; according to the format of the output data Output the result of processing the current failure. In addition, the output data formats of different atomic service processes are different. Specifically, the main data characteristics of the data model (data) can also include: atomic service process (faultStep), the data types in different atomic service processes are different, and the results of each process can be distinguished, making the execution process more clear and stable, and reducing due to data failure. The execution sequence error caused by the format; the fault type (faultSceneType), in some specific cases, the output data format may be inconsistent when belonging to the same atomic service process but different fault types, for example, the fault evaluation output of the network fault class is the network indicator Monitoring results, the output result of the fault evaluation of the alarm class is the alarm recovery situation.

In one embodiment, for the convenience of users, the above steps are divided into units. For example: there is an interface management unit, which provides users with a visual entry for operation and viewing; there is a model management unit, which performs addition, deletion, modification, and query operations on the above models according to user operations and stores them accordingly; there is a data storage unit, which uses a relational database or disk The file stores the execution result files of each fault scenario; there is an application running unit, and after the user triggers the execution of the scenario through the interface or periodically, it first reads the corresponding workflow data from the model management unit, and then executes the atomic services configured in the workflow data in sequence Processes enable the identification, analysis, correction and/or evaluation of faults. In this way, it is more convenient for users to modify the fault scenario solutions automatically generated by the system through the interface, for example, the fault scenario execution mode of the newly added fault scenario solution, the threshold definition of the fault scenario, etc. can be modified.

Take an actual execution process as an example. When the success rate of Evolved Radio Access Bearer (E-RAB) establishment in the field is declining, it is detected that the scene preset by the system cannot solve the problem of the decline in the establishment success rate of E-RAB. Therefore, a new Add a troubleshooting process for the problem of the decline in the success rate of E-RAB establishment. Specifically: users add atomic service information according to the preset atomic service model framework, including the input and output of atomic services, function realization, etc.; add workflow information according to the workflow model framework, including atomic services for each step of analyzing faults call logic and rules, and create a new scenario model to correspond to the workflow. The scenario model information includes the scenario name, the indicator formula required by the scenario, and fault type information; at the same time, set the start time of the new fault scenario, which can be executed immediately Or execute it regularly. After the scene is added successfully, the fault can be closed-loop handled according to the new fault handling process.

In another actual execution process, the fault model of "wireless disconnection rate" in the category of dropped calls has been preset, and the "Flow disconnection rate" index of the outfield also belongs to the category of dropped calls suddenly continues to deteriorate, of which "wireless disconnected rate ” and “Flow drop rate” scenarios all belong to the dropped call category. Through analysis, the system found that the existing fault model found that the fault model of "wireless disconnection rate", the workflow model, the atomic model, and the data model can meet the closed-loop processing of the newly added "Flow disconnection rate" fault, that is, in addition to the name and other definitions and In addition to the different information that cannot be changed, the adjustable parameters are processed according to the process corresponding to the "wireless disconnection rate", which can repair the abnormal data in the "Flow disconnection rate", so it will automatically follow the solution of the "wireless disconnection rate" , after modifying the unchangeable information such as the name, generate a "Flow dropout rate" solution to close-loop handle the new "Flow dropout rate" fault.

In addition, after obtaining the fault model, you can view the monitoring information of this fault through the visual interface, such as the TOP poor cell (that is, the cell with the most serious fault), the indicator trend graph of a single cell, etc.; if the fault analysis is triggered, you can also use the fault analysis The process checks the root cause of the fault and the recommended solution to the root cause, triggers the fault monitoring or fault evaluation process, and automatically monitors fault indicators and field business conditions to monitor and evaluate fault resolution.

In this embodiment, the steps of fault handling are organized and modularized planning and management are carried out, the closed-loop fault resolution process is analyzed, and it is divided into fault identification, fault analysis, and other process types, and different solutions are provided for each process ;Specify the input and output data types of each process through the data model to achieve the goal of being able to customize the addition or replacement of execution steps; assemble and call the atomic service model according to the actual situation through the workflow; through the user-defined fault information in the fault model And the corresponding workflow to achieve the purpose of customizing and adding fault scenarios and solving them. Reduce the professional technical requirements for custom fault scenarios, enabling users to conveniently and flexibly increase or decrease the fault handling process individually, and improve the efficiency of fault handling.

The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

An embodiment of the present application relates to a fault handling device, as shown in FIG. 2 , including:

The atomic service module 201 is configured to select a preset atomic service model or create an atomic service model according to the processing status of the current fault; the atomic service model is used to provide at least one type of atomic service process, wherein the atomic service process To perform one or any combination of the following processes on a failure: monitoring, analysis, correction, evaluation;

The workflow module 202 is configured to create a workflow according to the atomic service process in the atomic service model; the workflow includes a processing chain of the atomic service process;

A fault scenario module 203, configured to generate a fault model to which the current fault belongs, where the fault model includes indication information of the workflow;

The processing module 204 is configured to call the atomic service process in the atomic service model according to the workflow determined by the indication information to process the current fault.

In the atomic service module 201, the creation of the atomic service model includes: selecting the template of the atomic service model; and creating the atomic service model according to the fault parameters of the current fault and the selected template of the atomic service model.

In the failure scenario module 203, generating the failure model to which the current failure belongs includes: if the failure parameters of the current failure are of the same type as the failure parameters of the first failure model among the preset failure models, copying the first failure model, And modify the copied first fault model according to the fault parameters of the current fault to generate the fault model to which the current fault belongs; The fault model to which the fault belongs.

In an example, the fault model to which the current fault belongs includes one of the following or any combination thereof: a name of the current fault, an index of the current fault, and type information of the current fault.

In another example, the fault model to which the current fault belongs further includes: an execution mode for processing the current fault, wherein the execution mode includes: immediate execution, timing execution or periodic execution.

In the processing module 204, after invoking the atomic service process in the atomic service model according to the workflow determined by the instruction information to handle the current fault, it also includes: determining the format of the corresponding output data according to the atomic service process; Output the result of processing the current failure.

In addition, the output data formats of different atomic service processes are different.

In the implementation of this application, analyze the closed-loop fault resolution process, split it into fault identification, fault analysis, and other process types; assemble and call the atomic service model according to the actual situation through the workflow; through the fault type in the fault model and the corresponding work flow to achieve the purpose of customizing and adding fault scenarios and solving them. Reduce the professional technical requirements for custom fault scenarios, enabling users to conveniently and flexibly increase or decrease the fault handling process individually, and improve the efficiency of fault handling.

It is not difficult to find that this embodiment is a system embodiment corresponding to the above embodiment, and this embodiment can be implemented in cooperation with the above embodiment. The relevant technical details mentioned in the foregoing implementation manners are still valid in this implementation manner, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner may also be applied in the foregoing implementation manners.

It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problems proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

An embodiment of the present application relates to a server, as shown in FIG. 3 , including: at least one processor 301; and,

A memory 302 communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above fault handling method.

Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.

The first embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.

That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Those of ordinary skill in the art can understand that the above-mentioned implementation modes are specific examples for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.

Claims

A method for troubleshooting, including:

According to the processing status of the current fault, select a preset atomic service model or create an atomic service model; the atomic service model is used to provide at least one type of atomic service process, wherein the atomic service process is used to perform the following processing on the fault One or any combination of: monitoring, analysis, correction, evaluation;

Create a workflow according to the atomic service process in the atomic service model; the workflow includes a processing chain of the atomic service process;

generating a fault model to which the current fault belongs, where the fault model includes indication information of the workflow;

The workflow determined according to the indication information invokes the atomic service process in the atomic service model to handle the current fault.
The fault handling method according to claim 1, wherein said generating the fault model to which the current fault belongs comprises:

If the fault parameters of the current fault are of the same type as the fault parameters of the first fault model among the preset fault models, then copy the first fault model, and perform all copied fault parameters according to the fault parameters of the current fault Modifying the first fault model to generate the fault model to which the current fault belongs;

If the types of the fault parameters of the current fault are different from those of the preset fault models, the fault model to which the current fault belongs is created.
The fault handling method according to claim 1 or 2, wherein the fault model to which the current fault belongs includes one of the following or any combination thereof: the name of the current fault, the index of the current fault, the current Information about the type of failure.
The fault handling method according to claim 3, wherein the fault model to which the current fault belongs further comprises:

An execution mode for processing the current fault, wherein the execution mode includes: immediate execution, timing execution or periodic execution.
The fault handling method according to any one of claims 1 to 4, wherein after the workflow determined according to the indication information invokes the atomic service process in the atomic service model to handle the current fault, Also includes:

According to the atomic service process, determine the format of the corresponding output data;

Outputting the result of processing the current fault according to the format of the output data.
The fault handling method according to any one of claims 1 to 5, wherein the formats of the output data of the different atomic service processes are different.
The fault handling method according to any one of claims 1 to 6, wherein said creating an atomic service model includes:

selecting a template of the atomic service model;

An atomic service model is created according to the fault parameters of the current fault and the template of the selected atomic service model.
A fault handling device, comprising:

The atomic service module is used to select a preset atomic service model or create an atomic service model according to the processing status of the current fault; the atomic service model is used to provide at least one type of atomic service process, wherein the atomic service process uses To perform one or any combination of the following actions on the failure: monitoring, analysis, correction, evaluation;

A workflow module, configured to create a workflow according to the atomic service process in the atomic service model; the workflow includes a processing chain of the atomic service process;

A fault scenario module, configured to generate a fault model to which the current fault belongs, where the fault model includes indication information of the workflow;

A processing module, configured to call the atomic service process in the atomic service model according to the workflow determined by the indication information to process the current fault.
A server comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory is stored with instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 7 troubleshooting method.
A computer-readable storage medium storing a computer program, which implements the fault handling method according to any one of claims 1 to 7 when the computer program is executed by a processor.