WO2023109437A1 - 一种故障处理方法及装置、计算机可读存储介质 - Google Patents

一种故障处理方法及装置、计算机可读存储介质 Download PDF

Info

Publication number
WO2023109437A1
WO2023109437A1 PCT/CN2022/133149 CN2022133149W WO2023109437A1 WO 2023109437 A1 WO2023109437 A1 WO 2023109437A1 CN 2022133149 W CN2022133149 W CN 2022133149W WO 2023109437 A1 WO2023109437 A1 WO 2023109437A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
information
notification information
processing
derivation
Prior art date
Application number
PCT/CN2022/133149
Other languages
English (en)
French (fr)
Inventor
何威
谢洁意
闫兴安
柳圆圆
曹彬
魏志芯
Original Assignee
中移(苏州)软件技术有限公司
中国移动通信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中移(苏州)软件技术有限公司, 中国移动通信集团有限公司 filed Critical 中移(苏州)软件技术有限公司
Publication of WO2023109437A1 publication Critical patent/WO2023109437A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Definitions

  • the present invention is based on a Chinese patent application with application number 202111547268.3 and a filing date of December 16, 2021, and claims the priority of this Chinese patent application.
  • the entire content of this Chinese patent application is hereby incorporated by reference.
  • the present invention relates to the technical field of communications, and in particular to a fault handling method and device, and a computer-readable storage medium.
  • SDWAN Software-Defined Wide Area Network
  • SDN Software Defined Network
  • the entire workflow includes: user order, device configuration, billing and completion. Among them, because when a node in the workflow fails, it is necessary to manually locate the faulty node and manually resolve the fault. Therefore, how to determine the artificial fault node and solve the fault becomes a problem to be solved.
  • Embodiments of the present invention provide a fault handling method and device, and a computer-readable storage medium, capable of automatically locating fault nodes and processing faults, achieving the purpose of fault self-healing, and improving fault processing efficiency.
  • An embodiment of the present invention provides a fault handling method, the method includes:
  • the topology represents the workflow of the service provided by the software-defined wide area network
  • the fault corresponding to the fault information is processed according to the processing method, and a processing result is obtained.
  • the corresponding fault information is deduced based on the fault notification information, topology structure and derivation rule base, including:
  • the fault information is derived based on the fault node and the derivation rule base.
  • the determination of the corresponding fault node according to the fault notification information and the topology structure includes:
  • the derivation of the fault information based on the fault node and the derivation rule base includes:
  • the fault information is derived according to the target rule and the fault notification information.
  • the acquisition of fault notification information includes:
  • the notification information includes the fault notification information and warning information
  • the method also includes:
  • the derivation rule base is updated according to the optimization rules.
  • the method also includes:
  • the derivation process of the fault information and the processing process of the fault information are displayed.
  • the method also includes:
  • An embodiment of the present invention provides a fault processing device, including an acquisition part, a derivation part, a call part and a processing part; wherein,
  • the obtaining part is configured to obtain fault notification information
  • the derivation part is configured to determine corresponding fault information based on the fault notification information, topology structure and derivation rule base; wherein the topology structure represents the workflow of the service provided by the software-defined wide area network;
  • the calling part is configured to call the corresponding processing method from the fault library based on the fault information
  • the processing part is configured to process the fault corresponding to the fault information according to the processing method, and obtain a processing result.
  • the derivation part is further configured to determine the corresponding fault node according to the fault notification information and the topology structure; and deduce the fault node based on the fault node and the derivation rule base. information.
  • the derivation part is further configured to graphically process the fault notification information to obtain a corresponding graph structure; associate the graph structure with the topology structure to determine the fault node .
  • the derivation part is further configured to match the corresponding target rule in the derivation rule base according to the fault node; and deduce the fault notification information according to the target rule and the fault notification information. information.
  • the acquisition part is further configured to collect notification information; wherein, the notification information includes the fault notification information and warning information; and the warning information is formatted to obtain the fault notification information .
  • the device also includes an iterative optimization part and an update part, wherein:
  • the iterative optimization part is configured to perform iterative optimization processing based on the fault notification information, the target rule, the fault information, the processing method and the processing result to obtain an optimization rule;
  • the update part is configured to update the derivation rule base according to the optimization rules.
  • the device further includes a display part, wherein:
  • the display part is configured to display the derivation process of the fault information and the processing process of the fault information.
  • the device also includes a feedback part, wherein:
  • the feedback part is configured to feed back the processing result.
  • An embodiment of the present invention provides a fault handling device, including:
  • the processor is configured to implement the fault handling method provided by the embodiment of the present invention when executing the executable instructions stored in the memory.
  • An embodiment of the present invention provides a computer-readable storage medium, which stores executable instructions for causing a processor to implement the fault handling method provided by the embodiment of the present invention.
  • Embodiments of the present invention provide a fault handling method and device, and a computer-readable storage medium.
  • the method includes: acquiring fault notification information; deriving corresponding fault information based on the fault notification information, topology structure, and derivation rule base; wherein, The topology represents the workflow of the service provided by the software-defined wide area network; based on the fault information, the corresponding processing method is called from the fault database; according to the processing method, the fault corresponding to the fault information is processed, and the processing result is obtained.
  • the fault notification information is obtained, and then the fault notification information is processed, and the fault node is obtained in combination with the topology structure corresponding to the workflow node; the fault node is deduced according to the fault node and the pre-stored derivation rules The cause of the failure, that is, the failure information.
  • the corresponding processing method is called from the failure database to process the failure corresponding to the failure information and obtain the processing result.
  • the embodiment of the present invention can automatically obtain fault notification information when the workflow is interrupted, and automatically complete fault node location, fault information derivation, and fault processing according to the fault notification information, thereby realizing automatic fault location and processing, and achieving fault
  • the purpose of self-healing is to improve the efficiency of fault handling and reduce the impact of faults on workflow.
  • FIG. 1 is a schematic flow diagram of an optional method provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a fault handling device applied to SDWAN provided by an embodiment of the present invention
  • FIG. 3 is a second schematic flow diagram of an optional method provided by an embodiment of the present invention.
  • Fig. 4 is a structural schematic diagram 1 of a fault processing device provided by an embodiment of the present invention.
  • FIG. 5 is a third schematic flow diagram of an optional method provided by an embodiment of the present invention.
  • FIG. 6 is a fourth schematic flow diagram of an optional method provided by an embodiment of the present invention.
  • Fig. 7a is a schematic flow diagram 5 of an optional method provided by the embodiment of the present invention.
  • Fig. 7b is a structural schematic diagram II of a fault handling device provided by an embodiment of the present invention.
  • Fig. 8 is a structural schematic diagram III of a fault handling device provided by an embodiment of the present invention.
  • FIG. 9 is a fourth structural schematic diagram of a fault handling device provided by an embodiment of the present invention.
  • Software-Defined Wide Area Network (Software-Defined WAN, SDWAN): It is a service formed by applying software-defined network technology to wide area network scenarios.
  • Software Defined Network (Software Defined Network, SDN): It is an innovative network architecture used to realize network virtualization; by separating the control plane of network equipment from the data plane, it realizes flexible control of network traffic, making The network becomes smarter as a conduit.
  • Fig. 1 is a schematic flow diagram of an optional method provided by the embodiment of the present invention. As shown in Fig. 1, the embodiment of the present invention provides a fault handling method, including:
  • the device acquires fault notification information when a workflow node fails.
  • the device when a workflow node fails, the device will collect notification information.
  • the notification information may include fault notification information and warning information.
  • the fault information of the workflow node is recorded in the fault notification information.
  • the device can obtain information such as the fault content of the workflow node through processing the fault notification information, and the warning information It is used to remind the user that the current workflow node fails, and it acts as a warning.
  • the device after the device collects the notification information, it needs to perform data processing on the notification information, so as to obtain the fault notification information.
  • the device can transmit the notification information to the data module that performs data processing on the notification information through a message channel such as a message queue (Message Queue, MQ), and performs data processing through the above data module.
  • a message channel such as a message queue (Message Queue, MQ)
  • MQ message Queue
  • the data module can format the notification information, and format the warning information in the notification information.
  • the format of the data during formatting is as follows:
  • FIG. 2 is a schematic structural diagram of a fault handling device applied to SDWAN provided by an embodiment of the present invention, as shown in FIG. 2 , wherein SDWAN divides the entire process of SDWAN services into: user Place order, configure device, bill and deliver. After the user places an order, the configuration device will provide the user with the required services; among them, the configuration device can be composed of multiple parts, such as: Customer Premise Equipment (CPE), Point-of-Presence (PoP) ), VC/PE and controller etc. Each of the above multiple parts constituting the configuration device is a workflow node.
  • CPE Customer Premise Equipment
  • PoP Point-of-Presence
  • VC/PE Video Coding Protocol
  • controller Controller
  • the device may include a fault collection module 1, a data module 2, a graph module 3, an inference engine 4, a rule module 5, a machine learning module 7, an action module 6, and a UI module 8.
  • a workflow node fails , the device can handle the above faults through S1-S8, as follows:
  • the fault collection module 1 collects faults to obtain notification information, and sends the notification information to the data module 2 .
  • the data module 2 performs data processing on the notification information, obtains the fault notification information, and sends the fault notification information to the graph module 3 .
  • the graph module 3 performs graph algorithm processing on the fault notification information to obtain graph data (fault node information), and sends the graph data to the reasoning engine 4 .
  • the inference engine 4 invokes the derivation rules stored in the rule module 5, and deduces the fault information by combining the graphic data and the fault notification information.
  • the reasoning engine 4 calls an action (processing method) from the action module 6 according to the fault information and graphic data, and processes the fault corresponding to the fault information through the action to obtain a corresponding processing result.
  • the reasoning engine 4 sends the processing result to the machine learning module 7; the processing result carries fault notification information, graphic data, rules, fault information and action.
  • the machine learning module 7 iteratively optimizes the processing results to obtain an optimization result, and configures rules for the rule module 5 through the optimization results.
  • the UI module 8 displays the processing procedures of the rule module 6, the action module 6 and the reasoning engine 4 at the front end.
  • the device performs data processing on the collected notification information to obtain fault notification information, which improves the convenience of subsequent processing of the information collected by the device to obtain fault information, and improves work efficiency.
  • S101 may also include S1011-S1012, as follows:
  • S1011 Collect notification information; wherein, the notification information includes fault notification information and warning information.
  • the device when a workflow node fails, collects failure notification information and warning information, so as to obtain notification information.
  • the failure situation of the currently failed workflow node can be obtained to provide data support for subsequent handling of the failure.
  • the device formats the warning information in the notification information, so as to obtain the failure notification information.
  • the fault notification information will be obtained, which improves the convenience of subsequent processing of the information collected by the device to obtain fault information, and improves work efficiency.
  • the device determines the fault node based on the fault notification information and the topology structure, and then derives the corresponding fault information based on the derivation rule base.
  • the device constructs and stores the topology structure using the point-line relationship according to the workflow nodes; then graphically processes the fault notification information data and obtains the corresponding graph structure, searches the topology structure through the graph structure, and Get search results.
  • the workflow node corresponding to the search result in the topology structure is the failure node.
  • the device may perform data graphical processing on the fault notification information through a graphical algorithm, so as to obtain the above graph structure.
  • the device after the device completes the data graphical processing of the fault notification information and obtains the faulty node, it can call the derivation rule from the derivation rule library, and match the derivation rule with the faulty node until the faulty node is successfully matched.
  • the derivation rule of is used as the target rule, and the cause of the failure of the above-mentioned fault node is deduced according to the target rule, that is, the corresponding fault information is deduced.
  • the device can perform data graphical processing on the fault notification information through the graph module, and after obtaining the fault node, the graph module can send a derivation request to the reasoning engine in the device.
  • the inference engine receives the derivation request, it will respond to the above derivation request, call the above faulty node from the graph module, and search the rule module in the device at the same time, so as to obtain the matching of the above faulty node from the derivation rule base of the rule module target rule.
  • the derivation request can carry the fault node, and can also carry the graph structure, so that the deduction engine does not need to call the fault node from the graph module after receiving the derivation request.
  • the device derives corresponding fault information through target rules and fault notification information.
  • the device may substitute the failure notification information into the target rule, so as to deduce the cause of the failure of the failure node, and finally obtain the corresponding failure information.
  • the derivation rules are stored in the derivation rule base, and the data format in the derivation rule base is as follows:
  • the faulty node in the workflow is determined by processing the fault notification information, so as to obtain the faulty node, and then according to the target rule matching the faulty node, the corresponding fault of the faulty node can be deduced
  • the purpose of fault self-healing can be achieved by processing the faults corresponding to the above fault information; and the use of graph algorithms can speed up the efficiency of fault derivation.
  • S102 may also include S1021-S1022, as follows:
  • the device determines the fault node in the workflow according to the fault notification information and the topology structure.
  • the topology can be pre-stored in the device, or can be constructed by the device according to the workflow nodes through the point-line relationship when needed.
  • the faulty workflow node in the workflow is determined, thereby obtaining the faulty node, which is convenient for determining the fault information.
  • S1021 may also include S10211-S10212, as follows:
  • the device performs data graphical processing on the fault notification information, so as to obtain a corresponding graph structure.
  • the device converts the fault notification information into a graph structure through data graphical processing; wherein, the graph structure may be composed of dotted lines, that is, the above graph structure can represent the point-line relationship corresponding to the fault notification information.
  • the faulty workflow node in the workflow is determined according to the association between the graph structure and the topology structure, so as to obtain the faulty node, which is convenient for determining the fault information.
  • the device determines the faulty node by associating the graph structure with the topology structure.
  • the device associates the graph structure with the topology structure, and according to the association, searches for the part of the topology structure with the highest degree of correlation with the graph structure, and the corresponding workflow node of the above part is the failure node.
  • the faulty workflow node in the workflow is determined, thereby obtaining the faulty node, which is convenient for determining the fault information.
  • the device determines a target rule matching the faulty node from the derivation rule base according to the faulty node, and derives the fault information based on the target rule.
  • the purpose of fault self-healing can be achieved by processing the fault corresponding to the above fault information.
  • S1022 may also include S10221-S10222, as follows:
  • the device matches the derivation rules in the derivation rule base with the faulty node, so as to obtain the target rule.
  • the device matches the derivation rules in the derivation rule base with the faulty node, if the matching result is that the first derivation rule matches the faulty node, then the first derivation rule is the target rule.
  • the first derivation rule is any derivation rule in the derivation rule base.
  • each fault node may correspond to one or more derivation rules, and each derivation rule in the derivation rule base corresponds to a kind of fault information.
  • the fault information can be obtained according to the target rule, and then the fault corresponding to the fault information can be processed, which improves the purpose of processing the fault.
  • the device derives the fault information according to the target rule obtained in S10221 and in combination with the fault notification information.
  • the cause of the fault can be known through the fault information, and then the fault corresponding to the fault information can be processed, which improves the purpose of processing the fault.
  • the device invokes the corresponding processing method from the fault database based on the fault information obtained in S102.
  • the device can call the processing method from the fault library of the action module in the device.
  • the processing method in the fault library, and/or, the fault library in the action module can be customized and added to the device through a plug-in (plugin), can also be pre-stored in the device, and can also be added by the user.
  • plugin plug-in
  • a new processing method or fault library is generated and added to the device.
  • the device can call the processing method from the fault library, not only based on the fault information, but also combined with the fault node or fault notification information, so as to improve the accuracy of the calling processing method.
  • invoking the corresponding processing method in the fault library provides a basis for subsequent processing of faults to realize fault self-healing.
  • the device processes the fault corresponding to the fault information according to the processing method called in S103, and obtains a corresponding processing result.
  • the device when the processing result is that the fault has been resolved, the device can feed back the processing result so that the business can continue.
  • the reasoning engine may call the processing method to process the fault corresponding to the fault information and obtain the processing result. Afterwards, the reasoning engine sends the processing result to the controller in the device, and the controller sends an instruction to make the workflow node continue the process; it can also send the processing result to the faulty workflow node to notify the workflow node to continue the process .
  • the device may repeat S102-S104 until the processing result is that the fault is resolved.
  • the device can also send a prompt message to notify the user to manually handle the fault corresponding to the fault information.
  • the device when the processing result is that the fault is not resolved, can send the fault information, fault notification information, fault node, and processing method to the machine learning module in the device, and iteratively optimize the processing method through the machine learning module , to obtain a new processing method, and update the new processing method to the fault database, and use the new processing method to process the fault corresponding to the above fault information.
  • the embodiment of the present invention can automatically locate the fault node, and process the fault, realize the self-healing of the fault, and improve the fault processing efficiency.
  • Fig. 3 is a second schematic flow diagram of an optional method provided by the embodiment of the present invention.
  • the embodiment of the present invention provides a fault handling method, and include:
  • the device displays the derivation process of the fault information and the processing process of the fault information.
  • the device displays the derivation process of the fault and the processing process of the fault information through the UI module in the device.
  • the derivation process of the fault is S101-S103
  • the processing process of the fault information is S104; since S101-S104 are all described above, so they will not be repeated here.
  • FIG. 4 is a structural schematic diagram 1 of a fault handling device provided in an embodiment of the present invention.
  • the UI module 41 and action module 42, derivation engine 43 and rule The module 44 establishes a communication connection, so that the derivation rules in the derivation rule library of the rule module 44 can be configured through the UI module 41 , and the processing method called in the action module 42 can also be configured.
  • the device can graphically display the derivation process of the fault information by the derivation engine 43 and the processing process of the fault information, that is, display the fault self-healing process.
  • the content displayed by the device may also include processing results
  • the UI module in the device may be a display screen that supports interaction, such as a touch screen.
  • the device displays the fault self-healing process, which is convenient for the user to understand the working condition of the current workflow.
  • Fig. 5 is a third schematic flow diagram of an optional method provided by the embodiment of the present invention.
  • the embodiment of the present invention provides a fault handling method, and include:
  • the device after the device obtains the processing result, it feeds back the processing result, so that the workflow can continue.
  • the derivation engine may send the processing result to the controller of the device or the faulty workflow node, so as to complete the feedback of the processing result.
  • Fig. 6 is a schematic diagram 4 of an optional method flow diagram provided by the embodiment of the present invention.
  • the embodiment of the present invention provides a fault handling method, and include:
  • iterative optimization processing is performed on the faults corresponding to the fault information to obtain optimization rules.
  • the device may perform iterative optimization processing through a machine learning module in the device.
  • the iterative optimization processing refers to processing the fault corresponding to the fault information to obtain a more accurate derivation rule, that is, an optimization rule.
  • the machine learning module can also iteratively optimize the processing of the faults corresponding to the fault information, obtain a more optimized processing method, and update the fault database.
  • the machine learning module iteratively optimizes the fault notification information, target rules, fault information, processing methods and processing results according to the preset algorithm.
  • the preset algorithm can be added to the machine learning module through custom configuration or plug-in.
  • the device can optimize and update the derivation rule base, thereby improving the success rate of fault self-healing.
  • the device updates the derivation rules in the derivation rule base according to the optimization rules.
  • the device can use the optimization rule to overwrite the corresponding derivation rule in the derivation rule base to complete the update; it can also save the optimization rule to the corresponding derivation rule base to complete the update.
  • the device can optimize and update the derivation rule base, thereby improving the success rate of fault self-healing.
  • Fig. 7a is a schematic flowchart of an optional method provided by an embodiment of the present invention. As shown in Fig. 7a, a fault handling method provided by an embodiment of the present invention includes:
  • S301 Collect faults and obtain notification information.
  • Fig. 7b is a structural schematic diagram II of a fault processing device provided by the embodiment of the present invention.
  • the fault processing device provided by the embodiment of the present invention includes a fault acquisition module 71, data Module 72, graph module 73, derivation engine 74, action module 75, rule module 76, workflow node 77, machine learning module 78.
  • S301-S308 can be executed by the fault handling device provided by the embodiment of the present invention.
  • the failure collection module 71 establishes a communication connection with the data module 72, and when the failure collection module 71 collects the notification information, it sends the notification information to the data module 72; the data module 72 establishes a communication connection with the graph module 73, and when the data module 72 After receiving the notification information, after the notification information is processed to obtain the failure notification information, the failure notification information is sent to the graph module 73; the graph module 73 establishes a communication connection with the derivation engine 74, and when the graph module 73 receives the failure notification information Finally, the failure notification information is processed digitally, thereby determining the failure node of the failure in the workflow, and then notifies the derivation engine 74, and informs the derivation engine 74 that the failure node has been determined; the derivation engine 74 communicates with the action module 75, the rule module 76, The workflow node 77 and the machine learning module 78 all establish a communication connection.
  • the derivation engine 74 After the derivation engine 74 receives the notification from the graph module 73, it will call the faulty node from the graph module 73, and search for the derivation rule corresponding to the faulty node from the rule module 76 as The target rule is to derive the fault information based on the target rule, and call the corresponding processing method from the action module 75 to process the fault corresponding to the fault information. After the processing result is obtained, the processing result is sent to the workflow node 77; at the same time, the derivation The engine 74 sends fault notification information, fault nodes, fault information, target rules, and processing results to the machine learning module 78 . The machine learning module 78 establishes a communication connection with the rule module 76.
  • the machine learning module 78 After the machine learning module 78 receives the fault notification information, fault nodes, fault information, target rules, and processing results, iteratively optimizes the processing results of the above fault notification information.
  • the optimized derivation rules are obtained and sent to the rule module 76 to update the derivation rules.
  • the embodiments of the present invention can automatically obtain fault notification information when the workflow is interrupted, and automatically complete fault node location, fault information derivation, and fault processing based on the fault notification information, thereby realizing automatic fault location and processing to achieve the purpose of fault self-healing, and can also optimize and update the derivation rule base, thereby improving the success rate of fault self-healing.
  • Fig. 8 is a schematic structural diagram of a fault processing device provided by an embodiment of the present invention III. As shown in Fig. 8, an embodiment of the present invention provides a fault processing device, which is suitable for a fault processing method.
  • the above device 8 includes: Part 81, derivation part 82, call part 83 and processing part 84; Wherein,
  • the obtaining part 81 is configured to obtain fault notification information
  • the derivation part 82 is configured to determine corresponding fault information based on the fault notification information, topology structure and derivation rule base; wherein, the topology structure represents the workflow of the service provided by the software-defined wide area network;
  • the calling part 83 is configured to call the corresponding processing method from the fault library based on the fault information
  • the processing part 84 is configured to process the fault corresponding to the fault information according to the processing method, and obtain a processing result.
  • the derivation part 82 is further configured to determine the corresponding fault node according to the fault notification information and the topology; based on the fault node and the derivation rule base , deduce the fault information.
  • the derivation part 82 is further configured to graphically process the fault notification information to obtain a corresponding graph structure; associating the graph structure with the topology structure, to identify the faulty node.
  • the acquisition part 81 is further configured to collect notification information; wherein, the notification information includes the fault notification information and warning information; and format the warning information to Obtain the failure notification information.
  • the device further includes an iterative optimization part 85 and an update part 86, wherein:
  • the iterative optimization part 85 is configured to perform iterative optimization processing based on the fault notification information, the target rule, the fault information, the processing method and the processing result to obtain an optimization rule;
  • the update part 86 is configured to update the derivation rule base according to the optimization rules.
  • the device further includes a display portion 87, wherein:
  • the display part 87 is configured to display the derivation process of the fault information and the processing process of the fault information.
  • the device further comprises a feedback section 88, wherein:
  • the feedback part 88 is configured to feed back the processing result.
  • the embodiments of the present invention can automatically obtain fault notification information when the workflow is interrupted, and automatically complete fault node location, fault information derivation, and fault processing based on the fault notification information, thereby realizing automatic fault location and processing to achieve the purpose of fault self-healing, and can also optimize and update the derivation rule base, thereby improving the success rate of fault self-healing.
  • Fig. 9 is a structural schematic diagram 4 of a fault processing device provided by an embodiment of the present invention.
  • an embodiment of the present invention provides a fault processing device, which corresponds to a fault processing method; the fault processing device 9 includes : processor 91, memory 92 and communication bus 94, memory 92 communicates with processor 91 through communication bus 94, memory 92 stores one or more programs executable by the processor 91, when the one or more programs When being executed, the processor 91 executes the fault handling method according to the embodiment of the present invention.
  • the fault handling device 9 further includes a communication component 93 for data transmission, wherein the processor 91 is provided with at least one.
  • bus 94 is used to realize connection and communication between these components.
  • the pass bus 94 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as pass-through bus 94 in FIG.
  • An embodiment of the present invention provides a computer-readable storage medium, which stores executable instructions for causing a processor to implement the fault handling method provided by the embodiment of the present invention.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage and optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • the embodiment of the present invention discloses a fault handling method and device, and a computer-readable storage medium.
  • the above method acquires fault notification information when a fault occurs at a workflow node, and then processes the fault notification information, combining the topology corresponding to the workflow node According to the fault node and the pre-stored derivation rules, the cause of the failure of the fault node, that is, the fault information, is derived. Finally, based on the fault information, the corresponding processing method is called from the fault database to process the fault corresponding to the fault information. , to get the processing result.
  • the above method can automatically obtain fault notification information when the workflow is interrupted, and automatically complete fault node location, fault information derivation and fault processing according to the fault notification information, so as to realize automatic fault location and processing, and achieve fault self-healing
  • the purpose is to improve the efficiency of fault handling and reduce the impact of faults on workflow.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明实施例公开了一种故障处理方法及装置、计算机可读存储介质,该方法包括:获取故障通知信息;基于故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息;其中,拓扑结构表征软件定义广域网络提供的业务的工作流;基于故障信息,从故障库中调用对应的处理方法;根据处理方法对故障信息对应的故障进行处理,并得到处理结果。上述方法能够在工作流中断时,自动获取故障通知信息,并根据故障通知信息,自动完成故障节点的定位、故障信息的推导和故障的处理,从而实现故障的自动定位和处理,达到故障自愈的目的,提高故障处理效率,减少故障对工作流的影响。

Description

一种故障处理方法及装置、计算机可读存储介质
相关申请的交叉引用
本发明基于申请号为202111547268.3、申请日为2021年12月16日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本发明作为参考。
技术领域
本发明涉及通信技术领域,尤其涉及一种故障处理方法及装置、计算机可读存储介质。
背景技术
随着软硬件的快速发展,传统网络逐渐被基于软件的设备所替代。软件定义广域网络(Software-Defined WAN,SDWAN),是将软件定义网络(Software Defined Network,SDN)技术应用到广域网场景中所形成的一种服务。这种服务用于连接广阔地理范围的企业网络、数据中心、互联网应用及程序的能力。
为了使SDWAN的业务更加清晰可控,目前引入了工作流的机制,整个工作流包括:用户下单、配置设备、计费和完成。其中,由于,当工作流中的节点发生故障,需要人为去定位故障节点,并手动解决故障。因此,如何确定人为故障节点并解决故障成为待解决的问题。
发明内容
本发明实施例提供了一种故障处理方法及装置、计算机可读存储介质,能够自动定位故障节点,并对故障进行处理,达到故障自愈的目的,提高故障处理效率。
本发明的技术方案是这样实现的:
本发明实施例提供一种故障处理方法,上述方法包括:
获取故障通知信息;
基于所述故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息;其中,所述拓扑结构表征软件定义广域网络提供的业务的工作流;
基于所述故障信息,从故障库中调用对应的处理方法;
根据所述处理方法对所述故障信息对应的故障进行处理,并得到处理结果。
上述方案中,所述基于所述故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息,包括:
根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点;
基于所述故障节点,以及所述推导规则库,推导出所述故障信息。
上述方案中,所述根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点,包括:
对所述故障通知信息进行数据图形化处理,得到对应的图结构;
将所述图结构与所述拓扑结构进行关联,以确定所述故障节点。
上述方案中,所述基于所述故障节点,以及所述推导规则库,推导出所述故障信息,包括:
根据所述故障节点,在所述推导规则库中匹配对应的目标规则;
根据所述目标规则,以及所述故障通知信息,推导出所述故障信息。
上述方案中,所述获取故障通知信息,包括:
采集通知信息;其中,所述通知信息包括所述故障通知信息和警告信息;
对所述警告信息进行格式化处理,以得到所述故障通知信息。
上述方案中,所述方法还包括:
基于所述故障通知信息、所述目标规则、所述故障信息、所述处理方法和所述处理结果,进行迭代优化处理,以得到优化规则;
根据所述优化规则,对所述推导规则库进行更新。
上述方案中,所述方法还包括:
对所述故障信息的推导过程以及所述故障信息的处理过程进行显示。
上述方案中,所述方法还包括:
对所述处理结果进行反馈。
本发明实施例提供一种故障处理装置,包括获取部分、推导部分、调用部分和处理部分;其中,
所述获取部分,被配置为获取故障通知信息;
所述推导部分,被配置为基于所述故障通知信息,以及拓扑结构和推导规则库,确定对应的故障信息;其中,所述拓扑结构表征软件定义广域网络提供的业务的工作流;
所述调用部分,被配置为基于所述故障信息,从故障库中调用对应的处理方法;
所述处理部分,被配置为根据所述处理方法对所述故障信息对应的故障 进行处理,并得到处理结果。
上述方案中,所述推导部分,还被配置为根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点;基于所述故障节点,以及所述推导规则库,推导出所述故障信息。
上述方案中,所述推导部分,还被配置为对所述故障通知信息进行数据图形化处理,得到对应的图结构;将所述图结构与所述拓扑结构进行关联,以确定所述故障节点。
上述方案中,所述推导部分,还被配置为根据所述故障节点,在所述推导规则库中匹配对应的目标规则;根据所述目标规则,以及所述故障通知信息,推导出所述故障信息。
上述方案中,所述获取部分,还被配置为采集通知信息;其中,所述通知信息包括所述故障通知信息和警告信息;对所述警告信息进行格式化处理,以得到所述故障通知信息。
上述方案中,所述装置还包括迭代优化部分和更新部分,其中:
所述迭代优化部分,被配置为基于所述故障通知信息、所述目标规则、所述故障信息、所述处理方法和所述处理结果,进行迭代优化处理,以得到优化规则;
所述更新部分,被配置为根据所述优化规则,对所述推导规则库进行更新。
上述方案中,所述装置还包括显示部分,其中:
所述显示部分,被配置为对所述故障信息的推导过程以及所述故障信息的处理过程进行显示。
上述方案中,所述装置还包括反馈部分,其中:
所述反馈部分,被配置为对所述处理结果进行反馈。
本发明实施例提供一种故障处理装置,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现如本发明实施例提供的故障处理方法。
本发明实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现如本发明实施例提供的故障处理方法。
本发明实施例提供一种故障处理方法及装置、计算机可读存储介质,该方法包括:获取故障通知信息;基于故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息;其中,拓扑结构表征软件定义广域网络提供的业务的工作流;基于故障信息,从故障库中调用对应的处理方法;根据处 理方法对故障信息对应的故障进行处理,并得到处理结果。上述方法中,在工作流节点发生故障时,获取故障通知信息,之后对故障通知信息进行处理,结合工作流节点对应的拓扑结构,得到故障节点;根据故障节点以及预存的推导规则推导出故障节点发生故障的原因,即故障信息,最后基于故障信息,从故障库中调用对应的处理方法,对故障信息对应的故障进行处理,得到处理结果。
本发明实施例能够在工作流中断时,自动获取故障通知信息,并根据故障通知信息,自动完成故障节点的定位、故障信息的推导和故障的处理,从而实现故障的自动定位和处理,达到故障自愈的目的,提高故障处理效率,减少故障对工作流的影响。
附图说明
图1为本发明实施例提供的一种可选的方法流程示意图一;
图2为本发明实施例提供的一种应用于SDWAN的故障处理装置的结构示意图;
图3为本发明实施例提供的一种可选的方法流程示意图二;
图4为本发明实施例提供的一种故障处理装置的结构示意图一;
图5为本发明实施例提供的一种可选的方法流程示意图三;
图6为本发明实施例提供的一种可选的方法流程示意图四;
图7a为本发明实施例提供的一种可选的方法流程示意图五;
图7b为本发明实施例提供的一种故障处理装置的结构示意图二;
图8为本发明实施例提供的一种故障处理装置的结构示意图三;
图9为本发明实施例提供的一种故障处理装置的结构示意图四。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
在对本发明实施例的方案进行介绍之前,先对本发明实施例中可能会使用到的技术术语进行简单说明:
软件定义广域网络(Software-Defined WAN,SDWAN):是将软件定义网络技术应用到广域网场景中所形成的一种服务。
软件定义网络技术(Software Defined Network,SDN):是一种网络创新 架构,用于实现网络虚拟化;通过将网络设备的控制面与数据面分离开来,从而实现了网络流量的灵活控制,使网络作为管道变得更加智能。
图1是本发明实施例提供的一种可选的方法流程示意图一,如图1所示,本发明实施例提供一种故障处理方法,包括:
S101、获取故障通知信息。
本发明实施例中,适用于在工作流节点发生故障时,获取故障通知信息的场景。
本发明实施例中,装置在工作流节点发生故障时,获取故障通知信息。
本发明实施例中,工作流节点发生故障时,装置将会采集通知信息。其中,通知信息可以包括故障通知信息和警告信息,故障通知信息中记录有工作流节点的故障信息,装置通过对故障通知信息的处理,可以获取到工作流节点的故障内容等信息,而警告信息则用于提醒用户当前工作流节点发生故障,起警示作用。
本发明实施例中,装置采集到通知信息后,需要对通知信息进行数据处理,从而获取故障通知信息。其中,装置可以通过消息队列(Message Queue,MQ)等消息通道将通知信息传输至对通知信息进行数据处理的数据模块,通过上述数据模块进行数据处理。
示例性的,数据模块可以对通知信息进行格式化处理,将通知信息中的警告信息格式化,格式化处理时的数据格式如下:
Relateship(node1,node2)
Node(node1)
Node(node2)
Content:{alarmCode:10010……}
本发明实施例中,上述故障处理方法可以用于SDWAN业务。示例性的,图2是本发明实施例提供的一种应用于SDWAN的故障处理装置的结构示意图,如图2所示,其中,SDWAN通过引入工作流机制将SDWAN业务的整个过程分为:用户下单、配置设备、计费和交付。用户下单后,由配置设备为用户提供需要的服务;其中,配置设备可以由多个部分组成,如:客户前置设备(Customer Premise Equipment,CPE)、入网点(Point-of-Presence,PoP)、VC/PE和控制器等。上述组成配置设备的多个部分中的每个部分即为一个工作流节点。
本发明实施例中,装置可以包括故障采集模块1、数据模块2、graph模块3、推理引擎4、rule模块5、机器学习模块7、action模块6和UI模块8,当工作流节点发生故障时,装置可以通过S1-S8对上述故障进行处理,如下:
S1、故障采集模块1进行故障采集,以获取通知信息,并将通知信息发送至数据模块2。
S2、数据模块2对通知信息进行数据处理,得到故障通知信息,并将故障通知信息发送至graph模块3。
S3、graph模块3对故障通知信息进行图形算法处理,得到图形数据(故障节点信息),并将图形数据发送至推理引擎4。
S4、推理引擎4调用rule模块5中存储的推导规则,结合图形数据和故障通知信息推导出故障信息。
S5、推理引擎4根据故障信息和图形数据从action模块6调用action(处理方法),并通过action对故障信息对应的故障进行处理,得到对应的处理结果。
S6、推理引擎4将处理结果发送至机器学习模块7;处理结果携带故障通知信息、图形数据、规则、故障信息和action。
S7、机器学习模块7对处理结果进行迭代优化,得到优化结果,并通过优化结果对rule模块5配置规则。
S8、UI模块8对rule模块6、action模块6和推理引擎4的处理过程进行前端展示。
可以理解的是,装置对采集到的通知信息进行数据处理,以获取故障通知信息,提高后续对装置采集到的信息进行处理以获取故障信息的便捷度,提高工作效率。
本发明实施例中,S101还可以包括S1011-S1012,如下:
S1011、采集通知信息;其中,通知信息包括故障通知信息和警告信息。
在本发明的一些实施例中,适用于采集通知信息的场景。
在本发明的一些实施例中,装置在工作流节点发生故障时,采集故障通知信息和警告信息,从而得到通知信息。
可以理解的是,通过采集通知信息,即可获得当前发生故障的工作流节点的故障情况,为后续对于故障的处理提供数据支持。
S1012、对警告信息进行格式化处理,以得到故障通知信息。
在本发明的一些实施例中,适用于对通知信息进行数据处理的场景。
在本发明的一些实施例中,装置对通知信息中的警告信息进行格式化处理,从而得到故障通知信息。
可以理解的是,对采集到的通知信息中的警告信息进行格式化处理后,将得到故障通知信息,提高后续对装置采集到的信息进行处理以获取故障信息的便捷度,提高工作效率。
S102、基于故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息;其中,拓扑结构表征软件定义广域网络提供的业务的工作流。
本发明实施例中,适用于通过点线关系,结合推导规则库中的规则得到故障信息的场景。
本发明实施例中,装置基于故障通知信息以及拓扑结构确定故障节点,之后基于推导规则库,推导出对应的故障信息。
本发明实施例中,装置根据工作流节点,利用点线关系构建并存储拓扑结构;之后将故障通知信息数据图形化处理并得到对应的图结构,通过图结构,在拓扑结构中进行搜索,并得到搜索结果。其中,拓扑结构中搜索结果对应的工作流节点即为故障节点。
示例性的,装置可以通过图形算法对故障通知信息进行数据图形化处理,从而得到上述图结构。
本发明实施例中,装置完成对故障通知信息的数据图形化处理并得到故障节点后,可以从推导规则库中调用推导规则,将推导规则与故障节点进行匹配,直至找得到与故障节点匹配成功的推导规则作为目标规则,根据目标规则推导出上述故障节点发生故障的原因,即推导出对应的故障信息。
本发明实施例中,装置可以通过graph模块对故障通知信息进行数据图形化处理,在得到故障节点后,graph模块可以发送推导请求至装置中的推理引擎。当推理引擎接收到推导请求后,将响应于上述推导请求,从graph模块中调用上述故障节点,同时对装置中的rule模块进行搜索,以从rule模块的推导规则库中得到与上述故障节点匹配的目标规则。其中,graph模块发送推导请求至推理引擎时,推导请求可以携带故障节点,还可以携带图结构,使得推导引擎接收到推导请求后不需要从graph模块调用故障节点。
本发明实施例中,装置通过目标规则和故障通知信息推导出对应的故障信息。示例性的,装置可以将故障通知信息代入至目标规则中,以对故障节点发生故障的原因进行推导,最后得到对应的故障信息。其中,推导规则存储于推导规则库中,而推导规则库中的数据格式如下:
Figure PCTCN2022133149-appb-000001
Figure PCTCN2022133149-appb-000002
可以理解的是,通过对故障通知信息的处理确定工作流中发生故障的工作流节点,从而得到故障节点,之后根据与故障节点匹配的目标规则,即可推导出故障节点发生的故障对应的故障信息,对上述故障信息对应的故障进行处理,即可实现故障自愈的目的;且使用图形算法可以加快故障推导的效率。
本发明实施例中,S102还可以包括S1021-S1022,如下:
S1021、根据故障通知信息,以及拓扑结构,确定对应的故障节点。
在本发明的一些实施例中,适用于确定故障节点的场景。
在本发明的一些实施例中,装置根据故障通知信息以及拓扑结构,确定工作流中发生故障的故障节点。
在本发明的一些实施例中,拓扑结构可以预存在装置中,也可以由装置在需要时通过点线关系,根据工作流节点构建。
可以理解的是,通过对故障通知信息的处理确定工作流中发生故障的工作流节点,从而得到故障节点,便于确定故障信息。
在本发明的一些实施例中,S1021还可以包括S10211-S10212,如下:
S10211、对故障通知信息进行数据图形化处理,得到对应的图结构。
在本发明的一些实施例中,适用于对故障通知信息进行处理的场景。
在本发明的一些实施例中,装置对故障通知信息进行数据图形化处理,从而得到对应的图结构。
在本发明的一些实施例中,装置通过数据图形化处理,将故障通知信息转换为图结构;其中,图结构可以由点线构成,即上述图结构能够表示故障通知信息对应的点线关系。
可以理解的是,将故障通知信息转化为图结构后,根据图结构与拓扑结构的关联,确定工作流中发生故障的工作流节点,从而得到故障节点,便于确定故障信息。
S10212、将图结构与拓扑结构进行关联,以确定故障节点。
在本发明的一些实施例中,适用于确定故障节点的场景。
在本发明的一些实施例中,装置通过将图结构与拓扑结构进行关联,确定故障节点。
在本发明的一些实施例中,装置通过将图结构与拓扑结构进行关联,根据关联,搜索得到拓扑结构中与图结构关联度最高的部分,上述部分对应分工作流节点即为故障节点。
可以理解的是,通过对故障通知信息的处理确定工作流中发生故障的工作流节点,从而得到故障节点,便于确定故障信息。
S1022、基于故障节点,以及推导规则库,推导出故障信息。
在本发明的一些实施例中,适用于通过推导规则库中的规则得到发生故障的原因的场景。
在本发明的一些实施例中,装置根据故障节点从推导规则库中确定与故障节点匹配的目标规则,基于目标规则推导出故障信息。
可以理解的是,得到故障信息后,对上述故障信息对应的故障进行处理即可实现故障自愈的目的。
在本发明的一些实施例中,S1022还可以包括S10221-S10222,如下:
S10221、根据故障节点,在推导规则库中匹配对应的目标规则。
在本发明的一些实施例中,适用于从推导规则库中获取目标规则的场景。
在本发明的一些实施例中,装置将推导规则库中的推导规则与故障节点进行匹配,从而得到目标规则。
在本发明的一些实施例中,装置将推导规则库中的推导规则与故障节点进行匹配时,若匹配结果为第一推导规则与故障节点匹配,则第一推导规则为目标规则。其中,第一推导规则为推导规则库中的任意一个推导规则。在实际应用中,每个故障节点可以对应一个或多个推导规则,而推导规则库中的每个推导规则均对应一种故障信息。
可以理解的是,得到目标规则后,根据目标规则即可得到故障信息,之后对故障信息对应的故障进行处理即可,这样提高了处理故障时的目的性。
S10222、根据目标规则,以及故障通知信息,推导出故障信息。
在本发明的一些实施例中,适用于得到故障信息的场景。
在本发明的一些实施例中,装置根据S10221得到的目标规则,结合故障通知信息,推导出故障信息。
可以理解的是,通过故障信息即可知道发生故障的原因,之后对故障信息对应的故障进行处理即可,这样提高了处理故障时的目的性。
S103、基于故障信息,从故障库中调用对应的处理方法。
本发明实施例中,适用于获取处理方法,以处理故障的场景。
本发明实施例中,装置根据S102得到的故障信息,基于故障信息,从故障库中调用对应的处理方法。
本发明实施例中,装置可以从装置中的action模块的故障库中调用处理方法。其中,故障库中的处理方法,和/或,action模块中的故障库,可以通过插件(plugin)的方式被自定义添加至装置中,也可以预存在装置中,还可以通过用户对已添加至装置内的处理方法或故障库进行修改并保存后,生成新的处理方法或故障库添加至装置中。
本发明实施例中,装置可以从故障库中调用处理方法,不仅可以依据故障信息,还可以结合故障节点或故障通知信息,以提高调用的处理方法的准确度。
可以理解的是,在故障库中调用对应的处理方法,为后续处理故障实现故障自愈提供了基础。
S104、根据处理方法对故障信息对应的故障进行处理,并得到处理结果。
本发明实施例中,适用于对故障进行处理的场景。
本发明实施例中,装置根据S103调用的处理方法,对故障信息对应的故障进行处理,并得到对应的处理结果。
本发明实施例中,当处理结果为故障已解决,则装置可以对处理结果进行反馈,使得业务继续进行。其中,装置对处理结果进行反馈时,可以是推理引擎调用处理方法对故障信息对应的故障进行处理,并得到处理结果。之后推理引擎将处理结果发送至装置中的控制器,由控制器发送指令使得工作流节点继续进行流程;也可以将处理结果发送至发生故障的工作流节点处,以通知工作流节点继续进行流程。
本发明实施例中,当处理结果为故障未解决,则装置可以重复S102-S104,直至处理结果为故障已解决。其中,装置在重复S102-S104的同时,还可以 发出提示信息,通知用户对故障信息对应的故障进行人工处理。
本发明实施例中,当处理结果为故障未解决,装置可以将故障信息、故障通知信息、故障节点、处理方法均发送至装置中的机器学习模块中,通过机器学习模块对处理方法进行迭代优化,以得到新的处理方法,并将新的处理方法更新至故障库中,通过新的处理方法对上述故障信息对应的故障进行处理。
可以理解的是,本发明实施例能够自动定位故障节点,并对故障进行处理,实现故障的自愈,,提高故障处理效率。
基于图1,图3是本发明实施例提供的一种可选的方法流程示意图二,如图3所示,在本发明的一些实施例中,本发明实施例提供一种故障处理方法,还包括:
S105、对故障信息的推导过程以及故障信息的处理过程进行显示。
在本发明的一些实施例中,适用于对故障处理流程进行显示的场景。
在本发明的一些实施例中,装置对故障信息的推导过程以及故障信息的处理过程进行显示。
在本发明的一些实施例中,装置通过装置中的UI模块对故障的推导过程以及故障信息的处理过程进行显示。其中,故障的推导过程为S101-S103,故障信息的处理过程为S104;由于S101-S104上文中均有叙述,因此,此处不再赘述。
在本发明的一些实施例中,图4是本发明实施例提供的一种故障处理装置中的结构示意图一,如图4所示,装置中UI模块41与action模块42、推导引擎43和rule模块44建立通信连接,使得通过UI模块41可以对rule模块44的推导规则库中的推导规则进行配置,还可以对action模块42中调用的处理方法进行配置。其中,装置可以以图形化的方式展示推导引擎43对故障信息的推导过程以及故障信息的处理过程,即展示故障自愈的流程。
在本发明的一些实施例中,装置展示的内容中还可以包括处理结果,而装置中的UI模块可以是支持互动的显示屏,如触摸屏。
可以理解的是,装置对故障自愈过程进行展示,便于用户了解当前工作流的工作情况。
基于图1,图5是本发明实施例提供的一种可选的方法流程示意图三,如图5所示,在本发明的一些实施例中,本发明实施例提供一种故障处理方法,还包括:
S106、对处理结果进行反馈。
在本发明的一些实施例中,适用于对处理结果进行反馈的场景。
在本发明的一些实施例中,装置得到处理结果后,对处理结果进行反馈,使得工作流可以继续进行。
在本发明的一些实施例中,可以通过推导引擎将处理结果发送至装置的控制器或发生故障的工作流节点,从而完成处理结果的反馈。
可以理解的是,通过对处理结果的反馈,使得工作流可以尽快继续进行,提高工作效率。
基于图1,图6是本发明实施例提供的一种可选的方法流程示意图四,如图6所示,在本发明的一些实施例中,本发明实施例提供一种故障处理方法,还包括:
S107、基于故障通知信息、目标规则、故障信息、处理方法和处理结果,进行迭代优化处理,以得到优化规则。
在本发明的一些实施例中,适用于对推导规则库进行优化更新的场景。
在本发明的一些实施例中,基于故障通知信息、目标规则、故障信息、处理方法和处理结果,对故障信息对应的故障进行迭代优化处理,以得到优化规则。
在本发明的一些实施例中,装置可以通过装置中的机器学习模块进行迭代优化处理。其中,迭代优化处理指对故障信息对应的故障进行处理,以得到更准确的推导规则,即优化规则。在实际应用中,机器学习模块还可以对故障信息对应的故障进行迭代优化处理,得到更优化的处理方法,并对故障库进行更新。机器学习模块根据预设的算法对故障通知信息、目标规则、故障信息、处理方法和处理结果进行迭代优化处理,预设的算法可以通过自定义配置或插件添加于机器学习模块中。
可以理解的是,装置可以实现对推导规则库的优化和更新,从而提高故障自愈的成功率。
S108、根据优化规则,对推导规则库进行更新。
在本发明的一些实施例中,适用于对推导规则库进行更新的场景。
在本发明的一些实施例中,装置根据优化规则,对推导规则库中的推导规则进行更新。
在本发明的一些实施例中,装置可以利用优化规则,对推导规则库中对应的推导规则进行覆盖,从而完成更新;也可以将优化规则保存至对应的推导规则库中,以完成更新。
可以理解的是,装置可以实现对推导规则库的优化和更新,从而提高故障自愈的成功率。
图7a为本发明实施例提供的一种可选的方法流程示意图五,如图7a所示, 本发明实施例提供的一种故障处理方法,包括:
S301、故障采集,得到通知信息。
S302、对通知信息进行数据处理,得到故障通知信息。
S303、根据故障通知信息确定故障节点。
S304、根据故障节点调用目标规则,基于目标规则推导故障信息。
S305、根据故障信息调用处理方法,通过处理方法对故障进行处理,得到处理结果。
S306、反馈处理结果至工作流节点。
S307、根据故障通知信息、故障节点、故障信息、目标规则,对处理结果进行迭代优化,得到优化推导规则。
S308、利用优化推导规则,对推导规则库进行更新。
在本发明的一些实施例中,图7b为本发明实施例提供的一种故障处理装置的结构示意图二,如图7b所示,本发明实施例提供的故障处理装置包括故障采集模块71、数据模块72、graph模块73、推导引擎74、action模块75、rule模块76、工作流节点77、机器学习模块78。其中,可以通过本发明实施例提供的故障处理装置执行S301-S308。示例性的,故障采集模块71与数据模块72建立通信连接,当故障采集模块71采集到通知信息后,将通知信息发送至数据模块72;数据模块72与graph模块73建立通信连接,当数据模块72接收到通知信息后,对通知信息进行数据处理从而得到故障通知信息后,将故障通知信息发送至graph模块73;graph模块73与推导引擎74建立通信连接,当graph模块73接收到故障通知信息后,对故障通知信息进行数字图形化处理,从而确定工作流中发生故障的故障节点,之后通知推导引擎74,告知推导引擎74已确定故障节点;推导引擎74与action模块75、rule模块76、工作流节点77、机器学习模块78均建立通信连接,当推导引擎74接收到graph模块73的通知后,将从graph模块73调用故障节点,从rule模块76中搜索与故障节点对应的推导规则作为目标规则,基于目标规则推导故障信息,并从action模块75中调用对应的处理方法,以对故障信息对应的故障进行处理,得到处理结果后,将处理结果发送至工作流节点77;同时,推导引擎74将故障通知信息、故障节点、故障信息、目标规则、处理结果发送至机器学习模块78。机器学习模块78与rule模块76建立通信连接,当机器学习模块78接收到故障通知信息、故障节点、故障信息、目标规则、处理结果之后,将会对上述故障通知信息的处理结果进行迭代优化,得到优化推导规则并将优化推导规则发送至rule模块76实现推导规则的更新。
可以理解的是,本发明实施例能够在工作流中断时,自动获取故障通知 信息,并根据故障通知信息,自动完成故障节点的定位、故障信息的推导和故障的处理,从而实现故障的自动定位和处理,达到故障自愈的目的,还可以实现对推导规则库的优化和更新,从而提高故障自愈的成功率。
图8为本发明实施例提供的一种故障处理装置的结构示意图三,如图8所示,本发明实施例提供一种故障处理装置,适用于一种故障处理方法,上述装置8包括:获取部分81、推导部分82、调用部分83和处理部分84;其中,
所述获取部分81,被配置为获取故障通知信息;
所述推导部分82,被配置为基于所述故障通知信息,以及拓扑结构和推导规则库,确定对应的故障信息;其中,所述拓扑结构表征软件定义广域网络提供的业务的工作流;
所述调用部分83,被配置为基于所述故障信息,从故障库中调用对应的处理方法;
所述处理部分84,被配置为根据所述处理方法对所述故障信息对应的故障进行处理,并得到处理结果。
在本发明的一些实施例中,所述推导部分82,还被配置为根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点;基于所述故障节点,以及所述推导规则库,推导出所述故障信息。
在本发明的一些实施例中,所述推导部分82,还被配置为对所述故障通知信息进行数据图形化处理,得到对应的图结构;将所述图结构与所述拓扑结构进行关联,以确定所述故障节点。
在本发明的一些实施例中,所述获取部分81,还被配置为采集通知信息;其中,所述通知信息包括所述故障通知信息和警告信息;对所述警告信息进行格式化处理,以得到所述故障通知信息。
在本发明的一些实施例中,所述装置还包括迭代优化部分85和更新部分86,其中:
所述迭代优化部分85,被配置为基于所述故障通知信息、所述目标规则、所述故障信息、所述处理方法和所述处理结果,进行迭代优化处理,以得到优化规则;
所述更新部分86,被配置为根据所述优化规则,对所述推导规则库进行更新。
在本发明的一些实施例中,所述装置还包括显示部分87,其中:
所述显示部分87,被配置为对所述故障信息的推导过程以及所述故障信息的处理过程进行显示。
在本发明的一些实施例中,所述装置还包括反馈部分88,其中:
所述反馈部分88,被配置为对所述处理结果进行反馈。
可以理解的是,本发明实施例能够在工作流中断时,自动获取故障通知信息,并根据故障通知信息,自动完成故障节点的定位、故障信息的推导和故障的处理,从而实现故障的自动定位和处理,达到故障自愈的目的,还可以实现对推导规则库的优化和更新,从而提高故障自愈的成功率。
图9是本发明实施例提供的一种故障处理装置的结构示意图四,如图9所示,本发明实施例提供了一种故障处理装置,对应于一种故障处理方法;故障处理装置9包括:处理器91、存储器92以及通信总线94,存储器92通过通信总线94与处理器91进行通信,存储器92存储所述处理器91可执行的一个或者多个程序,当所述一个或者多个程序被执行时,所述处理器91执行如本发明实施例的故障处理方法,具体的,故障处理装置9还包括用于进行数据传输的通信组件93,其中,处理器91至少设有一个。
本发明实施例中,故障处理装置9中的各个组件通过总线94耦合在一起。可理解,通过总线94用于实现这些组件之间的连接通信。通过总线94除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图9中将各种总线都标为通过总线94。
本发明实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现如本发明实施例提供的故障处理方法。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或 多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。
工业实用性
本发明实施例公开了一种故障处理方法及装置、计算机可读存储介质,上述方法在工作流节点发生故障时,获取故障通知信息,之后对故障通知信息进行处理,结合工作流节点对应的拓扑结构,得到故障节点;根据故障节点以及预存的推导规则推导出故障节点发生故障的原因,即故障信息,最后基于故障信息,从故障库中调用对应的处理方法,对故障信息对应的故障进行处理,得到处理结果。上述方法能够在工作流中断时,自动获取故障通知信息,并根据故障通知信息,自动完成故障节点的定位、故障信息的推导和故障的处理,从而实现故障的自动定位和处理,达到故障自愈的目的,提高故障处理效率,减少故障对工作流的影响。

Claims (18)

  1. 一种故障处理方法,上述方法包括:
    获取故障通知信息;
    基于所述故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息;其中,所述拓扑结构表征软件定义广域网络提供的业务的工作流;
    基于所述故障信息,从故障库中调用对应的处理方法;
    根据所述处理方法对所述故障信息对应的故障进行处理,并得到处理结果。
  2. 根据权利要求1所述的方法,其中,所述基于所述故障通知信息,以及拓扑结构和推导规则库,推导出对应的故障信息,包括:
    根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点;
    基于所述故障节点,以及所述推导规则库,推导出所述故障信息。
  3. 根据权利要求2所述的方法,其中,所述根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点,包括:
    对所述故障通知信息进行数据图形化处理,得到对应的图结构;
    将所述图结构与所述拓扑结构进行关联,以确定所述故障节点。
  4. 根据权利要求2所述的方法,其中,所述基于所述故障节点,以及所述推导规则库,推导出所述故障信息,包括:
    根据所述故障节点,在所述推导规则库中匹配对应的目标规则;
    根据所述目标规则,以及所述故障通知信息,推导出所述故障信息。
  5. 根据权利要求1所述的方法,其中,所述获取故障通知信息,包括:
    采集通知信息;其中,所述通知信息包括所述故障通知信息和警告信息;
    对所述警告信息进行格式化处理,以得到所述故障通知信息。
  6. 根据权利要求4所述的方法,其中,所述方法还包括:
    基于所述故障通知信息、所述目标规则、所述故障信息、所述处理方法和所述处理结果,进行迭代优化处理,以得到优化规则;
    根据所述优化规则,对所述推导规则库进行更新。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    对所述故障信息的推导过程以及所述故障信息的处理过程进行显示。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    将所述处理结果反馈至当前节点。
  9. 一种故障处理装置,包括获取部分、推导部分、调用部分和处理部分;其中,
    所述获取部分,被配置为获取故障通知信息;
    所述推导部分,被配置为基于所述故障通知信息,以及拓扑结构和推导规则库,确定对应的故障信息;其中,所述拓扑结构表征软件定义广域网络提供的业务的工作流;
    所述调用部分,被配置为基于所述故障信息,从故障库中调用对应的处理方法;
    所述处理部分,被配置为根据所述处理方法对所述故障信息对应的故障进行处理,并得到处理结果。
  10. 根据权利要求9所述的装置,其中,所述推导部分,还被配置为根据所述故障通知信息,以及所述拓扑结构,确定对应的故障节点;基于所述故障节点,以及所述推导规则库,推导出所述故障信息。
  11. 根据权利要求10所述的装置,其中,所述推导部分,还被配置为对所述故障通知信息进行数据图形化处理,得到对应的图结构;将所述图结构与所述拓扑结构进行关联,以确定所述故障节点。
  12. 根据权利要求10所述的装置,其中,所述推导部分,还被配置为根据所述故障节点,在所述推导规则库中匹配对应的目标规则;根据所述目标规则,以及所述故障通知信息,推导出所述故障信息。
  13. 根据权利要求9所述的装置,其中,所述获取部分,还被配置为采集通知信息;其中,所述通知信息包括所述故障通知信息和警告信息;对所述警告信息进行格式化处理,以得到所述故障通知信息。
  14. 根据权利要求12所述的装置,其中,所述故障处理装置还包括迭代优化部分和更新部分;其中,
    所述迭代优化部分,被配置为基于所述故障通知信息、所述目标规则、所述故障信息、所述处理方法和所述处理结果,进行迭代优化处理,以得到优化规则;
    所述更新部分,被配置为根据所述优化规则,对所述推导规则库进行更新。
  15. 根据权利要求9所述的装置,其中,所述故障处理装置还包括显示部分;其中,
    所述显示部分,被配置为对所述故障信息的推导过程以及所述故障信息的处理过程进行显示。
  16. 根据权利要求9所述的装置,其中,所述故障处理装置还包括反馈部分;其中,
    所述反馈部分,被配置为将所述处理结果反馈至当前节点。
  17. 一种故障处理装置,包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至8任一项所述的方法。
  18. 一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至8任一项所述的方法。
PCT/CN2022/133149 2021-12-16 2022-11-21 一种故障处理方法及装置、计算机可读存储介质 WO2023109437A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111547268.3 2021-12-16
CN202111547268.3A CN116266808A (zh) 2021-12-16 2021-12-16 一种故障处理方法及装置、计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023109437A1 true WO2023109437A1 (zh) 2023-06-22

Family

ID=86742820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/133149 WO2023109437A1 (zh) 2021-12-16 2022-11-21 一种故障处理方法及装置、计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN116266808A (zh)
WO (1) WO2023109437A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200314022A1 (en) * 2019-03-25 2020-10-01 Cisco Technology, Inc. PREDICTIVE ROUTING USING MACHINE LEARNING IN SD-WANs
US20200342346A1 (en) * 2019-04-24 2020-10-29 Cisco Technology, Inc. Adaptive threshold selection for sd-wan tunnel failure prediction
CN111934936A (zh) * 2020-09-10 2020-11-13 广州虎牙科技有限公司 网络状态检测方法、装置、电子设备及存储介质
CN113656252A (zh) * 2021-08-24 2021-11-16 北京百度网讯科技有限公司 故障定位方法、装置、电子设备以及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200314022A1 (en) * 2019-03-25 2020-10-01 Cisco Technology, Inc. PREDICTIVE ROUTING USING MACHINE LEARNING IN SD-WANs
US20200342346A1 (en) * 2019-04-24 2020-10-29 Cisco Technology, Inc. Adaptive threshold selection for sd-wan tunnel failure prediction
CN111934936A (zh) * 2020-09-10 2020-11-13 广州虎牙科技有限公司 网络状态检测方法、装置、电子设备及存储介质
CN113656252A (zh) * 2021-08-24 2021-11-16 北京百度网讯科技有限公司 故障定位方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN116266808A (zh) 2023-06-20

Similar Documents

Publication Publication Date Title
JP2020514881A5 (zh)
US20160239288A1 (en) Device driver aggregation in operating system deployment
CN104125087B (zh) 一种告警信息处理方法及装置
US10097397B2 (en) System and method for managing CWSN with GUI
JP6050812B2 (ja) デバイス管理方法、装置、およびシステム
CN107426023A (zh) 云平台日志收集和转发方法、系统、设备及存储介质
US8407158B2 (en) System and method for providing interactive troubleshooting
CN104023213B (zh) 一种基于二维码的交互式服务方法和系统
JP2014158290A (ja) 情報システム、制御装置、通信方法およびプログラム
CN110932918A (zh) 日志数据采集方法、装置及存储介质
CN112437072A (zh) 一种云平台中虚拟机流量牵引系统、方法、设备及介质
WO2020220891A1 (zh) 用于生成物联网系统中的站点的配置文件的方法及装置
WO2023109437A1 (zh) 一种故障处理方法及装置、计算机可读存储介质
CN109348434A (zh) 一种场景信息的发送方法、发送装置及终端设备
WO2022062661A1 (zh) 操作通知方法和装置、存储介质和电子装置
CN112118600B (zh) 一种5g独立组网sa架构下的流量牵引系统
CN109842524A (zh) 自动升级方法、装置、电子设备及计算机可读存储介质
CN112118179B (zh) 创建端到端分段路由sr路径的方法及装置
US6883169B1 (en) Apparatus for managing the installation of software across a network
CN106933932B (zh) 数据处理方法、装置及应用服务器
US7509642B1 (en) Method and system for automatically providing network-transaction-status updates
WO2016202098A1 (zh) 一种以太网业务配置方法、装置及网管
US20150304200A1 (en) Traffic information collection system and collection control node
CN114327563A (zh) 数据同步方法及装置、系统、存储介质、计算机系统
CN109495178B (zh) 一种FTTx网络拓扑链路的构建方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906189

Country of ref document: EP

Kind code of ref document: A1