WO2021103800A1 - 故障修复操作推荐方法、装置及存储介质 - Google Patents

故障修复操作推荐方法、装置及存储介质 Download PDF

Info

Publication number
WO2021103800A1
WO2021103800A1 PCT/CN2020/118233 CN2020118233W WO2021103800A1 WO 2021103800 A1 WO2021103800 A1 WO 2021103800A1 CN 2020118233 W CN2020118233 W CN 2020118233W WO 2021103800 A1 WO2021103800 A1 WO 2021103800A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
candidate
influence
information
repair
Prior art date
Application number
PCT/CN2020/118233
Other languages
English (en)
French (fr)
Inventor
廖文奇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20892606.3A priority Critical patent/EP4047481A4/en
Publication of WO2021103800A1 publication Critical patent/WO2021103800A1/zh
Priority to US17/825,246 priority patent/US11743113B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0712Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2252Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using fault dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities

Definitions

  • This application relates to the field of communications technology, and in particular to a method, device and storage medium for recommending fault repair operations.
  • the SDN controller can centrally control multiple nodes. Wherein, when any one of the multiple nodes fails, the SDN controller may recommend a repair operation for the node to repair the failure.
  • a plan library can be configured in the SDN controller based on manual experience, and the plan library stores a mapping relationship between fault information and repair plans.
  • the SDN controller After the SDN controller receives the fault information from the faulty node, it can search for the repair plan corresponding to the fault information from the plan library according to the fault information. If it finds the repair plan corresponding to the fault information, the SDN controller can The repair operations included in the found repair plan are displayed, and a repair operation is manually selected as a fault repair operation and sent to the faulty node, so that the faulty node performs fault repair according to the fault repair operation. However, if the repair plan corresponding to the fault information does not exist in the plan library, the SDN controller will not be able to provide a repair operation for the faulty node, resulting in the failure to repair the fault.
  • the present application provides a recommended method, device and storage medium for fault repair operations, which can be used to solve the problem of failure to provide a repair operation for a faulty node when the repair plan corresponding to the fault information does not exist in the plan library in the related art.
  • the technical solution is as follows:
  • a method for recommending fault repair operations includes: obtaining fault information of a faulty node; if a repair plan corresponding to the fault information does not exist in the plan library, adopting a recommendation based on the fault information
  • the model determines a recommended plan, the recommended plan includes one or more candidate operations; the fault repair operation is determined from the one or more candidate operations included in the recommended plan, and the fault repair operation is recommended to the faulty node so that The faulty node performs fault repair according to the fault repair operation.
  • the fault information can be processed through the recommendation model to obtain the recommended plan, and then one or more of the recommended plans include Selecting an operation from the candidate operations as the fault repair operation solves the problem that the repair plan corresponding to the fault information does not exist in the plan library, and the problem that the repair operation cannot be provided for the faulty node is solved.
  • the fault information includes multiple fault parameters
  • the parameters of the recommended model include multiple fault feature factors
  • each of the multiple fault feature factors corresponds to one of the multiple fault parameters.
  • a fault parameter e.g., the implementation process of determining the recommendation plan through the recommendation model may be: taking the multiple fault parameters as the input of the recommendation model, according to the multiple fault feature factors, passing the The recommendation model determines the recommendation plan.
  • the parameters of the recommendation model further include a configuration impact factor
  • the configuration impact factor includes the original configuration information of the faulty node and the physical topology information of the network where the faulty node is located.
  • the implementation process of determining the fault repair operation from one or more candidate operations included in the recommendation plan may be: taking the first candidate operation as the input of the recommendation model, and according to the configuration impact factor, through The recommendation model determines a configuration influence degree corresponding to the first candidate operation, where the configuration influence degree is the predicted influence degree of the first candidate operation on the configuration of the faulty node and the physical topology of the network where it is located.
  • a candidate operation is any one of the one or more candidate operations; and the candidate operation with the least configuration influence among the one or more candidate operations is used as the fault repair operation.
  • the embodiment of the present application can determine the configuration influence degree of the candidate operation, and recommend a fault repair operation according to the configuration influence degree, so that the recommended fault repair operation has the least impact on the configuration of the faulty node and the physical topology of the network where it is located.
  • the method further includes: receiving a repair result fed back by the faulty node after performing the fault repair according to the fault repair operation, where the repair result includes the fault
  • the model determines the data plane influence degree corresponding to the fault repair operation, where the data plane influence degree is the true degree of influence caused by the fault repair operation on the configuration of the faulty node and the physical topology of the network where it is located; according to the fault repair Operate the corresponding data plane influence degree, the fault repair operation and the fault information to generate a fault sample; adjust the parameters of the recommended model according to the fault sample.
  • the DPV model can also be used to verify the impact of the fault repair operation on the data plane, and then generate fault samples based on the impact of the data plane, the fault repair operation, and the fault information to perform the parameters of the recommended model Further adjustments have improved the accuracy of the recommendation model recommendation plan.
  • a method for recommending a fault repair operation further comprising: obtaining fault information of a faulty node, and if a repair plan corresponding to the fault information exists in the plan library, repairing the fault information corresponding to the fault information
  • the recommended plan includes one or more candidate operations; the comprehensive influence degree of each candidate operation in the one or more candidate operations is predicted, and the comprehensive influence degree is used to indicate that the corresponding candidate operation affects the The magnitude of the comprehensive influence of the network where the faulty node is located; the candidate operation with the smallest degree of comprehensive influence among the one or more candidate operations is determined as a fault repair operation, and the fault repair operation is recommended to the faulty node so that the The faulty node performs fault repair according to the fault repair operation.
  • the comprehensive influence degree of each candidate operation included in the restoration plan can be determined, and then the fault repair operation can be determined according to the comprehensive influence degree, so that the fault repair operation is effective.
  • the overall impact of the network where the faulty node is located is minimized.
  • the predicting the comprehensive influence degree of each candidate operation in the one or more candidate operations included in the recommended plan includes: estimating that each candidate operation in the one or more candidate operations affects the The degree of business impact of the services in the network where the failed node is located; estimate the degree of impact of each candidate operation in the one or more candidate operations on the configuration of the network where the failed node is located; according to the business impact corresponding to each candidate operation The degree and configuration influence degree determine the comprehensive influence degree corresponding to each candidate operation.
  • the implementation process of estimating the degree of influence of each candidate operation of the one or more candidate operations on the configuration of the network where the faulty node is located may be: obtaining the original configuration information and all the information of the faulty node.
  • CPV configuration plane verification
  • the realization process of determining the comprehensive influence degree corresponding to each candidate operation according to the business influence degree and the configuration influence degree corresponding to each candidate operation may be: obtaining the business influence weight and the configuration influence weight; The business influence degree corresponding to the operation, the business influence weight, the configuration influence degree corresponding to the first candidate operation, and the configuration influence weight are determined to determine the comprehensive influence degree corresponding to the first candidate operation.
  • a device for recommending fault repairing operations has the function of realizing the behavior of the recommended method for fault repairing operations in the first or second aspect.
  • the device for recommending a fault repair operation includes at least one module for implementing the method for recommending a fault repair operation provided in the above-mentioned first aspect.
  • a device for recommending fault repair operations includes a processor and a memory in a structure, and the memory is used for storing and supporting the fault repair operation recommending device to execute the first aspect or the second aspect.
  • the program of the recommended method for fault repair operation provided by the aspect, and the data used to implement the recommended method for fault repair operation provided by the first aspect or the second aspect is stored.
  • the processor is configured to execute a program stored in the memory.
  • the operating device of the storage device may further include a communication bus, and the communication bus is used to establish a connection between the processor and the memory.
  • a computer-readable storage medium stores instructions that, when run on a computer, cause the computer to perform the fault repair described in the first or second aspect above Recommended method of operation.
  • a computer program product containing instructions which when running on a computer, causes the computer to execute the recommended method for fault repair operations described in the first or second aspect.
  • the fault information can be processed through the recommendation model to obtain the recommended plan, and then one or more of the recommended plans include Selecting an operation from the candidate operations as the fault repair operation solves the problem that the repair plan corresponding to the fault information does not exist in the plan library, and the problem that the repair operation cannot be provided for the faulty node is solved.
  • FIG. 1 is a system architecture diagram involved in a method for recommending a fault repair operation provided by an embodiment of the present application
  • Fig. 2 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a method for recommending a fault repair operation provided by an embodiment of the present application
  • FIG. 4 is a flowchart of another recommended method for fault repair operation provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a device for recommending fault repair operations according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of another device for recommending fault repair operations according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another device for recommending fault repair operations according to an embodiment of the present application.
  • FIG. 1 is a system architecture diagram involved in a method for recommending a fault repair operation provided by an embodiment of the present application.
  • the system includes a controller 101, an analyzer 102, and a plurality of physical nodes 103.
  • the controller 101 may communicate with the analyzer 102 and multiple physical nodes 103 respectively, and in addition, the analyzer 102 may also communicate with multiple physical nodes 103.
  • the controller 101 is used to centrally control the allocation of network resources.
  • the controller 101 may control data forwarding on multiple physical nodes 103.
  • the controller 101 may also receive the fault information reported by the analyzer 102, and use the method of recommending fault repair operations provided in the embodiment of the present application to report to the faulty devices in the multiple physical nodes 103 Recommended fault repair operations.
  • the analyzer 102 is used to monitor whether each physical node 103 of the multiple physical nodes 103 fails in real time, and after detecting that a certain physical node 103 fails, collect the failure information of the failed node, and report the failure information to The controller 101, so that the controller 101 recommends a repair operation to the faulty node according to the method provided in the embodiment of the present application.
  • the multiple physical nodes 103 may be multiple devices in the physical network.
  • the multiple physical nodes 103 are used to receive and/or send service data.
  • controller 101 and the analyzer 102 may be distributed in two different independent devices. Alternatively, the controller 101 and the analyzer 102 may be integrated into one device, which is not limited in the embodiment of the present application. Alternatively, the controller 101 may have the function of the analyzer 102. In this case, the analyzer 102 may not be included in the aforementioned system. In addition, the multiple physical nodes 103 may be network devices such as switches and routers, which are not limited in the embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the controller 101 in FIG. 1 may be implemented by the network device shown in FIG. 2.
  • the network device includes at least one processor 201, a communication bus 202, a memory 203, and at least one communication interface 204.
  • the processor 201 can be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs used to control the execution of the program of this application. integrated circuit.
  • CPU Central Processing Unit
  • ASIC application-specific integrated circuit
  • the communication bus 202 may include a path for transferring information between the above-mentioned components.
  • the memory 203 can be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions.
  • the type of dynamic storage device can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by Any other medium accessed by the computer, but not limited to this.
  • the memory 203 may exist independently and is connected to the processor 201 through the communication bus 202.
  • the memory 203 may also be integrated with the processor 201.
  • the communication interface 204 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area networks (Wireless Local Area Networks, WLAN), etc.
  • RAN wireless access network
  • WLAN Wireless Local Area Networks
  • the processor 201 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 2.
  • the computer device may include multiple processors, such as the processor 201 and the processor 205 shown in FIG. 2.
  • processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the computer device may further include an output device 206 and an input device 207.
  • the output device 206 communicates with the processor 201 and can display information in a variety of ways.
  • the output device 206 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • the input device 207 communicates with the processor 201, and can receive user input in a variety of ways.
  • the input device 207 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
  • the memory 203 is used to store the program code for executing the solution of the present application, and the processor 201 controls the execution.
  • the processor 201 is configured to execute the program code 208 stored in the memory 203.
  • the program code 208 may include one or more software modules.
  • the controller shown in FIG. 1 can recommend fault repair operations for the faulty device through one or more software modules in the processor 201 and the program code 208 in the memory 203.
  • the processor in FIG. 2 reads the code in the memory, so that the network device shown in FIG. 2 can execute part or all of the operations performed by the controller in each embodiment of the present application.
  • FIG. 3 is a flowchart of a method for recommending a fault repair operation provided by an embodiment of the present application.
  • the recommended method for fault repair operations can be applied to the controller 101 shown in FIG. 1, and referring to FIG. 3, the method includes:
  • Step 301 Obtain fault information of the faulty node.
  • the controller may receive the fault information of the faulty node reported by the analyzer.
  • the analyzer can monitor multiple physical nodes in real time. When a failure of a physical node is detected, information about the failure of the physical node can be collected, and the information can be reported to the controller as failure information.
  • the controller can also monitor multiple physical nodes, and then when a physical node is detected to be faulty, it collects relevant information about the physical node’s failure, and then collects The received information is used as fault information.
  • the physical node that has failed is the failed node, and the failure of the failed node may be referred to as the first failure.
  • the fault information may include multiple fault parameters.
  • the multiple fault parameters may be two or more of information such as fault identification, fault location, fault occurrence time, fault log, and fault type.
  • the fault identifier included in the fault information is an address resolution protocol (ARP) attack, which is used to indicate that the occurring fault is an ARP attack.
  • the fault location is node 1 port 2, which is used to indicate that the faulty node is node 1, and the fault occurs at port 2 of node 1.
  • the time of the fault is 2019-07-14-08:30, and the fault type is 01. It should be noted that different fault identifiers may correspond to the same fault type, or may correspond to different fault types. That is, some faults with different names may be the same type of fault.
  • the physical node may continue to report fault information to the controller.
  • the controller may receive some faulty nodes in the fault information. Multiple identical fault messages sent. Or, multiple physical nodes may have the same fault in the same time period. In this case, the controller will receive multiple similar fault information in this time period.
  • the controller may also recommend a repair operation according to a preset time interval. In this case, whenever the controller recommends a repair operation, the controller can obtain multiple fault information received within a preset time interval closest to the current moment, and aggregate the multiple fault information. Class analysis, so that the multiple fault information is divided into multiple categories. After that, the controller can recommend repair operations for each type through the methods described below.
  • Step 302 Search from the plan library whether there is a repair plan corresponding to the fault information.
  • the controller may generate a first fault event according to the fault information, and the first fault event includes an event identifier, an event feature, and an event status.
  • the event identifier is used to uniquely identify the first fault event
  • the event feature includes multiple fault parameters in the fault information
  • the event status refers to the current state of the first fault event.
  • the event status will be updated in real time. For example, when the recommended plan is obtained through the subsequent step 303, the event status of the first failure event may be updated to indicate that the recommended plan corresponding to the first failure event has been obtained.
  • the controller can search for a second failure event similar to the first failure event from the plan library according to the event characteristics contained in the first failure event, and then compare the found second failure event
  • the repair plan corresponding to the event is used as the repair plan corresponding to the fault information.
  • each failure event can be stored in the plan library.
  • Each repair plan may include one or more repair operations, and the repair operations include instructions for instructing the faulty node to perform fault repair.
  • the event feature includes multiple fault parameters.
  • the controller can calculate the first fault event and the respective fault parameters in the plan library according to the multiple fault parameters included in the event characteristics of the first fault event and the fault parameters included in the event characteristics of each fault event in the plan library. The similarity of the failure events.
  • the controller may determine the similarity between the first failure event and any one of the failure events in the plan library through the following algorithm (1).
  • Q is used to indicate the first failure event
  • C is used to indicate a certain failure event in the plan library
  • F is used to indicate the event characteristics of the first failure event
  • f is used to indicate one of the event characteristics of the first failure event.
  • the identification of the fault parameter, w f is used to indicate the weight corresponding to a certain fault parameter
  • q f is a certain fault parameter in the first fault event
  • c f is a fault of the same type as q f in a certain fault event in the plan library
  • the parameter, ⁇ f () is the similarity calculation function.
  • the controller can calculate the similarity between the first failure event and each failure event in the plan library. After that, the controller can search from a plurality of similarities whether there is a similarity greater than the first threshold. If there is no similarity greater than the first threshold among the multiple calculated similarities, the controller may determine that there is no repair plan corresponding to the fault information in the plan library, and then the controller may execute steps 303 and 304.
  • the main purpose of the failure event is to manage the received failure information.
  • the step of generating the failure event based on the failure information is an optional step.
  • the plan library can store the mapping relationship between different fault parameter sets and the repair plan.
  • the controller can directly pass the above algorithm (1), according to the various fault parameters contained in the fault information and the different faults stored in the plan library.
  • the fault parameters in the parameter set are matched to obtain the similarity between the fault information and each fault parameter set.
  • Step 303 If there is no repair plan corresponding to the fault information in the plan library, the recommended plan is determined through the recommendation model according to the fault information.
  • the controller cannot obtain the repair plan corresponding to the fault information from the plan library. In this case, the controller can determine the recommended plan through the recommended model according to the fault information.
  • the parameters of the recommended model may include multiple fault characteristic factors, and each fault characteristic factor may correspond to one of the multiple fault parameters included in the fault information.
  • each fault characteristic factor may correspond to one of the multiple fault parameters included in the fault information.
  • multiple fault feature factors may include topological feature factors corresponding to the fault location, time feature factors corresponding to the time of occurrence of the fault, and sources corresponding to the source of the fault. Characteristic factor.
  • the topological feature factor can be a networking association matrix, each element in the networking association matrix is used to represent a node in the current physical network, and each node is determined according to the distance between each node and the faulty node The value of the corresponding element.
  • the more hops between a node and the faulty node the farther the distance from the faulty node, and accordingly, the smaller the value of the element corresponding to the node.
  • the value of the time characteristic factor is determined by the time when the fault occurs. Since the same failure occurs at the same time, the time characteristic factor of the recommended model when processing the failure information can be determined according to the time when the failure occurs.
  • the value of the source characteristic factor can be determined according to the source of the fault.
  • the fault source refers to the source address of the fault.
  • the source of the failure can be the attack source IP address of the ARP attack.
  • parameters of the recommendation model may also include text correlation feature factors, correlation application feature factors, and so on.
  • the value of the text correlation feature factor can be determined according to the fault type, fault location, etc.
  • the value of the correlation application feature factor can be determined according to the application to which the fault location is associated.
  • the controller can take multiple fault parameters as the input of the recommended model, and determine the recommended plan through the recommended model according to the multiple fault characteristics, where each feature factor included in the recommended model can be used as the corresponding fault parameter. Weights.
  • the recommended plan may include one or more candidate operations.
  • Step 304 Determine a fault repair operation from one or more candidate operations included in the recommended plan, and send the fault repair operation to the faulty node.
  • the recommendation model processes the fault information to obtain the recommended plan, it can further perform configuration impact analysis on each of the one or more candidate operations included in the recommended plan, and then obtain the fault from the one or more candidate operations Repair operation.
  • the parameters of the recommended model may also include a configuration impact factor, and the configuration impact factor includes the original configuration information of the faulty node and the physical topology information of the network where the faulty node is located.
  • the original configuration information of the failed node may be the original configuration baseline of the failed node.
  • the physical topology information of the network where the faulty node is located may be used to indicate the topology relationship of each node in the physical network.
  • the recommendation model can process each candidate operation and configuration influence factor to obtain the corresponding configuration influence degree of the corresponding candidate operation, and then regard the candidate operation with the least configuration influence among one or more candidate operations as the fault Repair operation. Take any candidate operation in one or more candidate operations as an example, call it the first candidate operation.
  • the first candidate operation can be used as the input of the recommendation model.
  • the recommendation model determines the corresponding first candidate operation For each other candidate operation, you can refer to the processing of the first candidate operation to obtain the corresponding configuration influence degree of the corresponding candidate operation. After that, the recommendation model can take the corresponding candidate operation with the least configuration influence as Fault repair operation output.
  • the configuration influence degree is the predicted influence degree of the first candidate operation on the configuration of the faulty node and the physical topology of the network where it is located.
  • the configuration influence degree may represent the degree of connectivity between the faulty node and other nodes in the physical network after the first fault occurred on the faulty node is repaired by using the corresponding candidate operation.
  • the recommendation model may also perform business impact analysis on each candidate operation to obtain the business impact degree corresponding to each candidate operation. After that, the recommendation model can determine the comprehensive influence degree of each candidate operation according to the configuration influence degree and business influence degree corresponding to each candidate operation, and then use the candidate operation with the least comprehensive influence degree as a fault repair operation.
  • the recommendation model may also output all the candidate operations and the corresponding configuration influence degree.
  • the controller may search for a corresponding candidate operation whose configuration influence degree is less than the second threshold, and if it finds it, use one of the found candidate operations as a fault repair operation. For example, the candidate operation with the least configuration influence among the found candidate operations is regarded as the fault repair operation. If it is not found, the controller can reprocess the fault information through the recommended model, and in the process of reprocessing, the parameters of the recommended model can be adjusted.
  • the recommendation model may not perform configuration impact analysis, but directly output the candidate operation as a fault repair operation.
  • the controller may send the fault repair operation to the faulty node, so that the faulty node performs the fault repair according to the fault repair operation.
  • the faulty node may also feed back the repair result to the controller.
  • the repair result includes the repaired routing information of the faulty node and the physical topology information of the current network.
  • the routing information after the faulty node is repaired may include the forwarding table or the routing table after the faulty node is repaired.
  • the controller After the controller receives the repair result fed back by the faulty node, it can use the repaired routing information included in the repair result and the physical topology information of the network where it is located as the input of the DPV model, and the DPV model determines that the fault repair operation affects the fault. The actual degree of influence caused by the configuration of the node and the physical topology of the network where it is located. After that, the controller can generate a fault sample according to the influence degree of the data plane, the fault repair operation and the fault information, and then adjust the parameters of the recommended model according to the fault sample. That is, the failure sample can be used as sample data to train the recommendation model.
  • the actual impact of the fault repair operation on the network where the faulty node is located can be obtained through the DPV model, and the predicted impact degree of the candidate operation on the network where the faulty node is located can be obtained through the CPV model.
  • the recommendation model is further adjusted according to the degree of influence of the data plane determined by the DPV model, which can improve the recommendation accuracy of the recommendation model.
  • the controller can not only determine the degree of data plane impact according to the repair result, but also analyze the fault repair operation on the business in the physical network based on the real-time service flow in the network where the faulty node is located. Afterwards, the controller can determine the integrated impact level after repair according to the impact level of the business traffic and the impact level of the data plane, and then generate fault samples based on the integrated impact level after the repair to adjust the parameters of the recommended model.
  • the fault information can be processed through the recommendation model to obtain the recommended plan, and then one or more of the recommended plans include Selecting an operation from the candidate operations as the fault repair operation solves the problem that the repair plan corresponding to the fault information does not exist in the plan library, and the problem that the repair operation cannot be provided for the faulty node is solved.
  • the configuration influence degree of each candidate operation included in the recommended plan can be estimated in advance, and then the candidate operation with the least configuration influence degree is used as the fault repair operation. In this way, the impact of the faulty node on each node in the network when the faulty node is repaired through the fault repair operation can be minimized.
  • the foregoing embodiment mainly introduces the implementation process of how to recommend a fault repair operation to the faulty node based on the fault information of the faulty node when there is no repair plan corresponding to the fault information in the contingency plan library.
  • the implementation process of the controller recommending fault repair operations to the faulty node based on the fault information when there is fault information in the plan library.
  • Step 401 Obtain fault information of the faulty node.
  • step 301 For the implementation of this step, reference may be made to step 301 in the foregoing embodiment, and details are not described herein again in the embodiment of the present application.
  • Step 402 Search from the plan library whether there is a repair plan corresponding to the fault information.
  • the controller may refer to the method introduced in step 302 in the foregoing embodiment to determine the similarity between the first fault event and each fault event in the plan library, and obtain multiple similarities. If there is a similarity greater than the first threshold among the multiple similarities, it can be determined that there is a repair plan corresponding to the fault information in the plan library, and then the controller can execute steps 403-405.
  • Step 403 If there is a repair plan corresponding to the fault information in the plan library, the repair plan corresponding to the fault information is taken as the recommended plan.
  • the controller may obtain the first similarity from the similarity greater than the first threshold, and use the fault event whose similarity with the first fault event is the first similarity as the second fault event. After that, the controller may obtain the repair plan corresponding to the second fault event, and use the repair plan as a recommended plan.
  • the first degree of similarity may be the maximum degree of similarity among similarities greater than the first threshold.
  • the controller may regard all the fault events whose similarity with the first fault event is greater than the first threshold value as the second fault event. That is, there may be multiple second failure events. In this case, the controller may obtain multiple recommended plans from the plan library according to the second failure event.
  • Step 404 Predict the comprehensive influence degree of each candidate operation in one or more candidate operations included in the recommended plan.
  • the controller After obtaining the recommended plan, for one or more candidate operations included in the recommended plan, the controller can estimate the degree of comprehensive influence of each candidate operation on the network where the faulty node is located.
  • the controller may estimate the degree of impact of each of the one or more candidate operations included in the recommended plan on the business in the network where the faulty node is located. Estimate the degree of influence of each candidate operation among one or more candidate operations on the configuration of the network where the faulty node is located; determine the degree of comprehensive influence corresponding to each candidate operation according to the degree of business influence and configuration influence corresponding to each candidate operation.
  • the controller can estimate the impact on the services currently carried by the faulty node after the first candidate operation is used to repair the fault that occurs on the faulty node.
  • the smaller the impact the smaller the degree of business impact. For example, assuming that the first candidate operation is to isolate a certain port on the faulty node, the controller can estimate whether the existing virtual machine services will be affected after the port is isolated. If it is affected, the degree of business influence can be determined to be the first value, otherwise, the degree of business influence can be determined to be the second value.
  • the controller may use the foregoing method to estimate the degree of business impact corresponding to each candidate operation.
  • the controller may also estimate the degree of influence of each candidate operation on the configuration of the network where the faulty node is located.
  • the configuration influence degree may represent the degree of connectivity between the faulty node and other nodes in the physical network after the first fault occurred on the faulty node is repaired by using the corresponding candidate operation. Still taking any one of the one or more candidate operations as an example, for convenience of description, it is referred to as the first candidate operation.
  • the controller may obtain the original configuration information of the faulty node and the physical topology information of the network where the faulty node is located; according to the first candidate operation and the original configuration information of the faulty node, generate predicted configuration information corresponding to the first candidate operation.
  • the candidate operation is any one of one or more candidate operations; the predicted configuration information corresponding to the first candidate operation, the original configuration information of the faulty node and the physical topology information are used as the input of the configuration plane to verify the CPV model, which is determined by the CPV model
  • the degree of configuration influence corresponding to the first candidate operation can be the third value or the fourth value.
  • the configuration influence degree is the third value, it is used to indicate that the first candidate operation has no influence on the configuration of the faulty node, that is, the first candidate operation has no influence on the faulty node configuration.
  • the connection between the node and other nodes has no effect.
  • the configuration influence degree is the fourth value, it is used to indicate that the first candidate operation has an influence on the configuration of the faulty node, that is, the first candidate operation has an influence on the connection between the faulty node and other nodes.
  • the controller can store the physical topology information of the entire physical network and the configuration baseline of each physical node in the physical network.
  • the controller can obtain the physical topology information, and obtain the configuration baseline of the faulty node, and use the configuration baseline as the faulty node.
  • the original configuration information can be stored.
  • the controller may generate predicted configuration information corresponding to the first candidate operation according to the first candidate operation and the original configuration information.
  • the first candidate operation may include instructions for instructing to repair the fault. Based on this, the instructions included in the first candidate operation may be added on the basis of the original configuration information to obtain predicted configuration information, or it may be based on the first candidate operation. The instructions included in a candidate operation modify the instructions included in the original configuration information to obtain predicted configuration information.
  • the controller After obtaining the original configuration information, physical topology information, and predicted configuration information corresponding to the first candidate operation of the faulty node, the controller can process the foregoing through the CPV model, so as to obtain the configuration influence degree corresponding to the first candidate operation.
  • the controller may use the foregoing method to estimate the degree of configuration influence corresponding to each candidate operation.
  • the controller may obtain the business influence weight and the configuration influence weight. Then, according to the degree of business influence corresponding to the first candidate operation, the weight of business influence, the degree of configuration influence corresponding to the first candidate operation, and the weight of configuration influence, the comprehensive degree of influence corresponding to the first candidate operation is determined.
  • the business impact weight and the configuration impact weight may be pre-configured. Based on this, the controller can determine the product of the business influence weight and the business influence degree of the first candidate operation, determine the product of the configuration influence weight and the configuration influence degree corresponding to the first candidate operation, and use the sum of the two products as the first candidate operation Corresponding degree of comprehensive influence.
  • the controller can refer to the above method to determine the comprehensive influence degree corresponding to the corresponding candidate operation.
  • the controller may not perform business impact analysis, that is, the degree of business impact may not be estimated.
  • the determined configuration impact degree can be directly used as the comprehensive impact degree.
  • Step 405 Determine the corresponding candidate operation with the least comprehensive impact among the one or more candidate operations as the fault repair operation, and send the fault repair operation to the faulty node, so that the faulty node performs the fault repair according to the fault repair operation.
  • the controller can determine the corresponding candidate operation with the smallest comprehensive influence degree as the fault repair operation, and issue the fault repair operation to the faulty node so that the faulty node can follow the fault
  • the repair operation repairs the first failure that occurred.
  • the repair plan corresponding to the fault information when a repair plan corresponding to the fault information is found in the plan library, the repair plan corresponding to the fault information may be used as the recommended plan.
  • the comprehensive influence degree of each candidate operation included in the recommended plan can be estimated in advance.
  • the comprehensive impact degree can be calculated from the estimated business impact degree and configuration impact degree, and the candidate operation with the least comprehensive impact degree is taken as the fault repair operation. In this way, the faulty node can be used to perform fault repair through the fault repair operation. The impact of the services carried by the faulty node and the connected nodes is minimized.
  • an embodiment of the present application provides a device 500 for recommending fault repair operations, and the device 500 includes:
  • the obtaining module 501 is configured to execute step 301 in the foregoing embodiment
  • the first determining module 502 is configured to execute step 303 in the foregoing embodiment
  • the recommendation module 503 is configured to execute step 304 in the foregoing embodiment.
  • the fault information includes multiple fault parameters, and the parameters of the recommended model include multiple fault characteristic factors, and each of the multiple fault characteristic factors corresponds to one of the multiple fault parameters;
  • the first determining module 502 is specifically configured to:
  • Multiple fault parameters are used as the input of the recommended model, and the recommended plan is determined through the recommended model according to multiple fault feature factors.
  • the parameters of the recommended model further include a configuration impact factor, and the configuration impact factor includes the original configuration information of the faulty node and the physical topology information of the network where the faulty node is located;
  • the recommended module 503 is specifically used for:
  • the first candidate operation is used as the input of the recommendation model, and the configuration influence degree corresponding to the first candidate operation is determined through the recommendation model according to the configuration influence factor.
  • the configuration influence degree is the configuration influence degree of the first candidate operation on the configuration of the faulty node and the physical topology of the network.
  • the first candidate operation is any one of one or more candidate operations;
  • the candidate operation with the least configuration impact is regarded as the fault repair operation.
  • the device 500 further includes:
  • the receiving module 504 is configured to receive the repair result fed back by the faulty node after the fault is repaired according to the fault repair operation, the repair result includes the routing information of the faulty node after the fault repair and the physical topology information of the network where the faulty node is located;
  • the second determining module 505 is used to use the routing information and physical topology information after the fault repair as the input of the data plane to verify the DPV model, and determine the data plane influence degree corresponding to the fault repair operation through the DPV model, and the data plane influence degree is the fault repair operation The true degree of impact on the configuration of the failed node and the physical topology of the network where it is located;
  • the generating module 506 is used to generate fault samples according to the data plane influence degree, the fault repair operation and the fault information corresponding to the fault repair operation;
  • the adjustment module 507 is configured to adjust the parameters of the recommended model according to the fault samples.
  • the fault information can be processed through the recommended model to obtain the recommended plan, and then the recommended plan includes Selecting one operation from one or more candidate operations as the fault repair operation solves the problem that there is no repair plan corresponding to the fault information in the plan library, and the problem that the repair operation cannot be provided for the faulty node.
  • an embodiment of the present application provides a device 700 for recommending fault repair operations, and the device 700 includes:
  • the obtaining module 701 is configured to execute step 401 in the foregoing embodiment
  • the determining module 702 is configured to execute step 403 in the foregoing embodiment
  • the prediction module 703 is configured to perform step 404 in the foregoing embodiment
  • the recommendation module 704 is configured to execute step 405 in the foregoing embodiment.
  • the prediction module 703 is specifically configured to:
  • the comprehensive influence degree corresponding to each candidate operation is determined.
  • the prediction module 703 is specifically configured to:
  • the predicted configuration information corresponding to the first candidate operation, the original configuration information of the faulty node, and the physical topology information are used as the input of the configuration plane to verify the CPV model, and the degree of configuration influence corresponding to the first candidate operation is determined through the CPV model.
  • the recommendation module 704 is specifically configured to:
  • the comprehensive degree of influence corresponding to the first candidate operation is determined.
  • the repair plan corresponding to the fault information when a repair plan corresponding to the fault information is found in the plan library, the repair plan corresponding to the fault information can be used as a recommended plan.
  • the comprehensive influence degree of each candidate operation included in the recommended plan can be estimated in advance.
  • the comprehensive impact degree can be calculated from the estimated business impact degree and configuration impact degree, and the candidate operation with the least comprehensive impact degree is taken as the fault repair operation. In this way, the faulty node can be used to perform fault repair through the fault repair operation. The impact of the services carried by the faulty node and the connected nodes is minimized.
  • the fault repair operation recommendation device provided in the above embodiment recommends fault repair operations
  • only the division of the above-mentioned functional modules is used as an example for illustration.
  • the above-mentioned functions can be assigned to different functions according to needs.
  • Module completion that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the fault repair operation recommendation device provided by the above-mentioned embodiment and the fault repair operation recommendation method embodiment belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example: floppy disk, hard disk, tape), optical medium (for example: Digital Versatile Disc (DVD)), or semiconductor medium (for example: Solid State Disk (SSD) )Wait.

Abstract

本申请公开了一种故障修复操作推荐方法、装置及存储介质,属于通信技术领域。在本申请实施例中,在获取到故障信息之后,可以从预案库中查找故障信息对应的修复预案,当在预案库中无法查找到与故障信息对应的修复预案时,可以通过推荐模型来对故障信息进行处理,得到推荐预案,进而从推荐预案包括的一种或多种候选操作中选择一个操作作为故障修复操作,解决了在该预案库中不存在该故障信息对应的修复预案,无法为故障节点提供修复操作的问题。

Description

故障修复操作推荐方法、装置及存储介质
本申请要求于2019年11月27日提交中国专利局、申请号为CN 201911180239.0、发明名称为“故障修复操作推荐方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及一种故障修复操作推荐方法、装置及存储介质。
背景技术
在软件定义网络(software defined network,SDN)架构中,SDN控制器可以对多个节点进行集中控制。其中,当多个节点中的任一节点发生故障时,SDN控制器可以为该节点推荐用于修复故障的修复操作。
相关技术中,可以根据人工经验在SDN控制器中配置预案库,该预案库中存储有故障信息与修复预案的映射关系。当SDN控制器接收到来自故障节点的故障信息之后,可以根据该故障信息从预案库中查找与该故障信息对应的修复预案,如果查找到与该故障信息对应的修复预案,SDN控制器可以将查找到的修复预案中包括的修复操作进行显示,由人工选择一种修复操作作为故障修复操作下发至故障节点,以便故障节点根据该故障修复操作来进行故障修复。然而,如果该预案库中不存在该故障信息对应的修复预案,则SDN控制器将无法为故障节点提供修复操作,从而导致无法修复该故障。
发明内容
本申请提供了一种故障修复操作推荐方法、装置及存储介质,可以用于解决相关技术中通过预案库中不存在该故障信息对应的修复预案时,无法为故障节点提供修复操作的问题。所述技术方案如下:
第一方面,提供了一种故障修复操作推荐方法,所述方法包括:获取故障节点的故障信息;如果预案库中不存在所述故障信息对应的修复预案,则根据所述故障信息,通过推荐模型确定推荐预案,所述推荐预案包括一个或多个候选操作;从所述推荐预案包括的一个或多个候选操作中确定故障修复操作,向所述故障节点推荐所述故障修复操作,以便所述故障节点根据所述故障修复操作进行故障修复。
在本申请实施例中,当在预案库中无法查找到与故障信息对应的修复预案时,可以通过推荐模型来对故障信息进行处理,得到推荐预案,进而从推荐预案包括的一种或多种候选操作中选择一个操作作为故障修复操作,解决了在该预案库中不存在该故障信息对应的修复预案,无法为故障节点提供修复操作的问题。
可选地,所述故障信息包括多个故障参数,所述推荐模型的参数包括多个故障特征因子,所述多个故障特征因子中的每个故障特征因子对应所述多个故障参数中的一个故障参数。在此基础上,根据所述故障信息,通过推荐模型确定推荐预案的实现过程可以为:将 所述多个故障参数作为所述推荐模型的输入,按照所述多个故障特征因子,通过所述推荐模型确定所述推荐预案。
可选地,所述推荐模型的参数还包括配置影响因子,所述配置影响因子包括所述故障节点的原始配置信息和所述故障节点所在网络的物理拓扑信息。在此基础上,从所述推荐预案包括的一个或多个候选操作中确定故障修复操作的实现过程可以为:将第一候选操作作为所述推荐模型的输入,按照所述配置影响因子,通过所述推荐模型确定所述第一候选操作对应的配置影响程度,所述配置影响程度是所述第一候选操作对所述故障节点的配置和所在网络的物理拓扑的预测影响程度,所述第一候选操作为所述一个或多个候选操作中的任一候选操作;将所述一个或多个候选操作中对应的配置影响程度最小的候选操作作为所述故障修复操作。
也即,本申请实施例可以确定候选操作的配置影响程度,并根据配置影响度来推荐故障修复操作,以使得推荐的故障修复操作对故障节点的配置和所在网络的物理拓扑的影响最小。
可选地,所述向所述故障节点发送所述故障修复操作之后,还包括:接收所述故障节点根据所述故障修复操作进行故障修复后反馈的修复结果,所述修复结果包括所述故障节点进行故障修复后的路由信息和所在网络的物理拓扑信息;将故障修复后的所述路由信息和所述物理拓扑信息作为数据面验证(data plane verification,DPV)模型的输入,通过所述DPV模型确定所述故障修复操作对应的数据面影响程度,所述数据面影响程度是所述故障修复操作对所述故障节点的配置和所在网络的物理拓扑造成的真实影响程度;根据所述故障修复操作对应的数据面影响程度、所述故障修复操作和所述故障信息,生成故障样本;根据所述故障样本对所述推荐模型的参数进行调整。
在推荐故障修复操作之后,还可以通过DPV模型来验证该故障修复操作对数据面的影响程度,进而根据该数据面影响程度、故障修复操作和故障信息生成故障样本,以对推荐模型的参数进行进一步地调整,提高了推荐模型推荐预案的准确性。
第二方面,提供了一种故障修复操作推荐方法,所述方法还包括:获取故障节点的故障信息,如果预案库中存在所述故障信息对应的修复预案,则将所述故障信息对应的修复预案作为推荐预案,所述推荐预案包括一个或多个候选操作;预测所述一个或多个候选操作中每个候选操作的综合影响程度,所述综合影响程度用于指示相应候选操作对所述故障节点所在网络的综合影响的大小;将所述一个或多个候选操作中对应的综合影响程度最小的候选操作确定为故障修复操作,向所述故障节点推荐所述故障修复操作,以便所述故障节点根据所述故障修复操作进行故障修复。
如果预案库中存在故障信息对应的修复预案,则可以确定该修复预案中包括的各个候选操作的综合影响程度的大小,进而根据综合影响程度的大小来确定故障修复操作,以使得故障修复操作对故障节点所在网络的综合影响达到最小。
可选地,所述预测所述推荐预案包括的一个或多个候选操作中每个候选操作的综合影响程度,包括:预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在的网络内的业务的业务影响程度;预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在网络的配置影响程度;根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度。
可选地,所述预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在网络的配置影响程度的实现过程可以为:获取所述故障节点的原始配置信息和所述故障节点所在网络的物理拓扑信息;根据第一候选操作和所述故障节点的原始配置信息,生成所述第一候选操作对应的预测配置信息,所述第一候选操作为所述一个或多个候选操作中的任一候选操作;将所述第一候选操作对应的预测配置信息、所述故障节点的原始配置信息和所述物理拓扑信息作为配置面验证(configuration plane verification,CPV)模型的输入,通过所述CPV模型确定所述第一候选操作对应的配置影响程度。
可选地,所述根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度的实现过程可以为:获取业务影响权重和配置影响权重;根据第一候选操作对应的业务影响程度、所述业务影响权重、所述第一候选操作对应的配置影响程度和所述配置影响权重,确定所述第一候选操作对应的综合影响程度。
第三方面,提供了一种故障修复操作推荐装置,所述故障修复操作推荐装置具有实现上述第一方面或第二方面中故障修复操作推荐方法行为的功能。所述故障修复操作推荐装置包括至少一个模块,该至少一个模块用于实现上述第一方面所提供的故障修复操作推荐方法。
第四方面,提供了一种故障修复操作推荐装置,所述故障修复操作推荐装置的结构中包括处理器和存储器,所述存储器用于存储支持故障修复操作推荐装置执行上述第一方面或第二方面所提供的故障修复操作推荐方法的程序,以及存储用于实现上述第一方面或第二方面所提供的故障修复操作推荐方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第五方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的故障修复操作推荐方法。
第六方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的故障修复操作推荐方法。
上述第三方面、第四方面、第五方面和第六方面所获得的技术效果与第一方面和第二方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请提供的技术方案带来的有益效果至少包括:
在本申请实施例中,当在预案库中无法查找到与故障信息对应的修复预案时,可以通过推荐模型来对故障信息进行处理,得到推荐预案,进而从推荐预案包括的一种或多种候选操作中选择一个操作作为故障修复操作,解决了在该预案库中不存在该故障信息对应的修复预案,无法为故障节点提供修复操作的问题。
附图说明
图1是本申请实施例提供的故障修复操作推荐方法所涉及的系统架构图;
图2是本申请实施例提供的一种网络设备的结构示意图;
图3是本申请实施例提供的一种故障修复操作推荐方法流程图;
图4是本申请实施例提供的另一种故障修复操作推荐方法流程图;
图5是本申请实施例提供的一种故障修复操作推荐装置的结构示意图;
图6是本申请实施例提供的另一种故障修复操作推荐装置的结构示意图;
图7是本申请实施例提供的又一种故障修复操作推荐装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例进行详细的解释说明之前,先对本申请实施例涉及的系统架构进行介绍。
图1是本申请实施例提供的故障修复操作推荐方法所涉及的系统架构图。如图1中所示,该系统中包括控制器101、分析器102和多个物理节点103。其中,控制器101可以分别与分析器102以及多个物理节点103进行通信,另外,分析器102也可以与多个物理节点103通信。
需要说明的是,控制器101用于集中控制网络资源的分配。示例性地,控制器101可以控制多个物理节点103上的数据转发。除此之外,在本申请实施例中,控制器101还可以接收分析器102上报的故障信息,并采用本申请实施例提供的推荐故障修复操作的方法向多个物理节点103中的故障设备推荐故障修复操作。
分析器102用于实时监测多个物理节点103中的每个物理节点103是否发生故障,并在监测到某个物理节点103发生故障之后,收集该故障节点的故障信息,将该故障信息上报至控制器101,以便控制器101根据本申请实施例提供的方法来向该故障节点推荐修复操作。
多个物理节点103可以为物理网络中的多个设备。该多个物理节点103用于接收和/或发送业务数据。
需要说明的是,控制器101和分析器102可以分布在两个不同的独立的设备中。或者,控制器101和分析器102可以集成于一台设备中,本申请实施例对此不做限定。再或者,控制器101可以具有分析器102的功能,在这种情况下,上述系统中可以不包括分析器102。另外,该多个物理节点103可以为交换机、路由器等网络设备,本申请实施例对此不做限定。
图2是本申请实施例提供的一种网络设备的结构示意图。图1中的控制器101可以通过图2所示的网络设备来实现。参见图2,该网络设备包括至少一个处理器201,通信总线202,存储器203以及至少一个通信接口204。
处理器201可以是一个通用中央处理器(Central Processing Unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
通信总线202可包括一通路,在上述组件之间传送信息。
存储器203可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM))或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器203可以是独立存在,通过通信总线202与处理器201相连接。存储器203也可以和处理器201集成在一起。
通信接口204,使用任何收发器一类的装置,用于与其它设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。
在具体实现中,作为一种实施例,处理器201可以包括一个或多个CPU,例如图2中所示的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备可以包括多个处理器,例如图2中所示的处理器201和处理器205。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还可以包括输出设备206和输入设备207。输出设备206和处理器201通信,可以以多种方式来显示信息。例如,输出设备206可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备207和处理器201通信,可以以多种方式接收用户的输入。例如,输入设备207可以是鼠标、键盘、触摸屏设备或传感设备等。
其中,存储器203用于存储执行本申请方案的程序代码,并由处理器201来控制执行。处理器201用于执行存储器203中存储的程序代码208。程序代码208中可以包括一个或多个软件模块。图1中所示的控制器可以通过处理器201以及存储器203中的程序代码208中的一个或多个软件模块,来为故障设备推荐故障修复操作。
图2中的处理器读取存储器中的代码,可以使如图2所示的网络设备执行本申请中各实施例中的控制器所执行的部分或全部操作。
接下来对本申请实施例提供的故障修复操作推荐方法进行详细的解释说明。
图3是本申请实施例提供的一种故障修复操作推荐方法的流程图。该故障修复操作推荐方法可以应用于图1所示的控制器101,参见图3,该方法包括:
步骤301:获取故障节点的故障信息。
在本申请实施例中,控制器可以接收分析器上报的故障节点的故障信息。在这种情况中,分析器可以实时监测多个物理节点。当监测到某个物理节点发生故障时,可以收集该物理节点发生的故障的相关信息,并将这些信息作为故障信息上报至控制器。
可选地,在不存在分析器的情况下,也可以由控制器来监测多个物理节点,进而在监测到某个物理节点发生故障时,收集该物理节点发生的故障的相关信息,将收集到的信息 作为故障信息。其中,发生故障的物理节点即为故障节点,该故障节点发生的故障可以称为第一故障。
另外,故障信息可以包括多个故障参数。该多个故障参数可以为故障标识、故障位置、故障发生时间、故障日志、故障类型等信息中的两种或两种以上。示例性地,故障信息包括的故障标识为地址解析协议(address resolution protocol,ARP)攻击,用于指示发生的故障为ARP攻击。故障位置为节点1端口2,用于指示故障节点为节点1,且故障发生在节点1的端口2处。故障发生时间为2019-07-14-08:30,故障类型为01。需要说明的是,不同的故障标识可能对应相同的故障类型,也可能对应不同的故障类型。也即,某些名称不同的故障可能是同一种类型的故障。
可选地,在有些场景下,当物理节点发生故障之后,物理节点可能会持续向控制器上报故障信息,这样,在一定时长内,控制器接收到的故障信息中可能存在某些故障节点重复发送的多个相同的故障信息。或者,多个物理节点可能会在相同的时间段内发生相同的故障,在这种情况下,控制器将会在该时间段内接收到多个相似的故障信息。基于此,在本申请实施例中,控制器还可以按照预设时间间隔来进行修复操作的推荐。在这种情况下,每当控制器进行修复操作的推荐时,该控制器可以获取距离当前时刻最近的一个预设时间间隔内收到的多个故障信息,并对这多个故障信息进行聚类分析,从而将该多个故障信息分为多个类。之后,控制器可以通过下述介绍的方法来针对每一类推荐修复操作。
步骤302:从预案库中查找是否存在与该故障信息对应的修复预案。
在获取到故障信息之后,控制器可以根据故障信息生成第一故障事件,该第一故障事件包含有事件标识、事件特征以及事件状态。其中,事件标识用于唯一标识该第一故障事件,事件特征包含有故障信息中的多个故障参数,事件状态则是指该第一故障事件当前的状态,随着后续对该故障事件的处理进度,该事件状态将会进行实时的更新。例如,当通过后续步骤303得到推荐预案时,第一故障事件的事件状态即可以进行更新,以指示已经得到第一故障事件对应的推荐预案。
在生成第一故障事件之后,控制器可以根据该第一故障事件包含的事件特征,从预案库中查找是否存在与该第一故障事件相似的第二故障事件,并将查找到的第二故障事件对应的修复预案作为故障信息对应的修复预案。
其中,预案库中可以存储有多种故障事件以及每种故障事件对应的修复预案。并且,每种故障事件对应的修复预案可以为一个,也可以为多个。每个修复预案中可以包括一个或多个修复操作,该修复操作包含有用于指示故障节点进行故障修复的指令。
在本申请实施例中,事件特征包含有多个故障参数。基于此,控制器可以根据第一故障事件的事件特征中包含的多个故障参数,以及预案库中每种故障事件的事件特征包括的故障参数,来计算第一故障事件与预案库中的各种故障事件的相似度。
示例性地,控制器可以通过下述算法(1)来确定第一故障事件与预案库中任一种故障事件之间的相似度。
Figure PCTCN2020118233-appb-000001
其中,Q用于表示第一故障事件,C用于表示预案库中某种故障事件,F用于表示第一故障事件的事件特征,f用于表示第一故障事件的事件特征中的某个故障参数的标识, w f用于表示某个故障参数对应的权重,q f为第一故障事件中某个故障参数,c f为预案库中的某种故障事件中与q f同类型的故障参数,σ f()为相似度计算函数。
通过上述方法,控制器可以计算得到第一故障事件与预案库中的每种故障事件之间的相似度。之后,控制器可以从多个相似度中查找是否存在大于第一阈值的相似度。如果计算得到的多个相似度中不存在大于第一阈值的相似度,则控制器可以确定该预案库中不存在与故障信息对应的修复预案,接下来,控制器可以执行步骤303和304。
需要说明的是,故障事件的主要目的在于对接收到的故障信息进行管理,在本申请实施例中,根据故障信息生成故障事件的步骤为可选步骤。在这种情况下,预案库中可以存储不同故障参数集合与修复预案的映射关系,控制器可以直接通过上述算法(1),根据故障信息内包含的各个故障参数与预案库中存储的不同故障参数集合中的故障参数进行匹配,从而得到该故障信息与每个故障参数集合之间的相似度。
步骤303:如果预案库中不存在故障信息对应的修复预案,则根据故障信息,通过推荐模型确定推荐预案。
由前述介绍可知,如果通过计算相似度发现预案库中不存在与第一故障事件的相似度大于第一阈值的故障事件,则说明目前预案库中不存在与第一故障事件相似的第二故障事件。控制器也就无法从预案库中获取到故障信息对应的修复预案。在这种情况下,控制器可以根据故障信息,通过推荐模型确定推荐预案。
其中,推荐模型的参数可以包括多个故障特征因子,且每个故障特征因子可以与故障信息包括的多个故障参数中的一个故障参数对应。例如,多个故障参数包括故障位置、故障发生时间和故障来源,则多个故障特征因子可以包括与故障位置对应的拓扑特征因子,与故障发生时间对应的时间特征因子以及与故障来源对应的来源特征因子。
其中,拓扑特征因子可以为一个组网关联矩阵,该组网关联矩阵中的每个元素用于代表当前物理网络中的一个节点,根据每个节点与故障节点之间的距离来确定每个节点对应的元素的取值。其中,一个节点与故障节点相隔的跳数越多,则与故障节点的距离越远,相应地,该节点对应的元素的取值就越小。
时间特征因子的取值由故障发生时间来确定。由于同一故障,基本在同一时间内发生,因此,可以根据故障发生时间来确定推荐模型在处理故障信息时的时间特征因子。
来源特征因子的取值可以根据故障来源确定。其中,故障来源是指故障的来源地址。例如,当故障节点发生的第一故障为ARP攻击时,故障来源即可以为ARP攻击的攻击源IP地址。
除此之外,该推荐模型的参数还可以包括文本关联特征因子、关联应用特征因子等。其中,文本关联特征因子的取值可以根据故障类型、故障位置等来确定,关联应用特征因子的取值可以根据故障位置所关联到的应用来确定。
在本步骤中,控制器可以将多个故障参数作为推荐模型的输入,按照多个故障特征银子,通过推荐模型确定推荐预案,其中,推荐模型包括的每个特征因子可以作为对应的故障参数的权重。其中,该推荐预案中可以包括一个或多个候选操作。
步骤304:从推荐预案包括的一个或多个候选操作中确定故障修复操作,向故障节点发送故障修复操作。
推荐模型在对故障信息进行处理得到推荐预案之后,可以进一步的对推荐预案包括的一个或多个候选操作中的每个候选操作进行配置影响分析,进而从一个或多个候选操作中 得出故障修复操作。
其中,推荐模型的参数中还可以包括配置影响因子,该配置影响因子包括故障节点的原始配置信息和故障节点所在网络的物理拓扑信息。该故障节点的原始配置信息可以为故障节点的原始配置基线。该故障节点所在网络的物理拓扑信息可以用于指示该物理网络中的各个节点的拓扑关系。
在此基础上,推荐模型可以对每个候选操作和配置影响因子进行处理,得到相应候选操作对应的配置影响程度,进而将一个或多个候选操作中对应的配置影响程度最小的候选操作作为故障修复操作。以一个或多个候选操作中的任一候选操作为例,称其为第一候选操作,可以将第一候选操作作为推荐模型的输入,按照配置影响因子,通过推荐模型确定第一候选操作对应的配置影响程度,对于其他每个候选操作,均可以参照对第一候选操作的处理,来得到相应候选操作对应的配置影响程度,之后,推荐模型可以将对应的配置影响程度最小的候选操作作为故障修复操作输出。其中,该配置影响程度是第一候选操作对故障节点的配置和所在网络的物理拓扑的预测影响程度。并且,该配置影响程度可以表征在采用相应候选操作对故障节点上发生的第一故障进行修复之后,该故障节点与物理网络中的其他节点之间的连通程度。
可选地,在一些可能的实现方式中,推荐模型还可以对每个候选操作进行业务影响分析,得到每个候选操作对应的业务影响程度。之后,推荐模型可以根据每个候选操作对应的配置影响程度和业务影响程度来确定每个候选操作的综合影响程度,进而将综合影响程度最小的候选操作作为故障修复操作。
可选地,在一些可能的实现方式中,在得到每个候选操作对应的配置影响程度之后,推荐模型也可以将所有的候选操作和对应的配置影响程度输出。在这种情况下,控制器可以从中查找对应的配置影响程度小于第二阈值的候选操作,如果查找到,则将查找到的候选操作中的一个候选操作作为故障修复操作。例如,将查找到的候选操作中对应的配置影响程度最小的候选操作作为故障修复操作。如果未查到,则控制器可以通过推荐模型重新对故障信息进行处理,在再次处理的过程中,可以调整推荐模型的参数。
可选地,在一些可能的实现方式中,如果推荐预案中仅包括一个候选操作,推荐模型也可以不进行配置影响分析,而是直接将该候选操作作为故障修复操作进行输出。
在得到故障修复操作之后,控制器可以向故障节点发送该故障修复操作,以便故障节点根据该故障修复操作进行故障修复。
可选地,在故障节点根据该故障修复操作对故障进行修复之后,该故障节点还可以向控制器反馈修复结果。该修复结果包括该故障节点修复后的路由信息以及当前网络的物理拓扑信息。其中,该故障节点修复后的路由信息可以包括该故障节点修复后的转发表或路由表。
控制器在接收到故障节点反馈的修复结果之后,可以将该修复结果内包括的故障修复后的路由信息和所在网络的物理拓扑信息作为DPV模型的输入,通过该DPV模型确定故障修复操作对故障节点的配置和所在网络的物理拓扑造成的真实影响程度。之后,控制器可以根据该数据面影响程度、该故障修复操作和故障信息,生成故障样本,进而根据该故障样本对推荐模型的参数进行调整。也即,该故障样本可以作为样本数据来训练推荐模型。
由此可见,通过DPV模型可以得到故障修复操作对该故障节点所在网络产生的实际影响,而通过CPV模型得到的则是候选操作对故障节点所在网络的预测影响程度。这样, 根据该DPV模型确定的数据面影响程度对推荐模型进行进一步地调整,可以提高推荐模型的推荐准确度。
可选地,在一些可能的情况中,控制器不仅可以根据修复结果确定数据面影响程度,还可以根据该故障节点所在网络中的实时业务流来分析该故障修复操作对该物理网络中的业务流量的影响程度,之后,控制器可以根据业务流量影响程度和数据面影响程度,确定修复后的综合影响程度,进而根据该修复后的综合影响程度生成故障样本来对推荐模型的参数进行调整。
在本申请实施例中,当在预案库中无法查找到与故障信息对应的修复预案时,可以通过推荐模型来对故障信息进行处理,得到推荐预案,进而从推荐预案包括的一种或多种候选操作中选择一个操作作为故障修复操作,解决了在该预案库中不存在该故障信息对应的修复预案,无法为故障节点提供修复操作的问题。
另外,在本申请实施例中,在得到推荐预案之后,可以提前预估推荐预案包括的每种候选操作的配置影响程度,之后再将配置影响程度最小的候选操作作为故障修复操作。这样,可以使得故障节点通过该故障修复操作进行故障修复时对网络中的各个节点的影响达到最小。
上述实施例主要介绍了在预案库中不存在故障信息对应的修复预案的情况下,如何根据故障节点的故障信息向该故障节点推荐故障修复操作的实现过程。接下来将结合图4介绍在预案库中存在故障信息的情况下,控制器根据故障信息向故障节点推荐故障修复操作的实现过程。
步骤401:获取故障节点的故障信息。
本步骤的实现方式可以参考前述实施例中的步骤301,本申请实施例在此不再赘述。
步骤402:从预案库中查找是否存在与该故障信息对应的修复预案。
在本步骤中,控制器可以参考前述实施例中步骤302介绍的方式来确定第一故障事件与预案库中各个故障事件之间的相似度,得到多个相似度。如果该多个相似度中存在大于第一阈值的相似度,则可以确定该预案库中存在故障信息对应的修复预案,接下来,控制器可以执行步骤403-405。
步骤403:如果预案库中存在故障信息对应的修复预案,则将故障信息对应的修复预案作为推荐预案。
由步骤402中的介绍可知,如果通过计算相似度发现预案库中存在与第一故障事件的相似度大于第一阈值的故障事件,则说明目前预案库中存在与第一故障事件相似的故障事件。在这种情况下,控制器可以从大于第一阈值的相似度中获取第一相似度,将与第一故障事件的相似度为第一相似度的故障事件作为第二故障事件。之后,控制器可以获取第二故障事件对应的修复预案,并将该修复预案作为推荐预案。其中,第一相似度可以为大于第一阈值的相似度中的最大相似度。
可选地,在另一种可能的实现方式中,控制器可以将与第一故障事件的相似度大于第一阈值的故障事件均作为第二故障事件。也即,第二故障事件可以为多个。在这种情况下,控制器可以根据第二故障事件从预案库中获取到多个推荐预案。
步骤404:预测推荐预案包括的一个或多个候选操作中每个候选操作的综合影响程度。
在获取到推荐预案之后,对于推荐预案包括的一个或多个候选操作,控制器可以预估 每个候选操作对故障节点所在的网络内的综合影响程度。
示例性地,控制器可以预估推荐预案包括的一个或多个候选操作中的每个候选操作对故障节点所在的网络内的业务影响程度。预估一个或多个候选操作中的每个候选操作对故障节点所在网络的配置影响程度;根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度。
以一个或多个候选操作中的任一候选操作为例,将其称为第一候选操作。控制器可以预估采用第一候选操作对故障节点上发生的故障进行修复之后,对故障节点当前承载的业务的影响,影响越小,业务影响程度越小。例如,假设第一候选操作为隔离故障节点上的某个端口,则控制器可以预估隔离该端口之后是否影响已有的虚拟机业务。如果影响,则可以确定业务影响程度为第一数值,否则,确定业务影响程度为第二数值。
对于一个或多个候选操作中的每个候选操作,控制器均可以采用上述方法来预估每个候选操作对应的业务影响程度。
在本申请实施例中,控制器还可以预估每个候选操作对故障节点所在网络的配置影响程度。该配置影响程度可以表征在采用相应候选操作对故障节点上发生的第一故障进行修复之后,该故障节点与物理网络中的其他节点之间的连通程度。仍以一个或多个候选操作中的任一候选操作为例,为了方便描述,将其称为第一候选操作。
示例性地,控制器可以获取故障节点的原始配置信息和故障节点所在网络的物理拓扑信息;根据第一候选操作和故障节点的原始配置信息,生成第一候选操作对应的预测配置信息,第一候选操作为一个或多个候选操作中的任一候选操作;将第一候选操作对应的预测配置信息、故障节点的原始配置信息和物理拓扑信息作为配置面验证CPV模型的输入,通过CPV模型确定第一候选操作对应的配置影响程度。其中,该配置影响程度可以为第三数值或第四数值,当配置影响程度为第三数值时,用于指示第一候选操作对故障节点的配置无影响,也即,第一候选操作对故障节点与其他节点的连通无影响。当配置影响程度为第四数值时,用于指示第一候选操作对故障节点的配置有影响,也即,第一候选操作对故障节点与其他节点的连通有影响。
其中,控制器中可以存储有整个物理网络的物理拓扑信息以及该物理网络中各个物理节点的配置基线,控制器可以获取物理拓扑信息,并获取故障节点的配置基线,将该配置基线作为故障节点的原始配置信息。
在获取到该故障节点的原始配置信息之后,控制器可以根据第一候选操作和原始配置信息,生成第一候选操作对应的预测配置信息。由前述介绍可知,第一候选操作可以包括用于指示修复故障的指令,基于此,可以在原始配置信息的基础上添加第一候选操作包括的指令,从而得到预测配置信息,或者,可以根据第一候选操作包括的指令,对原始配置信息中包括的指令进行修改,从而得到预测配置信息。
在得到该故障节点的原始配置信息、物理拓扑信息和第一候选操作对应的预测配置信息之后,控制器可以通过CPV模型来对上述进行处理,从而得到第一候选操作对应的配置影响程度。
对于一个或多个候选操作中的每个候选操作,控制器均可以采用上述方法来预估每个候选操作对应的配置影响程度。
在确定每个候选操作对应的业务影响程度和配置影响程度之后,控制器可以获取业务影响权重和配置影响权重。之后,根据第一候选操作对应的业务影响程度、业务影响权重、 第一候选操作对应的配置影响程度和配置影响权重,确定第一候选操作对应的综合影响程度。
其中,业务影响权重和配置影响权重可以是预先配置的。基于此,控制器可以确定业务影响权重与第一候选操作的业务影响程度的乘积,确定配置影响权重和第一候选操作对应的配置影响程度的乘积,将两个乘积的和作为第一候选操作对应的综合影响程度。
对于每个候选操作,控制器均可以参照上述方法确定得到相应候选操作对应的综合影响程度。
需要说明的是,在一种可能的情况中,控制器可以不进行业务影响分析,也即,可以不预估业务影响程度,在这种情况下,可以直接将确定的配置影响程度作为综合影响程度。
步骤405:将一个或多个候选操作中对应的综合影响程度最小的候选操作确定为故障修复操作,向故障节点发送故障修复操作,以便故障节点根据故障修复操作进行故障修复。
在得到每个候选操作对应的综合影响程度之后,控制器可以将对应的综合影响程度最小的候选操作确定为故障修复操作,并向故障节点下发该故障修复操作,以便故障节点可以根据该故障修复操作对发生的第一故障进行修复。
在本申请实施例中,当在预案库中查找到与故障信息对应的修复预案时,可以将故障信息对应的修复预案作为推荐预案。在得到推荐预案之后,可以提前预估推荐预案包括的每种候选操作的综合影响程度。其中,综合影响程度可以由预估的业务影响程度和配置影响程度综合计算得到,将综合影响程度最小的候选操作作为故障修复操作,这样,可以使得故障节点通过该故障修复操作进行故障修复时对该故障节点承载的业务和连通的节点的影响达到最小。
参见图5,本申请实施例提供了一种故障修复操作推荐装置500,该装置500包括:
获取模块501,用于执行前述实施例中的步骤301;
第一确定模块502,用于执行前述实施例中的步骤303;
推荐模块503,用于执行前述实施例中的步骤304。
可选地,故障信息包括多个故障参数,推荐模型的参数包括多个故障特征因子,多个故障特征因子中的每个故障特征因子对应多个故障参数中的一个故障参数;
第一确定模块502具体用于:
将多个故障参数作为推荐模型的输入,按照多个故障特征因子,通过推荐模型确定推荐预案。
可选地,推荐模型的参数还包括配置影响因子,配置影响因子包括故障节点的原始配置信息和故障节点所在网络的物理拓扑信息;
推荐模块503具体用于:
将第一候选操作作为推荐模型的输入,按照配置影响因子,通过推荐模型确定第一候选操作对应的配置影响程度,配置影响程度是第一候选操作对故障节点的配置和所在网络的物理拓扑的预测影响程度,第一候选操作为一个或多个候选操作中的任一候选操作;
将一个或多个候选操作中对应的配置影响程度最小的候选操作作为故障修复操作。
可选地,参见图6,该装置500还包括:
接收模块504,用于接收故障节点根据故障修复操作进行故障修复后反馈的修复结果,修复结果包括故障节点进行故障修复后的路由信息和所在网络的物理拓扑信息;
第二确定模块505,用于将故障修复后的路由信息和物理拓扑信息作为数据面验证DPV模型的输入,通过DPV模型确定故障修复操作对应的数据面影响程度,数据面影响程度是故障修复操作对故障节点的配置和所在网络的物理拓扑造成的真实影响程度;
生成模块506,用于根据故障修复操作对应的数据面影响程度、故障修复操作和故障信息,生成故障样本;
调整模块507,用于根据故障样本对推荐模型的参数进行调整。
综上所述,在本申请实施例中,当在预案库中无法查找到与故障信息对应的修复预案时,可以通过推荐模型来对故障信息进行处理,得到推荐预案,进而从推荐预案包括的一种或多种候选操作中选择一个操作作为故障修复操作,解决了在该预案库中不存在该故障信息对应的修复预案,无法为故障节点提供修复操作的问题。
参见图7,本申请实施例提供了一种故障修复操作推荐装置700,该装置700包括:
获取模块701,用于执行前述实施例中的步骤401;
确定模块702,用于执行前述实施例中的步骤403;
预测模块703,用于执行前述实施例中的步骤404;
推荐模块704,用于执行前述实施例中的步骤405。
可选地,预测模块703具体用于:
预估一个或多个候选操作中的每个候选操作对故障节点所在的网络内的业务的业务影响程度;
预估一个或多个候选操作中的每个候选操作对故障节点所在网络的配置影响程度;
根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度。
可选地,预测模块703具体用于:
获取故障节点的原始配置信息和故障节点所在网络的物理拓扑信息;
根据第一候选操作和故障节点的原始配置信息,生成第一候选操作对应的预测配置信息,第一候选操作为一个或多个候选操作中的任一候选操作;
将第一候选操作对应的预测配置信息、故障节点的原始配置信息和物理拓扑信息作为配置面验证CPV模型的输入,通过CPV模型确定第一候选操作对应的配置影响程度。
可选地,推荐模块704具体用于:
获取业务影响权重和配置影响权重;
根据第一候选操作对应的业务影响程度、业务影响权重、第一候选操作对应的配置影响程度和配置影响权重,确定第一候选操作对应的综合影响程度。
综上所述,在本申请实施例中,当在预案库中查找到与故障信息对应的修复预案时,可以将故障信息对应的修复预案作为推荐预案。在得到推荐预案之后,可以提前预估推荐预案包括的每种候选操作的综合影响程度。其中,综合影响程度可以由预估的业务影响程度和配置影响程度综合计算得到,将综合影响程度最小的候选操作作为故障修复操作,这样,可以使得故障节点通过该故障修复操作进行故障修复时对该故障节点承载的业务和连通的节点的影响达到最小。
需要说明的是:上述实施例提供的故障修复操作推荐装置在推荐故障修复操作时,仅 以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的故障修复操作推荐装置与故障修复操作推荐方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种故障修复操作推荐方法,其特征在于,所述方法包括:
    获取故障节点的故障信息;
    如果预案库中不存在所述故障信息对应的修复预案,则根据所述故障信息,通过推荐模型确定推荐预案,所述推荐预案包括一个或多个候选操作;
    从所述推荐预案包括的一个或多个候选操作中确定故障修复操作,向所述故障节点推荐所述故障修复操作,以便所述故障节点根据所述故障修复操作进行故障修复。
  2. 根据权利要求1所述的方法,其特征在于,所述故障信息包括多个故障参数,所述推荐模型的参数包括多个故障特征因子,所述多个故障特征因子中的每个故障特征因子对应所述多个故障参数中的一个故障参数;
    所述根据所述故障信息,通过推荐模型确定推荐预案,包括:
    将所述多个故障参数作为所述推荐模型的输入,按照所述多个故障特征因子,通过所述推荐模型确定所述推荐预案。
  3. 根据权利要求2所述的方法,其特征在于,所述推荐模型的参数还包括配置影响因子,所述配置影响因子包括所述故障节点的原始配置信息和所述故障节点所在网络的物理拓扑信息;
    所述从所述推荐预案包括的一个或多个候选操作中确定故障修复操作,包括:
    将第一候选操作作为所述推荐模型的输入,按照所述配置影响因子,通过所述推荐模型确定所述第一候选操作对应的配置影响程度,所述配置影响程度是所述第一候选操作对所述故障节点的配置和所在网络的物理拓扑的预测影响程度,所述第一候选操作为所述一个或多个候选操作中的任一候选操作;
    将所述一个或多个候选操作中对应的配置影响程度最小的候选操作作为所述故障修复操作。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述向所述故障节点发送所述故障修复操作之后,还包括:
    接收所述故障节点根据所述故障修复操作进行故障修复后反馈的修复结果,所述修复结果包括所述故障节点进行故障修复后的路由信息和所在网络的物理拓扑信息;
    将故障修复后的所述路由信息和所述物理拓扑信息作为数据面验证DPV模型的输入,通过所述DPV模型确定所述故障修复操作对应的数据面影响程度,所述数据面影响程度是所述故障修复操作对所述故障节点的配置和所在网络的物理拓扑造成的真实影响程度;
    根据所述故障修复操作对应的数据面影响程度、所述故障修复操作和所述故障信息,生成故障样本;
    根据所述故障样本对所述推荐模型的参数进行调整。
  5. 一种故障修复操作推荐方法,其特征在于,所述方法包括:
    获取故障节点的故障信息;
    如果预案库中存在所述故障信息对应的修复预案,则将所述故障信息对应的修复预案作为推荐预案,所述推荐预案包括一个或多个候选操作;
    预测所述一个或多个候选操作中每个候选操作的综合影响程度,所述综合影响程度用于指示相应候选操作对所述故障节点所在网络的综合影响的大小;
    将所述一个或多个候选操作中对应的综合影响程度最小的候选操作确定为故障修复操作,向所述故障节点推荐所述故障修复操作,以便所述故障节点根据所述故障修复操作进行故障修复。
  6. 根据权利要求5所述的方法,其特征在于,所述预测所述推荐预案包括的一个或多个候选操作中每个候选操作的综合影响程度,包括:
    预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在的网络内的业务的业务影响程度;
    预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在网络的配置影响程度;
    根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度。
  7. 根据权利要求6所述的方法,其特征在于,所述预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在网络的配置影响程度,包括:
    获取所述故障节点的原始配置信息和所述故障节点所在网络的物理拓扑信息;
    根据第一候选操作和所述故障节点的原始配置信息,生成所述第一候选操作对应的预测配置信息,所述第一候选操作为所述一个或多个候选操作中的任一候选操作;
    将所述第一候选操作对应的预测配置信息、所述故障节点的原始配置信息和所述物理拓扑信息作为配置面验证CPV模型的输入,通过所述CPV模型确定所述第一候选操作对应的配置影响程度。
  8. 根据权利要求6或7所述的方法,其特征在于,所述根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度,包括:
    获取业务影响权重和配置影响权重;
    根据第一候选操作对应的业务影响程度、所述业务影响权重、所述第一候选操作对应的配置影响程度和所述配置影响权重,确定所述第一候选操作对应的综合影响程度。
  9. 一种故障修复操作推荐装置,其特征在于,所述装置包括:
    获取模块,用于获取故障节点的故障信息;
    第一确定模块,用于如果预案库中不存在所述故障信息对应的修复预案,则根据所述故障信息,通过推荐模型确定推荐预案,所述推荐预案包括一个或多个候选操作;
    推荐模块,用于从所述推荐预案包括的一个或多个候选操作中确定故障修复操作,向所述故障节点推荐所述故障修复操作,以便所述故障节点根据所述故障修复操作进行故障修复。
  10. 根据权利要求9所述的装置,其特征在于,所述故障信息包括多个故障参数,所述推荐模型的参数包括多个故障特征因子,所述多个故障特征因子中的每个故障特征因子对应所述多个故障参数中的一个故障参数;
    所述第一确定模块具体用于:
    将所述多个故障参数作为所述推荐模型的输入,按照所述多个故障特征因子,通过所述推荐模型确定所述推荐预案。
  11. 根据权利要求10所述的装置,其特征在于,所述推荐模型的参数还包括配置影响因子,所述配置影响因子包括所述故障节点的原始配置信息和所述故障节点所在网络的物理拓扑信息;
    所述推荐模块具体用于:
    将第一候选操作作为所述推荐模型的输入,按照所述配置影响因子,通过所述推荐模型确定所述第一候选操作对应的配置影响程度,所述配置影响程度是所述第一候选操作对所述故障节点的配置和所在网络的物理拓扑的预测影响程度,所述第一候选操作为所述一个或多个候选操作中的任一候选操作;
    将所述一个或多个候选操作中对应的配置影响程度最小的候选操作作为所述故障修复操作。
  12. 根据权利要求9-11任一所述的装置,其特征在于,所述装置还包括:
    接收模块,用于接收所述故障节点根据所述故障修复操作进行故障修复后反馈的修复结果,所述修复结果包括所述故障节点进行故障修复后的路由信息和所在网络的物理拓扑信息;
    第二确定模块,用于将故障修复后的所述路由信息和所述物理拓扑信息作为数据面验证DPV模型的输入,通过所述DPV模型确定所述故障修复操作对应的数据面影响程度,所述数据面影响程度是所述故障修复操作对所述故障节点的配置和所在网络的物理拓扑造成的真实影响程度;
    生成模块,用于根据所述故障修复操作对应的数据面影响程度、所述故障修复操作和所述故障信息,生成故障样本;
    调整模块,用于根据所述故障样本对所述推荐模型的参数进行调整。
  13. 一种故障修复操作推荐装置,其特征在于,所述装置包括:
    获取模块,用于获取故障节点的故障信息;
    确定模块,用于如果预案库中存在所述故障信息对应的修复预案,则将所述故障信息对应的修复预案作为推荐预案,所述推荐预案包括一个或多个候选操作;
    预测模块,用于预测所述一个或多个候选操作中每个候选操作的综合影响程度,所述综合影响程度用于指示相应候选操作对所述故障节点所在网络的综合影响的大小;
    推荐模块,用于将所述一个或多个候选操作中对应的综合影响程度最小的候选操作确定为故障修复操作,向所述故障节点推荐所述故障修复操作,以便所述故障节点根据所述故障修复操作进行故障修复。
  14. 根据权利要求13所述的装置,其特征在于,所述预测模块具体用于:
    预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在的网络内的业务的业务影响程度;
    预估所述一个或多个候选操作中的每个候选操作对所述故障节点所在网络的配置影响程度;
    根据每个候选操作对应的业务影响程度和配置影响程度,确定每个候选操作对应的综合影响程度。
  15. 根据权利要求14所述的装置,其特征在于,所述预测模块具体用于:
    获取所述故障节点的原始配置信息和所述故障节点所在网络的物理拓扑信息;
    根据第一候选操作和所述故障节点的原始配置信息,生成所述第一候选操作对应的预测配置信息,所述第一候选操作为所述一个或多个候选操作中的任一候选操作;
    将所述第一候选操作对应的预测配置信息、所述故障节点的原始配置信息和所述物理拓扑信息作为配置面验证CPV模型的输入,通过所述CPV模型确定所述第一候选操作对应的配置影响程度。
  16. 根据权利要求14或15所述的装置,其特征在于,所述推荐模块具体用于:
    获取业务影响权重和配置影响权重;
    根据第一候选操作对应的业务影响程度、所述业务影响权重、所述第一候选操作对应的配置影响程度和所述配置影响权重,确定所述第一候选操作对应的综合影响程度。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述权利要求1-8任一项所述的故障修复操作推荐方法。
PCT/CN2020/118233 2019-11-27 2020-09-28 故障修复操作推荐方法、装置及存储介质 WO2021103800A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20892606.3A EP4047481A4 (en) 2019-11-27 2020-09-28 METHOD AND DEVICE FOR RECOMMENDED TROUBLESHOOTING ACTIONS AND STORAGE MEDIA
US17/825,246 US11743113B2 (en) 2019-11-27 2022-05-26 Fault rectification operation recommendation method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911180239.0 2019-11-27
CN201911180239.0A CN112860496A (zh) 2019-11-27 2019-11-27 故障修复操作推荐方法、装置及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/825,246 Continuation US11743113B2 (en) 2019-11-27 2022-05-26 Fault rectification operation recommendation method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2021103800A1 true WO2021103800A1 (zh) 2021-06-03

Family

ID=75985400

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118233 WO2021103800A1 (zh) 2019-11-27 2020-09-28 故障修复操作推荐方法、装置及存储介质

Country Status (4)

Country Link
US (1) US11743113B2 (zh)
EP (1) EP4047481A4 (zh)
CN (1) CN112860496A (zh)
WO (1) WO2021103800A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11561880B2 (en) * 2020-02-28 2023-01-24 Dell Products L.P. Method to analyze impact of a configuration change to one device on other connected devices in a data center
CN113259171B (zh) * 2021-06-02 2021-10-01 新华三技术有限公司 一种业务部署方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103746841A (zh) * 2013-12-30 2014-04-23 华为技术有限公司 故障恢复的方法及控制器
CN104486411A (zh) * 2014-12-15 2015-04-01 四川长虹电器股份有限公司 一种推荐方法及云端服务器
US9407359B2 (en) * 2014-07-30 2016-08-02 Ciena Corporation Localized network repair systems and methods
CN106941421A (zh) * 2017-03-31 2017-07-11 北京奇艺世纪科技有限公司 一种链路故障修复方法及装置
CN108418711A (zh) * 2013-09-30 2018-08-17 华为技术有限公司 故障管理的存储介质和计算机程序产品
CN109840157A (zh) * 2017-11-28 2019-06-04 中国移动通信集团浙江有限公司 故障诊断的方法、装置、电子设备和存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7162427B1 (en) * 1999-08-20 2007-01-09 Electronic Data Systems Corporation Structure and method of modeling integrated business and information technology frameworks and architecture in support of a business
US7552208B2 (en) * 2005-01-18 2009-06-23 Microsoft Corporation Methods for managing capacity
US8005706B1 (en) * 2007-08-03 2011-08-23 Sprint Communications Company L.P. Method for identifying risks for dependent projects based on an enhanced telecom operations map
US8527327B1 (en) * 2010-03-21 2013-09-03 Mark Lawrence Method and apparatus to manage project control
US9483344B2 (en) * 2012-04-05 2016-11-01 Assurant, Inc. System, method, apparatus, and computer program product for providing mobile device support services
US9002997B2 (en) * 2013-01-22 2015-04-07 Amazon Technologies, Inc. Instance host configuration
US9079505B1 (en) * 2014-02-25 2015-07-14 Elwah LLC System and method for management of a fleet of vehicles having an energy storage system
WO2015166509A1 (en) * 2014-04-30 2015-11-05 Hewlett-Packard Development Company, L.P. Support action based self learning and analytics for datacenter device hardware/firmware fault management
CN105827419B (zh) * 2015-01-05 2020-03-10 华为技术有限公司 一种转发设备故障处理的方法、设备和控制器
US20160364666A1 (en) * 2015-06-12 2016-12-15 General Electric Company Dynamically controlling industrial system outage assignments to achieve dose states
US10025583B2 (en) * 2016-02-17 2018-07-17 International Business Machines Corporation Managing firmware upgrade failures
CN108632063B (zh) * 2017-03-20 2021-01-05 华为技术有限公司 管理网络切片实例的方法、装置和系统
CN109257195B (zh) * 2017-07-12 2021-01-15 华为技术有限公司 集群中节点的故障处理方法及设备
US11361234B2 (en) * 2018-08-30 2022-06-14 International Business Machines Corporation Real-world execution of contingent plans

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108418711A (zh) * 2013-09-30 2018-08-17 华为技术有限公司 故障管理的存储介质和计算机程序产品
CN103746841A (zh) * 2013-12-30 2014-04-23 华为技术有限公司 故障恢复的方法及控制器
US9407359B2 (en) * 2014-07-30 2016-08-02 Ciena Corporation Localized network repair systems and methods
CN104486411A (zh) * 2014-12-15 2015-04-01 四川长虹电器股份有限公司 一种推荐方法及云端服务器
CN106941421A (zh) * 2017-03-31 2017-07-11 北京奇艺世纪科技有限公司 一种链路故障修复方法及装置
CN109840157A (zh) * 2017-11-28 2019-06-04 中国移动通信集团浙江有限公司 故障诊断的方法、装置、电子设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG RUI: "Research on Security Service Chain Mapping Mechanism Oriented to SDN/NFV", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 20 October 2018 (2018-10-20), pages 1 - 84, XP055816671 *

Also Published As

Publication number Publication date
US11743113B2 (en) 2023-08-29
US20220286351A1 (en) 2022-09-08
CN112860496A (zh) 2021-05-28
EP4047481A1 (en) 2022-08-24
EP4047481A4 (en) 2023-01-04

Similar Documents

Publication Publication Date Title
US11016836B2 (en) Graphical user interface for visualizing a plurality of issues with an infrastructure
CN113328872B (zh) 故障修复方法、装置和存储介质
US8015139B2 (en) Inferring candidates that are potentially responsible for user-perceptible network problems
US11108619B2 (en) Service survivability analysis method and apparatus
JP6307453B2 (ja) リスク評価システムおよびリスク評価方法
US9774654B2 (en) Service call graphs for website performance
WO2013186870A1 (ja) サービス監視システム、及び、サービス監視方法
JP2017509262A (ja) ネットワーク障害のトラブルシューティング・オプションの識別
US11743113B2 (en) Fault rectification operation recommendation method and apparatus, and storage medium
US20160156516A1 (en) Monitoring device, method, and medium
WO2021052380A1 (zh) 提取故障传播条件的方法、装置及存储介质
US10831630B2 (en) Fault analysis method and apparatus based on data center
US10884805B2 (en) Dynamically configurable operation information collection
US10659289B2 (en) System and method for event processing order guarantee
US7646729B2 (en) Method and apparatus for determination of network topology
JP2016010124A (ja) 管理装置、管理プログラム及び情報処理システム
US20160004584A1 (en) Method and computer system to allocate actual memory area from storage pool to virtual volume
US11036561B2 (en) Detecting device utilization imbalances
US20180270102A1 (en) Data center network fault detection and localization
US9634884B2 (en) Monitoring apparatus, monitoring method and monitoring program
WO2022057428A1 (zh) 确定故障根因的方法,装置以及相关设备
WO2022160916A1 (zh) 处理数据的方法、装置、系统及存储介质
WO2013103008A1 (ja) 事象の原因を特定する情報システム、コンピュータ及び方法
JP2015007886A (ja) 運用管理処理検証装置、運用管理システム、運用管理処理検証方法、および、コンピュータ・プログラム
CN114095394A (zh) 网络节点故障检测方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20892606

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020892606

Country of ref document: EP

Effective date: 20220516

NENP Non-entry into the national phase

Ref country code: DE