WO2024037345A1 - Fault processing method and apparatus, and storage medium - Google Patents

Fault processing method and apparatus, and storage medium Download PDF

Info

Publication number
WO2024037345A1
WO2024037345A1 PCT/CN2023/110786 CN2023110786W WO2024037345A1 WO 2024037345 A1 WO2024037345 A1 WO 2024037345A1 CN 2023110786 W CN2023110786 W CN 2023110786W WO 2024037345 A1 WO2024037345 A1 WO 2024037345A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
information
fault diagnosis
business
processing
Prior art date
Application number
PCT/CN2023/110786
Other languages
French (fr)
Chinese (zh)
Inventor
张均
宋燕
付光荣
姜磊
孟照星
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024037345A1 publication Critical patent/WO2024037345A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Definitions

  • the embodiments of the present application relate to, but are not limited to, the field of communication technology, and in particular, to a fault handling method, device, and storage medium thereof.
  • the business operation and maintenance system and the professional operation and maintenance system are usually deployed separately.
  • the professional operation and maintenance system can analyze, diagnose and repair network faults.
  • the professional operation and maintenance system handles the fault according to the business intention, and then confirms with the business operation and maintenance system whether the business problem is solve. Based on this, if the business operation and maintenance personnel in the business operation and maintenance system cannot accurately describe the business problem, there will be multiple fault location situations, which will lead to a reduction in fault handling efficiency.
  • Embodiments of the present application provide a fault handling method, device, and storage medium.
  • embodiments of the present application provide a fault processing method, which includes: obtaining business intent information; performing standardized processing on the business intent information to obtain a fault diagnosis scenario and the processing priority of the fault diagnosis scenario; according to the Perform fault processing on the fault diagnosis scenario according to the above processing priority.
  • embodiments of the present application also provide a fault handling device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • a fault handling device including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, Troubleshooting methods as described above.
  • embodiments of the present application also provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the fault handling method as described above.
  • embodiments of the present application further provide a computer program product, which includes a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device obtains the information from the computer program or computer instructions.
  • the computer-readable storage medium reads the computer program or the computer instructions, and the processor executes the computer program or the computer instructions, so that the computer device performs the fault handling method as described above.
  • Figure 1 is a schematic diagram of a fault processing device for performing a fault processing method provided by an embodiment of the present application
  • Figure 2 is a flow chart of a fault handling method provided by an embodiment of the present application.
  • FIG. 3 is a flow chart of a method in step S120 in Figure 2;
  • Figure 4 is a flow chart of a method in step S220 in Figure 3;
  • Figure 5 is a flow chart of a fault handling method provided by another embodiment of the present application.
  • Figure 6 is a flow chart of a fault handling method provided by another embodiment of the present application.
  • Figure 7 is a schematic diagram of an application scenario in which a fault handling device provided by an embodiment of the present application is deployed in a business operation and maintenance system;
  • Figure 8 is a flow chart of a fault handling method provided by another embodiment of the present application.
  • Figure 9 is a schematic diagram of an application scenario in which a fault handling device for executing a fault handling method provided by an embodiment of the present application is deployed in a business operation and maintenance system;
  • Figure 10 is a schematic diagram of the movement scenario of an AGV automatic guided transport vehicle provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of a fault processing device provided by another embodiment of the present application.
  • Embodiments of this application include: obtaining business intent information; performing standardized processing on the business intent information to obtain the fault diagnosis scenario of the target network device and the processing priority of the fault diagnosis scenario; performing fault processing on the fault diagnosis scenario according to the processing priority, that is, That is to say, through the standardized processing of business intent information, the target network device and its fault diagnosis scenario can be accurately located, and the fault diagnosis scenario can be fault processed according to the processing priority, which improves the fault processing efficiency. Therefore, the embodiment of the present application can Accurately locate faults and improve fault handling efficiency.
  • Figure 1 is a schematic diagram of a fault processing device for executing a fault processing method provided by an embodiment of the present application.
  • the fault handling device at least includes an intent sensor 110 , an intent translator 120 , a knowledge manager 130 , a delimitation locator 150 , a fault forwarder 140 and an intent verifier 160 , where the intent sensor 110.
  • the intent translator 120, the knowledge manager 130, the intent verifier 160, the delimitation locator 150 and the fault handler 140 are communicatively connected in sequence, and the intent translator 120 is also communicatively connected with the delimitation locator 150.
  • the intent sensor 110 can obtain the user's service perception indicator requirements, and then convert the user's service perception indicator requirements and other business intention information into processing statements, such as delay guarantee requirements, bandwidth guarantee requirements, etc., through business intention input in voice, text, etc. Process statements. Taking the scenario of terminal automatic assembly business at a certain workstation in a certain workshop as an example, the intent sensor 110 can convert the business intention information into the service delay of a certain workstation in a certain workshop, the network jitter of a certain workstation in a certain workshop, and the network jitter of a certain workstation in a certain workshop. Terminal failure and other handling statements.
  • the intent sensor 110 can also obtain service monitoring faults in real time and automatically sense the service intent information of the service monitoring faults. It can be understood that in a campus scenario, the possible business intentions expressed by users in business statements can be enumerated, and there are no specific restrictions here.
  • the intention translator 120 is one of the core components of the fault processing device.
  • the intention translator 120 can receive the business intention information from the intention sensor 110, standardize the business intention information, and obtain the fault diagnosis scenario and fault The processing priority of diagnostic scenarios. For example, assuming that the target terminal is a campus terminal that has experienced a service failure, its business intent information is input into the intent translator 120.
  • the intent translator 120 performs standardized sentence processing on the business intent information to obtain the network attribute information of the campus terminal. According to the network attribute information, The first target network device, then the intent translator 120 obtains the current network topology information and static network topology information of the campus terminal, and performs correlation analysis on the static network topology information and the current network topology information to obtain multiple The second target network device the device is associated with.
  • the intent translator 120 can also dynamically query the current operation and maintenance data information (such as network characteristic data information, alarm data information, performance data information of related equipment, etc.) in real time, thereby intuitively obtaining the information corresponding to the business fault.
  • Fault diagnosis scenarios (such as equipment failure), or the current operation and maintenance data information can also be input into the similarity model from the knowledge manager 130, and then translated to obtain multiple fault diagnosis scenarios and the processing priorities of the fault diagnosis scenarios, where , the fault causes of multiple fault diagnosis scenarios may be different. It can be understood that when the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, the business intention information and the fault diagnosis scenario can be sent to the professional operation and maintenance system.
  • the knowledge manager 130 can be used to save and provide the knowledge and experience required by the fault handling device, such as unified processing statements expressing business intent (that is, processing statements that perform standardized statement processing on business intent information), historical operation and maintenance data information , historical business data information, result information after business intention information conversion (such as network attribute information), processing priority of fault diagnosis scenarios, information related to business intention information such as typical faults and indicator degradation, for example, video in a certain area Monitor the bandwidth fault information, corresponding network equipment, fault diagnosis scenario information and other information related to the business intention information of the service guarantee; another example is the delay fault information that is strongly related to the business intention information of the automatic assembly service guarantee of a certain station in a certain workshop, etc. .
  • business intent that is, processing statements that perform standardized statement processing on business intent information
  • historical operation and maintenance data information historical business data information
  • result information after business intention information conversion such as network attribute information
  • processing priority of fault diagnosis scenarios information related to business intention information such as typical faults and indicator degradation, for example, video in a certain area Monitor the bandwidth fault
  • the knowledge manager 130 can also be used to store different fault diagnosis scenarios under the professional operation and maintenance system (such as fault diagnosis scenarios of possible causes of equipment failure or indicator degradation), as well as fault diagnosis instructions, etc.
  • the business operation and maintenance system (or professional operation and maintenance system) can inject historical business data information and historical operation and maintenance data information of the business operation and maintenance system (or professional operation and maintenance system) into the knowledge manager 130, and use the knowledge manager 130 to Historical business data and historical operation and maintenance data are used for training to obtain a similarity model.
  • the delimitation locator 150 may be used to provide automated diagnosis capabilities for one or more fault diagnosis scenarios output by the intent translator 120 (such as network device failure or network performance index degradation and other fault diagnosis scenarios).
  • the delimitation locator 150 obtains fault diagnosis instructions for fault diagnosis scenarios such as network equipment failure or network performance index degradation from the knowledge manager 130, and performs fault processing on the fault diagnosis scenarios from the intent translator 120 according to the fault diagnosis instructions.
  • the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, that is, when the fault diagnosis scenario can be delimited and positioned in the business operation and maintenance system
  • the delimitation locator 150 starts from Fault diagnosis instructions are obtained in the knowledge manager 130, and fault diagnosis scenarios are performed according to the fault diagnosis instructions
  • the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, that is, the fault diagnosis scenario can be defined and positioned in the professional operation and maintenance system.
  • the fault switcher 140 needs to be called, and the business intention information and fault diagnosis scenario are forwarded to the professional operation and maintenance system through the fault switcher 140, and the professional operation and maintenance system performs fault processing on the fault diagnosis scenario.
  • the fault location involves complex diagnostic processes such as fault diagnosis tree, repeated diagnosis, multi-step diagnosis, delay Diagnosis etc.
  • the fault transferor 140 is used to transfer the business intention information and fault diagnosis scenarios to the professional operation and maintenance system or the business operation and maintenance system, so that the professional operation and maintenance system or the business operation and maintenance system can perform fault processing on the fault diagnosis scene (i.e., perform fault diagnosis on the fault diagnosis scene). analysis, fault diagnosis and fault self-healing), and receive fault reports from professional operation and maintenance systems or business operation and maintenance systems. Troubleshooting results of diagnostic scenarios.
  • the fault switch 140 when the fault switch 140 is set in the business operation and maintenance system, and the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, then the fault switch 140 needs to transfer the business intention information and fault diagnosis scenario to the professional operation and maintenance system, and then receive The professional operation and maintenance system obtains the fault processing result based on the business intention information and the fault diagnosis scenario; conversely, when the fault switch 140 is set in the professional operation and maintenance system, and the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, then the fault switch 140 needs to The business intent information and fault diagnosis scenarios can be transferred to the business operation and maintenance system. Afterwards, the receiving business operation and maintenance system obtains the fault processing results based on the business intent information and fault diagnosis scenarios. There are no specific restrictions here.
  • the intent verifier 160 is used to evaluate the fault processing result of the delimitation locator 150, that is, to evaluate the fault processing result of the fault diagnosis scenario to obtain the evaluation result. Finally, the intent verifier 160 also updates the evaluation results to the knowledge manager 130 to continuously optimize and improve the accuracy of the intent translation.
  • the evaluation result is that the fault processing is effective, and the fault processing process can be closed in a closed loop; when the network status cannot be restored or the service status cannot be restored, the evaluation result It means that the fault diagnosis scenario has not been repaired, so it needs to be transferred to manual processing, or the fault processing device can re-obtain the business intent information and standardize the business intent information again.
  • the fault handling device can be deployed in the business operation and maintenance system or in the professional operation and maintenance system, and can also be set up in the business operation and maintenance system and the professional operation and maintenance system. There is no specific restriction here.
  • fault processing device shown in Figure 1 does not limit the embodiments of the present application, and may include more or less components than shown, or combine certain components, or use different Component placement.
  • FIG 2 is a flow chart of a fault processing method provided by an embodiment of the present application.
  • the fault processing method can be applied to a fault processing device, such as the fault processing device shown in Figure 1, and the fault processing device can be provided in in the business operation and maintenance system.
  • the fault handling method may include but is not limited to step S110, step S120 and step S130.
  • Step S110 Obtain business intent information.
  • the business intention information may be delay intention information or other business intention information, which will not be listed here.
  • Step S120 Standardize the business intent information to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
  • fault diagnosis scenarios there may be multiple fault diagnosis scenarios, and the multiple fault diagnosis scenarios may be different, where the fault diagnosis scenarios may include alarm diagnosis scenarios and performance diagnosis scenarios.
  • Step S130 Perform fault processing on the fault diagnosis scenario according to the processing priority.
  • the business intention information can first be obtained, and then the business intention information can be standardized to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
  • Troubleshooting fault diagnosis scenarios according to processing priority that is, through standardized processing of business intent information, accurately locate the target network device and its fault diagnosis scenario, and be able to perform fault processing on the fault diagnosis scenario according to processing priority, Improved fault handling efficiency, therefore, the embodiment of the present application can accurately determine fault, and at the same time can improve the efficiency of fault handling.
  • ToB is to use the enterprise as the service subject in the enterprise business to provide platforms, products or services to enterprise customers. business model, therefore, this ToB is also called enterprise service.
  • Step S120 is further described.
  • Step S120 may include but is not limited to step S210, step S220, and step S230.
  • Step S210 Perform standardized sentence processing on the service intent information to obtain network attribute information of the target terminal.
  • the network attribute information may include physical location information, card number information, etc., which are not specifically limited here.
  • the target terminal (such as a park terminal) is a work station in the park with a built-in number card, a gantry crane, an industrial camera or an AGV (Automated Guided Vehicle) car, etc.
  • all target terminals have a common feature, that is, they are registered to the 5G private network through a number card, and the network automatically controls them to perform automated operations or automated monitoring. There are no specific restrictions here.
  • Step S220 Obtain the static network topology information of the target terminal and the current network topology information of the target terminal, and obtain multiple candidate network devices based on the network attribute information, static network topology information, and current network topology information.
  • the static network topology information includes a static physical network topology and a static logical network topology.
  • the static physical network topology includes the longitude of the static network topology and the latitude of the static network topology.
  • the static logical network topology includes the information related to the target terminal. Logical links between various network devices, etc.
  • the current network topology information includes the current physical network topology and the current logical network topology.
  • the current physical network topology includes the physical location topology information, the longitude of the current network topology, the latitude of the current network topology, etc.
  • the current logical network topology includes information related to the target.
  • the logical links between various network devices related to the terminal are not specifically limited here.
  • candidate network equipment may be CPE (Customer Premise Equipment, customer terminal equipment) required to form a 5G (5th Generation, fifth generation mobile communication system) private network, or it may be communication equipment such as base stations. There are no specific restrictions on this.
  • CPE Customer Premise Equipment, customer terminal equipment
  • 5G Fifth Generation, fifth generation mobile communication system
  • base stations There are no specific restrictions on this.
  • Step S230 Determine the current operation and maintenance data information of each candidate network device, and obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario based on the current operation and maintenance data information.
  • the current operation and maintenance data information includes alarm data information, performance data information and other operation and maintenance data information.
  • the alarm data information can be delay alarm information, out-of-service alarm information and other alarm information.
  • Performance data information It can be latency performance data, etc., which are not listed here. For example, assuming that the candidate network device is a base station, when there is base station out-of-service alarm information in the base station, the alarm diagnosis scenario of the out-of-service alarm is obtained based on the base station out-of-service alarm information.
  • the processing priority of this alarm diagnosis scenario is the highest; for another example, when the end-to-end user perception indicator data of the internal network of the candidate network device is deteriorated, that is, the performance data information is the end-to-end user perception indicator data of the internal network.
  • the performance data information is obtained from the performance diagnosis scenario, that is, the device parameter configuration check, which is not specifically limited in this embodiment.
  • the current operation and maintenance data information and business intent information can be input into the similarity model to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
  • the similarity model can be obtained by obtaining historical operation and maintenance data information and historical business data information, and then training the historical operation and maintenance data information and historical business data information.
  • similarity calculation can be performed on the current operation and maintenance data information and the business intention information to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario, wherein the Euclidean distance, cosine similarity algorithm, Manhattan distance or Similarity algorithms such as Chebyshev distance perform similarity calculations on current operation and maintenance data information and business intent information, and there are no specific restrictions here.
  • the current operation and maintenance data information and business intention information can be input into the similarity model to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario. That is to say, the current operation and maintenance data information and the processing priority of the fault diagnosis scenario can be obtained through the similarity model.
  • the business intent information is used for similarity calculation. Therefore, the fault diagnosis scenario is the diagnostic scenario corresponding to the current operation and maintenance data information that is relevant to the business intent information, and the processing priority of the fault diagnosis scenario is the diagnostic scenario corresponding to the fault diagnosis scenario.
  • all the alarm data information and business intent information can be input into the similarity model, that is, the alarm data information can be analyzed through the similarity model. Similarity calculation is performed with the business intent information to obtain multiple alarm diagnosis scenarios and the processing priority of each alarm diagnosis scenario, that is, the correlation order between the causes of multiple alarms and the causes of each alarm and the business intent information. Then, fault processing can be performed on each alarm generating cause in sequence according to the correlation sorting (ie, processing priority), or a part of the alarm generating causes can be selected for fault processing according to the correlation sorting, which is not specifically limited here.
  • the correlation sorting ie, processing priority
  • the business intent information is delay intent
  • the current operation and maintenance data information includes alarm data information and performance data information, where the alarm data information is end-to-end device fault information
  • the performance data information is delay performance data.
  • the performance diagnosis scenario corresponding to the delay performance data has the highest processing priority
  • the performance diagnosis scenario corresponding to the end-to-end device fault information has the second highest processing priority. Therefore, , fault processing can be performed on the performance diagnosis scenarios corresponding to the delay performance data and the performance diagnosis scenarios corresponding to the end-to-end device fault information in sequence according to the processing priority, and there are no specific restrictions here.
  • the historical business data information of each candidate network device can be obtained, and the historical business data information and current operation data information can be obtained.
  • Correlation analysis can be performed by calculating the similarity between historical business data information and current operation and maintenance data information. For example, assuming that the alarm data and performance data of each candidate network device are normal, the intent translator can obtain the historical business data of each candidate network device, perform correlation analysis on the historical business data and current operation and maintenance data, and determine Whether the increase in other business volumes preempting network resources has caused the network performance indicators of this business to fail to meet standards.
  • the fault handling method including the above steps S210 to S230, first, it is possible to Perform standardized statement processing on the business intent information to obtain the network attribute information of the target terminal, and then obtain the static network topology information of the target terminal and the current network topology information of the target terminal. Based on the network attribute information, static network topology information, and current network topology information, Obtain multiple candidate network devices, and finally determine the current operation and maintenance data information of each candidate network device. Based on the current operation and maintenance data information, obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
  • the embodiments of the present application can quickly and automatically handle enterprise business faults when the network topology changes from a business perspective.
  • step S220 is further described.
  • This step S220 may include but is not limited to step S310, step S320 and step S330.
  • Step S310 Obtain the first target network device according to the network attribute information.
  • the first target network device is a network device registered by the target terminal.
  • the number of the first target network devices may be multiple, and there is no specific limitation here.
  • Step S320 Perform correlation analysis on the static network topology information and the current network topology information to obtain a plurality of second target network devices associated with the first target network device.
  • the second target network device may be a network device registered with the first target network device, or a network device accessed by the first target network device, or a network device with other associated relationships with the first target network device. , no specific restrictions are made here.
  • the static network topology information may be physical location topology information, etc., which will not be listed here.
  • Step S330 Use the first target network device and the plurality of second target network devices as candidate network devices.
  • the first target network equipment, the second target network equipment and the candidate network equipment may all be CPE (Customer Premise Equipment, customer) required to form a 5G (5th Generation, fifth generation mobile communication system) private network.
  • Terminal equipment or communication equipment such as base stations, which are not specifically limited here.
  • the first target network device can be obtained according to the network attribute information, and then the static network topology information and the current network topology information are correlated and analyzed to obtain A plurality of second target network devices associated with the first target network device use the first target network device and the second target network device as candidate network devices, so that the service fault can be accurately delimited in subsequent steps.
  • the fault diagnosis scenario when the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, fault diagnosis instructions are obtained, and fault processing is performed on the fault diagnosis scenario according to the fault diagnosis instructions and processing priority.
  • the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, that is, when the fault diagnosis scenario can be delimited and located in the business operation and maintenance system
  • the delimitation locator can obtain fault diagnosis instructions from the knowledge manager, and the business operation and maintenance system
  • the fault diagnosis instruction is received, and fault processing is performed on the fault diagnosis scenario according to the fault diagnosis instruction and processing priority.
  • the business operation and maintenance fault diagnosis scenario is the fault diagnosis scenario that the business operation and maintenance system can diagnose.
  • the end-to-end delay performance data corresponds to the performance diagnosis scenario for delays caused by misconfiguration of connection service parameters. Therefore, the business operation and maintenance system only needs to modify the parameters to complete the troubleshooting of this fault diagnosis scenario. No need to Call professional operation and maintenance system.
  • the fault handling method may also include but is not limited to step S410 and step S420.
  • Step S410 When the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, send the business intention information and the fault diagnosis scenario to the professional operation and maintenance system.
  • professional operation and maintenance fault diagnosis scenarios are fault diagnosis scenarios that need to be transferred to the professional operation and maintenance system for delimitation and positioning, such as faults that the business operation and maintenance system fails to locate or that require professional operation and maintenance system processing.
  • the fault diagnosis scenario corresponding to the base station out-of-service alarm information is a professional operation and maintenance fault diagnosis scenario. Therefore, it is necessary to use the fault switch to combine the business intention information and the fault diagnosis scenario. Transfer it to the professional operation and maintenance system for processing, and use the mature delimitation and positioning capabilities of the professional operation and maintenance system to perform fault analysis, fault diagnosis and fault self-healing.
  • Step S420 Receive the fault processing results from the professional operation and maintenance system.
  • the fault processing results are obtained by the professional operation and maintenance system based on the business intent information and fault diagnosis scenarios.
  • the fault switcher in the business operation and maintenance system can automatically combine the business intention information with The fault diagnosis scenario is sent to the professional operation and maintenance system, and then the business operation and maintenance system can receive the fault processing results obtained by the professional operation and maintenance system based on the business intention information and the fault diagnosis scenario. Therefore, this embodiment can combine the business operation and maintenance system and the professional operation and maintenance system.
  • the system is associated with the maintenance system, and the automation technology and the professional technology of the professional operation and maintenance system are used to delimit and locate the fault, thereby improving the automated processing capabilities of business faults.
  • the fault handling method may also include but is not limited to step S510 and step S520.
  • Step S510 Determine the fault processing results of the fault diagnosis scenario, perform evaluation processing on the fault processing results, and obtain the evaluation results.
  • the fault processing results of the fault diagnosis scenario by the business operation and maintenance system or the fault processing results of the fault diagnosis scenario by the professional operation and maintenance system can be input into the intent verifier, and the intent verifier passes the business fault recovery situation and manual feedback In this case, the fault processing result is evaluated and processed to obtain the evaluation result. Therefore, this embodiment can determine the validity and accuracy of the fault processing result through the evaluation result.
  • recovery of network status does not necessarily mean recovery of business failure. Therefore, manual or business monitoring is required to determine whether the current business failure has been recovered. If the business fault is restored, that is, the network status is restored and user perception is restored, the intent verifier evaluates that the network fault processing based on the business intent information is effective; if the fault diagnosis scenario is not repaired, that is, the intent verifier evaluates that the fault processing cannot be resolved. If there is a problem, the next business fault diagnosis is required, or further manual intervention is required.
  • Step S520 When the evaluation result indicates that the fault diagnosis scenario has not been repaired, re-obtain the business intent information.
  • the fault processing method including the above steps S510 to S520, it is possible to determine the fault processing results of the fault diagnosis scenario, evaluate the fault processing results, and obtain the evaluation results.
  • the evaluation results represent the fault diagnosis
  • the scenario has not been repaired, and the business intent information is reacquired, that is, the business intent information is re-standardized to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario, and the fault diagnosis scenario is fault processed according to the processing priority.
  • the embodiment does not specifically limit this.
  • the service fault corresponding to the business intent information can be processed through manual intervention. This embodiment does not impose specific restrictions on this.
  • the evaluation results, business intent information, and fault diagnosis scenarios may be stored. Assessment results, business intent information, and troubleshooting scenarios can be updated together to the knowledge manager.
  • the processing priority of its fault diagnosis scenario can be updated to the knowledge management In the processor; for network fault handling based on business intent information where the intent verifier evaluation is invalid, the business intent information and fault diagnosis scenarios can be optimized.
  • the knowledge management can be optimized fault diagnosis scenarios for this service in the device to continuously improve the automated processing capabilities and intelligent processing capabilities of this fault handling device.
  • manual confirmation of the information in the automatically updated knowledge manager is performed, and closed-loop feedback of the fault handling results is performed to iteratively optimize the accuracy of the information in the knowledge manager.
  • the intent sensor obtains the business intent information, and identifies the network quality corresponding to the business intent information according to the business intent information. Expectations, such as network quality expectations in terms of delay guarantee requirements and bandwidth guarantee requirements. Then, the intent translator will receive the business intent information from the intent sensor and translate the corresponding business issues through the business intent information, such as network equipment.
  • the intent translator will obtain one or more fault diagnosis scenarios and the processing priority of the fault diagnosis scenario through the similarity model of the knowledge manager, and assign the processing priority of the fault diagnosis scenario and the fault diagnosis scenario to The priority is sent to the delimitation locator, which delimits and locates the fault diagnosis scene, and then determines whether the fault diagnosis scene needs to be processed by the professional operation and maintenance system 400.
  • the delimitation locator sends the business intention information and fault diagnosis scenarios to the professional operation and maintenance system 400 through the fault forwarder, and uses the mature delimitation and locating capabilities of the professional operation and maintenance system 400 to perform fault analysis, fault diagnosis and fault self-healing.
  • the fault processing results of the professional operation and maintenance system 400 are evaluated through the intent verifier to evaluate whether the business fault has been repaired.
  • the evaluation results of the evaluation process represent the results of the evaluation based on Network troubleshooting of business intent information is effective.
  • the evaluation results are updated in the knowledge manager.
  • a scenario of automatic assembly of terminals at a certain station in a workshop is taken as an example, and this scenario has requirements for network assurance.
  • the intent sensor will identify the network quality expectations corresponding to the business intent information based on the business intent information, such as network quality expectations in terms of delay guarantee requirements and bandwidth guarantee requirements.
  • the intent translator will receive the time information from the intent sensor. Delay intention information is used to translate the corresponding business problems through the delay intention information.
  • Step 1 Obtain the card number information of the faulty workstation from the business operation and maintenance system 300 through the physical location information of a certain workshop and the physical location information of a certain workstation, that is, obtain the first target network device according to the network attribute information;
  • Step 2 Query the dynamic registration information of the faulty station based on the number card information, obtain the network device currently connected to the faulty station's number card (i.e., the second target network device) from the professional operation and maintenance system 400, and then obtain The longitude of the current network topology, the latitude of the current network topology and the physical location topology information of the faulty station (i.e. static network topology information), combine the longitude of the current network topology, the latitude of the current network topology and the physical location of the faulty station
  • candidate network device A and candidate network device B can be obtained, where candidate network device A and candidate network device B are both network devices that may fail;
  • Step 3 Query the delay alarm information and delay performance data (that is, the current operation and maintenance data information) of candidate network device A related to the delay intention information, and input the delay alarm information, delay performance data, and delay intention information into Similarity model performs correlation analysis on delay alarm information, delay performance data and delay intention information to obtain possible network faults, that is, assuming Equipment failure 1 and equipment failure 2; Similarly, perform similar query and analysis on candidate network equipment B to obtain the possible network failure, that is, equipment failure 3;
  • Step 4 If in step 3, neither candidate network device A nor candidate network device B has delay alarm information and delay performance data corresponding to equipment failure 1, equipment failure 2, and equipment failure 3, that is, candidate network equipment A’s If the current operation and maintenance data information and the current operation and maintenance data information of candidate network device B are normal, then step 4 will be to compare the historical business data information and historical operation data of candidate network device A and candidate network device B registered on the business operation and maintenance system 300.
  • Dimensional data information i.e., historical alarm data information and other performance data information except delay performance data
  • the current operation and maintenance data information i.e., current alarm data information and performance data information
  • the business intent information is input into the similarity model, and the device fault C corresponding to candidate network device A is determined, and the device fault D corresponding to candidate network device B is determined, and the monitoring indicator E, monitoring indicator F and monitoring indicator G are determined to be degraded performance indicators.
  • cause 1, cause 2, cause 3, cause 4 and cause 5 are different fault diagnosis scenarios, and cause 1, cause 2 and cause 4 are fault causes.
  • the highest possibility that is, the highest processing priority). At this point, the second stage of the intention translation process is completed.
  • Step 5 Call the professional operation and maintenance system 400 to perform fault analysis, fault diagnosis and fault self-healing capabilities for each fault diagnosis scenario, that is, according to the processing priority, equipment fault 1, device fault 2 and device fault 3 in step 3 are sequentially performed. Perform fault processing to obtain the fault processing result; or, according to the processing priority, perform fault processing on cause 1, cause 2, cause 3, cause 4, and cause 5 in step 4 in sequence, and obtain the fault processing result.
  • Step 6 If the professional operation and maintenance system 400 fault processing result is that equipment failure 1 caused a business failure of "the automatic assembly operation of a certain station terminal in a certain workshop is lagging behind", and equipment failure 1 has been repaired, then intent verification is required.
  • the server verifies the business failure recovery situation and manual feedback to confirm that the business has returned to normal.
  • the evaluation results, business intent information and fault diagnosis scenarios corresponding to this business failure will be updated to the knowledge manager; if the fault diagnosis scenario or the business has not been restored, If it is normal, the next business fault diagnosis needs to be performed, or further manual intervention is required.
  • the fault is delimited and located within the business operation and maintenance system 300 first. If the fault cannot be recovered, it is transferred to the professional operation and maintenance system 400 for processing.
  • the service intention information is the delay intention information of the factory park, and the delay guarantee service failure of the AGV car caused by the failure of the third base station is taken as an example.
  • the target terminal AGV car moves from the first moving position of the positioning coordinates to the second moving position of the positioning coordinates through the connected business network (i.e., I5GC (Industry 5th-Generation Core, Industry 5G Core Network)).
  • I5GC Industry 5th-Generation Core, Industry 5G Core Network
  • the communication delay of the entire moving process of the AGV car needs to have relatively strict network guarantee requirements. For example, 99.999% of the delay is less than 10 milliseconds in one month. During this process, the network coverage will change due to distance and other factors.
  • the AGV car will switch to the network at least once to access the new network.
  • the AGV car when the AGV car moves to the first moving position of the positioning coordinates, it accesses through the first target network device, that is, the second base station.
  • the first target network device that is, the second base station.
  • it When it moves to the second moving position of the positioning coordinates, it needs to switch networks.
  • the network can only be accessed through the weak coverage network equipment, that is, the fourth base station.
  • this scenario will not be able to guarantee the delay requirements. Therefore, the fault diagnosis scenario of the third base station needs to be handled through the following intent translation process (i.e., standardized processing of business intent information) to ensure delay requirements, namely:
  • Step 1 The intent sensor learns the business intent information of the business monitoring failure through the obtained business monitoring failure (that is, the key perception indicator of the delay in connecting the business monitoring becomes worse), that is, the AGV car moves from the first position of the positioning coordinate to the positioning coordinate.
  • the delay in the second mobile position exceeds 10 milliseconds;
  • Step 2 The intent sensor transmits the business intent information of the business monitoring failure to the intent translator;
  • Step 3 The intent translator learns the card number information of the target terminal AGV car from the business intent information, and learns that the AGV car is currently at the second base station and the fourth base station based on the card number information and the current network topology information (i.e., real-time network data). Registered on the base station;
  • Step 4 The intent translator obtains the current operation and maintenance data information of the second base station and the current operation and maintenance data information of the fourth base station through the knowledge manager, where the current operation and maintenance data information of the second base station and the current operation and maintenance data information of the fourth base station are obtained.
  • the dimension data information can be operation and maintenance data information such as alarm data information, performance data information, etc., to initially determine whether the second base station and the fourth base station are faulty. Finally, it is determined that the current operation and maintenance data information (such as alarm data information and performance data information) of each candidate network device is normal, that is, neither the second base station nor the fourth base station has failed, so there is no fault diagnosis scenario;
  • Step 5 Perform a similarity algorithm on the coordinate information of the AGV car, the coordinate information of the second base station and the coordinate information of the fourth base station to obtain the second target network device, that is, the third base station.
  • the third base station is the most likely to cause the service Faulty network equipment;
  • Step 6 Determine the current operation and maintenance data information (such as alarm data information) of the third base station, and obtain the alarm diagnosis scenario of the out-of-service alarm of the third base station based on the current operation and maintenance data information;
  • current operation and maintenance data information such as alarm data information
  • Step 7 Send the alarm diagnosis scenario of the out-of-service alarm of the third base station to the professional operation and maintenance system.
  • the professional operation and maintenance system will perform delimitation and positioning to ultimately ensure the fault recovery of the third base station, that is, to ensure that the AGV car is in the second station.
  • Mobile locations can access the third base station with a greater probability, ultimately ensuring delay requirements.
  • the candidate network equipment that has failed is obtained, the business intent information is translated into the alarm fault diagnosis scenario of the base station out-of-service alarm, and it is transferred to the professional operation and maintenance system for troubleshooting. process.
  • Example 2 in one embodiment, as shown in Figure 10, taking the AGV car movement scenario as an example, if the third base station of the network device does not fail, but when the AGV car applies for air interface resources from the third base station, Due to the shortage of air interface resources, resource allocation is not timely, resulting in increased latency; or, the AGV car registers with the fourth base station with weak coverage, resulting in increased latency.
  • the candidate network device associated with the connection service is the second base station, the third base station, and the third base station.
  • the base station nor the fourth base station is faulty, but the fifth base station may be faulty.
  • correlation analysis shows that the physical location of the fifth base station is not within the connection service, that is, the failure of the fifth base station will not affect the connection service.
  • the second stage of intent translation will combine the historical business data information and current operation and maintenance data information in the knowledge manager to analyze each candidate network device in turn, for example, the AGV car registered to
  • the probability analysis of the third base station and the fourth base station shows that the number of times the AGV car is registered to the fourth base station increases; for another example, the number of other terminals registered on the third base station increases significantly, resulting in resource constraints. From this, the candidate network can be judged Failures caused by the equipment's third base station are most likely to cause delays. Further, by combining the current performance data information of the third base station, the performance diagnosis scenario of the third base station is obtained.
  • the diagnosis result of this scenario is the delay caused by incorrect configuration parameters and insufficient air interface resources.
  • the solution is to dynamically adjust the configuration parameters or check the candidate network equipment.
  • the third base station is expanded to ensure the delay requirements of this service.
  • the above three examples are network fault location and fault rectification from a business perspective.
  • the two-stage intent translation design can be applied to network fault delineation and location in different industries and various campus scenarios.
  • this fault handling device can realize automated and intelligent processing of overall network faults, thereby reducing reliance on manual processing and improving Improve fault handling efficiency and improve system robustness.
  • the fault handling method includes the entire diagnosis process of ToB network faults, that is, from business fault intent identification, intent translation, delimitation and positioning (including transferring to a professional operation and maintenance system for fault delimitation and positioning) to business fault diagnosis remediation, and the entire process of intent verification, which improves user network perception and overall satisfaction.
  • This fault handling method can be applied to enterprise campuses, that is, "digital campuses” or “smart mines” supported by communications, computing power, etc., which is conducive to rapid recovery of business and operation and maintenance faults in enterprise campuses; in addition, this fault handling method The method can also be applied to operator networks to facilitate rapid delineation and location of operation and maintenance faults, and is not specifically limited here.
  • the fault handling device 200 includes a memory 202, a processor 201, and a computer program stored in the memory 202 and executable on the processor 201. .
  • the processor 201 and the memory 202 may be connected through a bus or other means.
  • the memory 202 can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory 202 may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory 202 may include memory located remotely relative to the processor 201, and these remote memories may be connected to the processor 201 through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • fault processing device 200 in this embodiment can be, for example, the fault processing device in the embodiment shown in Figure 1. These embodiments all belong to the same inventive concept, so these embodiments have the same implementation principles and The technical effects will not be detailed here.
  • the non-transitory software programs and instructions required to implement the fault handling method of the above embodiment are stored in the memory 202.
  • the fault handling method in the above embodiment is executed, for example, the above described Figure 2 is executed.
  • the method steps S110 to S130 in FIG. 3 the method steps S210 to S230 in FIG. 3 , the method steps S310 to S330 in FIG. 4 , the method steps S410 to S420 in FIG. 5 , and the method steps S510 to S520 in FIG. 6 .
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • one embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are executed by a processor or controller, for example, executing the above
  • the method steps S110 to S130 in Fig. 2, the method steps S210 to S230 in Fig. 3, the method steps S310 to S330 in Fig. 4, the method steps S410 to S420 in Fig. 5 and the method steps S510 to S510 in Fig. 6 are described. S520.
  • an embodiment of the present application also provides a computer program product, including a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium.
  • the computer program or computer instructions are obtained, and the processor executes the computer program or computer instructions, so that the computer device performs the fault handling method in the above embodiment, for example, performs the method step S110 in Figure 2 described above. to S130, method steps S210 to S230 in Figure 3, method steps S310 to S330 in Figure 4, method steps S410 to S420 in Figure 5, and method steps S510 to S520 in Figure 6.
  • Embodiments of this application include: obtaining business intent information; performing standardized processing on the business intent information to obtain the fault diagnosis scenario of the target network device and the processing priority of the fault diagnosis scenario; performing fault processing on the fault diagnosis scenario according to the processing priority, that is, That is to say, through the standardized processing of business intent information, the target network device and its fault diagnosis scenario can be accurately located, and the fault diagnosis scenario can be fault processed according to the processing priority, which improves the fault processing efficiency. Therefore, the embodiment of the present application can Accurately locate faults and improve fault handling efficiency.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Facsimiles In General (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present application provides a fault processing method and apparatus, and a storage medium. The method comprises: acquiring service intention information (S110); performing standardization processing on the service intention information to obtain a fault diagnosis scenario of a target network device and a processing priority of the fault diagnosis scenario (S120); and performing fault processing on the fault diagnosis scenario according to the processing priority (S130).

Description

故障处理方法及其装置、存储介质Troubleshooting method, device and storage medium thereof
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210994855.5、申请日为2022年08月18日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210994855.5 and a filing date of August 18, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本申请实施例涉及但不限于通信技术领域,尤其涉及一种故障处理方法及其装置、存储介质。The embodiments of the present application relate to, but are not limited to, the field of communication technology, and in particular, to a fault handling method, device, and storage medium thereof.
背景技术Background technique
在一些情形下,通常将业务运维系统和专业运维系统分开部署,其中,专业运维系统可以对网络类故障进行分析、诊断和故障修复。当业务运维人员发现业务问题后,需要将该业务问题的业务意图反馈给专业运维系统,专业运维系统根据该业务意图进行故障处理,之后,再向业务运维系统确认该业务问题是否解决。基于此,若业务运维系统中业务运维人员无法精准描述业务问题,则会存在多次故障定位的情况,进而导致故障处理效率降低。In some cases, the business operation and maintenance system and the professional operation and maintenance system are usually deployed separately. The professional operation and maintenance system can analyze, diagnose and repair network faults. When the business operation and maintenance personnel discover a business problem, they need to feed back the business intention of the business problem to the professional operation and maintenance system. The professional operation and maintenance system handles the fault according to the business intention, and then confirms with the business operation and maintenance system whether the business problem is solve. Based on this, if the business operation and maintenance personnel in the business operation and maintenance system cannot accurately describe the business problem, there will be multiple fault location situations, which will lead to a reduction in fault handling efficiency.
发明内容Contents of the invention
本申请实施例提供了一种故障处理方法及其装置、存储介质。Embodiments of the present application provide a fault handling method, device, and storage medium.
第一方面,本申请实施例提供了一种故障处理方法,包括:获取业务意图信息;对所述业务意图信息进行标准化处理,得到故障诊断场景和所述故障诊断场景的处理优先级;根据所述处理优先级对所述故障诊断场景进行故障处理。In the first aspect, embodiments of the present application provide a fault processing method, which includes: obtaining business intent information; performing standardized processing on the business intent information to obtain a fault diagnosis scenario and the processing priority of the fault diagnosis scenario; according to the Perform fault processing on the fault diagnosis scenario according to the above processing priority.
第二方面,本申请实施例还提供了一种故障处理装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述的故障处理方法。In a second aspect, embodiments of the present application also provide a fault handling device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, Troubleshooting methods as described above.
第三方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如上所述的故障处理方法。In a third aspect, embodiments of the present application also provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the fault handling method as described above.
第四方面,本申请实施例还提供了一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如上所述的故障处理方法。In a fourth aspect, embodiments of the present application further provide a computer program product, which includes a computer program or computer instructions. The computer program or computer instructions are stored in a computer-readable storage medium. The processor of the computer device obtains the information from the computer program or computer instructions. The computer-readable storage medium reads the computer program or the computer instructions, and the processor executes the computer program or the computer instructions, so that the computer device performs the fault handling method as described above.
附图说明Description of drawings
图1是本申请一个实施例提供的用于执行故障处理方法的故障处理装置的示意图;Figure 1 is a schematic diagram of a fault processing device for performing a fault processing method provided by an embodiment of the present application;
图2是本申请一个实施例提供的故障处理方法的流程图;Figure 2 is a flow chart of a fault handling method provided by an embodiment of the present application;
图3是图2中步骤S120的一种方法的流程图;Figure 3 is a flow chart of a method in step S120 in Figure 2;
图4是图3中步骤S220的一种方法的流程图; Figure 4 is a flow chart of a method in step S220 in Figure 3;
图5是本申请另一个实施例提供的故障处理方法的流程图;Figure 5 is a flow chart of a fault handling method provided by another embodiment of the present application;
图6是本申请另一个实施例提供的故障处理方法的流程图;Figure 6 is a flow chart of a fault handling method provided by another embodiment of the present application;
图7是本申请一个实施例提供的故障处理装置部署在业务运维系统的应用场景的示意图;Figure 7 is a schematic diagram of an application scenario in which a fault handling device provided by an embodiment of the present application is deployed in a business operation and maintenance system;
图8是本申请另一个实施例提供的故障处理方法的流程图;Figure 8 is a flow chart of a fault handling method provided by another embodiment of the present application;
图9是本申请一个实施例提供的用于执行故障处理方法的故障处理装置部署在业务运维系统的应用场景的示意图;Figure 9 is a schematic diagram of an application scenario in which a fault handling device for executing a fault handling method provided by an embodiment of the present application is deployed in a business operation and maintenance system;
图10是本申请一个实施例提供的AGV自动导引运输车的移动场景的示意图。Figure 10 is a schematic diagram of the movement scenario of an AGV automatic guided transport vehicle provided by an embodiment of the present application.
图11是本申请的另一个实施例提供的一种故障处理装置的示意图。Figure 11 is a schematic diagram of a fault processing device provided by another embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the embodiments described here are only used to explain the present application and are not used to limit the present application.
需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图的描述中,多个(或多项)的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到“第一”、“第二”等只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。It should be noted that although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that in the flowchart. In the description of the specification, claims and the above drawings, plural (or multiple) means two or more, greater than, less than, exceeding, etc. are understood to exclude the number, and above, below, within, etc. are understood to include the number. If there are descriptions of "first", "second", etc., they are only used for the purpose of distinguishing technical features and cannot be understood as indicating or implying the relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the indicated technical features. The sequence relationship of technical features.
本申请实施例包括:获取业务意图信息;对业务意图信息进行标准化处理,得到目标网络设备的故障诊断场景和故障诊断场景的处理优先级;根据处理优先级对故障诊断场景进行故障处理,即是说,通过对业务意图信息的标准化处理,精确定位目标网络设备以及其故障诊断场景,并且能够根据处理优先级对故障诊断场景进行故障处理,提高了故障的处理效率,因此,本申请实施例能够精确定位故障,同时能够提高故障的处理效率。Embodiments of this application include: obtaining business intent information; performing standardized processing on the business intent information to obtain the fault diagnosis scenario of the target network device and the processing priority of the fault diagnosis scenario; performing fault processing on the fault diagnosis scenario according to the processing priority, that is, That is to say, through the standardized processing of business intent information, the target network device and its fault diagnosis scenario can be accurately located, and the fault diagnosis scenario can be fault processed according to the processing priority, which improves the fault processing efficiency. Therefore, the embodiment of the present application can Accurately locate faults and improve fault handling efficiency.
如图1所示,图1是本申请一个实施例提供的用于执行故障处理方法的故障处理装置的示意图。在图1的实施例中,该故障处理装置至少包括意图感知器110、意图翻译器120、知识管理器130、定界定位器150、故障转交器140和意图验证器160,其中,意图感知器110、意图翻译器120、知识管理器130、意图验证器160、定界定位器150和故障转交器140依次通讯连接,且意图翻译器120还与定界定位器150通讯连接。As shown in Figure 1, Figure 1 is a schematic diagram of a fault processing device for executing a fault processing method provided by an embodiment of the present application. In the embodiment of FIG. 1 , the fault handling device at least includes an intent sensor 110 , an intent translator 120 , a knowledge manager 130 , a delimitation locator 150 , a fault forwarder 140 and an intent verifier 160 , where the intent sensor 110. The intent translator 120, the knowledge manager 130, the intent verifier 160, the delimitation locator 150 and the fault handler 140 are communicatively connected in sequence, and the intent translator 120 is also communicatively connected with the delimitation locator 150.
意图感知器110可以获取用户业务感知指标诉求,然后通过语音、文字等方式的业务意图输入,将该用户业务感知指标诉求等业务意图信息转化为处理语句,比如时延保障要求、带宽保障要求等处理语句。以某车间某工位终端自动装配业务的场景为例,意图感知器110可以将该的业务意图信息转化为某车间某工位业务时延、某车间某工位网络抖动、某车间某工位终端故障等处理语句。The intent sensor 110 can obtain the user's service perception indicator requirements, and then convert the user's service perception indicator requirements and other business intention information into processing statements, such as delay guarantee requirements, bandwidth guarantee requirements, etc., through business intention input in voice, text, etc. Process statements. Taking the scenario of terminal automatic assembly business at a certain workstation in a certain workshop as an example, the intent sensor 110 can convert the business intention information into the service delay of a certain workstation in a certain workshop, the network jitter of a certain workstation in a certain workshop, and the network jitter of a certain workstation in a certain workshop. Terminal failure and other handling statements.
另外,意图感知器110还可以实时获取业务监控故障,并自动对业务监控故障的业务意图信息进行感知。可以理解的是,在园区场景下,用户对业务语句中可能的表达的业务意图是可以枚举的,在此不做具体限制。In addition, the intent sensor 110 can also obtain service monitoring faults in real time and automatically sense the service intent information of the service monitoring faults. It can be understood that in a campus scenario, the possible business intentions expressed by users in business statements can be enumerated, and there are no specific restrictions here.
意图翻译器120是故障处理装置的核心组件之一,该意图翻译器120可以接收来自意图感知器110的业务意图信息,对该业务意图信息进行标准化处理,得到故障诊断场景和故障 诊断场景的处理优先级。比如,假设目标终端是发生业务故障的园区终端,将其业务意图信息输入意图翻译器120,意图翻译器120对业务意图信息进行标准化语句处理,得到园区终端的网络属性信息,根据网络属性信息得到第一目标网络设备,接着意图翻译器120获取该园区终端的当前网络拓扑信息和静态网络拓扑信息,并对该静态网络拓扑信息和当前网络拓扑信息进行关联分析,得到多个与第一目标网络设备相关联的第二目标网络设备。另外,意图翻译器120还可以实时对当前运维数据信息(比如,相关设备的网络特性数据信息、告警数据信息以及性能数据信息等)进行动态查询,由此可以直观地得到与业务故障对应的故障诊断场景(比如设备故障),或者,也可以将当前运维数据信息输入到来自知识管理器130的相似度模型中,然后翻译得到多个故障诊断场景和故障诊断场景的处理优先级,其中,多个故障诊断场景的故障原因可能不相同。可以理解的是,当故障诊断场景为专业运维故障诊断场景时,可以将业务意图信息与故障诊断场景发送给专业运维系统。The intention translator 120 is one of the core components of the fault processing device. The intention translator 120 can receive the business intention information from the intention sensor 110, standardize the business intention information, and obtain the fault diagnosis scenario and fault The processing priority of diagnostic scenarios. For example, assuming that the target terminal is a campus terminal that has experienced a service failure, its business intent information is input into the intent translator 120. The intent translator 120 performs standardized sentence processing on the business intent information to obtain the network attribute information of the campus terminal. According to the network attribute information, The first target network device, then the intent translator 120 obtains the current network topology information and static network topology information of the campus terminal, and performs correlation analysis on the static network topology information and the current network topology information to obtain multiple The second target network device the device is associated with. In addition, the intent translator 120 can also dynamically query the current operation and maintenance data information (such as network characteristic data information, alarm data information, performance data information of related equipment, etc.) in real time, thereby intuitively obtaining the information corresponding to the business fault. Fault diagnosis scenarios (such as equipment failure), or the current operation and maintenance data information can also be input into the similarity model from the knowledge manager 130, and then translated to obtain multiple fault diagnosis scenarios and the processing priorities of the fault diagnosis scenarios, where , the fault causes of multiple fault diagnosis scenarios may be different. It can be understood that when the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, the business intention information and the fault diagnosis scenario can be sent to the professional operation and maintenance system.
知识管理器130可以用于保存和提供该故障处理装置所需的知识和经验,比如,业务意图表达的统一处理语句(即对业务意图信息进行标准化语句处理的处理语句)、历史运维数据信息、历史业务数据信息、业务意图信息转换后的结果信息(比如网络属性信息)、故障诊断场景的处理优先级、与典型故障和指标劣化等业务意图信息相关的信息,比如,与某区的视频监控业务保障的业务意图信息相关的带宽故障信息、对应的网络设备以及故障诊断场景信息等信息;又如,与某车间某工位自动装配业务保障的业务意图信息强关联的时延故障信息等。知识管理器130还可以用于存储专业运维系统下不同的故障诊断场景(比如设备故障或者指标劣化的可能原因的故障诊断场景),以及故障诊断指令等。另外,业务运维系统(或者专业运维系统)可以向知识管理器130注入业务运维系统(或者专业运维系统)的历史业务数据信息和历史运维数据信息,通过知识管理器130对该历史业务数据信息和历史运维数据信息进行训练,得到相似度模型。The knowledge manager 130 can be used to save and provide the knowledge and experience required by the fault handling device, such as unified processing statements expressing business intent (that is, processing statements that perform standardized statement processing on business intent information), historical operation and maintenance data information , historical business data information, result information after business intention information conversion (such as network attribute information), processing priority of fault diagnosis scenarios, information related to business intention information such as typical faults and indicator degradation, for example, video in a certain area Monitor the bandwidth fault information, corresponding network equipment, fault diagnosis scenario information and other information related to the business intention information of the service guarantee; another example is the delay fault information that is strongly related to the business intention information of the automatic assembly service guarantee of a certain station in a certain workshop, etc. . The knowledge manager 130 can also be used to store different fault diagnosis scenarios under the professional operation and maintenance system (such as fault diagnosis scenarios of possible causes of equipment failure or indicator degradation), as well as fault diagnosis instructions, etc. In addition, the business operation and maintenance system (or professional operation and maintenance system) can inject historical business data information and historical operation and maintenance data information of the business operation and maintenance system (or professional operation and maintenance system) into the knowledge manager 130, and use the knowledge manager 130 to Historical business data and historical operation and maintenance data are used for training to obtain a similarity model.
值得注意的是,知识管理器130中的数据库、知识库或运维知识图谱,它们的形成和接口在一个智能故障诊断系统中非常重要。It is worth noting that the formation and interface of the database, knowledge base or operation and maintenance knowledge graph in the knowledge manager 130 are very important in an intelligent fault diagnosis system.
定界定位器150可以用于为意图翻译器120输出的一种或多种故障诊断场景(比如网络设备故障或者网络性能指标劣化等故障诊断场景)提供自动化诊断能力。定界定位器150从知识管理器130中获取网络设备故障或者网络性能指标劣化等故障诊断场景的故障诊断指令,并根据故障诊断指令对来自意图翻译器120的故障诊断场景进行故障处理。以故障处理装置设置在业务运维系统为例,若当故障诊断场景为业务运维故障诊断场景,即该故障诊断场景可以在业务运维系统中定界定位时,则定界定位器150从知识管理器130中获取故障诊断指令,并根据故障诊断指令对故障诊断场景进行故障处理;当故障诊断场景为专业运维故障诊断场景,即该故障诊断场景可以在专业运维系统中定界定位时,则需要调用故障转交器140,通过故障转交器140将该业务意图信息与故障诊断场景转发给专业运维系统,由专业运维系统对该故障诊断场景进行故障处理。The delimitation locator 150 may be used to provide automated diagnosis capabilities for one or more fault diagnosis scenarios output by the intent translator 120 (such as network device failure or network performance index degradation and other fault diagnosis scenarios). The delimitation locator 150 obtains fault diagnosis instructions for fault diagnosis scenarios such as network equipment failure or network performance index degradation from the knowledge manager 130, and performs fault processing on the fault diagnosis scenarios from the intent translator 120 according to the fault diagnosis instructions. Taking the fault processing device installed in the business operation and maintenance system as an example, if the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, that is, when the fault diagnosis scenario can be delimited and positioned in the business operation and maintenance system, the delimitation locator 150 starts from Fault diagnosis instructions are obtained in the knowledge manager 130, and fault diagnosis scenarios are performed according to the fault diagnosis instructions; when the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, that is, the fault diagnosis scenario can be defined and positioned in the professional operation and maintenance system. , the fault switcher 140 needs to be called, and the business intention information and fault diagnosis scenario are forwarded to the professional operation and maintenance system through the fault switcher 140, and the professional operation and maintenance system performs fault processing on the fault diagnosis scenario.
值得注意的是,无论时业务运维系统的定界定位器150还是专业运维系统的定界定位器150,其故障定位都涉及复杂诊断流程如故障诊断树、重复诊断、多步诊断、延缓诊断等。It is worth noting that whether it is the delimitation locator 150 of the business operation and maintenance system or the delimitation locator 150 of the professional operation and maintenance system, the fault location involves complex diagnostic processes such as fault diagnosis tree, repeated diagnosis, multi-step diagnosis, delay Diagnosis etc.
故障转交器140用于将业务意图信息与故障诊断场景转交给专业运维系统或者业务运维系统,以便于专业运维系统或者业务运维系统对故障诊断场景进行故障处理(即对故障诊断场景进行分析、故障诊断和故障自愈),并接收来自专业运维系统或者业务运维系统对故障 诊断场景的故障处理结果。比如,当故障转交器140设置在业务运维系统,且故障诊断场景为专业运维故障诊断场景,那么故障转交器140需要将业务意图信息与故障诊断场景转交给专业运维系统,之后,接收专业运维系统根据业务意图信息与故障诊断场景得到故障处理结果;反之,当故障转交器140设置在专业运维系统,且故障诊断场景为专业运维故障诊断场景,那么故障转交器140需要将业务意图信息与故障诊断场景可以转交给业务运维系统,之后,接收业务运维系统根据业务意图信息与故障诊断场景得到故障处理结果,在此不做具体限制。The fault transferor 140 is used to transfer the business intention information and fault diagnosis scenarios to the professional operation and maintenance system or the business operation and maintenance system, so that the professional operation and maintenance system or the business operation and maintenance system can perform fault processing on the fault diagnosis scene (i.e., perform fault diagnosis on the fault diagnosis scene). analysis, fault diagnosis and fault self-healing), and receive fault reports from professional operation and maintenance systems or business operation and maintenance systems. Troubleshooting results of diagnostic scenarios. For example, when the fault switch 140 is set in the business operation and maintenance system, and the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, then the fault switch 140 needs to transfer the business intention information and fault diagnosis scenario to the professional operation and maintenance system, and then receive The professional operation and maintenance system obtains the fault processing result based on the business intention information and the fault diagnosis scenario; conversely, when the fault switch 140 is set in the professional operation and maintenance system, and the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, then the fault switch 140 needs to The business intent information and fault diagnosis scenarios can be transferred to the business operation and maintenance system. Afterwards, the receiving business operation and maintenance system obtains the fault processing results based on the business intent information and fault diagnosis scenarios. There are no specific restrictions here.
意图验证器160用于评估定界定位器150的故障处理结果,即对故障诊断场景的故障处理结果进行评估处理,得到评估结果。最后,意图验证器160还将评估结果更新到知识管理器130中,以不断优化和提升意图翻译的正确性。The intent verifier 160 is used to evaluate the fault processing result of the delimitation locator 150, that is, to evaluate the fault processing result of the fault diagnosis scenario to obtain the evaluation result. Finally, the intent verifier 160 also updates the evaluation results to the knowledge manager 130 to continuously optimize and improve the accuracy of the intent translation.
在一实施例中,当网络状态恢复,业务恢复正常且用户感知恢复,则评估结果为故障处理有效,可以闭环关闭本次故障处理流程;当网络状态无法恢复或业务状态无法恢复,则评估结果表征故障诊断场景未修复,那么需要转交人工处理,或者由该故障处理装置重新获取业务意图信息,再次对该业务意图信息进行标准化处理。In one embodiment, when the network status recovers, the service returns to normal and user perception recovers, the evaluation result is that the fault processing is effective, and the fault processing process can be closed in a closed loop; when the network status cannot be restored or the service status cannot be restored, the evaluation result It means that the fault diagnosis scenario has not been repaired, so it needs to be transferred to manual processing, or the fault processing device can re-obtain the business intent information and standardize the business intent information again.
可以理解的是,该故障处理装置可以部署在业务运维系统中,也可以部署在专业运维系统中,还可以设置在业务运维系统和专业运维系统中,在此不做具体限制。It can be understood that the fault handling device can be deployed in the business operation and maintenance system or in the professional operation and maintenance system, and can also be set up in the business operation and maintenance system and the professional operation and maintenance system. There is no specific restriction here.
本申请实施例描述的故障处理装置以及应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着故障处理装置、业务运维系统或者专业运维系统的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The fault handling device and application scenarios described in the embodiments of the present application are for the purpose of explaining the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application. Those skilled in the art will know that with the failure With the evolution of processing devices, business operation and maintenance systems, or professional operation and maintenance systems and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
本领域技术人员可以理解的是,图1中示出的故障处理装置并不构成对本申请实施例的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the fault processing device shown in Figure 1 does not limit the embodiments of the present application, and may include more or less components than shown, or combine certain components, or use different Component placement.
基于上述故障处理装置,下面结合附图,对本申请实施例作进一步阐述。Based on the above fault handling device, the embodiments of the present application will be further described below in conjunction with the accompanying drawings.
参照图2,图2是本申请一个实施例提供的故障处理方法的流程图,该故障处理方法可以应用于故障处理装置,例如图1所示的故障处理装置,且该故障处理装置可以设置于业务运维系统中。该故障处理方法可以包括但不限于步骤S110、步骤S120和步骤S130。Referring to Figure 2, Figure 2 is a flow chart of a fault processing method provided by an embodiment of the present application. The fault processing method can be applied to a fault processing device, such as the fault processing device shown in Figure 1, and the fault processing device can be provided in in the business operation and maintenance system. The fault handling method may include but is not limited to step S110, step S120 and step S130.
步骤S110:获取业务意图信息。Step S110: Obtain business intent information.
一可行的实施方式,业务意图信息可以是时延意图信息,也可以是其他业务意图信息,在此不再一一列举。In a feasible implementation manner, the business intention information may be delay intention information or other business intention information, which will not be listed here.
步骤S120:对业务意图信息进行标准化处理,得到故障诊断场景和故障诊断场景的处理优先级。Step S120: Standardize the business intent information to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
一可行的实施方式,故障诊断场景的数量可以有多个,且该多个故障诊断场景可以不相同,其中,故障诊断场景可以包括告警诊断场景和性能诊断场景。In a feasible implementation, there may be multiple fault diagnosis scenarios, and the multiple fault diagnosis scenarios may be different, where the fault diagnosis scenarios may include alarm diagnosis scenarios and performance diagnosis scenarios.
步骤S130:根据处理优先级对故障诊断场景进行故障处理。Step S130: Perform fault processing on the fault diagnosis scenario according to the processing priority.
本实施例中,通过采用包括有上述步骤S110至步骤S130的故障处理方法,首先可以获取业务意图信息,然后对业务意图信息进行标准化处理,得到故障诊断场景和故障诊断场景的处理优先级,最后根据处理优先级对故障诊断场景进行故障处理,即是说,通过对业务意图信息的标准化处理,精确定位目标网络设备以及其故障诊断场景,并且能够根据处理优先级对故障诊断场景进行故障处理,提高了故障的处理效率,因此,本申请实施例能够精确定 位故障,同时能够提高故障的处理效率。In this embodiment, by adopting the fault processing method including the above-mentioned steps S110 to S130, the business intention information can first be obtained, and then the business intention information can be standardized to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario. Finally, Troubleshooting fault diagnosis scenarios according to processing priority, that is, through standardized processing of business intent information, accurately locate the target network device and its fault diagnosis scenario, and be able to perform fault processing on the fault diagnosis scenario according to processing priority, Improved fault handling efficiency, therefore, the embodiment of the present application can accurately determine fault, and at the same time can improve the efficiency of fault handling.
值得注意的是,相关技术中,运维操作人员或用户可以将期望的业务意图表达给意图识别模型,通过意图识别模型对网络类故障进行故障分析、故障诊断和故障自愈。但是,意图识别模型往往需要依赖于整个网络的拓扑结构、网络配置任务或者策略等网络数据,因此,该业务意图的执行过程和验证过程通常需要额外的网络开销和资源开销。又因为,在业务运维系统上,由于资源限制和网络特性,无完整的网络数据,而且在实际系统运行中,企业网络拓扑往往会经常发生变化,因此,在业务运维系统上将无法对企业的业务故障进行快速地自动化处理。It is worth noting that in related technologies, operation and maintenance operators or users can express desired business intentions to the intent recognition model, and perform fault analysis, fault diagnosis and fault self-healing on network faults through the intent recognition model. However, the intent identification model often relies on network data such as the topology of the entire network, network configuration tasks, or policies. Therefore, the execution and verification process of the business intent usually requires additional network overhead and resource overhead. And because, in the business operation and maintenance system, due to resource limitations and network characteristics, there is no complete network data, and in the actual system operation, the enterprise network topology often changes, therefore, it will not be possible to control the business operation and maintenance system. Enterprise business failures can be quickly and automatically handled.
此外,由于5G通讯网络在ToB应用中提供了基础信息通道的核心能力,因此,企业更为关注的是如何依托5G网络开展各类业务生产活动,而对于具体的网络故障,企业运维人员无法专业地处理,因此,如何能够从业务视角出发对网络故障进行管理维护,是亟待解决的一个问题,其中,ToB就是在企业业务中,以企业作为服务主体,为企业客户提供平台、产品或服务的业务模式,因此,该ToB也称为企业服务。In addition, because the 5G communication network provides the core capabilities of basic information channels in ToB applications, enterprises are more concerned about how to rely on the 5G network to carry out various business production activities. However, for specific network failures, enterprise operation and maintenance personnel cannot Handle it professionally. Therefore, how to manage and maintain network faults from a business perspective is an urgent problem to be solved. Among them, ToB is to use the enterprise as the service subject in the enterprise business to provide platforms, products or services to enterprise customers. business model, therefore, this ToB is also called enterprise service.
基于上述分析,在一实施例中,如图3所示,对步骤S120进行进一步的说明,该步骤S120可以包括但不限于有步骤S210、步骤S220和步骤S230。Based on the above analysis, in one embodiment, as shown in FIG. 3 , step S120 is further described. Step S120 may include but is not limited to step S210, step S220, and step S230.
步骤S210:对业务意图信息进行标准化语句处理,得到目标终端的网络属性信息。Step S210: Perform standardized sentence processing on the service intent information to obtain network attribute information of the target terminal.
一可行的实施方式,网络属性信息可以包括物理位置信息、号卡信息等,在此不做具体限制。In a feasible implementation, the network attribute information may include physical location information, card number information, etc., which are not specifically limited here.
一可行的实施方式,目标终端(比如园区终端)均是内置了号卡的园区的某个工位、龙门吊、某个工业相机或着AGV(Automated Guided Vehicle,自动导引运输车)小车等,且该所有目标终端均有个共同特征,即通过号卡注册到5G专网网络中,由网络自动控制其进行自动化操作或进行自动化监控等,在此不做具体限制。A feasible implementation method is that the target terminal (such as a park terminal) is a work station in the park with a built-in number card, a gantry crane, an industrial camera or an AGV (Automated Guided Vehicle) car, etc. And all target terminals have a common feature, that is, they are registered to the 5G private network through a number card, and the network automatically controls them to perform automated operations or automated monitoring. There are no specific restrictions here.
步骤S220:获取目标终端的静态网络拓扑信息和目标终端的当前网络拓扑信息,根据网络属性信息、静态网络拓扑信息和当前网络拓扑信息,得到多个候选网络设备。Step S220: Obtain the static network topology information of the target terminal and the current network topology information of the target terminal, and obtain multiple candidate network devices based on the network attribute information, static network topology information, and current network topology information.
一可行的实施方式,静态网络拓扑信息包括静态物理网络拓扑和静态逻辑网络拓扑,其中,静态物理网络拓扑包括静态网络拓扑的经度和静态网络拓扑的纬度等,静态逻辑网络拓扑包括与目标终端相关的各个网络设备之间的逻辑链路等。同样地,当前网络拓扑信息包括当前物理网络拓扑和当前逻辑网络拓扑,其中,当前物理网络拓扑包括物理位置拓扑信息、当前网络拓扑的经度和当前网络拓扑的纬度等,当前逻辑网络拓扑包括与目标终端相关的各个网络设备之间的逻辑链路等,在此不做具体限制。In a feasible implementation, the static network topology information includes a static physical network topology and a static logical network topology. The static physical network topology includes the longitude of the static network topology and the latitude of the static network topology. The static logical network topology includes the information related to the target terminal. Logical links between various network devices, etc. Similarly, the current network topology information includes the current physical network topology and the current logical network topology. The current physical network topology includes the physical location topology information, the longitude of the current network topology, the latitude of the current network topology, etc., and the current logical network topology includes information related to the target. The logical links between various network devices related to the terminal are not specifically limited here.
一可行的实施方式,候选网络设备可以是对组成5G(5th Generation,第五代移动通信系统)专网所需的CPE(Customer Premise Equipment,客户终端设备),也可以是基站等通讯设备,在此不做具体限制。In a feasible implementation, candidate network equipment may be CPE (Customer Premise Equipment, customer terminal equipment) required to form a 5G (5th Generation, fifth generation mobile communication system) private network, or it may be communication equipment such as base stations. There are no specific restrictions on this.
步骤S230:确定各个候选网络设备的当前运维数据信息,根据当前运维数据信息得到故障诊断场景和故障诊断场景的处理优先级。Step S230: Determine the current operation and maintenance data information of each candidate network device, and obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario based on the current operation and maintenance data information.
一可行的实施方式,当前运维数据信息包括告警数据信息、性能数据信息以及其他运维数据信息,其中,告警数据信息可以是时延告警信息、退服告警信息以及其他告警信息,性能数据信息可以是时延性能数据等,在此不再一一例举。比如,假设候选网络设备为基站,当基站中存在基站退服告警信息,则根据基站退服告警信息得到该退服告警的告警诊断场景, 且该告警诊断场景的处理优先级最高;又如,当候选网络设备的内部网络的端到端用户感知指标数据劣化,即性能数据信息为内部网络的端到端用户感知指标数据劣化,根据该性能数据信息得到性能诊断场景,即设备参数配置检查,本实施例对此不作具体限制。In a feasible implementation, the current operation and maintenance data information includes alarm data information, performance data information and other operation and maintenance data information. The alarm data information can be delay alarm information, out-of-service alarm information and other alarm information. Performance data information It can be latency performance data, etc., which are not listed here. For example, assuming that the candidate network device is a base station, when there is base station out-of-service alarm information in the base station, the alarm diagnosis scenario of the out-of-service alarm is obtained based on the base station out-of-service alarm information. And the processing priority of this alarm diagnosis scenario is the highest; for another example, when the end-to-end user perception indicator data of the internal network of the candidate network device is deteriorated, that is, the performance data information is the end-to-end user perception indicator data of the internal network. According to this The performance data information is obtained from the performance diagnosis scenario, that is, the device parameter configuration check, which is not specifically limited in this embodiment.
在一实施例中,可以将当前运维数据信息和业务意图信息输入到相似度模型,得到故障诊断场景和故障诊断场景的处理优先级。其中,该相似度模型可以通过获取历史运维数据信息和历史业务数据信息,然后对历史运维数据信息和历史业务数据信息进行训练而得到。In one embodiment, the current operation and maintenance data information and business intent information can be input into the similarity model to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario. Among them, the similarity model can be obtained by obtaining historical operation and maintenance data information and historical business data information, and then training the historical operation and maintenance data information and historical business data information.
在一实施例中,可以对当前运维数据信息和业务意图信息进行相似度计算,得到故障诊断场景和故障诊断场景的处理优先级,其中,可以使用欧式距离、余弦相似度算法、曼哈顿距离或者切比雪夫距离等相似度算法对当前运维数据信息和业务意图信息进行相似度计算,在此不做具体限制。In one embodiment, similarity calculation can be performed on the current operation and maintenance data information and the business intention information to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario, wherein the Euclidean distance, cosine similarity algorithm, Manhattan distance or Similarity algorithms such as Chebyshev distance perform similarity calculations on current operation and maintenance data information and business intent information, and there are no specific restrictions here.
可以理解的是,可以将当前运维数据信息和业务意图信息输入到相似度模型,得到故障诊断场景和故障诊断场景的处理优先级,即是说,通过相似度模型对当前运维数据信息和业务意图信息进行相似度计算,因此,故障诊断场景即是与业务意图信息具有相关性的当前运维数据信息所对应的诊断场景,而故障诊断场景的处理优先级即是故障诊断场景所对应的当前运维数据信息与业务意图信息的相关性排序,其中,相关性排序是相关性从高到低的排序,可以根据实际故障诊断场景确定,在此不做具体限制。It can be understood that the current operation and maintenance data information and business intention information can be input into the similarity model to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario. That is to say, the current operation and maintenance data information and the processing priority of the fault diagnosis scenario can be obtained through the similarity model. The business intent information is used for similarity calculation. Therefore, the fault diagnosis scenario is the diagnostic scenario corresponding to the current operation and maintenance data information that is relevant to the business intent information, and the processing priority of the fault diagnosis scenario is the diagnostic scenario corresponding to the fault diagnosis scenario. The correlation sorting between the current operation and maintenance data information and the business intention information, where the correlation sorting is the sorting from high to low relevance, which can be determined according to the actual fault diagnosis scenario, and is not specifically limited here.
举一示例,假设候选网络设备中存在大量且多种的告警数据信息,则可以将所有的该告警数据信息和业务意图信息均输入到相似度模型中,即通过相似度模型对该告警数据信息和业务意图信息进行相似度计算,得到多个告警诊断场景和各个告警诊断场景的处理优先级,即多个告警产生原因和各个告警产生原因与业务意图信息的相关性排序。接着,可以根据该相关性排序(即处理优先级)依次对各个告警产生原因进行故障处理,或者,根据该相关性排序选择其中一部分告警产生原因进行故障处理,在此不做具体限制。For example, assuming that there are a large number and variety of alarm data information in the candidate network equipment, all the alarm data information and business intent information can be input into the similarity model, that is, the alarm data information can be analyzed through the similarity model. Similarity calculation is performed with the business intent information to obtain multiple alarm diagnosis scenarios and the processing priority of each alarm diagnosis scenario, that is, the correlation order between the causes of multiple alarms and the causes of each alarm and the business intent information. Then, fault processing can be performed on each alarm generating cause in sequence according to the correlation sorting (ie, processing priority), or a part of the alarm generating causes can be selected for fault processing according to the correlation sorting, which is not specifically limited here.
举一示例,假设该业务意图信息为时延意图,当前运维数据信息包括告警数据信息和性能数据信息,其中,告警数据信息为端到端设备故障信息,性能数据信息为时延性能数据,根据当前运维数据信息与业务意图信息的相关性排序可知,时延性能数据对应的性能诊断场景的处理优先级最高,端到端设备故障信息对应的性能诊断场景的处理优先级次高,因此,可以根据处理优先级依次对时延性能数据对应的性能诊断场景和端到端设备故障信息对应的性能诊断场景进行故障处理,在此不做具体限制。As an example, assume that the business intent information is delay intent, and the current operation and maintenance data information includes alarm data information and performance data information, where the alarm data information is end-to-end device fault information, and the performance data information is delay performance data. According to the correlation sorting between the current operation and maintenance data information and the business intention information, it can be seen that the performance diagnosis scenario corresponding to the delay performance data has the highest processing priority, and the performance diagnosis scenario corresponding to the end-to-end device fault information has the second highest processing priority. Therefore, , fault processing can be performed on the performance diagnosis scenarios corresponding to the delay performance data and the performance diagnosis scenarios corresponding to the end-to-end device fault information in sequence according to the processing priority, and there are no specific restrictions here.
在另一实施例中,当各个候选网络设备的当前运维数据信息(比如告警数据信息和性能数据信息)正常,可以获取各个候选网络设备的历史业务数据信息,对历史业务数据信息和当前运维数据信息进行关联分析,得到故障诊断场景和故障诊断场景的处理优先级,其中,可以通过对历史业务数据信息和当前运维数据信息进行相似度计算以进行关联分析。比如,假设各个候选网络设备的告警数据信息和性能数据信息均正常,则意图翻译器可以获取各个候选网络设备的历史业务数据信息,对历史业务数据信息和当前运维数据信息进行关联分析,确定是否是其他业务量的增加抢占网络资源引起本业务网络性能指标不达标,因此,可以得到候选网络设备的关键指标劣化的可能原因(即性能诊断场景)和该关键指标劣化的可能原因与业务数据信息的相关性排序(即性能诊断场景的处理优先级),然后根据该相关性排序依次对该关键指标劣化的可能原因进行故障处理,在此不做具体限制。In another embodiment, when the current operation and maintenance data information (such as alarm data information and performance data information) of each candidate network device is normal, the historical business data information of each candidate network device can be obtained, and the historical business data information and current operation data information can be obtained. Perform correlation analysis on dimensional data information to obtain fault diagnosis scenarios and processing priorities of fault diagnosis scenarios. Correlation analysis can be performed by calculating the similarity between historical business data information and current operation and maintenance data information. For example, assuming that the alarm data and performance data of each candidate network device are normal, the intent translator can obtain the historical business data of each candidate network device, perform correlation analysis on the historical business data and current operation and maintenance data, and determine Whether the increase in other business volumes preempting network resources has caused the network performance indicators of this business to fail to meet standards. Therefore, we can obtain the possible reasons for the degradation of key indicators of the candidate network equipment (i.e., performance diagnosis scenarios) and the possible reasons for the degradation of the key indicators and business data. The information is sorted by relevance (i.e., the processing priority of the performance diagnosis scenario), and then the possible causes of the degradation of the key indicators are sequentially processed based on the relevance sorting. There are no specific restrictions here.
本实施例中,通过采用包括有上述步骤S210至步骤S230的故障处理方法,首先,可以 对业务意图信息进行标准化语句处理,得到目标终端的网络属性信息,接着获取目标终端的静态网络拓扑信息和目标终端的当前网络拓扑信息,根据网络属性信息、静态网络拓扑信息和当前网络拓扑信息,得到多个候选网络设备,最后确定各个候选网络设备的当前运维数据信息,根据当前运维数据信息得到故障诊断场景和故障诊断场景的处理优先级,即是说,在企业的业务运维系统中,即便企业网络拓扑发生变化,也可以实时获取当前网络拓扑信息,根据网络属性信息、静态网络拓扑信息和当前网络拓扑信息确定多个候选网络设备,并对该各个候选网络设备进行故障诊断,以便于后续步骤中对故障诊断场景进行故障处理,因此,本申请实施例能够从业务视角出发,在网络拓扑发生变化的情况下,对企业的业务故障进行快速地自动化处理。In this embodiment, by adopting the fault handling method including the above steps S210 to S230, first, it is possible to Perform standardized statement processing on the business intent information to obtain the network attribute information of the target terminal, and then obtain the static network topology information of the target terminal and the current network topology information of the target terminal. Based on the network attribute information, static network topology information, and current network topology information, Obtain multiple candidate network devices, and finally determine the current operation and maintenance data information of each candidate network device. Based on the current operation and maintenance data information, obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario. That is to say, in the enterprise's business operation and maintenance system Even if the enterprise network topology changes, the current network topology information can be obtained in real time, multiple candidate network devices can be determined based on network attribute information, static network topology information and current network topology information, and fault diagnosis can be performed on each candidate network device. This facilitates troubleshooting in fault diagnosis scenarios in subsequent steps. Therefore, the embodiments of the present application can quickly and automatically handle enterprise business faults when the network topology changes from a business perspective.
在一实施例中,如图4所示,对步骤S220进行进一步的说明,该步骤S220可以包括但不限于有步骤S310、步骤S320和步骤S330。In one embodiment, as shown in FIG. 4 , step S220 is further described. This step S220 may include but is not limited to step S310, step S320 and step S330.
步骤S310:根据网络属性信息得到第一目标网络设备。Step S310: Obtain the first target network device according to the network attribute information.
一可行的实施方式,该第一目标网络设备是目标终端注册的网络设备。该第一目标网络设备的数量可以为多个,在此不做具体限制。In a feasible implementation, the first target network device is a network device registered by the target terminal. The number of the first target network devices may be multiple, and there is no specific limitation here.
步骤S320:对静态网络拓扑信息和当前网络拓扑信息进行关联分析,得到多个与第一目标网络设备相关联的第二目标网络设备。Step S320: Perform correlation analysis on the static network topology information and the current network topology information to obtain a plurality of second target network devices associated with the first target network device.
一可行的实施方式,该第二目标网络设备可以是第一目标网络设备注册的网络设备,或者是第一目标网络设备接入的网络设备,或者与第一目标网络设备其他关联关系的网络设备,在此不做具体限制。In a possible implementation, the second target network device may be a network device registered with the first target network device, or a network device accessed by the first target network device, or a network device with other associated relationships with the first target network device. , no specific restrictions are made here.
一可行的实施方式,静态网络拓扑信息可以是物理位置拓扑信息等,在此不再一一列举。In a feasible implementation, the static network topology information may be physical location topology information, etc., which will not be listed here.
步骤S330:将第一目标网络设备和多个第二目标网络设备作为候选网络设备。Step S330: Use the first target network device and the plurality of second target network devices as candidate network devices.
一可行的实施方式,第一目标网络设备、第二目标网络设备和候选网络设备均可以是对组成5G(5th Generation,第五代移动通信系统)专网所需的CPE(Customer Premise Equipment,客户终端设备),也可以是基站等通讯设备,在此不做具体限制。In a feasible implementation, the first target network equipment, the second target network equipment and the candidate network equipment may all be CPE (Customer Premise Equipment, customer) required to form a 5G (5th Generation, fifth generation mobile communication system) private network. Terminal equipment), or communication equipment such as base stations, which are not specifically limited here.
本实施例中,通过采用包括有上述步骤S310至步骤S330的故障处理方法,因此,可以根据网络属性信息得到第一目标网络设备,接着对静态网络拓扑信息和当前网络拓扑信息进行关联分析,得到多个与第一目标网络设备相关联的第二目标网络设备,将该第一目标网络设备和第二目标网络设备作为候选网络设备,以便于后续步骤中能够精准地定界业务故障。In this embodiment, by adopting the fault handling method including the above steps S310 to S330, the first target network device can be obtained according to the network attribute information, and then the static network topology information and the current network topology information are correlated and analyzed to obtain A plurality of second target network devices associated with the first target network device use the first target network device and the second target network device as candidate network devices, so that the service fault can be accurately delimited in subsequent steps.
在一实施例中,当故障诊断场景为业务运维故障诊断场景,获取故障诊断指令,根据故障诊断指令和处理优先级对故障诊断场景进行故障处理。若当故障诊断场景为业务运维故障诊断场景,即该故障诊断场景可以在业务运维系统中定界定位时,则定界定位器可以从知识管理器中获取故障诊断指令,业务运维系统接收该故障诊断指令,并根据故障诊断指令和处理优先级对故障诊断场景进行故障处理,本申请实施例对此不做具体限制。In one embodiment, when the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, fault diagnosis instructions are obtained, and fault processing is performed on the fault diagnosis scenario according to the fault diagnosis instructions and processing priority. If the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, that is, when the fault diagnosis scenario can be delimited and located in the business operation and maintenance system, the delimitation locator can obtain fault diagnosis instructions from the knowledge manager, and the business operation and maintenance system The fault diagnosis instruction is received, and fault processing is performed on the fault diagnosis scenario according to the fault diagnosis instruction and processing priority. This embodiment of the present application does not impose specific limitations on this.
可以理解的是,业务运维故障诊断场景即业务运维系统能够诊断的故障诊断场景。比如,端到端的时延性能数据,其对应的性能诊断场景为连接业务参数配置错误导致的时延,因此,只需要业务运维系统修改参数即可完成对该故障诊断场景的故障处理,无需调用专业运维系统。It can be understood that the business operation and maintenance fault diagnosis scenario is the fault diagnosis scenario that the business operation and maintenance system can diagnose. For example, the end-to-end delay performance data corresponds to the performance diagnosis scenario for delays caused by misconfiguration of connection service parameters. Therefore, the business operation and maintenance system only needs to modify the parameters to complete the troubleshooting of this fault diagnosis scenario. No need to Call professional operation and maintenance system.
在另一实施例中,如图5所示,该故障处理方法还可以包括但不限于有步骤S410和步骤S420。 In another embodiment, as shown in FIG. 5 , the fault handling method may also include but is not limited to step S410 and step S420.
步骤S410:当故障诊断场景为专业运维故障诊断场景,将业务意图信息与故障诊断场景发送给专业运维系统。Step S410: When the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, send the business intention information and the fault diagnosis scenario to the professional operation and maintenance system.
可以理解的是,专业运维故障诊断场景即需要转交到专业运维系统进行定界定位的故障诊断场景,比如业务运维系统定位无结果或者需要专业运维系统处理的故障。比如,当当前运维数据信息为基站退服告警信息时,该基站退服告警信息对应的故障诊断场景为专业运维故障诊断场景,因此需要通过故障转交器,将业务意图信息和故障诊断场景转交给专业运维系统处理,利用专业运维系统成熟的定界定位能力进行故障分析、故障诊断和故障自愈。It can be understood that professional operation and maintenance fault diagnosis scenarios are fault diagnosis scenarios that need to be transferred to the professional operation and maintenance system for delimitation and positioning, such as faults that the business operation and maintenance system fails to locate or that require professional operation and maintenance system processing. For example, when the current operation and maintenance data information is base station out-of-service alarm information, the fault diagnosis scenario corresponding to the base station out-of-service alarm information is a professional operation and maintenance fault diagnosis scenario. Therefore, it is necessary to use the fault switch to combine the business intention information and the fault diagnosis scenario. Transfer it to the professional operation and maintenance system for processing, and use the mature delimitation and positioning capabilities of the professional operation and maintenance system to perform fault analysis, fault diagnosis and fault self-healing.
步骤S420:接收专业运维系统的故障处理结果,故障处理结果由专业运维系统根据业务意图信息和故障诊断场景而得到。Step S420: Receive the fault processing results from the professional operation and maintenance system. The fault processing results are obtained by the professional operation and maintenance system based on the business intent information and fault diagnosis scenarios.
本实施例中,通过采用包括有上述步骤S410至步骤S420的故障处理方法,因此,当故障诊断场景为专业运维故障诊断场景,业务运维系统中的故障转交器可以自动将业务意图信息与故障诊断场景发送给专业运维系统,然后业务运维系统可以接收专业运维系统根据业务意图信息和故障诊断场景而得到的故障处理结果,因此,本实施例可以将业务运维系统和专业运维系统进行关联,并利用自动化技术和专业运维系统的专业化技术对故障进行定界定位,进而提高了业务故障的自动化处理能力。In this embodiment, by adopting the fault handling method including the above steps S410 to S420, when the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, the fault switcher in the business operation and maintenance system can automatically combine the business intention information with The fault diagnosis scenario is sent to the professional operation and maintenance system, and then the business operation and maintenance system can receive the fault processing results obtained by the professional operation and maintenance system based on the business intention information and the fault diagnosis scenario. Therefore, this embodiment can combine the business operation and maintenance system and the professional operation and maintenance system. The system is associated with the maintenance system, and the automation technology and the professional technology of the professional operation and maintenance system are used to delimit and locate the fault, thereby improving the automated processing capabilities of business faults.
在一实施例中,如图6所示,该故障处理方法还可以包括但不限于有步骤S510和步骤S520。In an embodiment, as shown in Figure 6, the fault handling method may also include but is not limited to step S510 and step S520.
步骤S510:确定故障诊断场景的故障处理结果,对故障处理结果进行评估处理,得到评估结果。Step S510: Determine the fault processing results of the fault diagnosis scenario, perform evaluation processing on the fault processing results, and obtain the evaluation results.
在一实施例中,可以将业务运维系统对故障诊断场景的故障处理结果或者专业运维系统对故障诊断场景的故障处理结果输入意图验证器,意图验证器通过业务故障恢复的情况以及人工反馈的情况,对故障处理结果进行评估处理,得到评估结果,因此,本实施例能够通过评估结果确定故障处理结果的有效性和准确性。In one embodiment, the fault processing results of the fault diagnosis scenario by the business operation and maintenance system or the fault processing results of the fault diagnosis scenario by the professional operation and maintenance system can be input into the intent verifier, and the intent verifier passes the business fault recovery situation and manual feedback In this case, the fault processing result is evaluated and processed to obtain the evaluation result. Therefore, this embodiment can determine the validity and accuracy of the fault processing result through the evaluation result.
可以理解的是,网络状态恢复不一定代表业务故障恢复,因此,需要人工或业务监控判断当前业务故障是否恢复。如果业务故障恢复,即网络状态恢复且用户感知恢复,则意图验证器评估本次基于业务意图信息的网络故障处理有效;如果故障诊断场景未修复,即意图验证器评估本次故障处理未能解决问题,则需要进行下一次业务故障诊断,或需要进一步人工介入。It is understandable that recovery of network status does not necessarily mean recovery of business failure. Therefore, manual or business monitoring is required to determine whether the current business failure has been recovered. If the business fault is restored, that is, the network status is restored and user perception is restored, the intent verifier evaluates that the network fault processing based on the business intent information is effective; if the fault diagnosis scenario is not repaired, that is, the intent verifier evaluates that the fault processing cannot be resolved. If there is a problem, the next business fault diagnosis is required, or further manual intervention is required.
步骤S520:当评估结果表征故障诊断场景未修复,重新获取业务意图信息。Step S520: When the evaluation result indicates that the fault diagnosis scenario has not been repaired, re-obtain the business intent information.
本实施例中,通过采用包括有上述步骤S510至步骤S520的故障处理方法,因此,可以确定故障诊断场景的故障处理结果,对故障处理结果进行评估处理,得到评估结果,当评估结果表征故障诊断场景未修复,重新获取业务意图信息,即是说,重新对业务意图信息进行标准化处理,得到故障诊断场景和故障诊断场景的处理优先级,并根据处理优先级对故障诊断场景进行故障处理,本实施例对此不作具体限制。In this embodiment, by adopting the fault processing method including the above steps S510 to S520, it is possible to determine the fault processing results of the fault diagnosis scenario, evaluate the fault processing results, and obtain the evaluation results. When the evaluation results represent the fault diagnosis The scenario has not been repaired, and the business intent information is reacquired, that is, the business intent information is re-standardized to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario, and the fault diagnosis scenario is fault processed according to the processing priority. The embodiment does not specifically limit this.
在另一实施例中,当评估结果表征故障诊断场景未修复,可以通过人工干预的方式对该业务意图信息对应的业务故障进行处理,本实施例对此不作具体限制。In another embodiment, when the evaluation result indicates that the fault diagnosis scenario has not been repaired, the service fault corresponding to the business intent information can be processed through manual intervention. This embodiment does not impose specific restrictions on this.
在一实施例中,可以将评估结果、业务意图信息和故障诊断场景进行存储。可以将评估结果、业务意图信息和故障诊断场景一起更新到知识管理器。另外,对于意图验证器评估有效的基于业务意图信息的网络故障处理,可以将其故障诊断场景的处理优先级更新到知识管 理器中;对于意图验证器评估无效的基于业务意图信息的网络故障处理,可以对该业务意图信息和故障诊断场景进行优化,比如,当评估结果表征故障诊断场景未修复,可以优化该知识管理器中该业务的故障诊断场景,以不断提升本故障处理装置的自动化处理能力和智能化处理能力。最后,对自动化更新的知识管理器中的信息进行人工确认,对故障处理结果进行闭环反馈,以迭代优化知识管理器中信息的准确性。In an embodiment, the evaluation results, business intent information, and fault diagnosis scenarios may be stored. Assessment results, business intent information, and troubleshooting scenarios can be updated together to the knowledge manager. In addition, for the intent validator to evaluate effective network fault handling based on business intent information, the processing priority of its fault diagnosis scenario can be updated to the knowledge management In the processor; for network fault handling based on business intent information where the intent verifier evaluation is invalid, the business intent information and fault diagnosis scenarios can be optimized. For example, when the evaluation results indicate that the fault diagnosis scenario has not been repaired, the knowledge management can be optimized fault diagnosis scenarios for this service in the device to continuously improve the automated processing capabilities and intelligent processing capabilities of this fault handling device. Finally, manual confirmation of the information in the automatically updated knowledge manager is performed, and closed-loop feedback of the fault handling results is performed to iteratively optimize the accuracy of the information in the knowledge manager.
针对上述实施例所提供的故障处理方法,下面以示例进行详细的描述:The fault handling method provided by the above embodiment is described in detail below with examples:
示例一:Example one:
参考图7和图8,在一实施例中,假设故障处理装置100部署在业务运维系统300中,意图感知器获取业务意图信息,并根据业务意图信息识别出该业务意图信息对应的网络质量期望,比如,时延保障要求和带宽保障要求等方面的网络质量期望,接着,意图翻译器会接收来自意图感知器的业务意图信息,通过业务意图信息翻译出对应的业务问题,比如,网络设备故障和网络质量劣化,然后,意图翻译器会通过知识管理器的相似度模型得到一种或者多种故障诊断场景和故障诊断场景的处理优先级,并将该故障诊断场景和故障诊断场景的处理优先级发送给定界定位器,由定界定位器对该故障诊断场景进行定界定位,接着判断故障诊断场景是否需要专业运维系统400处理,当故障诊断场景为专业运维故障诊断场景,定界定位器通过故障转交器将业务意图信息与故障诊断场景发送给专业运维系统400,利用专业运维系统400成熟的定界定位能力进行故障分析、故障诊断和故障自愈。最后,通过意图验证器对专业运维系统400的故障处理结果进行评估处理,评估业务故障是否修复,当业务恢复正常,即用户感知恢复且网络状态恢复,则评估处理的评估结果表征本次基于业务意图信息的网络故障处理有效。同时,更新评估结果到知识管理器中。Referring to Figures 7 and 8, in one embodiment, assuming that the fault handling device 100 is deployed in the business operation and maintenance system 300, the intent sensor obtains the business intent information, and identifies the network quality corresponding to the business intent information according to the business intent information. Expectations, such as network quality expectations in terms of delay guarantee requirements and bandwidth guarantee requirements. Then, the intent translator will receive the business intent information from the intent sensor and translate the corresponding business issues through the business intent information, such as network equipment. Failure and network quality degradation, then the intent translator will obtain one or more fault diagnosis scenarios and the processing priority of the fault diagnosis scenario through the similarity model of the knowledge manager, and assign the processing priority of the fault diagnosis scenario and the fault diagnosis scenario to The priority is sent to the delimitation locator, which delimits and locates the fault diagnosis scene, and then determines whether the fault diagnosis scene needs to be processed by the professional operation and maintenance system 400. When the fault diagnosis scene is a professional operation and maintenance fault diagnosis scene, The delimitation locator sends the business intention information and fault diagnosis scenarios to the professional operation and maintenance system 400 through the fault forwarder, and uses the mature delimitation and locating capabilities of the professional operation and maintenance system 400 to perform fault analysis, fault diagnosis and fault self-healing. Finally, the fault processing results of the professional operation and maintenance system 400 are evaluated through the intent verifier to evaluate whether the business fault has been repaired. When the business returns to normal, that is, user perception is restored and the network status is restored, the evaluation results of the evaluation process represent the results of the evaluation based on Network troubleshooting of business intent information is effective. At the same time, the evaluation results are updated in the knowledge manager.
在一些实施例中,参考图9,以某车间某工位终端自动装配业务的场景为例,且该场景对网络保障有要求。意图感知器会根据业务意图信息识别出该业务意图信息对应的网络质量期望,比如,时延保证要求和带宽保障要求等方面的网络质量期望,然后,意图翻译器会接收来自意图感知器的时延意图信息,通过时延意图信翻译出对应的业务问题,当当前运维数据信息为网络质量劣化的性能数据信息,且该性能数据信息导致的业务故障是某车间某工位终端自动装配操作发生滞后现象,即该业务问题为网络质量劣化,则可以通过以下意图翻译过程(即对业务意图信息进行标准化处理)来分析网络质量劣化的可能原因(即故障诊断场景),即:In some embodiments, referring to Figure 9, a scenario of automatic assembly of terminals at a certain station in a workshop is taken as an example, and this scenario has requirements for network assurance. The intent sensor will identify the network quality expectations corresponding to the business intent information based on the business intent information, such as network quality expectations in terms of delay guarantee requirements and bandwidth guarantee requirements. Then, the intent translator will receive the time information from the intent sensor. Delay intention information is used to translate the corresponding business problems through the delay intention information. When the current operation and maintenance data is performance data of degraded network quality, and the business failure caused by this performance data is the automatic assembly operation of a terminal at a certain station in a workshop If hysteresis occurs, that is, the business problem is network quality degradation, the possible causes of network quality degradation (ie, fault diagnosis scenarios) can be analyzed through the following intent translation process (ie, standardized processing of business intent information), namely:
步骤1,通过某车间的物理位置信息和某工位的物理位置信息,从业务运维系统300中获取该故障工位的号卡信息,即根据网络属性信息得到第一目标网络设备;Step 1: Obtain the card number information of the faulty workstation from the business operation and maintenance system 300 through the physical location information of a certain workshop and the physical location information of a certain workstation, that is, obtain the first target network device according to the network attribute information;
步骤2,根据该号卡信息查询该故障工位的动态注册信息,从专业运维系统400中获取该故障工位的号卡当前接入的网络设备(即第二目标网络设备),接着获取当前网络拓扑的经度、当前网络拓扑的纬度和该故障工位的物理位置拓扑信息(即静态网络拓扑信息),将该当前网络拓扑的经度、当前网络拓扑的纬度和该故障工位的物理位置拓扑信息输入到知识管理器的相似度模型,可以得到候选网络设备A和候选网络设备B,其中,候选网络设备A和候选网络设备B均为可能发生故障的网络设备;Step 2: Query the dynamic registration information of the faulty station based on the number card information, obtain the network device currently connected to the faulty station's number card (i.e., the second target network device) from the professional operation and maintenance system 400, and then obtain The longitude of the current network topology, the latitude of the current network topology and the physical location topology information of the faulty station (i.e. static network topology information), combine the longitude of the current network topology, the latitude of the current network topology and the physical location of the faulty station By inputting the topology information into the similarity model of the knowledge manager, candidate network device A and candidate network device B can be obtained, where candidate network device A and candidate network device B are both network devices that may fail;
步骤3,查询候选网络设备A与时延意图信息相关的时延告警信息和时延性能数据(即当前运维数据信息),将时延告警信息和时延性能数据与时延意图信息输入到相似度模型,即时延告警信息、时延性能数据与时延意图信息进行关联分析,得到可能的网络故障,即设 备故障1和设备故障2;同样地,对候选网络设备B做类似查询分析得到可能网络故障,即设备故障3;Step 3: Query the delay alarm information and delay performance data (that is, the current operation and maintenance data information) of candidate network device A related to the delay intention information, and input the delay alarm information, delay performance data, and delay intention information into Similarity model performs correlation analysis on delay alarm information, delay performance data and delay intention information to obtain possible network faults, that is, assuming Equipment failure 1 and equipment failure 2; Similarly, perform similar query and analysis on candidate network equipment B to obtain the possible network failure, that is, equipment failure 3;
至此,完成将时延意图信息翻译为候选网络设备A上的多种网络故障和候选网络设备B上的多种网络故障,即完成第一阶段意图翻译流程;At this point, the translation of the delay intent information into multiple network faults on candidate network device A and multiple network faults on candidate network device B is completed, that is, the first phase of the intent translation process is completed;
步骤4,如果步骤3中,候选网络设备A和候选网络设备B都没有与设备故障1、设备故障2以及设备故障3相对应的时延告警信息和时延性能数据,即候选网络设备A的当前运维数据信息和候选网络设备B的当前运维数据信息均正常,则本步骤4将要对候选网络设备A和候选网络设备B在业务运维系统300上注册的历史业务数据信息和历史运维数据信息(即历史告警数据信息和除时延性能数据外的其他性能数据信息)进行训练,得到相似度模型,并将当前运维数据信息(即当前的告警数据信息和性能数据信息)和业务意图信息输入到相似度模型中,确定候选网络设备A对应的设备故障C,候选网络设备B对应的设备故障D,且确定监控指标E、监控指标F和监控指标G为已经劣化的性能指标,然后通过知识管理器可知,产生原因1、产生原因2、产生原因3、产生原因4和产生原因5为不同的故障诊断场景,且产生原因1、产生原因2和产生原因4为故障原因的可能性最高(即处理优先级最高)。至此,完成第二阶段意图翻译流程。Step 4. If in step 3, neither candidate network device A nor candidate network device B has delay alarm information and delay performance data corresponding to equipment failure 1, equipment failure 2, and equipment failure 3, that is, candidate network equipment A’s If the current operation and maintenance data information and the current operation and maintenance data information of candidate network device B are normal, then step 4 will be to compare the historical business data information and historical operation data of candidate network device A and candidate network device B registered on the business operation and maintenance system 300. Dimensional data information (i.e., historical alarm data information and other performance data information except delay performance data) are trained to obtain a similarity model, and the current operation and maintenance data information (i.e., current alarm data information and performance data information) and The business intent information is input into the similarity model, and the device fault C corresponding to candidate network device A is determined, and the device fault D corresponding to candidate network device B is determined, and the monitoring indicator E, monitoring indicator F and monitoring indicator G are determined to be degraded performance indicators. , and then it can be known from the knowledge manager that cause 1, cause 2, cause 3, cause 4 and cause 5 are different fault diagnosis scenarios, and cause 1, cause 2 and cause 4 are fault causes. The highest possibility (that is, the highest processing priority). At this point, the second stage of the intention translation process is completed.
步骤5,调用专业运维系统400对各个故障诊断场景进行故障分析、故障诊断和故障自愈能力,即根据处理优先级,依次对步骤3中的设备故障1、设备故障2和设备故障3进行故障处理,得到故障处理结果;或者,根据处理优先级,依次对步骤4中的产生原因1、产生原因2、产生原因3、产生原因4和产生原因5进行故障处理,得到故障处理结果。Step 5: Call the professional operation and maintenance system 400 to perform fault analysis, fault diagnosis and fault self-healing capabilities for each fault diagnosis scenario, that is, according to the processing priority, equipment fault 1, device fault 2 and device fault 3 in step 3 are sequentially performed. Perform fault processing to obtain the fault processing result; or, according to the processing priority, perform fault processing on cause 1, cause 2, cause 3, cause 4, and cause 5 in step 4 in sequence, and obtain the fault processing result.
步骤6,假如专业运维系统400故障处理结果为设备故障1导致了“某车间某工位终端自动装配操作发生滞后现象”的业务故障,且设备故障1已经进行了修复,那么则需要意图验证器通过业务故障恢复的情况以及人工反馈的情况进行验证处理,确认该业务恢复正常。另外,如果业务恢复正常,即网络状态恢复且用户感知恢复,则将本次业务故障对应的将评估结果、业务意图信息和故障诊断场景更新到知识管理器中;如果故障诊断场景或者业务未恢复正常,则需要进行下一次业务故障诊断,或需要进一步人工介入。Step 6. If the professional operation and maintenance system 400 fault processing result is that equipment failure 1 caused a business failure of "the automatic assembly operation of a certain station terminal in a certain workshop is lagging behind", and equipment failure 1 has been repaired, then intent verification is required. The server verifies the business failure recovery situation and manual feedback to confirm that the business has returned to normal. In addition, if the business returns to normal, that is, the network status is restored and user perception is restored, the evaluation results, business intent information and fault diagnosis scenarios corresponding to this business failure will be updated to the knowledge manager; if the fault diagnosis scenario or the business has not been restored, If it is normal, the next business fault diagnosis needs to be performed, or further manual intervention is required.
可以理解的是,在上述实施例中,优先在业务运维系统300内部进行故障定界定位,如果故障恢复不了,再转交到专业运维系统400进行处理。It can be understood that in the above embodiment, the fault is delimited and located within the business operation and maintenance system 300 first. If the fault cannot be recovered, it is transferred to the professional operation and maintenance system 400 for processing.
示例二:Example two:
在一实施例中,如图10所示,以业务意图信息为工厂园区的时延意图信息,以第三基站发生故障导致AGV小车的时延保障业务故障为例。目标终端AGV小车从定位坐标第一移动位置通过所连接的业务网络(即I5GC(Industry 5th-Generation Core,行业5G核心网))控制,移动到定位坐标第二移动位置。但是,为了安全生产的需要,AGV小车的整个移动过程的通讯时延需要有比较严格的网络保障要求,比如1个月99.999%的时延小于10毫秒。而在此过程中,由于距离等因素网络覆盖会发生变化,因此,该AGV小车会发生至少一次网络切换以接入新的网络。在一些实施例中,AGV小车在定位坐标第一移动位置时,通过第一目标网络设备即第二基站接入,在移动到定位坐标第二移动位置时,需要切换网络,在切换网络的过程中,由于第三基站存在故障,只能通过弱覆盖的网络设备即第四基站接入网络,但是长期以往,会导致该场景不能保障时延要求。因此,需要通过以下意图翻译过程(即对业务意图信息进行标准化处理)对第三基站的故障诊断场景进行故障处理,以保障时延要求,即: In one embodiment, as shown in Figure 10, the service intention information is the delay intention information of the factory park, and the delay guarantee service failure of the AGV car caused by the failure of the third base station is taken as an example. The target terminal AGV car moves from the first moving position of the positioning coordinates to the second moving position of the positioning coordinates through the connected business network (i.e., I5GC (Industry 5th-Generation Core, Industry 5G Core Network)). However, in order to meet the needs of safe production, the communication delay of the entire moving process of the AGV car needs to have relatively strict network guarantee requirements. For example, 99.999% of the delay is less than 10 milliseconds in one month. During this process, the network coverage will change due to distance and other factors. Therefore, the AGV car will switch to the network at least once to access the new network. In some embodiments, when the AGV car moves to the first moving position of the positioning coordinates, it accesses through the first target network device, that is, the second base station. When it moves to the second moving position of the positioning coordinates, it needs to switch networks. During the process of switching networks In this case, due to the failure of the third base station, the network can only be accessed through the weak coverage network equipment, that is, the fourth base station. However, in the long run, this scenario will not be able to guarantee the delay requirements. Therefore, the fault diagnosis scenario of the third base station needs to be handled through the following intent translation process (i.e., standardized processing of business intent information) to ensure delay requirements, namely:
步骤1,意图感知器通过获取的业务监控故障(即连接业务监控的时延关键感知指标变差),得知业务监控故障的业务意图信息,即AGV小车从定位坐标第一移动位置到定位坐标第二移动位置的时延超过10毫秒;Step 1: The intent sensor learns the business intent information of the business monitoring failure through the obtained business monitoring failure (that is, the key perception indicator of the delay in connecting the business monitoring becomes worse), that is, the AGV car moves from the first position of the positioning coordinate to the positioning coordinate. The delay in the second mobile position exceeds 10 milliseconds;
步骤2,意图感知器将该业务监控故障的业务意图信息传递给意图翻译器;Step 2: The intent sensor transmits the business intent information of the business monitoring failure to the intent translator;
步骤3,意图翻译器从业务意图信息得知目标终端AGV小车的号卡信息,根据该号卡信息和当前网络拓扑信息(即实时网络数据)得知该AGV小车当前在第二基站和第四基站上进行了注册;Step 3: The intent translator learns the card number information of the target terminal AGV car from the business intent information, and learns that the AGV car is currently at the second base station and the fourth base station based on the card number information and the current network topology information (i.e., real-time network data). Registered on the base station;
步骤4,意图翻译器通过知识管理器获取到第二基站的当前运维数据信息和第四基站的当前运维数据信息,其中,第二基站的当前运维数据信息和第四基站的当前运维数据信息均可以为告警数据信息、性能数据信息等运维数据信息,初步判断第二基站和第四基站是否发生故障。最终,判断得知各个候选网络设备的当前运维数据信息(比如告警数据信息和性能数据信息)正常,即第二基站和第四基站均未发生故障,因此,无故障诊断场景;Step 4: The intent translator obtains the current operation and maintenance data information of the second base station and the current operation and maintenance data information of the fourth base station through the knowledge manager, where the current operation and maintenance data information of the second base station and the current operation and maintenance data information of the fourth base station are obtained. The dimension data information can be operation and maintenance data information such as alarm data information, performance data information, etc., to initially determine whether the second base station and the fourth base station are faulty. Finally, it is determined that the current operation and maintenance data information (such as alarm data information and performance data information) of each candidate network device is normal, that is, neither the second base station nor the fourth base station has failed, so there is no fault diagnosis scenario;
步骤5,对AGV小车的坐标信息、第二基站的坐标信息和第四基站的坐标信息进行相似度算法,得到第二目标网络设备即第三基站,其中,第三基站是最可能导致该业务故障的网络设备;Step 5: Perform a similarity algorithm on the coordinate information of the AGV car, the coordinate information of the second base station and the coordinate information of the fourth base station to obtain the second target network device, that is, the third base station. Among them, the third base station is the most likely to cause the service Faulty network equipment;
步骤6,确定该第三基站的当前运维数据信息(比如告警数据信息),根据该当前运维数据信息得到该第三基站的退服告警的告警诊断场景;Step 6: Determine the current operation and maintenance data information (such as alarm data information) of the third base station, and obtain the alarm diagnosis scenario of the out-of-service alarm of the third base station based on the current operation and maintenance data information;
步骤7,将该第三基站的退服告警的告警诊断场景发送给专业运维系统,由专业运维系统进行定界定位,以最终保障第三基站的故障恢复,即保障AGV小车在第二移动位置能够更大概率的接入第三基站,最终以保障时延要求。Step 7: Send the alarm diagnosis scenario of the out-of-service alarm of the third base station to the professional operation and maintenance system. The professional operation and maintenance system will perform delimitation and positioning to ultimately ensure the fault recovery of the third base station, that is, to ensure that the AGV car is in the second station. Mobile locations can access the third base station with a greater probability, ultimately ensuring delay requirements.
可以理解的是,通过以上步骤即第一阶段意图翻译流程得到发生故障的候选网络设备,并将业务意图信息翻译为基站退服告警的告警故障诊断场景,并转交专业运维系统进行故障处理的过程。It can be understood that through the above steps, that is, the first stage of the intent translation process, the candidate network equipment that has failed is obtained, the business intent information is translated into the alarm fault diagnosis scenario of the base station out-of-service alarm, and it is transferred to the professional operation and maintenance system for troubleshooting. process.
示例三:Example three:
基于示例二,在一实施例中,如图10所示,以AGV小车移动场景为例,若网络设备第三基站并未发生故障,而是该AGV小车在向第三基站申请空口资源时,由于空口资源紧张,造成资源分配不及时,导致时延增大;或者,AGV小车向弱覆盖的第四基站进行注册而导致时延增大。Based on Example 2, in one embodiment, as shown in Figure 10, taking the AGV car movement scenario as an example, if the third base station of the network device does not fail, but when the AGV car applies for air interface resources from the third base station, Due to the shortage of air interface resources, resource allocation is not timely, resulting in increased latency; or, the AGV car registers with the fourth base station with weak coverage, resulting in increased latency.
在一些实施例中,假设示例二中的第一阶段意图翻译流程,很难精准找到某个具体的导致业务故障的网络设备,因为与该连接业务有关联的候选网络设备第二基站、第三基站和第四基站均没有发生故障,而第五基站可能存在故障,但由关联分析可知第五基站的物理位置不在该连接业务内,即第五基站故障不会影响到该连接业务。基于此,在本实施例中,即第二阶段意图翻译将结合知识管理器中的历史业务数据信息和当前运维数据信息等,依次对各个候选网络设备进行分析,比如,对AGV小车注册到第三基站和第四基站的概率分析可知,AGV小车注册到第四基站的次数增大;又比如,第三基站上注册的其他终端的次数显著增加而导致资源紧张,由此可判断候选网络设备第三基站所产生的故障最可能导致时延。进一步地,结合第三基站的当前的性能数据信息,得到第三基站的性能诊断场景,通过业务运维系统的定界定位和专业运维系统的定界定位,该场景的诊断结果(即性能诊断场景)为配置参数错误和空口资源不足导致的时延,对此,处理建议为动态调整配置参数或对候选网络设备 第三基站进行扩容,以保障该业务的时延要求。In some embodiments, assuming the first-stage intent translation process in Example 2, it is difficult to accurately find a specific network device that causes a service failure, because the candidate network device associated with the connection service is the second base station, the third base station, and the third base station. Neither the base station nor the fourth base station is faulty, but the fifth base station may be faulty. However, correlation analysis shows that the physical location of the fifth base station is not within the connection service, that is, the failure of the fifth base station will not affect the connection service. Based on this, in this embodiment, the second stage of intent translation will combine the historical business data information and current operation and maintenance data information in the knowledge manager to analyze each candidate network device in turn, for example, the AGV car registered to The probability analysis of the third base station and the fourth base station shows that the number of times the AGV car is registered to the fourth base station increases; for another example, the number of other terminals registered on the third base station increases significantly, resulting in resource constraints. From this, the candidate network can be judged Failures caused by the equipment's third base station are most likely to cause delays. Further, by combining the current performance data information of the third base station, the performance diagnosis scenario of the third base station is obtained. Through the delimitation and positioning of the business operation and maintenance system and the delimitation and positioning of the professional operation and maintenance system, the diagnosis result of this scenario (i.e., performance Diagnosis scenario) is the delay caused by incorrect configuration parameters and insufficient air interface resources. For this, the solution is to dynamically adjust the configuration parameters or check the candidate network equipment. The third base station is expanded to ensure the delay requirements of this service.
可以理解的是,上述三个示例中均为从业务视角出发的网络故障定位和故障修复。另外,两阶段的意图翻译设计能够应用于不同行业和多种园区场景的网络故障定界定位。特别是在企业园区的数字化转型过程中,其以5G专网作为新基础设施,因此,该故障处理装置可以实现对整体网络故障的自动化处理和智能化处理,从而降低对人工处置的依赖,提升故障处理效率,提高系统的健壮性。It is understandable that the above three examples are network fault location and fault rectification from a business perspective. In addition, the two-stage intent translation design can be applied to network fault delineation and location in different industries and various campus scenarios. Especially in the digital transformation process of enterprise campuses, which use 5G private networks as new infrastructure, this fault handling device can realize automated and intelligent processing of overall network faults, thereby reducing reliance on manual processing and improving Improve fault handling efficiency and improve system robustness.
另外,值得注意的是,故障处理方法包括了ToB网络故障的整个诊断流程,即从业务故障意图识别、意图翻译、定界定位(包括转交专业运维系统进行故障定界定位)到业务故障诊断修复,以及意图验证的整个流程,该流程能够提升用户网络感知和整体满意度。该故障处理方法可以应用于企业园区,即由通讯、算力等支撑的“数字园区”或者“智慧矿山”中,这有利于快速恢复企业园区中的业务和运维故障;另外,该故障处理方法也可以应用于运营商网络,以便于对运维故障快速定界定位,在此不作具体限制。In addition, it is worth noting that the fault handling method includes the entire diagnosis process of ToB network faults, that is, from business fault intent identification, intent translation, delimitation and positioning (including transferring to a professional operation and maintenance system for fault delimitation and positioning) to business fault diagnosis remediation, and the entire process of intent verification, which improves user network perception and overall satisfaction. This fault handling method can be applied to enterprise campuses, that is, "digital campuses" or "smart mines" supported by communications, computing power, etc., which is conducive to rapid recovery of business and operation and maintenance faults in enterprise campuses; in addition, this fault handling method The method can also be applied to operator networks to facilitate rapid delineation and location of operation and maintenance faults, and is not specifically limited here.
另外,参照图11,本申请的一个实施例还提供了一种故障处理装置,该故障处理装置200包括存储器202、处理器201及存储在存储器202上并可在处理器201上运行的计算机程序。In addition, referring to Figure 11, one embodiment of the present application also provides a fault handling device. The fault handling device 200 includes a memory 202, a processor 201, and a computer program stored in the memory 202 and executable on the processor 201. .
处理器201和存储器202可以通过总线或者其他方式连接。The processor 201 and the memory 202 may be connected through a bus or other means.
存储器202作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器202可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器202可包括相对于处理器201远程设置的存储器,这些远程存储器可以通过网络连接至该处理器201。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, the memory 202 can be used to store non-transitory software programs and non-transitory computer executable programs. In addition, the memory 202 may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 202 may include memory located remotely relative to the processor 201, and these remote memories may be connected to the processor 201 through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
需要说明的是,本实施例中的故障处理装置200,可以为例如图1所示实施例中的故障处理装置,这些实施例均属于相同的发明构思,因此这些实施例具有相同的实现原理以及技术效果,此处不再详述。It should be noted that the fault processing device 200 in this embodiment can be, for example, the fault processing device in the embodiment shown in Figure 1. These embodiments all belong to the same inventive concept, so these embodiments have the same implementation principles and The technical effects will not be detailed here.
实现上述实施例的故障处理方法所需的非暂态软件程序以及指令存储在存储器202中,当被处理器201执行时,执行上述实施例中的故障处理方法,例如,执行以上描述的图2中的方法步骤S110至S130、图3中的方法步骤S210至S230、图4中的方法步骤S310至S330、图5中的方法步骤S410至S420、图6中的方法步骤S510至S520。The non-transitory software programs and instructions required to implement the fault handling method of the above embodiment are stored in the memory 202. When executed by the processor 201, the fault handling method in the above embodiment is executed, for example, the above described Figure 2 is executed. The method steps S110 to S130 in FIG. 3 , the method steps S210 to S230 in FIG. 3 , the method steps S310 to S330 in FIG. 4 , the method steps S410 to S420 in FIG. 5 , and the method steps S510 to S520 in FIG. 6 .
以上所描述的设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,执行以上描述的图2中的方法步骤S110至S130、图3中的方法步骤S210至S230、图4中的方法步骤S310至S330、图5中的方法步骤S410至S420、图6中的方法步骤S510至S520。In addition, one embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are executed by a processor or controller, for example, executing the above The method steps S110 to S130 in Fig. 2, the method steps S210 to S230 in Fig. 3, the method steps S310 to S330 in Fig. 4, the method steps S410 to S420 in Fig. 5 and the method steps S510 to S510 in Fig. 6 are described. S520.
此外,本申请的一个实施例还提供了一种计算机程序产品,包括计算机程序或计算机指令,计算机程序或计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取计算机程序或计算机指令,处理器执行计算机程序或计算机指令,使得计算机设备执行上述实施例中的故障处理方法,例如,执行以上描述的图2中的方法步骤S110 至S130、图3中的方法步骤S210至S230、图4中的方法步骤S310至S330、图5中的方法步骤S410至S420、图6中的方法步骤S510至S520。In addition, an embodiment of the present application also provides a computer program product, including a computer program or computer instructions. The computer program or computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium. The computer program or computer instructions are obtained, and the processor executes the computer program or computer instructions, so that the computer device performs the fault handling method in the above embodiment, for example, performs the method step S110 in Figure 2 described above. to S130, method steps S210 to S230 in Figure 3, method steps S310 to S330 in Figure 4, method steps S410 to S420 in Figure 5, and method steps S510 to S520 in Figure 6.
本申请实施例包括:获取业务意图信息;对业务意图信息进行标准化处理,得到目标网络设备的故障诊断场景和故障诊断场景的处理优先级;根据处理优先级对故障诊断场景进行故障处理,即是说,通过对业务意图信息的标准化处理,精确定位目标网络设备以及其故障诊断场景,并且能够根据处理优先级对故障诊断场景进行故障处理,提高了故障的处理效率,因此,本申请实施例能够精确定位故障,同时能够提高故障的处理效率。Embodiments of this application include: obtaining business intent information; performing standardized processing on the business intent information to obtain the fault diagnosis scenario of the target network device and the processing priority of the fault diagnosis scenario; performing fault processing on the fault diagnosis scenario according to the processing priority, that is, That is to say, through the standardized processing of business intent information, the target network device and its fault diagnosis scenario can be accurately located, and the fault diagnosis scenario can be fault processed according to the processing priority, which improves the fault processing efficiency. Therefore, the embodiment of the present application can Accurately locate faults and improve fault handling efficiency.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。 Those of ordinary skill in the art can understand that all or some steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Claims (12)

  1. 一种故障处理方法,包括:A troubleshooting method including:
    获取业务意图信息;Obtain business intent information;
    对所述业务意图信息进行标准化处理,得到故障诊断场景和所述故障诊断场景的处理优先级;Standardize the business intent information to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario;
    根据所述处理优先级对所述故障诊断场景进行故障处理。Perform fault processing on the fault diagnosis scenario according to the processing priority.
  2. 根据权利要求1所述的故障处理方法,其中,所述对所述业务意图信息进行标准化处理,得到故障诊断场景和所述故障诊断场景的处理优先级,包括:The fault handling method according to claim 1, wherein the standardized processing of the business intent information to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario includes:
    对所述业务意图信息进行标准化语句处理,得到目标终端的网络属性信息;Perform standardized sentence processing on the business intent information to obtain the network attribute information of the target terminal;
    获取所述目标终端的静态网络拓扑信息和所述目标终端的当前网络拓扑信息,根据所述网络属性信息、所述静态网络拓扑信息和所述当前网络拓扑信息,得到多个候选网络设备;Obtain static network topology information of the target terminal and current network topology information of the target terminal, and obtain multiple candidate network devices based on the network attribute information, the static network topology information, and the current network topology information;
    确定各个所述候选网络设备的当前运维数据信息,根据所述当前运维数据信息得到所述故障诊断场景和所述故障诊断场景的处理优先级。The current operation and maintenance data information of each candidate network device is determined, and the fault diagnosis scenario and the processing priority of the fault diagnosis scenario are obtained according to the current operation and maintenance data information.
  3. 根据权利要求2所述的故障处理方法,其中,所述根据所述当前运维数据信息得到所述故障诊断场景和所述故障诊断场景的处理优先级,包括:The fault processing method according to claim 2, wherein said obtaining the fault diagnosis scenario and the processing priority of the fault diagnosis scenario according to the current operation and maintenance data information includes:
    将所述当前运维数据信息和所述业务意图信息输入到相似度模型,得到所述故障诊断场景和所述故障诊断场景的处理优先级。Input the current operation and maintenance data information and the business intention information into the similarity model to obtain the fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
  4. 根据权利要求3所述的故障处理方法,其中,所述相似度模型通过以下步骤得到:The fault handling method according to claim 3, wherein the similarity model is obtained through the following steps:
    获取历史运维数据信息和历史业务数据信息;Obtain historical operation and maintenance data information and historical business data information;
    对所述历史运维数据信息和所述历史业务数据信息进行训练,得到所述相似度模型。The historical operation and maintenance data information and the historical business data information are trained to obtain the similarity model.
  5. 根据权利要求2所述的故障处理方法,其中,所述根据所述当前运维数据信息得到所述故障诊断场景和所述故障诊断场景的处理优先级,包括:The fault processing method according to claim 2, wherein said obtaining the fault diagnosis scenario and the processing priority of the fault diagnosis scenario according to the current operation and maintenance data information includes:
    当各个所述候选网络设备的所述当前运维数据信息正常,获取各个所述候选网络设备的历史业务数据信息,对所述历史业务数据信息和所述当前运维数据信息进行关联分析,得到所述故障诊断场景和所述故障诊断场景的处理优先级。When the current operation and maintenance data information of each candidate network device is normal, obtain the historical business data information of each candidate network device, perform a correlation analysis on the historical business data information and the current operation and maintenance data information, and obtain The fault diagnosis scenario and the processing priority of the fault diagnosis scenario.
  6. 根据权利要求2所述的故障处理方法,其中,所述根据所述网络属性信息、所述静态网络拓扑信息和所述当前网络拓扑信息,得到多个候选网络设备,包括:The fault handling method according to claim 2, wherein a plurality of candidate network devices are obtained based on the network attribute information, the static network topology information and the current network topology information, including:
    根据所述网络属性信息得到第一目标网络设备;Obtain the first target network device according to the network attribute information;
    对所述静态网络拓扑信息和所述当前网络拓扑信息进行关联分析,得到多个与所述第一目标网络设备相关联的第二目标网络设备;Perform correlation analysis on the static network topology information and the current network topology information to obtain a plurality of second target network devices associated with the first target network device;
    将所述第一目标网络设备和多个所述第二目标网络设备作为所述候选网络设备。The first target network device and the plurality of second target network devices are used as the candidate network devices.
  7. 根据权利要求1所述的故障处理方法,其中,所述根据所述处理优先级对所述故障诊断场景进行故障处理,包括:The fault processing method according to claim 1, wherein performing fault processing on the fault diagnosis scenario according to the processing priority includes:
    当所述故障诊断场景为业务运维故障诊断场景,获取故障诊断指令,根据所述故障诊断指令和所述处理优先级对所述故障诊断场景进行故障处理。When the fault diagnosis scenario is a business operation and maintenance fault diagnosis scenario, a fault diagnosis instruction is obtained, and fault processing is performed on the fault diagnosis scenario according to the fault diagnosis instruction and the processing priority.
  8. 根据权利要求1所述的故障处理方法,其中,所述故障处理方法还包括:The fault handling method according to claim 1, wherein the fault handling method further includes:
    当所述故障诊断场景为专业运维故障诊断场景,将所述业务意图信息与所述故障诊断场景发送给专业运维系统; When the fault diagnosis scenario is a professional operation and maintenance fault diagnosis scenario, send the business intent information and the fault diagnosis scenario to the professional operation and maintenance system;
    接收所述专业运维系统的故障处理结果,所述故障处理结果由所述专业运维系统根据所述业务意图信息和所述故障诊断场景而得到。Receive fault processing results from the professional operation and maintenance system, where the fault processing results are obtained by the professional operation and maintenance system based on the business intent information and the fault diagnosis scenario.
  9. 根据权利要求1所述的故障处理方法,其中,所述故障处理方法还包括:The fault handling method according to claim 1, wherein the fault handling method further includes:
    确定所述故障诊断场景的故障处理结果,对所述故障处理结果进行评估处理,得到评估结果;Determine the fault processing results of the fault diagnosis scenario, evaluate the fault processing results, and obtain the evaluation results;
    当所述评估结果表征所述故障诊断场景未修复,重新获取所述业务意图信息。When the evaluation result indicates that the fault diagnosis scenario has not been repaired, the business intent information is reacquired.
  10. 根据权利要求9所述的故障处理方法,其中,所述故障处理方法还包括:The fault handling method according to claim 9, wherein the fault handling method further includes:
    将所述评估结果、所述业务意图信息和所述故障诊断场景进行存储。The evaluation results, the business intent information and the fault diagnosis scenario are stored.
  11. 一种故障处理装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至10中任意一项所述的故障处理方法。A fault handling device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements any one of claims 1 to 10. Troubleshooting methods described above.
  12. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至10中任意一项所述的故障处理方法。 A computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to execute the fault handling method described in any one of claims 1 to 10.
PCT/CN2023/110786 2022-08-18 2023-08-02 Fault processing method and apparatus, and storage medium WO2024037345A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210994855.5 2022-08-18
CN202210994855.5A CN117640338A (en) 2022-08-18 2022-08-18 Fault processing method and device and storage medium

Publications (1)

Publication Number Publication Date
WO2024037345A1 true WO2024037345A1 (en) 2024-02-22

Family

ID=89940684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110786 WO2024037345A1 (en) 2022-08-18 2023-08-02 Fault processing method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN117640338A (en)
WO (1) WO2024037345A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109787817A (en) * 2018-12-28 2019-05-21 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Network fault diagnosis method, device and computer readable storage medium
EP3823215A1 (en) * 2019-11-18 2021-05-19 Juniper Networks, Inc. Network model aware diagnosis of a network
CN113779247A (en) * 2021-08-27 2021-12-10 北京邮电大学 Network fault diagnosis method and system based on intention driving
CN114500244A (en) * 2020-11-13 2022-05-13 中兴通讯股份有限公司 Network fault diagnosis method and device, computer equipment and readable medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109787817A (en) * 2018-12-28 2019-05-21 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Network fault diagnosis method, device and computer readable storage medium
EP3823215A1 (en) * 2019-11-18 2021-05-19 Juniper Networks, Inc. Network model aware diagnosis of a network
CN114500244A (en) * 2020-11-13 2022-05-13 中兴通讯股份有限公司 Network fault diagnosis method and device, computer equipment and readable medium
CN113779247A (en) * 2021-08-27 2021-12-10 北京邮电大学 Network fault diagnosis method and system based on intention driving

Also Published As

Publication number Publication date
CN117640338A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN109150572B (en) Method, device and computer readable storage medium for realizing alarm association
US10268961B2 (en) Generating predictive models to reconfigure electronic devices
EP3780496B1 (en) Feature engineering programming method and apparatus
CN112672440A (en) Instruction execution method, system, network device and storage medium
WO2021208979A1 (en) Network fault handling method and apparatus
CN108764739A (en) Study of Intelligent Robot Control system and method, readable storage medium storing program for executing
JP2024515333A (en) NETWORK SLICE SELF-OPTIMIZATION METHOD, BASE STATION, AND STORAGE MEDIUM
WO2024016781A1 (en) Software fault feedback processing method and apparatus, medium and device
US9949061B2 (en) Fault management method and apparatus
US11611488B2 (en) AI machine learning technology based fault management system for network equpment that supports SDN open flow protocol
WO2024037345A1 (en) Fault processing method and apparatus, and storage medium
CN115378841B (en) Method and device for detecting state of equipment accessing cloud platform, storage medium and terminal
CN117135030A (en) Alarm association analysis method, device, terminal equipment and storage medium
WO2023093379A1 (en) Disaster recovery switching method and system, electronic device, and storage medium
CN115220992A (en) Interface change monitoring method and device, computer equipment and storage medium
CN114331092A (en) Equipment shadow, remote scheduling management method, device and system for engineering mechanical equipment
CN111371908B (en) Method, device, storage medium and electronic device for sending information
CN109995557B (en) Communication method and device
CN113328907B (en) Method, core network, apparatus and medium for performance and error detection in a communication network
WO2022121513A1 (en) Method and apparatus for generating worst value of performance index, and electronic device and storage medium
CN112907221B (en) Self-service method, device and system
WO2017197829A1 (en) Test information management method and apparatus, and test case execution system and device
CN114296987A (en) Fault processing method and device, electronic equipment and computer storage medium
CN117319611B (en) Monitoring image transmission method and device
US20230276276A1 (en) Method and System for Monitoring a Wireless Communication Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854249

Country of ref document: EP

Kind code of ref document: A1