WO2024066346A1

WO2024066346A1 - Alarm processing method and apparatus, and storage medium and electronic apparatus

Info

Publication number: WO2024066346A1
Application number: PCT/CN2023/091861
Authority: WO
Inventors: 王超; 彭浩宇
Original assignee: 中兴通讯股份有限公司
Priority date: 2022-09-27
Filing date: 2023-04-28
Publication date: 2024-04-04
Also published as: CN117792864A

Abstract

Provided in the embodiments of the present disclosure are an alarm processing method and apparatus, and a storage medium and an electronic apparatus. The method comprises: determining a fault root cause of alarm information, and determining a user intention according to the fault root cause; when the user intention is fault processing, screening the alarm information, and removing alarm information to be manually processed, so as to obtain an alarm to be processed; determining an alarm solution for said alarm; and processing said alarm according to the alarm solution. In this way, the problems, in the relevant art, of the cost of operation and maintenance personnel being relatively high, and it being relatively difficult to perform manual processing due to the operation and maintenance personnel using experience thereof to process alarms on the basis of an alarm root cause report and there being a relatively large number of alarms can be solved, the alarm troubleshooting processing efficiency of key facilities of a mobile communication network can be improved, and a fault time can be shortened.

Description

Alarm processing method, device, storage medium and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

This disclosure is based on Chinese patent application CN202211183805.5, filed on September 27, 2022, with the invention name “An alarm processing method, device, storage medium and electronic device”, and claims the priority of the patent application, and all the contents disclosed therein are incorporated into this disclosure by reference.

Technical Field

The embodiments of the present disclosure relate to the field of communications, and in particular, to an alarm processing method, device, storage medium, and electronic device.

Background technique

In the field of mobile communication networks, alarms are mainly used to solve and improve system reliability, control the time to solve faults, and reduce the scope of impact. In a huge mobile communication network, code changes, environmental changes, and human operation changes occur every day. It is precisely because of this constant change and chaos that it is necessary to quickly discover potential anomalies, make emergency responses, and respond correctly.

In the current network management system, when the operation and maintenance personnel receive an alarm message, they will open a page to view it. This page provides the asset information, configuration information, personnel information, monitoring indicator data of the alarm source, and the situation of the alarm processing on the day. Then they can perform related operations, such as creating a work order, silencing the alarm, upgrading the alarm, and confirming the alarm in batches. Intelligent network management will provide related alarm merging and one-click diagnosis functions to help reduce the number of alarms that need to be processed and provide alarm root cause reports. However, the operation and maintenance personnel do not need to know the cause of the alarm. What is needed is to solve the fault behind the alarm and achieve the result of clearing the alarm. There is no effective solution in the relevant technology, and the operation and maintenance personnel need to solve it based on experience. In addition, there are many conventional alarms, and the cost of operation and maintenance personnel is high and it is difficult to handle them manually.

In related technologies, operation and maintenance personnel handle alarms based on their experience and alarm root cause reports, resulting in a large number of alarms, high operation and maintenance personnel costs, and difficulty in manpower processing. No solution has yet been proposed.

Summary of the invention

The embodiments of the present disclosure provide an alarm processing method, device, storage medium and electronic device to at least solve the problem in the related art that operation and maintenance personnel handle alarms based on alarm root cause reports based on experience, resulting in a large number of alarms, high operation and maintenance personnel costs, and difficulty in manual processing.

According to an embodiment of the present disclosure, a method for processing an alarm is provided, the method comprising:

Determine the root cause of the fault in the alarm information, and determine the user's intention based on the root cause of the fault;

In the case where the user intends to handle the fault, the alarm information is screened, the alarm information to be manually processed is eliminated, and the alarms to be processed are obtained;

Determine an alarm solution for the pending alarm;

The pending alarm is processed according to the alarm solution.

According to another embodiment of the present disclosure, there is also provided an alarm processing device, the device comprising:

A first determination module is configured to determine a root cause of a fault in the alarm information, and determine a user intention based on the root cause of the fault;

The screening module is configured to screen the alarm information and remove the alarm information when the user intends to handle the fault. In addition to the alarm information to be processed manually, the alarms to be processed are obtained;

A second determination module is configured to determine an alarm solution for the alarm to be processed;

The processing module is configured to process the to-be-processed alarm according to the alarm solution.

According to another embodiment of the present disclosure, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program is configured to execute the steps of any of the above method embodiments when running.

According to another embodiment of the present disclosure, an electronic device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a hardware structure block diagram of a base station device of an alarm processing method according to an embodiment of the present disclosure;

FIG2 is a flow chart of an alarm processing method according to an embodiment of the present disclosure;

FIG3 is a schematic diagram of automated alarm processing based on machine learning in the intent network domain according to an embodiment of the present disclosure;

FIG4 is a schematic diagram of intent capture according to this embodiment;

FIG5 is a schematic diagram of alarm screening according to this embodiment;

FIG6 is a schematic diagram of a script engine according to this embodiment;

FIG7 is a flow chart of determining an alarm solution according to the present embodiment;

FIG8 is a block diagram of an alarm processing apparatus according to an embodiment of the present disclosure.

Detailed ways

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings and in combination with the embodiments.

It should be noted that the terms "first", "second", etc. in the specification and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence.

The method embodiments provided in the embodiments of the present disclosure can be executed in a base station device or a similar computing device. Taking the operation on the base station device as an example, FIG1 is a hardware structure block diagram of the base station device of the alarm processing method of the embodiment of the present disclosure. As shown in FIG1, the base station device may include one or more (only one is shown in FIG1) processors 102 (the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the above-mentioned base station device may also include a transmission device 106 and an input and output device 108 for communication functions. It can be understood by those skilled in the art that the structure shown in FIG1 is only for illustration, and it does not limit the structure of the above-mentioned base station device. For example, the base station device may also include more or fewer components than those shown in FIG1, or have a configuration different from that shown in FIG1.

The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the alarm processing method in the embodiment of the present disclosure. The processor 102 executes various functional applications and service chain address pool slice processing by running the computer program stored in the memory 104, that is, to implement the above method. The memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include a memory remotely arranged relative to the processor 102, and these remote memories may be connected to the base station device via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

The transmission device 106 is used to receive or send data via a network. The specific example of the above network may include a wireless network provided by a communication provider of a base station device. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 can be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet wirelessly.

In this embodiment, an alarm processing method running on the above-mentioned base station device is provided. FIG. 2 is a flow chart of the alarm processing method according to an embodiment of the present disclosure. As shown in FIG. 2 , the process includes the following steps:

Step S202, determining the root cause of the fault in the alarm information, and determining the user intention according to the root cause of the fault;

The user intent capture in the above step S202 specifically obtains diagnostic information, provides one-click diagnosis, and analyzes the root cause of the alarm. Relying on the single-board diagnostic data of the actual network element and the existing processed alarm data, the sample is sufficient, and as time goes by and the amount of data continues to increase, combined with the alarm type and processing scheme reported by the actual network element, through result-driven model training, the weight of each diagnostic factor for the selection of the network element alarm processing scheme is adjusted, and a comprehensive analysis is performed from multiple dimensions that affect the operation of the base station, such as diagnostic data and environmental data, to determine the root cause of the fault and accurately capture the user intent.

Step S204, when the user intends to handle the fault, the alarm information is screened, and the alarm information to be manually handled is eliminated to obtain the alarms to be handled;

Step S206, determining an alarm solution for the alarm to be processed;

Step S208: Process the pending alarm according to the alarm solution.

In this embodiment, the above-mentioned step S208 may specifically include: adjusting the script execution order of multiple solutions based on the priorities of multiple solutions included in the alarm solution; calling the script framework corresponding to the network management system to generate alarm processing use cases corresponding to the multiple solutions; and executing the alarm processing use cases corresponding to the multiple solutions in sequence according to the script execution order until one of the multiple solutions is successfully executed.

On the alarm details page, online documents are provided to deconstruct the alarm design principles. Alarm processing suggestions are programmable commands and scripts. The scripts are developed, submitted, and shared by operation and maintenance personnel. Based on the success rate of past alarm processing, weight features of relevant solutions (including execution order) are assigned, and machine learning models are used to generate alarm solutions. Based on diagnostic data and weights, the priority of the solutions is evaluated, and the execution order of relevant scripts is adjusted to improve execution efficiency and the success rate of alarm processing. The corresponding script framework of the network management system is called to dynamically generate automated alarm processing use cases, and they are executed in sequence.

Through the above steps S202 to S208, the problem in related technologies that operation and maintenance personnel handle alarms based on alarm root cause reports based on experience, there are many alarms, the cost of operation and maintenance personnel is high, and manual processing is difficult can be solved. This can improve the efficiency of alarm troubleshooting for key facilities in mobile communication networks and shorten the failure time.

In this embodiment, an Isolation Forest (IForest) is used in combination with a one class-support vector machine (OC-SVM) to train the model using normal data after removing abnormal noise points, and to find anomalies in new data. This is an unsupervised algorithm. The model is constructed through the algorithm to achieve initial screening of alarms, remove current abnormal alarms and turn them into manual processing, and retain alarms that can be processed automatically. The above step S204 can specifically include: using the Isolation Forest algorithm in combination with the OC-SVM model to screen the alarm information, remove the alarm information to be manually processed, and screen out the alarms to be processed. Furthermore, diagnostic data and environmental data are collected from the alarm information. Specifically, diagnostic data and environmental data on each board in the corresponding baseband unit (BBU) and remote radio unit (RRU) are collected from the alarm information; the diagnostic data and environmental data are used to form an N-dimensional scatter plot; the degree of alienation between the scattered points in the scatter plot is calculated using the isolation forest algorithm, and abnormal scattered points are removed according to the alienation degree to obtain an N-dimensional preliminary screening scatter plot; based on the OC-SVM model, the points to be manually processed are removed from the preliminary screening scatter plot. Specifically, in the preliminary screening scatter plot, the environmental data is subjected to dimensionality reduction processing, and the preset type features are eliminated according to the actual status of the current network element obtained to obtain the processed environmental data; the diagnostic data is divided into hardware analysis data and software analysis data; the hardware analysis data, software analysis data and processed environmental data are used as feature values for dimensionality reduction to form a target scatter plot; the target scatter plot is screened for the second time based on the OC-SVM model to obtain the alarms to be processed.

In an optional embodiment, the above-mentioned secondary screening of the target scatter plot based on the OC-SVM model to obtain the alarm to be processed may specifically include: determining the position of the sphere where the cluster is located in the N-dimensional space of the above-mentioned target scatter plot and calculating the radius of the sphere; if the scatter points corresponding to the alarm information exceed the radius position, the alarm information is judged to be alarm information to be manually processed; and the alarm information to be manually processed is eliminated from the target scatter plot to obtain the alarm to be processed.

In this embodiment, the above-mentioned step S206 may specifically include: inputting the alarm to be processed into a pre-trained target integrated alarm decision tree, and obtaining multiple solutions and corresponding priorities output by the target integrated alarm decision tree, wherein the target integrated alarm decision tree is based on the processing success rate of the processed alarms, assigns weights of the solutions corresponding to the processed alarms, and is trained based on the training data generated by the processed alarms and the corresponding weights, and the above-mentioned alarm solutions include multiple solutions.

In one embodiment, the method also includes: dividing the training data to form multiple alarm decision trees, and using decision tree pruning to trim some edge results of the multiple alarm decision trees to obtain multiple target alarm decision trees; using a random forest algorithm to combine the multiple target alarm decision trees to obtain an integrated alarm decision tree; and performing overfitting processing on the integrated alarm decision tree to obtain a target integrated alarm decision tree.

In another embodiment, the method further includes: counting the processing success rate of the alarms to be processed; adjusting the weights of the solutions corresponding to the processed alarms in the above training set according to the processing success rate; and updating the above target integrated alarm decision tree according to the adjusted training set. After the command is executed, the success rate of clearing the alarm is counted, the newly added script execution order is recorded, and the training set weight of the machine learning algorithm is updated according to the success rate. After the validity of the newly added execution order is verified, it will be released as a preset solution with the next version. When the automated script cannot clear the fault, a text message or email will be forwarded to the user, allowing manual intervention to optimize the strategy or replace the hardware.

This embodiment can be integrated into a network management system, which is a telecommunication-class operation and maintenance management (Operation and Maintenance Management, referred to as OMM) system based on B/S communication agent components. The OMM system manages no less than 15,000 base stations, and the system has no less than 2,000 preset alarms. When a base station fails during the daily operation and maintenance process, the alarm is reported to the OMM system after conversion by the middleware. After the operation and maintenance personnel monitor the alarm reported by the corresponding facility, they process the alarm.

The application of this embodiment is to perform self-healing repairs on the failures of key facilities in the mobile communication network. Through relevant resource configuration, the alarm is associated with executable processing suggestions, and the relevant scripts or commands are automatically executed, the faulty base station is marked, and the corresponding down-station operation process or hardware maintenance process is reported to achieve automated fault processing, reduce and release manpower input. Figure 3 is a schematic diagram of automated alarm processing based on machine learning in the intent network domain according to an embodiment of the present disclosure, as shown in Figure 3, including:

Step 1, intent capture, according to the alarm design principle, use one-click diagnosis to analyze and capture the user's specific intention for this alarm from the alarm information of the OMM system. There may be multiple reasons for the generation of an alarm. This step diagnoses the network or hardware to identify the root cause of the alarm, narrow the scope of fault location in the problem domain, and quickly determine the user's intention. Figure 4 is a schematic diagram of intent capture according to this embodiment. As shown in Figure 4, taking the "Link between OMM and NE broken" alarm as an example, the transmission between the evolved Node B (eNB) and the network management (OMM) will go through multiple routes, and the first hop route (Gateway) from the network element to the network management is referred to as the network element first hop gateway.

By initiating diagnosis from the OMM, a ping test is initiated to this node (first-hop gateway). If the ping fails, the root cause of the fault is considered to be situation ② in Figure 3, that is, it is determined that the gateway configured at the IP layer referenced by the network management to network element OMC channel is abnormal, which is judged to be a transmission problem, and the link from the Operations & Maintenance Center (OMC) to this device is disconnected.

If the ping is successful, it is considered as situation ① in the figure, that is, the gateway configured in the IP layer referenced by the network management to the network element OMC channel is normal. Therefore, at least the link from the OMC to this device is normal. Then you need to check the transmission problem from the first hop to the network element or the hardware problem of the network element itself.

The operation and maintenance of base station equipment in wireless systems is mainly carried out through alarms. Once an alarm occurs, field operation and maintenance personnel troubleshoot according to the alarm handling suggestions and their own experience. In practice, the front-line operation and maintenance personnel are uneven and their abilities vary greatly. Accurate operation and maintenance prompts are particularly important for troubleshooting efficiency.

In the intent capture phase, one-click diagnosis is used to obtain the root cause of the fault, which can accurately identify the fault behind the alarm and prepare for the next step of troubleshooting.

Intent capture is to capture the state that the user wants the network to achieve into the system. When designing alarms, the OMM system takes into account various fault causes that induce alarms. Conversely, when the system reports an alarm, it is considered that the mobile communication network has a corresponding fault. If the root cause of the alarm can be accurately identified, it is considered that the user's intention has been captured.

Step 2, alarm screening, converts the intent into a set of configuration changes or network configurations that need to be executed, applies the algorithm model to preliminarily screen the uploaded model, and screens out relevant alarms that meet the requirements of automated processing. Before implementing intent analysis, it is necessary to analyze whether the acquired alarms meet the requirements of automated processing. In this step, the isolation forest algorithm is combined with the OC-SVM model to preliminarily screen the alarms, so as to identify the fault solution in the solution domain.

Isolation Forest (iForest) is an unsupervised learning algorithm that can be used for anomaly detection, and is often used for outlier detection and singular value detection. iForest is a method for removing outliers from training data. Unlike other anomaly detection algorithms that use quantitative indicators such as distance and density to characterize the degree of alienation between samples, iForest detects outliers through the isolation of sample points.

First, we collect diagnostic data (voltage, link status, bit error, power, CPU occupancy, board temperature, etc.) on each board in the corresponding network element BBU and RRU frame, as well as environmental data (inlet and outlet temperature, fan speed, input voltage, etc.) from the alarm information we obtain. Currently, there are more than 200 types of diagnostic data. We construct an N-dimensional scatter plot with the corresponding data, and use the isolation forest algorithm to calculate the degree of alienation between sample scatter points. For example, if the voltage is too high due to a short circuit, the sample scatter points will inevitably deviate seriously from the positions of other points. The corresponding alarm information that is eliminated is most likely impossible to complete with automated processing, so it needs to be transferred to manual processing.

FIG5 is a schematic diagram of alarm screening according to the present embodiment. As shown in FIG5, after the abnormal alarm reports are eliminated by the isolation forest, the noise of the OC-SVM alarm model during operation is reduced. Since the OC-SVM vector machine is sensitive to the dimension, it is necessary to perform dimensionality reduction processing on the environmental data. The irrelevant features such as the clock state and the read/write speed are eliminated according to the actual state of the current network element. The diagnostic data is then divided into two aspects: hardware analysis data and software analysis data. The data is used as feature value dimensionality reduction and then input into the OC-SVM vector machine for secondary screening. A scatter plot in N-dimensional space is made according to the existing type dimension, and the formula is used:

z is a new data point, K(z, z) is the outer product of z and z, _αi is the vector data in the training set, K(z, xi) is the outer product of the point and the corresponding αi, αj is the transpose of αi, that is, _αj = _αiT , and R is the sphere radius of the OC-SVM vector machine.

Calculate the sphere position of the main cluster in N-dimensional space and calculate the radius of the sphere. If it exceeds the radius, It is determined that the alarm does not fall within the scope of automated processing, so it is removed from the automated processing list and replaced with manual processing.

After the initial screening of the alarm, according to the fault root cause analysis component, after obtaining the user's potential intention, the OMM system will provide a series of predefined processing suggestions, which will be weighted according to the historical fault resolution success rate. If there is no manual intervention, the processing suggestion with the highest weight value will be selected and converted into specific steps, and the corresponding steps will be associated with the script arrangement and execution in the following article. Take the "network element link broken" alarm as an example:

1. Check whether the network element type is "Management network element (MO SDR)". Yes -> 2 No -> 3.

2. Check the additional text of the alarm.

a. If "voltage abnormality" or "main control board power failure" is prompted, check the power supply of the network element, check and eliminate related faults, and check whether the alarm is restored. Yes -> End No -> b;

b. If the message "Transmission from the network management to the network element OMC channel gateway is abnormal", check the transmission line from the network management to the network element OMC channel gateway, check and eliminate related faults, and check whether the alarm is restored. Yes->End No->c;

c. If the message "Transmission from the network management to the NE OMC channel gateway is normal" appears, check the transmission line from the NE to the NE OMC channel gateway, check and eliminate related faults, and check whether the alarm is restored. Yes -> End No -> 3.

3. Check the transmission status from the network management to the network element.

a. Execute "ping IP" on the network management server to check whether the connection between the network management server and the network element is normal. Yes -> 4 No -> b

b. Check whether the "Management NE IP Address" of the corresponding NE is correct. Yes -> 4 No -> c;

c. Change the IP address, wait for 3 minutes, and check whether the alarm is restored. Yes -> End No -> 4;

4. Contact the NE and network maintenance engineers to check and eliminate related faults and check whether the alarm is restored. Yes->End No->5.

5. Please seek higher level equipment maintenance support.

Step 3, policy execution, generates a predictive priority model through a machine learning algorithm based on the alarm handling solutions previously handled by the user and related diagnostic data, and automatically scripts and processes the predicted alarm handling solutions, outputs automated scripts, and executes network configuration changes.

Due to the complexity of network devices, including base station devices, network management servers, router switch devices, etc., all of these devices may report alarms. To make the intent network operate autonomously, it is necessary to adapt all network devices in the intent network, which is a huge workload. Automation and orchestration help achieve this agility by simplifying network operations and management. The simplest way to automate a network is to use standard, low-level APIs through programmability to provide fine-grained control of mobile base station devices and even chip levels.

The OMM system should have a REST interface based on XML or JSON encoding to support CLI (command line interface) and OPEN API (open application programming interface). Programmability is critical to realizing network-aware applications and application-aware networks. Network programmability does not lie in various interfaces and specifications, but in the abstraction of the network, which can truly reflect user intent, reduce network complexity and improve the level of automation by eliminating manual configuration. After receiving the configuration policies issued, the network devices in the intent network will execute the corresponding policies in sequence.

FIG6 is a schematic diagram of the script engine according to the present embodiment. As shown in FIG6 , the script development and script designer (Open Script Designer, referred to as OSD) is an online script tool provided to engineering technicians and developers. Through this tool, script projects are developed, compiled and published to meet the needs of customized scripts. In addition to providing script compilation and publishing functions, the tool also provides auxiliary functions such as syntax checking, code blocks, automatic completion and online help. The open script engine provides intelligent syntax prompts, business Python SDK library, and script layout designer, allowing developers to develop scripts more conveniently and reduce development time. Threshold. You can set whether the script contains important operations, as well as prompt information about the impact of the operations. When executing a script containing important operations, there will be a verification code and prompt information.

Script arrangement and execution: After finding the script in the Open Script Execution Engine (OSE) application list, you can arrange the script. For example, you can associate the "export network element parameter file script" and the "export alarm script". After the execution is completed, you can download the corresponding attachments to your local computer.

Script management can be categorized and managed by tagging, and scripts can be quickly found. The scripts come with help files, output samples and other information, which can provide more detailed guidance on script use. All operation and maintenance personnel or customized development experts can develop automated scripts based on alarm processing suggestions and push the scripts to the server. After verifying the validity of the script, it will be sent out as a built-in script with future product versions.

The corresponding process can be divided into the following 16 processing solutions and 9 unitized processing cases. Based on the 16 judgment categories that can be used as decision tree categories (i.e., the corresponding priority content of the corresponding processing solutions can be judged), the corresponding unitized processing cases can be designed using the script designer to ensure that the script can correspond to the corresponding processing steps one by one. The tree diagram construction solution is used to arrange the script and build the corresponding automatic alarm processing solution for subsequent execution.

According to the alarm code, the corresponding alarm type is confirmed, and by collecting the specific diagnostic data of the network element, a decision tree is generated to identify the network element alarm solution. In this process, it is clear that the decision tree is a top-down analysis method. Before generation, the data division rules are obtained, and the construction behavior is performed from the root node of the decision tree. Therefore, after completing the acquisition of the specific type of alarm, the network element diagnostic data is randomly divided into several subsets, and the data attributes are evaluated with reference to the Gini purity number. The lower the coefficient value, the fewer data attributes it represents. When the coefficient value is equal to 0, it indicates that the subset and the array category are consistent. According to this basis, the array Gini coefficient is calculated. The formula is as follows:

D is the total number of samples, ci is the number of samples in the i-th category.

Data is divided according to the training data to form multiple decision trees. Decision tree pruning is used to reduce some marginal results of the decision tree, and multiple decision tree classifiers are combined using the random forest algorithm to achieve an integrated decision tree classifier with better prediction effect. When looking for features to split at the node, it is not to find all the features that can maximize the indicator (such as information gain), but to randomly extract a part of the features, find the optimal solution among the extracted features, apply it to the node, and split it. The random forest method has Bagging (Bootstrap aggregating, guided aggregation algorithm), that is, the idea of integration, which is actually equivalent to sampling both samples and features, so overfitting can be avoided.

Figure 7 is a flow chart of the alarm solution determined according to the present embodiment. As shown in Figure 7, the decision tree after overfitting processing can predict the alarm processing solution according to the acquired network element diagnostic data input into the tree and pre-generate an automated alarm processing use case to ensure timely alarm processing, and confirm the priority of the alarm processing solution according to the post-order traversal sequence of the tree, reserve a backup processing solution corresponding to the current alarm in the buffer to ensure the success rate of automated alarm processing, and set the MAX value of the processing solution to be processed in parallel with the alarm processing, that is, the number of cycles staying in the processing area in the processing loop, so as to avoid blocking the processing of subsequent alarms for a long time when a certain alarm cannot be resolved.

Step 4, network feedback, the base station or server provides network status feedback information to confirm whether the alarm is processed successfully. If unsuccessful, the next solution predicted in the strategy execution step is continued in a loop until the maximum number of loops is reached.

After the network configuration is sent to the network equipment with the help of OSE scripts and successfully executed, the intentional network needs to monitor the operation status of the network in real time. On the one hand, it collects network performance data and alarm data to observe whether the alarm has been restored, whether the network performance has returned to normal, and whether the configuration data has been synchronized to the base station equipment normally. On the other hand, it continuously predicts network equipment failures and abnormal conditions, for example, if alarm A is restored, whether alarm B is associated.

The system will continue to verify in real time whether the original business intent has been met, and can perform corrective actions if the preset intent is not achieved, forming a continuous closed-loop system, which improves the availability and agility of the network. Only a continuous closed-loop system can guarantee the effectiveness of the intent and ensure that the intent will not be disturbed by sudden network conditions.

Step 5, strategy optimization, the analysis component verifies the received network-driven feedback information through the requested intent to verify whether the requested intent is running according to the request and design expectations, and collects the successfully processed solutions and assigns them corresponding weights as a supplementary data set to further improve the AI prediction model in steps 1, 2, and 3. The characteristic of this step is that in a commercial mobile communication network, alarm reports are very frequent, reaching 100,000 per day. After each alarm report triggers the execution of the strategy, the effect of the strategy (fault resolution speed and fault resolution situation) can be verified, and the prediction model can be adjusted in reverse guidance.

Step 6, Intent Feedback, reports the status and operation of the requested intent through value-based business outcomes.

Once a network anomaly is detected, the intent network needs to provide timely feedback to the intent capture link to re-convert, verify and execute the user intent.

Regarding the problem of decision tree prediction, the alarm processing has large random factors and empiricism problems, so it is necessary to further optimize the fitting problem of the decision tree. The following two solutions can further optimize the logarithmic fitting:

Since the training data of the alarm processing decision tree comes from the user's own successful pre-processing use cases, it is possible to start with the training data, optimize the training data and screen the data in a targeted manner with the cooperation of the operation and maintenance personnel, so that there are more experience intentions in making decisions, which is more in line with the relevant processes of manual processing and improves the accuracy of processing. In addition, the weight of the new model is compared with the old model, and the selection weight of this number is increased according to the success rate of processing, and it is fed back to the user processing.

It is also possible to assign weights to the corresponding alarm success ratios in the processing solutions and add them as new feature values to the training data. According to the assigned weights, the proportion of solutions assigned by the solutions can be increased, and the accuracy of predictions and the efficiency of automated alarm processing can be continuously improved.

This embodiment collects single-board diagnostic data, applies machine learning algorithms, converts the alarm operation and maintenance personnel's intention to solve network failures into strategies, and then implements them. When an alarm is reported, the corresponding automated processing suggestion for the alarm will be triggered to execute relevant scripts and command lines for automatic repair. According to the diagnostic data of the network element single board, a large number of repetitive alarms are automatically processed to achieve limited resources and reasonable allocation. According to the automated solution for network element alarm processing, it can be applied to scenarios with high operation and maintenance personnel costs, difficult manual processing, and more conventional alarms. For alarm information with abnormal deviations in diagnostic data and a high probability of being unable to be automatically processed, it can be further provided to the operation and maintenance personnel with multi-dimensional processing analysis based on the degree of deviation of the corresponding data. Each alarm processing can assign weights to the processing plan and pre-process the plan. A large amount of data is also conducive to the application of this automated processing method to larger-scale regions.

According to another aspect of an embodiment of the present disclosure, an alarm processing device is further provided. FIG8 is a block diagram of the alarm processing device according to an embodiment of the present disclosure. As shown in FIG8 , the device includes:

A first determination module 82 is configured to determine a root cause of a fault in the alarm information, and determine a user intention based on the root cause of the fault;

A screening module 84 is configured to screen the alarm information and remove the alarm information to be manually processed to obtain the alarms to be processed when the user intends to process the fault;

A second determination module 86 is configured to determine an alarm solution for the alarm to be processed;

The processing module 88 is configured to process the to-be-processed alarm according to the alarm solution.

In one embodiment, the screening module 84 is further used to screen the alarm information by using an isolation forest algorithm combined with an OC-SVM model, to eliminate the alarm information to be manually processed, and to screen out the alarms to be processed.

In one embodiment, the screening module 84 includes:

A collection submodule, configured to collect diagnostic data and environmental data from the alarm information;

A formation submodule, configured to form an N-dimensional scatter plot of the diagnostic data and the environmental data;

A first elimination submodule is configured to calculate the degree of alienation between the scattered points in the scatter plot by using the isolation forest algorithm, and eliminate abnormal scattered points according to the alienation degree to obtain an N-dimensional preliminary screening scatter plot;

The second elimination submodule is configured to eliminate the alarm information to be manually processed from the preliminary screening scatter plot based on the OC-SVM model to obtain the alarm to be processed.

In one embodiment, the second elimination submodule includes:

A dimension reduction unit is configured to perform dimension reduction processing on the environmental data in the preliminary screening scatter plot, and remove preset type features according to the acquired actual state of the current network element to obtain processed environmental data;

A composition unit, configured to divide the diagnostic data into hardware analysis data and software analysis data, and use the hardware analysis data, the software analysis data and the processed environment data as feature values for dimension reduction to form a target scatter plot;

The secondary screening unit is configured to perform secondary screening on the target scatter plot based on the OC-SVM model to obtain the alarm to be processed.

In one embodiment, the secondary screening unit is further configured to determine the position of the sphere where the cluster is located in the N-dimensional space of the target scatter plot and calculate the radius of the sphere; if the scatter points corresponding to the alarm information exceed the radius position, the alarm information is judged to be the alarm information to be manually processed; the alarm information to be manually processed is eliminated from the target scatter plot to obtain the alarm to be processed.

In one embodiment, the second determination module 96 is further configured to input the alarm to be processed into a pre-trained target integrated alarm decision tree to obtain multiple solutions and corresponding priorities output by the target integrated alarm decision tree, wherein the target integrated alarm decision tree is based on the processing success rate of the processed alarms, assigns weights of the solutions corresponding to the processed alarms, and is trained based on the training data generated based on the processed alarms and the corresponding weights, and the alarm solution includes the multiple solutions.

In one embodiment, the device further comprises:

A data partitioning module is configured to perform data partitioning on the training data to form a plurality of alarm decision trees, and to use decision tree pruning to prune some edge results of the plurality of alarm decision trees to obtain a plurality of target alarm decision trees;

A combination module, configured to use a random forest algorithm to combine the multiple target alarm decision trees to obtain an integrated alarm decision tree;

The overfitting module is configured to perform overfitting processing on the integrated alarm decision tree to obtain a target integrated alarm decision tree.

In one embodiment, the device further comprises:

A statistics module, configured to collect statistics on the success rate of processing the pending alarms;

An adjustment module, configured to adjust the weight of the solution corresponding to the processed alarm in the training set according to the processing success rate;

The updating module is configured to update the target integrated alarm decision tree according to the adjusted training set.

In one embodiment, the processing module 88 is further configured to adjust the script execution order of the multiple solutions based on the priorities of the multiple solutions; call the script framework corresponding to the network management system to generate the scripts corresponding to the multiple solutions; Alarm processing use case; execute the alarm processing use cases corresponding to the multiple solutions in sequence according to the script execution order until one of the multiple solutions is successfully executed.

An embodiment of the present disclosure further provides a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps of any of the above method embodiments when running.

In an exemplary embodiment, the above-mentioned computer-readable storage medium may include, but is not limited to: a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, a magnetic disk or an optical disk, and other media that can store computer programs.

An embodiment of the present disclosure further provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.

In an exemplary embodiment, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary implementation modes, and this embodiment will not be described in detail herein.

Obviously, those skilled in the art should understand that the above modules or steps of the present disclosure can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, they can be implemented by a program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, the steps shown or described can be executed in a different order than here, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation. Thus, the present disclosure is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the principles of the present disclosure shall be included in the protection scope of the present disclosure.

Claims

An alarm processing method, the method comprising:

Determine the root cause of the fault in the alarm information, and determine the user's intention based on the root cause of the fault;

In the case where the user intends to handle the fault, the alarm information is screened, the alarm information to be manually processed is eliminated, and the alarms to be processed are obtained;

Determine an alarm solution for the pending alarm;

The pending alarm is processed according to the alarm solution.
The method according to claim 1, wherein screening the alarm information, eliminating the alarm information to be manually processed, and obtaining the alarms to be processed comprises:

The alarm information is screened by using the isolation forest algorithm in combination with the OC-SVM model, the alarm information to be manually processed is eliminated, and the alarms to be processed are screened out.
The method according to claim 2, wherein the alarm information is screened using an isolation forest algorithm combined with an OC-SVM model to eliminate alarm information to be manually processed, and the screened alarms to be processed include:

collecting diagnostic data and environmental data from the alarm information;

Constructing an N-dimensional scatter plot of the diagnostic data and the environmental data;

The isolation forest algorithm is used to calculate the degree of alienation between the scattered points in the scatter plot, and abnormal scattered points are eliminated according to the degree of alienation to obtain an N-dimensional preliminary screening scatter plot;

Based on the OC-SVM model, the alarm information to be manually processed is removed from the preliminary screening scatter plot to obtain the alarm to be processed.
The method according to claim 3, wherein, based on the OC-SVM model, removing the alarm information to be manually processed from the preliminary screening scatter plot, and obtaining the alarm to be processed comprises:

Performing dimensionality reduction processing on the environmental data in the preliminary screening scatter plot, and eliminating preset type features according to the acquired actual state of the current network element to obtain processed environmental data;

dividing the diagnostic data into hardware analysis data and software analysis data;

The hardware analysis data, the software analysis data and the processed environment data are used as feature values for dimension reduction to form a target scatter plot;

The target scatter plot is screened a second time based on the OC-SVM model to obtain the alarm to be processed.
The method according to claim 4, wherein performing secondary screening on the target scatter plot based on the OC-SVM model to obtain the alarm to be processed comprises:

Determine the sphere position where the cluster is located in the N-dimensional space of the target scatter plot and calculate the radius of the sphere;

If the scattered points corresponding to the alarm information exceed the radius position, the alarm information is judged to be the alarm information to be manually processed;

The alarm information to be manually processed is eliminated from the target scatter plot to obtain the alarm to be processed.
The method according to claim 1, wherein determining an alarm solution for the pending alarm comprises:

The alarm to be processed is input into a pre-trained target integrated alarm decision tree to obtain multiple solutions and corresponding priorities output by the target integrated alarm decision tree, wherein the target integrated alarm decision tree is trained based on the processing success rate of the processed alarms, assigns weights of the solutions corresponding to the processed alarms, and is generated based on the training data generated by the processed alarms and the corresponding weights, and the alarm solution includes the multiple solutions.
The method according to claim 6, wherein the method further comprises:

The training data is divided into multiple alarm decision trees, and some edge results of the multiple alarm decision trees are trimmed by decision tree pruning to obtain multiple target alarm decision trees;

Using a random forest algorithm, the multiple target warning decision trees are combined to obtain an integrated warning decision tree;

An overfitting process is performed on the integrated alarm decision tree to obtain a target integrated alarm decision tree.
The method according to claim 6 or 7, wherein the method further comprises:

Collecting statistics on the success rate of processing the pending alarms;

adjusting the weight of the solution corresponding to the processed alarm in the training set according to the processing success rate;

The target integrated alarm decision tree is updated according to the adjusted training set.
The method according to claim 6, wherein processing the pending alarm according to the alarm solution comprises:

adjusting the script execution order of the multiple solutions based on the priorities of the multiple solutions;

Calling a script framework corresponding to the network management system to generate alarm processing use cases corresponding to the multiple solutions;

The alarm processing use cases corresponding to the multiple solutions are executed in sequence according to the script execution order until one of the multiple solutions is successfully executed.
An alarm processing device, the device comprising:

A first determination module is configured to determine a root cause of a fault in the alarm information, and determine a user intention based on the root cause of the fault;

A screening module, configured to screen the alarm information and remove the alarm information to be manually processed to obtain the alarms to be processed when the user intends to process the fault;

A second determination module is configured to determine an alarm solution for the alarm to be processed;

The processing module is configured to process the to-be-processed alarm according to the alarm solution.
A computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the method described in any one of claims 1 to 9 when run.
An electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to execute the method described in any one of claims 1 to 9.