WO2023093527A1 - 告警关联规则生成方法、装置、电子设备和存储介质 - Google Patents

告警关联规则生成方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023093527A1
WO2023093527A1 PCT/CN2022/130711 CN2022130711W WO2023093527A1 WO 2023093527 A1 WO2023093527 A1 WO 2023093527A1 CN 2022130711 W CN2022130711 W CN 2022130711W WO 2023093527 A1 WO2023093527 A1 WO 2023093527A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
association
correlation
association rule
alarms
Prior art date
Application number
PCT/CN2022/130711
Other languages
English (en)
French (fr)
Inventor
唐英
杜贤俊
宋军
郑聂聪
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023093527A1 publication Critical patent/WO2023093527A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • the embodiments of the present disclosure relate to the technical field of mobile communication, and in particular to a method, device, electronic device and storage medium for generating an alarm association rule.
  • the 5G network realizes the customization, openness and service of the network through the service-based network architecture, combined with virtualization and cloudization technologies.
  • the virtualized network implements the functions of traditional telecommunication equipment through software, runs on general hardware equipment, and uses virtualization technology to realize the sharing of hardware resources.
  • the virtualized network is divided into hardware layer, virtual layer, and network element layer from bottom to top.
  • the lower layer resources are the basis for the operation of the upper layer resources.
  • the failure of the physical layer resources often causes the failure of the virtual layer resources, and the failure of the virtual layer resources, It will also cause network elements to fail, and eventually affect the normal processing of services. Therefore, when an alarm occurs on a resource in a virtualized network, it often causes multiple virtual resources to fail. As the scale of resources increases, the number of alarms will also increase rapidly. In order to efficiently solve the problem of resource alarms, it is particularly important in 5G networks to analyze the correlation of different resource alarms to find and handle root alarms.
  • One of the commonly used alarm correlation analysis methods today is to pre-define alarm correlation rules based on the accumulation of expert knowledge bases.
  • a rule engine is used to calculate the correlation of alarms within a certain time slice based on existing rules.
  • Another method is that the system uses machine learning and big data analysis to mine alarm association rules.
  • the alarm association rules generated by machine learning are less practical, and more rules are mined, occupying a large amount of system resources; Alarm correlation rules are more effective, but the generation process is more complicated. Therefore, how to concisely and efficiently establish alarm association rules that are suitable for the current network structure is still an urgent problem to be solved.
  • the main purpose of the embodiments of the present disclosure is to propose a method, device, electronic device, and storage medium for generating alarm association rules, aiming at generating concisely and efficiently alarm association rules that can accurately identify the association relationships between alarms for the current network architecture, and improving Efficiency and accuracy of alarm handling.
  • an embodiment of the present disclosure provides a method for generating an alarm association rule, which includes: performing an associated alarm analysis on each resource in the resource type relationship tree according to a slice of historical alarms, and obtaining associated alarms and associated times of associated alarms; According to the preset correlation strength between resources, obtain the correlation coefficient corresponding to the correlation alarm; according to the correlation times and correlation coefficient, obtain the correlation degree of the correlation alarm; if the correlation degree is greater than the first preset threshold, generate the correlation alarm Alarm correlation rules.
  • an embodiment of the present disclosure also proposes an alarm association rule generation device, including: an acquisition module, configured to analyze the association alarms of each resource in the resource type relationship tree according to the historical alarm slice, and obtain the association alarm and The number of association times of associated alarms; the calculation module is set to obtain the correlation coefficient corresponding to the associated alarm according to the preset association strength between resources; the determination module is set to obtain the degree of association of the associated alarm according to the number of association times and the correlation coefficient; generate A module configured to generate an alarm association rule according to the associated alarm when the degree of association is greater than a first preset threshold.
  • an embodiment of the present disclosure also proposes an electronic device, the device includes: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be executed by the at least one processor An instruction, the instruction is executed by at least one processor, so that the at least one processor can execute the method for generating an alarm correlation rule as described above.
  • the embodiments of the present disclosure further provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the method for generating an alarm association rule as described above is implemented.
  • FIG. 1 is a flow chart of a method for generating an alarm association rule in an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a 5G virtualized network resource model in an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a resource type relationship tree in an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a resource instance relationship tree in an embodiment of the present disclosure.
  • FIG. 5 is a flow chart of a method for mining and maintaining alarm association rules in an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of an alarm association rule mining and maintenance device in an embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of an apparatus for generating an alarm association rule in another embodiment of the present disclosure.
  • Fig. 8 is a schematic structural diagram of an electronic device in another embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a method for generating an alarm association rule, which includes: performing an alarm association analysis on each resource in the resource type relationship tree according to the historical alarm slice, and obtaining the associated alarm and the number of associated alarms ; Obtain the correlation coefficient corresponding to the correlation alarm according to the preset correlation strength between resources; obtain the correlation degree of the correlation alarm according to the correlation times and correlation coefficient; Generate alarm correlation rules.
  • a historical alarm slice is generated according to the alarm information of each resource within a period of time, and then each resource is analyzed according to the historical alarm slice.
  • Perform associated alarm analysis obtain associated alarms and the number of associations between associated alarms, combine them as the preset association strength between resources, obtain the association coefficient of associated alarms, and obtain the degree of association of associated alarms according to the number of associations and the association coefficient.
  • an alarm association rule is generated according to the associated alarm.
  • the correlation degree of correlation alarms is accurately measured, and then alarm correlation rules are generated based on correlation alarms with a sufficiently high correlation degree. It makes the generation of alarm correlation rules more targeted and at the same time simplifies the complexity of generating alarm correlation rules. It can generate concise and efficient alarm correlation rules for the current network architecture, thereby improving the efficiency of using alarm correlation rules for alarm correlation analysis. accuracy and efficiency.
  • the first aspect of the embodiments of the present disclosure provides a method for generating an alarm association rule.
  • the alarm association rule generation method is applied to analysis with communication and analysis capabilities.
  • Control terminals such as electronic devices such as computers, tablets, and mobile phones, are described in this embodiment by taking the application in computers as an example, at least including but not limited to the following steps:
  • Step 101 analyze the associated alarms of each resource in the resource type relationship tree, and obtain the associated alarms and the associated times of the associated alarms.
  • the computer before the computer generates alarm correlation rules, it first defines the resource types that need to be analyzed for alarm correlation according to the virtualized network architecture, sorts out the relationship between these resource types, and establishes a resource type relationship tree. Then obtain the historical alarm information of each resource within a certain period of time, generate multiple historical alarm slices in chronological order, and establish the relationship between resource instances according to the resource type relationship tree, and generate a resource instance relationship tree. Combined with the resource instance relationship tree, according to the history Alarm slicing, which analyzes the associated alarms of each resource in the resource type relationship tree, and obtains the associated alarms with associated relationships and the number of associated alarms.
  • FIG. 2 a schematic diagram of a 5G virtualized network resource model is shown in Figure 2.
  • the uppermost network element layer includes service network elements and virtualized network functions
  • the virtual layer in the middle layer includes hosts, virtual interactive machines, and virtual databases.
  • virtual network, virtual host, virtual network card, and virtual disk are virtualized network resources.
  • the lowest hardware layer includes servers, system servers, interactive servers, interactive system servers, switch ports, router ports, switches, and routers. Facilities or ports, each resource communicates and connects according to business interaction or association relationship.
  • the computer determines the range of resources to be analyzed, and after sorting out the hierarchical relationship of resources, the resource type relationship tree established is shown in Figure 3.
  • the generated resource type relationship tree can be stored in a graph database, resource types are stored as type tree nodes in the library, and relationships between types are stored as relationship edges between tree nodes.
  • the resource instance relationship tree generated by the computer according to the resource type relationship tree is shown in Figure 4.
  • the resource instance relationship tree can also be stored in the graph database.
  • the resource instance is stored as an instance tree node, and the relationship between instances is stored as the relationship between nodes. relationship side.
  • the relationship between the instance and the type can be established, and the resource id can be used to create an index.
  • the overall structure of the resource instance relationship tree is that different routers are connected to at least one server through different switches, and the corresponding hosts are connected through the server, and then each host is connected through its own connected virtual host and virtual network function.
  • EMS Element Management System
  • VIM Virtualized Infrastructure Manager
  • PIM Physical Infrastructure Manager
  • the computer reads the historical alarm information of each resource contained in the resource type relationship tree within a certain period of time in the historical alarm database, and according to the time sequence of alarm occurrence and the preset time granularity, the historical alarm The information is divided into multiple historical alarm slices.
  • Use slice scanning to perform historical alarm slice scanning in the resource relationship instance relationship tree, analyze the associated alarms of each resource in the resource type relationship tree, and identify a group of alarms with both resource correlation and time correlation as associated Alarms, and count the number of occurrences of alarm sets with time and resource correlations, and obtain associated alarms with associated relationships and the associated times of associated alarms.
  • the specific method for the computer to use historical alarm slices to analyze associated alarms is as follows: In the generated resource instance relationship tree, start from the earliest alarm slice among the obtained historical alarm slices, and play each historical alarm slice one by one according to a certain playback speed , update the alarm state of each resource in the resource instance relationship tree according to the alarm information in each alarm slice. In the case that multiple resources have alarm changes at the same time, extract the resources that have alarm changes, and query whether these resources have resource dependencies on the resource instance relationship tree.
  • resource A and B have alarm changes at the same time
  • resource A and resource B have a relationship edge in the resource instance relationship tree, or there is a reachable path from resource A to resource B in the resource instance relationship tree, and this slice
  • this slice If there are no alarm changes in other resources on this path, it is determined that the two alarms are resource-related.
  • the resources whose alarms change at the same time have resource dependencies, according to the resource type and the level of the resource in the resource type relationship tree, set the alarm of the resource at the lower level as the root alarm, and set the alarm of the resource at the upper level as the child alarm. alarm, and set the two as associated alarms. And extract the key features of the associated alarms to generate associated tags.
  • the main content of the mark is the resource type, alarm code and association times of the two alarms, for example, it is recorded as [(resource type 1, alarm code 1), (resource type 2, alarm code 2), association times n], and the association times are initially The value is 1, and every time the feature corresponding to the associated alarm is scanned, the number of associations is increased by 1. There may be multiple resources in a slice that have alarm changes at different times at the same time.
  • these alarms are time- and resource-dependent.
  • the network element with the service discovery function and router 1 also generate alarm changes in this time slice, but this network element does not have a reachable path to router 1, so the two alarms are not related in this time slice.
  • the computer After the scanning of the current historical alarm slice is completed, the computer automatically scans the next historical alarm slice and updates the associated alarms until the scanning of all historical alarm slices is completed to obtain one or more related resources in the resource type relationship tree. Associated alarms and the associated times of each associated alarm. By analyzing the associated alarms of each resource in the resource type relationship tree according to the historical alarm slice, from the two dimensions of time association and resource association, the associated alarms and the number of associated alarms can be accurately obtained, which is convenient for subsequent generation of alarm association rules.
  • Step 102 according to the preset correlation strength between resources, obtain the correlation coefficient corresponding to the correlation alarm.
  • the computer obtains associated alarms and associated times of associated alarms based on historical alarm slices, according to the type of resources where associated alarms are located and the preset association strength between resources, the computer obtains associated alarms when performing association degree analysis. Corresponding correlation coefficient.
  • the computer obtains the correlation coefficient corresponding to the associated alarm according to the preset correlation strength between resources, including: obtaining the resource connection path between the resources where the associated alarm is located; according to the preset alarm correlation strength between resources , to obtain the preset alarm correlation strength between the resources corresponding to each sub-path in the resource connection path; according to the preset alarm correlation strength between the resources corresponding to each sub-path, obtain the correlation coefficient corresponding to the related alarm.
  • the computer obtains the associated alarm, it searches for the resource connection path between the resources where the associated alarm is located in the resource instance relationship tree generated according to the resource type relationship tree, and then divides it into multiple sub-nodes according to the different resource nodes that the resource connection path passes through.
  • Path and according to the preset alarm correlation strength between resources, obtain the preset alarm correlation strength corresponding to each sub-path, and then combine the preset alarm correlation strength corresponding to each sub-path on the resource connection path to obtain the correlation corresponding to the alarm resource coefficient.
  • the correlation strength of each stage in the resource connection path is obtained according to the preset correlation strength between resources, and the correlation coefficient is accurately obtained by combining the correlation strength of each stage, so as to ensure the accuracy of the evaluation of the subsequent correlation degree.
  • the alarm correlation strength between resources in the resource type relationship tree is pre-assessed as follows: service network element-virtual host : The association strength is 0.4; virtual host-host: the association strength is 0.8; virtual host-virtual switch: the association strength is 0.5; virtual host-disk array: the association strength is 0.6; host-server: the association strength is 1; server-switch : the correlation strength is 0.5; magnetic array-switch: the correlation strength is 0.5; switch-switch: the correlation strength is 0.5; switch-router: the correlation strength is 0.5.
  • the alarm correlation strength between resources can be manually set based on expert experience, or can be calculated by the computer based on expert experience.
  • the correlation strength can be stored as an attribute on the edge of the resource type relationship, and stored in the graph database together with the resource type relationship tree , making it easy to find references later.
  • the computer When the computer obtains the correlation coefficient corresponding to the associated alarm, it first obtains the specific resource where the associated alarm is located. For example, in the resource instance relationship tree shown in Figure 4, the associated alarm is [(switch 4, Ethernet protocol down), (the user plane data function, the routing group is unavailable)], the resource where the associated alarm is located is the switch 4 and the user plane data function. Then, the computer obtains the connection path between the switch 4 and the user plane data function by querying the resource connection path in the resource instance relationship tree, which is switch 4—server 7—host 7—virtual host 6—user plane data function. Combined with the preset correlation strengths between resources, the preset correlation strengths corresponding to each sub-path are 0.5, 1, 0.8, and 0.4 respectively.
  • obtaining the resource connection path between the resources where the associated alarms are located by the computer includes: obtaining the shortest resource connection path between the resources where the associated alarms are located.
  • the computer acquires the resource connection path, if it detects that there are multiple resource connection paths between the resources where the associated alarm is located, the shortest resource connection path is selected as the resource connection path for calculating the correlation coefficient corresponding to the associated alarm.
  • the computer can also calculate the correlation coefficient corresponding to each resource connection path, and select a resource connection path according to the magnitude of the correlation coefficient, for example, use the connection path with the largest or smallest corresponding correlation coefficient as the selected resource connection path.
  • Step 103 according to the correlation times and the correlation coefficient, the correlation degree of the correlated alarm is acquired.
  • the computer After the computer obtains the associated alarm, the number of times associated with the associated alarm, and the correlation coefficient corresponding to the associated alarm, it evaluates the specific degree of association of the alarms contained in the associated alarm. Coefficient, to get the degree of association of associated alarms. By determining the degree of association of associated alarms based on the number of associations and the association coefficient, the degree of association of associated alarms is comprehensively analyzed at the two levels of alarm occurrence probability and alarm association probability to accurately obtain the degree of association of associated alarms.
  • the computer can calculate the correlation value of the correlation alarm according to the product of the correlation times of the correlation alarm and the correlation coefficient corresponding to the correlation alarm, and measure the correlation degree of the correlation alarm through the correlation value.
  • the obtained association alarm is [(switch 4, Ethernet protocol down), (user plane data function, routing group is unavailable)]
  • the number of associations is 20
  • the association degree corresponding to the association alarm [(switch 4, Ethernet protocol down), (user plane data function, routing group unavailable)] is 3.2.
  • Step 104 if the degree of association is greater than a first preset threshold, an alarm association rule is generated according to the associated alarm.
  • the computer acquires the degree of association of the associated alarm, it detects whether the acquired degree of association meets the first preset threshold, and if the degree of association of the associated alarm is greater than the first preset threshold, it determines whether the degree of association of the associated alarm is greater than the first preset threshold.
  • the degree of association is high enough, and the correlation between associated alarms is significant, and then an alarm association rule is generated based on the associated alarms.
  • the computer generates an alarm association rule according to the associated alarm, including: obtaining the first alarm key feature corresponding to the root alarm in the associated alarm and the second alarm key feature corresponding to the sub-alert; wherein, the alarm key feature includes a resource type and an alarm code; According to the first alarm key feature and the second alarm key feature, an alarm association rule is generated; the alarm association rule includes: in the case of detecting that there are multiple alarms whose alarm key features are respectively the first alarm key feature and the second alarm key feature , the alarm whose key feature is the first alarm key feature is marked as the root alarm, the alarm whose key feature is the second alarm key feature is marked as a sub-alarm, and the root alarm and the sub-alarm are associated alarms.
  • the alarm association rule generation process is started, the key feature of the alarm is extracted from the associated alarm, and the resource type 1 and alarm code 1 corresponding to the root alarm in the associated alarm are obtained. Resource type 2 and alarm code 2 corresponding to the sub-alarm.
  • the obtained alarm key features generate an alarm flag whose key features of the alarm are the first alarm key feature and the second alarm key feature when multiple alarms are detected, the key feature of the alarm is the first alarm key feature is the root alarm, and the alarm whose key feature is the second alarm key feature is marked as a sub-alert, and the root alarm and the sub-alert are the alarm association rules of the associated alarm.
  • the computer After the computer generates the alarm association rule according to the associated alarm, it further includes: updating the effectiveness of the alarm association rule according to the real-time alarm message or the historical alarm slice; when the effectiveness of the alarm association rule is greater than the second preset When the threshold is exceeded, the alarm correlation rule takes effect.
  • the computer After the computer generates the alarm correlation rules, it can first store the generated initial alarm correlation rules in the set of seed alarm correlation rules or set them as seed alarm correlation rules, and then correlate the alarms according to the real-time alarm messages or the alarm information in the historical alarm slice.
  • the computer updates the effectiveness of the alarm association rule according to the real-time alarm message or the historical alarm slice, including: determining the alarm whose alarm status has changed according to the real-time alarm message or the alarm information in the historical alarm slice; , the alarm status includes reporting alarm and alarm recovery; when the alarm belongs to the associated alarm corresponding to the alarm association rule, it is detected whether the alarm status change of the associated alarm is reported within the preset time interval; the alarm status change of the associated alarm is not reported In the case of , lower the effectiveness of the alarm association rules; in the case of reporting the alarm state changes of the associated alarms, adjust the effectiveness of the alarm association rules according to the reporting sequence of the alarm state changes of the associated alarms.
  • the computer When the computer adjusts the effectiveness of the alarm correlation rules, it obtains the real-time alarm message or the alarm information in the historical alarm slice, and through the analysis of the alarm information, obtains the resources and alarms to which the alarm has been reported and/or the alarm has been restored. code. Detect whether the obtained alarm belongs to the associated alarm corresponding to the alarm correlation rule, and further detect whether the alarm status change of the associated alarm is reported within the preset time interval if the obtained alarm belongs to the associated alarm corresponding to the alarm correlation rule.
  • the reported alarm message in the case that the reported alarm message is obtained, it is detected whether the associated alarm report has occurred within a certain period of time before and after the occurrence of the reported alarm message; in the case of the obtained alarm recovery message, the detection alarm Whether the associated alarm recovery has occurred before and within a certain period of time after the recovery message occurs.
  • the detection alarm Whether the associated alarm recovery has occurred before and within a certain period of time after the recovery message occurs.
  • down-regulate the effectiveness of the alarm association rules down-regulate the effectiveness of the alarm association rules; The reporting sequence of alarm status changes adjusts the effectiveness of alarm association rules.
  • the applicability of the alarm correlation rules to the current network architecture can be accurately measured, so that after the alarm correlation rules take effect, the system can analyze them in a timely and efficient manner. Correlation alarms in the current network architecture.
  • an effective score can be preset for each alarm correlation rule. One point is added each time the effectiveness level is increased, and one point is subtracted when the effectiveness level is decreased.
  • the preset effective score can be It is a fixed value, or it can be a score set according to the correlation degree of the associated alarm, and the score adjustment step can also be set by yourself.
  • the computer adjusts the effectiveness of the alarm association rules according to the reporting order of the alarm state changes of the associated alarms, including: the order of reporting the alarm state changes in the associated alarms is that the root alarm state changes are reported, and then the sub-alarm state changes are reported. In this case, increase the effectiveness of the alarm association rules; in the case of reporting the alarm status changes in the associated alarms, the sub-alarm status changes are reported, and then the root alarm status changes are reported, and the alarm association rules are lowered.
  • the computer detects that the alarm state change of the associated alarm is reported within the preset time window, it detects the reporting sequence of the alarm state change, and the order of reporting the detected alarm state change is after the root alarm state change in the associated alarm is reported.
  • the computer further includes: updating the validity degree of the effective alarm association rule according to the real-time alarm message; when the validity degree of the alarm association rule is not greater than the second preset threshold In this case, the alarm correlation rule fails.
  • the computer adopts a verification method similar to the above-mentioned validity degree, and updates the effectiveness of the effective alarm association rules in real time according to the real-time alarm messages in the current network architecture, that is, the alarm association rules that are set as formal Update the effectiveness of rules or rules stored in the official alarm association rule set, and detect the relationship between the effectiveness of the active alarm association rules and the second preset threshold after each update or after a preset interval , when the effectiveness of the alarm correlation rule is not greater than the second preset threshold, invalidate the alarm correlation rule, that is, set the formal alarm correlation rule as an invalid rule, reset it as a seed alarm correlation rule, or directly remove the formal alarm correlation rule collection of rules.
  • the alarm association rule can be removed from the effective rule set or set to an invalid state, which is not limited in this embodiment.
  • the effectiveness and effective status of the alarm correlation rules are updated in real time, avoiding a large number of rules that are not applicable to the current network architecture in the effective rules, and ensuring that the active alarm correlation rules can be better applicable to the current network Architecture to ensure the accuracy and efficiency of alarm correlation analysis.
  • the invalidated alarm correlation rule may also be added to the storage set of other alarm analysis rules, for example, impact analysis of tidal service correlation alarm, etc., which is not limited in this embodiment.
  • the computer after the computer updates the effectiveness of the alarm association rules according to the real-time alarm messages or historical alarm slices, it further includes: deleting the alarm when the effectiveness of the alarm association rules is less than the third preset threshold An association rule; wherein, the third preset threshold is smaller than the second preset threshold.
  • the computer After the computer updates the effectiveness of the alarm association rules, it will also detect whether the effectiveness of the alarm association rules meets the third preset threshold lower than the second preset threshold, and delete the third preset threshold whose effectiveness is lower than the third preset threshold.
  • Alarm correlation rules for thresholds By setting a third preset threshold, a lower limit is set for the effectiveness of the alarm correlation rules. When the effectiveness of the alarm correlation rules is too low, it is determined that the alarm correlation rules are not applicable to the current network architecture and deleted to avoid the need for Store and maintain too many useless alarm correlation rules to reduce processing pressure.
  • Step 101 analyze the associated alarms of each resource in the resource type relationship tree, and obtain the associated alarms and the associated times of the associated alarms.
  • Step 102 according to the preset correlation strength between resources, obtain the correlation coefficient corresponding to the correlation alarm.
  • Step 103 according to the correlation times and the correlation coefficient, the correlation degree of the correlated alarm is obtained.
  • Step 104 if the degree of association is greater than a first preset threshold, an alarm association rule is generated according to the associated alarm.
  • alarm association rule mining is similar to the above-mentioned method for generating alarm association rules, and will not be repeated here.
  • Step 105 evaluating and updating the effectiveness of the generated alarm association rules, and dynamically maintaining the generated alarm association rules according to the effectiveness.
  • the computer mines a series of alarm correlation rules, it stores the mined alarm correlation rules in the pre-created alarm correlation rule library. For example, the newly mined alarm correlation rules are used as seed alarm correlation rules, into the seed rule base. Then, according to historical alarm slices or real-time alarm messages, the effectiveness of the seed alarm association rules in the seed rule base is evaluated and updated, and when the effectiveness of any seed alarm association rule exceeds the second preset threshold, it is set to Formal alarm correlation rules are added to the formal rule base for storing effective rules; when the effectiveness of any seed alarm correlation rule is less than the third preset threshold, it is directly deleted or removed from the seed rule base.
  • the validity degree of the effective formal alarm association rules will also be evaluated and updated. If the effectiveness of any effective formal alarm association rules is not greater than the second preset threshold, its Set it to an invalid state, remove the official rule base or delete it directly.
  • the effectiveness of the effective alarm correlation rules is ensured, and the alarm correlation rules that are not suitable for the current network architecture are cleared in time , reduce the resource occupation of alarm correlation rule storage, and reduce the number of alarm correlation rules that need to be traversed in the process of correlation alarm analysis, and improve analysis efficiency.
  • FIG. 6 a schematic structural diagram of an alarm association rule mining and maintenance device that implements the above alarm association rule mining and maintenance, including: a relationship tree construction module 601, configured to generate resource associations according to predefined resource association relationships during system initialization. type relationship tree, and set the correlation coefficient on the relationship side of the resource type relationship tree.
  • a relationship tree construction module 601 configured to generate resource associations according to predefined resource association relationships during system initialization. type relationship tree, and set the correlation coefficient on the relationship side of the resource type relationship tree.
  • the alarm slicing module 602 is configured to obtain all the historical alarm information that conforms to the resource association relationship among the alarm information of each resource in the resource instance relationship tree within a preset time, and use the acquired historical alarm information set as an analysis sample, and use the analysis sample Segment by alarm occurrence time to generate continuous time slices containing alarm information.
  • the rule generation module 603 is configured to scan and analyze alarm slices using a correlation analysis algorithm, analyze the correlation and degree of correlation of alarms in terms of time and resources, perform feature extraction on a set of correlated alarms whose correlation degree meets the requirements, and generate an initial state alarm correlation rules.
  • the rule maintenance module 604 is configured to dynamically maintain the generated alarm association rules, evaluate and update the effectiveness of the seed alarm association rules in the initial state by monitoring real-time alarm messages or calling historical alarm information, and in any seed alarm association rules If the effectiveness of any seed alarm association rule is greater than the second preset threshold, it will be set as an effective formal alarm association rule; if the effectiveness of any seed alarm association rule is less than the third preset threshold, it will be deleted. In addition, the effectiveness of the active formal alarm association rules is updated according to real-time alarm messages or historical alarm information, and dynamic maintenance is performed according to the updated effectiveness. The validity of any effective formal alarm association rules is not greater than In the case of the second preset threshold, set it to an invalid state, remove the formal rule base or delete it directly.
  • FIG. 7 Another aspect of the embodiments of the present disclosure relates to an apparatus for generating an alarm association rule, referring to FIG. 7 , including:
  • the acquisition module 701 is configured to analyze the associated alarms of each resource in the resource type relationship tree according to the slice of historical alarms, and acquire associated alarms and associated times of associated alarms.
  • the calculation module 702 is configured to obtain a correlation coefficient corresponding to a correlation alarm according to a preset correlation strength between resources.
  • the determination module 703 is configured to acquire the degree of association of the associated alarms according to the number of association times and the association coefficient.
  • the generating module 704 is configured to generate an alarm association rule according to the associated alarm if the association degree is greater than a first preset threshold.
  • this embodiment is an apparatus embodiment corresponding to the method embodiment, and this embodiment can be implemented in cooperation with the method embodiment.
  • the relevant technical details mentioned in the method embodiments are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the related technical details mentioned in this embodiment can also be applied in the method embodiment.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problems raised by the present disclosure are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • FIG. 8 Another aspect of the embodiments of the present disclosure also provides an electronic device, referring to FIG. 8 , including: including at least one processor 801; Instructions executed by at least one processor 801. The instructions are executed by at least one processor 801, so that at least one processor 801 can execute the method for generating an alarm association rule described in any one of the foregoing method embodiments.
  • the memory 802 and the processor 801 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 801 and various circuits of the memory 802 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 801 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 801 .
  • Processor 801 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 802 may be used to store data used by the processor 801 when performing operations.
  • Another aspect of the embodiments of the present disclosure also provides a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本公开涉及无线通信技术领域,公开了一种告警关联规则生成方法、装置、电子设备和存储介质,方法包括:根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及关联告警的关联次数;根据各资源之间的预设关联强度,获取关联告警对应的关联系数;根据关联次数和关联系数,获取关联告警的关联程度;在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。使得告警关联规则的生成对当前网络架构更加有针对性地同时,也简化了告警关联规则生成的复杂程度,能够简洁高效的生成针对当前网络架构的告警关联规则,进而提升利用告警关联规则进行告警关联分析时的准确性和效率。

Description

告警关联规则生成方法、装置、电子设备和存储介质
相关申请的交叉引用
本公开基于2021年11月25日提交的发明名称为“告警关联规则生成方法、装置、电子设备和存储介质”的中国专利申请CN202111413538.1,并且要求该专利申请的优先权,通过引用将其所公开的内容全部并入本公开。
技术领域
本公开实施例涉及移动通信技术领域,特别涉及一种告警关联规则生成方法、装置、电子设备和存储介质。
背景技术
5G网络依托于CloudNative(云原生)核心思想,通过基于服务的网络架构,结合虚拟化、云化技术,实现了网络的定制化、开放性以及服务化。虚拟化网络将传统电信设备功能通过软件实现,运行于通用硬件设备之上,并采用虚拟化技术实现硬件资源共享。虚拟化网络自下而上分为硬件层、虚拟层、网元层,下层资源是上层资源运行的基础,物理层资源的故障,往往会引起虚拟层资源的故障,而虚拟层资源发生故障,又会导致网元发生故障,最终影响业务的正常处理。所以,当虚拟化网络中一个资源发生告警时,往往会引起多个虚拟资源发生故障,随着资源规模的增长,告警数量也会急速增加。为了高效的解决资源告警问题,分析不同资源告警的关联性找到并处理根告警在5G网络中尤为重要。
当下常用的告警关联分析方法,一个是根据专家知识库的积累,预先定义出告警关联规则,在告警上报时,使用规则引擎依据现有规则计算一定时间切片内告警的关联性。但是,专家经验难以积累,并且生成的告警关联规则的有效程度不稳定。另一个方法是系统使用机器学习和大数据分析进行告警关联规则挖掘,机器学习产生的告警关联规则的实用性较低,并且挖掘出的规则较多,占用大量的系统资源;大数据分析生成的告警关联规则有效性较好,但是生成过程较为复杂。因此,如何简洁高效的建立适配于当下网络结构的告警关联规则依旧是一个急需解决的问题。
发明内容
本公开实施例的主要目的在于提出一种告警关联规则生成方法、装置、电子设备和存储介质,旨在针对当前网络架构,简洁高效的生成能够准确识别告警之间关联关系的告警关联规则,提高告警处理的效率和准确性。
为实现上述目的,本公开实施例提供了一种告警关联规则生成方法,包括:根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及关联告警的关联次数;根据各资源之间的预设关联强度,获取关联告警对应的关联系数;根据关联次数和关联系数,获取关联告警的关联程度;在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。
为实现上述目的,本公开实施例还提出了一种告警关联规则生成装置,包括:获取模块,设置为根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警 及关联告警的关联次数;计算模块,设置为根据各资源之间的预设关联强度,获取关联告警对应的关联系数;确定模块,设置为根据关联次数和关联系数,获取关联告警的关联程度;生成模块,设置为在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。
为实现上述目的,本公开实施例还提出了一种电子设备,设备包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行如上所述的告警关联规则生成方法。
为实现上述目的,本公开实施例还提出了计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现如上所述的告警关联规则生成方法。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。
图1是本公开实施例中的告警关联规则生成方法流程图;
图2是本公开实施例中的5G虚拟化网络资源模型的结构示意图;
图3是本公开实施例中的资源类型关系树的结构示意图;
图4是本公开实施例中的资源实例关系树的结构示意图;
图5是本公开实施例中的告警关联规则挖掘维护方法流程图;
图6是本公开实施例中的一种告警关联规则挖掘维护装置结构示意图;
图7是本公开另一实施例中的告警关联规则生成装置结构示意图;
图8是本公开另一实施例中的电子设备的结构示意图。
具体实施方式
由背景技术可知,在当下较为常用的告警关联规则生成方法中,基于专家经验积累生成告警关联规则时,专家经验难以积累并且生成的告警关联规则的有效程度不稳定;基于机器学习生成的告警关联规则有效性差,且需要占用大量系统资源;基于大数据分析生成告警关联规则时规则生成过程较为复杂。因此,如何简洁高效的建立适配于当下网络结构的告警关联规则,保证告警关联分析效率和准确性是一个迫切需要得到解决的问题。
为了解决上述问题,本公开的实施例提供了一种告警关联规则生成方法,包括:根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及关联告警的关联次数;根据各资源之间的预设关联强度,获取关联告警对应的关联系数;根据关联次数和关联系数,获取关联告警的关联程度;在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。
本公开实施例提供的告警关联规则生成方法,在根据需要进行关联告警分析的资源建立资源类型关系树后,根据一段时间内各资源的告警信息生成历史告警切片,然后根据历史告警切片对各资源进行关联告警分析,获取关联告警和关联告警间的关联次数,结合为各资源间的预设关联强度,获取关联告警的关联系数,根据关联次数和关联系数获取关联告警的关联程度,在关联程度大于第一预设阈值时,根据关联告警生成告警关联规则。通过结合利用历史告警切片对各资源进行关联告警分析的结果和各资源间的预设关联强度,对关联告警的 关联程度进行准确的衡量,进而根据关联程度足够高的关联告警生成告警关联规则,使得告警关联规则的生成更加有针对性地同时,也简化了告警关联规则生成的复杂程度,能够简洁高效的生成针对当前网络架构的告警关联规则,进而提升利用告警关联规则进行告警关联分析时的准确性和效率。
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合附图对本公开的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本公开各实施例中,为了使读者更好地理解本公开而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本公开所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本公开的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
下面将结合具体的实施例的对本公开记载的告警关联规则生成方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。
本公开实施例的第一方面提供了一种告警关联规则生成方法,告警关联规则生成方法的具体流程参考图1,在一些实施例中,告警关联规则生成方法应用于具有通信和分析能力的分析控制终端,如电脑、平板、手机等电子设备,本实施例以应用在电脑中为例进行说明,至少包括但不限于以下步骤:
步骤101,根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及关联告警的关联次数。
具体地说,电脑在进行告警关联规则生成前,先根据虚拟化网络组网架构,界定需要进行告警关联分析的资源类型,并梳理这些资源类型的关联关系,建立资源类型关系树。然后获取一定时长内各资源的历史告警信息,按照时间顺序生成多个历史告警切片,并根据资源类型关系树建立资源实例之间的关系,生成资源实例关系树,结合资源实例关系树,根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取存在关联关系的关联告警及关联告警的关联次数。
例如,一种5G虚拟化网络资源模型示意图如图2所示,最上层的网元层包含了业务网元和虚拟化网络功能,处于中间层的虚拟层包括了主机、虚拟交互机、虚拟数据库、虚拟网络、虚拟主机、虚拟网卡、虚拟磁盘这些虚拟化后的网络资源,最下层的硬件层包括了服务器、系统服务器、交互服务器、交互系统服务器、交换机端口、路由器端口、交换机和路由器这些硬件设施或端口,各个资源根据业务交互或者关联关系进行通信连接。电脑根据网络资源模型,确定待分析的资源范围,梳理资源分层关联关系后,建立的资源类型关系树如图3所示,自上向下分别为业务网元、虚拟主机、主机、服务器以及与服务器并列的磁阵、交换机和路由器。生成的资源类型关系树可以存储在图数据库中,资源类型存储为库中的类型树节点,类型之间的关系存储为树节点之间的关系边。
电脑根据资源类型关系树生成的资源实例关系树如图4所示,资源实例关系树也可存储在图数据库中,资源实例存储为一个实例树节点,实例之间的关系存储为节点之间的关系边。为了后续可快速查找关系,可建立实例与类型的关系边,并以资源id建立索引。资源实例关系树整体结构为不同的路由器通过不同的交换机和至少一个服务器连接,并通过服务器连接相对应的主机,然后各个主机再通过自身连接的虚拟主机和虚拟网络功能相连接。另外,资源实例关系树建立后,还需要实时与资源库保持同步,以实时更新资源实例关系。
电脑在进行告警关联规则生成前,通过资源探针从网元管理系统(Element Management System,简称为EMS)、虚拟基础设施管理器(Virtualized Infrastructure Manager,简称为VIM)和物理基础设施管理器(Physical Infrastructure Manager,简称为PIM)采集到各种资源数据,并存入系统资源库。同时,系统通过告警探针从EMS、VIM和PIM采集告警,格式化处理后,生成告警信息查询快照存入当前告警库和历史告警库。在生成资源实例关系树后,电脑在历史告警数据库中读取一定时长内资源类型关系树中包含的各资源的历史告警信息,并根据告警发生的时间顺序和预设的时间粒度,将历史告警信息切分为多个历史告警切片。利用切片扫描的方式,在资源关系实例关系树中进行历史告警切片扫描,对资源类型关系树中的各资源进行关联告警分析,将同时具有资源相关性和时间相关性的一组告警标识为关联告警,并对具有时间和资源相关性的告警集合出现的次数进行统计,得到存在关联关系的关联告警及关联告警的关联次数。
电脑利用历史告警切片进行关联告警分析的具体方法如下:在生成的资源实例关系树中,从获取到的各历史告警切片中最早的告警切片开始,按照一定的播放速度,逐一播放各历史告警切片,根据各告警切片中的告警信息,对资源实例关系树中的各资源进行告警状态的更新。在多个资源同时发生告警变化的情况下,提取出发生告警变化的资源,在资源实例关系树上查询这些资源是否有资源相关性。例如,资源A和B同时发生告警变化,资源A和资源B在资源实例关系树上有一条关系边、或资源A到资源B在资源实例关系树上有一条可达的路径,且在此切片中在此路径上其他资源未发生的告警变化,则判定两个告警有资源相关性。在同时发生告警变化的资源具有资源相关性的情况下,根据资源类型和资源在资源类型关系树中的层级,将位于下层的资源的告警设置为根告警,位于上层的资源的告警设置为子告警,并将两者设置为关联告警。并对关联告警进行告警关键特征提取,生成关联标记。标记的主要内容为两个告警的资源类型、告警码和关联次数,例如记录为[(资源类型1,告警码1),(资源类型2,告警码2),关联次数n],关联次数初始值为1,每次扫描到关联告警对应特征出现的时候,关联次数加1。一个切片中可能存在多个资源分别在不同时间同时发生告警变化,例如,某一个切片中,资源实例关系树中策略控制功能网元与虚拟主机2、虚拟主机3与主机3、用户面数据功能网元与交换机3、服务器7和交换机3、交换机3和交换机4,这些资源组合或有一条直接相连的关系边,或者在资源实例关系树上有一条可达的路径,且在此切片中此路径上其他资源未同时发生告警变化。此时认为这些告警分别有时间和资源上的相关性。另外,业务发现功能网元和路由器1在本时间切片内也同时产生了告警变化,但是此网元不存在到达路由器1的可到达路径,所以,这条两条告警在本切片没有关联关系。
电脑在当前历史告警切片扫面完成后,自动扫描下一历史告警切片,并对关联告警进行更新,直到完成所有历史告警切片的扫描,获取资源类型关系树中各资源存在关联关系的一个或多个关联告警及各关联告警的关联次数。通过根据历史告警切片对资源类型关系树中各资源进行关联告警分析,从时间关联和资源关联两个维度,准确获取关联告警和关联告警的关联次数,便于后续进行告警关联规则的生成。
步骤102,根据各资源之间的预设关联强度,获取关联告警对应的关联系数。
具体地说,电脑在根据历史告警切片获取关联告警和关联告警的关联次数后,根据关联告警所在的资源的类型,以及各资源之间的预设关联强度,获取关联告警在进行关联程度分析时对应的关联系数。
在一个例子中,电脑根据各资源之间的预设关联强度,获取关联告警对应的关联系数,包括:获取关联告警所在的资源之间的资源连接路径;根据各资源间的预设告警关联强度,获取资源连接路径中各子路径对应的资源间的预设告警关联强度;根据各子路径对应的资源间的预设告警关联强度,获取关联告警对应的关联系数。电脑在获取到关联告警后,在根据资源类型关系树生成的资源实例关系树种,查找关联告警所在资源之间的资源连接路径,然后根据资源连接路径经过的不同资源节点,将其划分为多个子路径,并根据各资源间的预设告警关联强度,获取每一段子路径对应的预设告警关联强度,然后结合资源连接路径上各子路径对应的预设告警关联强度,获取告警资源对应的关联系数。通过根据资源间预设关联强度获取资源连接路径中各阶段的关联强度,结合各阶段的关联强度准确获取关联系数,保证后续关联程度评估的准确性。
例如,电脑在根据资源的业务分层支撑关系以及积累的专家经验,评估不同类型资源的告警关联强度后,预先将资源类型关系树中各资源间的告警关联强度如下:业务网元-虚拟主机:关联强度为0.4;虚拟主机-主机:关联强度为0.8;虚拟主机-虚拟交换机:关联强度为0.5;虚拟主机-磁阵:关联强度为0.6;主机-服务器:关联强度为1;服务器-交换机:关联强度为0.5;磁阵-交换机:关联强度为0.5;交换机-交换机:关联强度为0.5;交换机-路由器:关联强度为0.5。各资源间的告警关联强度可以根据专家经验进行人工设置,也可以由电脑根据专家经验积累自行计算,关联强度可以存储为在资源类型关系边的一个属性,和资源类型关系树一起存储在图数据库中,便于后续查找引用。
电脑在获取关联告警对应的关联系数的时候,先获取关联告警所在的具体资源,例如,在如图4所示的资源实例关系树中,关联告警为[(交换机4,以太网协议down),(用户面数据功能,路由组不可用)],关联告警所在资源为交换机4和用户面数据功能。然后,电脑通过在资源实例关系树中进行资源连接路径查询,获取到交换机4和用户面数据功能之间的连接路径为交换机4——服务器7——主机7——虚拟主机6——用户面数据功能。结合各资源间的预设关联强度,各子路径对应的预设关联强度分别为0.5、1、0.8、0.4,结合各子路径对应的预设关联强度,获取资源连接路径对应的关联系数。例如,以各子路径预设关联强度的乘积作为关联系数,则关联告警[(交换机4,以太网协议down),(用户面数据功能,路由组不可用)]对应的关联系数为(0.5×0.8×0.4×1)=0.16。
进一步地,电脑获取关联告警所在的资源之间的资源连接路径,包括:获取关联告警所在的资源之间的最短资源连接路径。电脑在进行资源连接路径获取的时候,在检测到关联告警所在资源之间存在多条资源连接路径的情况下,选取最短资源连接路径作为计算关联告警对应的关联系数的资源连接路径。另外,电脑还可以分别计算出各资源连接路径对应的关联系数,根据关联系数的大小选取一条资源连接路径,例如,将对应关联系数最大或最小的连接路径作为选中的资源连接路径。通过在多个备选的资源连接路径中选取最短资源连接路径作为计算关联告警对应关联系数的资源连接路径,避免后续对关联告警的关联程度评估过低,导致告警规则的漏生成。
步骤103,根据关联次数和关联系数,获取关联告警的关联程度。
具体地说,电脑在获取到关联告警、关联告警的关联次数和关联告警对应的关联系数后,对关联告警中包含的告警的具体关联程度进行评估,对于每一个关联告警,根据关联次数和关联系数,获取关联告警的关联程度。通过根据关联次数和关联系数确定关联告警的关联程 度,在告警发生概率和告警关联概率两个层次上对关联告警的关联程度进行综合分析,准确获取关联告警的关联程度。
例如,电脑可以通过根据关联告警的关联次数和关联告警对应的关联系数的乘积,计算关联告警的关联值,并通过关联值对关联告警的关联程度进行衡量。在如图4所示的资源实例关系树中,获取到的关联告警为[(交换机4,以太网协议down),(用户面数据功能,路由组不可用)],关联次数为20的情况下,关联告警的关联值=20×0.16=3.2,则关联告警[(交换机4,以太网协议down),(用户面数据功能,路由组不可用)]对应的关联程度为3.2。
步骤104,在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。
具体地说,电脑在获取关联告警的关联程度后,对获取到的关联程度是否满足第一预设阈值进行检测,在关联告警的关联程度大于第一预设阈值的情况下,判定关联告警的关联程度足够高,关联告警之间的相关性显著,然后根据关联告警生成告警关联规则。
在一个例子中,电脑根据关联告警生成告警关联规则,包括:获取关联告警中根告警对应的第一告警关键特征和子告警对应的第二告警关键特征;其中,告警关键特征包括资源类型和告警码;根据第一告警关键特征和第二告警关键特征,生成告警关联规则;告警关联规则包括:在检测到存在告警关键特征分别为第一告警关键特征和第二告警关键特征的多个告警的情况下,将告警关键特征为第一告警关键特征的告警标记为根告警,将告警关键特征为第二告警关键特征的告警标记为子告警,根告警与子告警为关联告警。电脑在检测到关联告警的关联程度大于第一预设阈值的情况下,启动告警关联规则生成进程,对关联告警进行告警关键特征提取,获取关联告警中根告警对应的资源类型1和告警码1以及子告警对应的资源类型2和告警码2。根据获取到的告警关键特征,生成在检测到存在告警关键特征分别为第一告警关键特征和第二告警关键特征的多个告警的情况下,将告警关键特征为第一告警关键特征的告警标记为根告警,将告警关键特征为第二告警关键特征的告警标记为子告警,根告警与子告警为关联告警的告警关联规则。通过特征提取的方式,准确的根据关联程度足够强的关联告警对应的告警关联规则,便于后续利用告警关联规则准确的对关联告警进行识别,提高关联告警识别的效率和准确性。在一个例子中,电脑在根据关联告警生成告警关联规则后,还包括:根据实时告警消息或历史告警切片,对告警关联规则的有效程度进行更新;在告警关联规则的有效程度大于第二预设阈值的情况下,生效告警关联规则。电脑在生成告警关联规则后,可以先将生成的初始告警关联规则存入种子告警关联规则集合中或者设置为种子告警关联规则,然后根据实时告警消息或者历史告警切片中的告警信息,对告警关联规则实际应用过程中的有效程度进行评估,并根据评估结果对告警关联规则的有效程度进行更新。然后按照预设时长或者每次更新有效程度后,对告警关联规则的有效程度是否大于第二预设阈值进行检测,在检测到告警关联规则的有效程度大于第二预设阈值的情况下,将被设置为种子告警关联规则或存储在种子告警关联规则集合中的告警关联规则,移入正式告警关联规则集合中或者设置为正式告警关联规则,即将告警关联规则生效,开始利用其对网络架构中发生的告警进行关联告警分析。
在另一个例子中,电脑根据实时告警消息或历史告警切片,对告警关联规则的有效程度进行更新,包括:根据实时告警消息或历史告警切片中的告警信息,确定告警状态发生变化的告警;其中,告警状态包括上报告警和告警恢复;在告警属于告警关联规则对应的关联告警的情况下,检测预设时间区间内是否上报了关联告警的告警状态变化;在未上报关联告警 的告警状态变化的情况下,下调告警关联规则的有效程度;在上报了关联告警的告警状态变化的情况下,根据关联告警的告警状态变化的上报顺序,调整告警关联规则的有效程度。电脑在对告警关联规则的有效程度进行调整的时候,获取实时告警消息或者历史告警切片中的告警信息,通过对告警信息的分析,获取发生上报告警和/或告警恢复的告警所属资源和告警码。检测获取到的告警是否属于告警关联规则对应的关联告警,在获取到的告警属于告警关联规则对应的关联告警的情况下,进一步检测预设时间区间内是否上报了关联告警的告警状态变化。即,在获取到的是上报告警消息的情况下,检测上报告警消息发生前和发生后一定时长内,是否发生了关联告警上报;在获取到的是告警恢复消息的情况下,检测告警恢复消息发生前和发生后一定时长内,是否发生了关联告警恢复。在检测到预设时间区间内未上报关联告警的告警状态变化的情况下,下调告警关联规则的有效程度;在检测到预设时间区间内上报了关联告警的告警状态变化的情况下,根据关联告警的告警状态变化的上报顺序,调整告警关联规则的有效程度。通过利用历史告警切片或者实时告警消息对告警关联规则的有效程度进行进一步核验,实现对告警关联规则对当前网络架构的适用性的准确衡量,便于告警关联规则生效后,系统能够及时高效的分析出当前网络架构中的关联告警。
电脑对告警关联规则的有效程度进行衡量的时候,可以为每个告警关联规则预设一个有效分数,每次上调有效程度时加一分,下调有效程度时减一分,预设的有效分数可以是固定值,也可以是根据关联告警关联程度设置的分数,分数调整步长也可以自行设置。
进一步地,电脑根据关联告警的告警状态变化的上报顺序,调整告警关联规则的有效程度,包括:在关联告警中的告警状态变化上报顺序为根告警状态变化上报后,再上报子告警状态变化的情况下,上调告警关联规则的有效程度;在关联告警中的告警状态变化上报顺序为子告警状态变化上报后,再上报根告警状态变化的情况下,下调告警关联规则的有效程度。电脑在检测到预设时间窗口内上报了关联告警的告警状态变化的情况下,对告警状态变化的上报顺序进行检测,检测到告警状态变化上报的顺序为关联告警中的根告警状态变化上报后,再上报子告警状态变化的情况下,判定告警关联规则有效,上调告警关联规则的有效程度;在检测到告警状态变化上报的顺序为关联告警中的子告警状态变化上报后,再上报根告警状态变化的情况下,判定告警关联规则无效,下调告警关联规则的有效程度。通过对关联告警中根告警状态变化和子告警状态变化的顺序进行检测,准确的对告警关联规则的有效性进行检测和调整,进而准确完成告警关联规则有效程度的核验。
在另一例子中,电脑在生效告警关联规则后,还包括:根据实时告警消息,对生效后的告警关联规则的有效程度进行更新;在告警关联规则的有效程度不大于第二预设阈值的情况下,失效告警关联规则。电脑在告警关联规则生效后,采用和上述有效程度类似的核验方式,根据当前网络架构中的实时告警消息,对生效的告警关联规则的有效程度进行实时更新,即,对被设置为正式告警关联规则或存储在正式告警关联规则集合中的规则进行有效程度更新,并在每次更新后或者间隔预设时长后,对生效中的告警关联规则的有效程度和第二预设阈值的关系进行检测,在告警关联规则的有效程度不大于第二预设阈值的情况下,将告警关联规则失效,即,将正式告警关联规则设置为失效规则、重新设置为种子告警关联规则或者直接移出正式告警关联规则集合。电脑失效告警关联规则时,可以将告警关联规则从生效规则集合中移除或者将其设置为失效状态,本实施例对此不做限制。通过在应用告警关联规则后,实时更新告警关联规则的有效程度和生效状态,避免生效规则中存在大量不适用于当前网络 架构的规则,保证生效中的告警关联规则能够较好的适用于当前网络架构,保证告警关联分析的准确高效。
另外,在将告警关联规则失效后,还可以将失效后的告警关联规则加入其余告警分析规则的存储集合,例如,潮汐式业务关联告警影响分析等,本实施例对此不做限制。
在另一个例子中,电脑在根据实时告警消息或历史告警切片,对告警关联规则的有效程度进行更新后,还包括:在告警关联规则的有效程度小于第三预设阈值的情况下,删除告警关联规则;其中,第三预设阈值小于第二预设阈值。电脑在对告警关联规则的有效程度进行更新后,还会对告警关联规则的有效程度是否满足低于第二预设阈值的第三预设阈值进行检测,并删除有效程度低于第三预设阈值的告警关联规则。通过设置一个第三预设阈值,对告警关联规则的有效性设置一个下限,在告警关联规则的有效性太低的时候,判定告警关联规则完全不适用于当前网络架构,将其删除,避免需要存储和维护过多无用的告警关联规则,减轻处理压力。
综上,电脑根据告警信息进行告警关联规则挖掘维护的流程可以参考图5,至少包括但不限于以下步骤:
步骤101,根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及关联告警的关联次数。
步骤102,根据各资源之间的预设关联强度,获取关联告警对应的关联系数。
步骤103,根据关联次数和关联系数,获取关联告警的关联程度。
步骤104,在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。
告警关联规则挖掘的具体实现与上述的告警关联规则生成方法类似,在此就不再赘述。
步骤105,对生成的告警关联规则进行有效程度评估和更新,根据有效程度动态维护生成的告警关联规则。
具体地说,电脑在挖掘出一系列的告警关联规则后,将挖掘出的告警关联规则存入预先创建的告警关联规则库,例如,将新挖掘出的告警关联规则作为种子告警关联规则,存入种子规则库。然后根据历史告警切片或者实时告警消息,对种子规则库中的种子告警关联规则进行有效程度评估和更新,在任一种子告警关联规则的有效程度超过第二预设门限的情况下,将其设置为正式告警关联规则,并加入到存储生效规则的正式规则库中;在任一种子告警关联规则的有效程度小于第三预设门限的情况下,直接将其删除或者移出种子规则库。
在生效种子告警关联规则后,还会对生效中的正式告警关联规则进行有效程度评估和更新,在任一生效中的正式告警关联规则的有效程度不大于第二预设门限的情况下,将其设置为失效状态,移除正式规则库或者直接删除。通过根据实时告警消息或者历史告警信息对生成的告警关联规则和生效中的告警关联规则进行动态维护,保证生效中的告警关联规则的有效性,并且通过及时清除不适应当下网络架构的告警关联规则,减少告警关联规则存储的资源占用,以及减少关联告警分析的过程中需要遍历的告警关联规则数量,提高分析效率。
另外,一种实现上述告警关联规则挖掘维护的告警关联规则挖掘维护装置的结构示意图参考图6,包括:关系树构造模块601,设置为在系统初始化时,根据预先定义的资源关联关系,生成资源类型关系树,并在资源类型关系树的关系边上设置关联系数。通过查询资源库监听资源库的变更消息,将资源库中复杂的资源关系进行精简,只提取需要分析的资源类型的实例,组织告警关联规则挖掘过程中使用的资源实例关系树。
告警切片模块602,设置为获取预设时间内,资源实例关系树中各资源的告警信息中,所有符合资源关联关系的历史告警信息,将获取到的历史告警信息集合作为分析样本,将分析样本按告警发生时间进行分割,生成包含告警信息的连续时间切片。
规则生成模块603,设置为使用关联分析算法对告警切片进行扫描分析,分析告警在时间和资源上的相关性以及关联程度,对关联程度符合要求的一组关联告警集合进行特征提取,生成初始状态的告警关联规则。
规则维护模块604,设置为对生成的告警关联规则进行动态维护,通过监听实时告警消息或者调用历史告警信息,对初始状态的种子告警关联规则的有效程度进行评估和更新,在任一种子告警关联规则的有效程度大于第二预设门限的情况下,将其设置为生效的正式告警关联规则;在任一种子告警关联规则的有效程度小于第三预设门限的情况下,将其删除。并且还根据听实时告警消息或历史告警信息对生效中的正式告警关联规则的有效程度进行更新,并根据更新后的有效程度进行动态维护,在任一生效中的正式告警关联规则的有效程度不大于第二预设门限的情况下,将其设置为失效状态,移除正式规则库或者直接删除。
本公开实施例的另一方面涉及一种告警关联规则生成装置,参考图7,包括:
获取模块701,设置为根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及关联告警的关联次数。
计算模块702,设置为根据各资源之间的预设关联强度,获取关联告警对应的关联系数。
确定模块703,设置为根据关联次数和关联系数,获取关联告警的关联程度。
生成模块704,设置为在关联程度大于第一预设阈值的情况下,根据关联告警生成告警关联规则。
不难发现,本实施例为与方法实施例相对应的装置实施例,本实施例可与方法实施例互相配合实施。方法实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在方法实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本公开的创新部分,本实施例中并没有将与解决本公开所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本公开实施例的另一方面还提供了一种电子设备,参考图8,包括:包括至少一个处理器801;以及,与至少一个处理器801通信连接的存储器802;其中,存储器802存储有可被至少一个处理器801执行的指令,指令被至少一个处理器801执行,以使至少一个处理器801能够执行上述任一方法实施例所描述的告警关联规则生成方法。
其中,存储器802和处理器801采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器801和存储器802的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器801处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传输给处理器801。
处理器801负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口, 电压调节、电源管理以及其他控制功能。而存储器802可以被用于存储处理器801在执行操作时所使用的数据。
本公开实施例的另一方面还提供了一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施例是实现本公开的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本公开的精神和范围。

Claims (12)

  1. 一种告警关联规则生成方法,包括:
    根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及所述关联告警的关联次数;
    根据各所述资源之间的预设关联强度,获取所述关联告警对应的关联系数;
    根据所述关联次数和所述关联系数,获取所述关联告警的关联程度;
    在所述关联程度大于第一预设阈值的情况下,根据所述关联告警生成告警关联规则。
  2. 根据权利要求1所述的告警关联规则生成方法,其中,所述根据各所述资源之间的预设关联强度,获取所述关联告警对应的关联系数,包括:
    获取所述关联告警所在的所述资源之间的资源连接路径;
    根据各资源间的所述预设告警关联强度,获取所述资源连接路径中各子路径对应的资源间的所述预设告警关联强度;
    根据所述各子路径对应的资源间的所述预设告警关联强度,获取所述关联告警对应的所述关联系数。
  3. 根据权利要求2所述的告警关联规则生成方法,其中,所述获取所述关联告警所在的所述资源之间的资源连接路径,包括:
    获取所述关联告警所在的所述资源之间的最短资源连接路径。
  4. 根据权利要求1所述的告警关联规则生成方法,其中,在所述根据所述关联告警生成告警关联规则后,还包括:
    根据实时告警消息或所述历史告警切片,对所述告警关联规则的有效程度进行更新;
    在所述告警关联规则的所述有效程度大于第二预设阈值的情况下,生效所述告警关联规则。
  5. 根据权利要求4所述的告警关联规则生成方法,其中,所述根据实时告警消息或所述历史告警切片,对所述告警关联规则的有效程度进行更新,包括:
    根据所述实时告警消息或所述历史告警切片中的告警信息,确定告警状态发生变化的告警;其中,所述告警状态包括上报告警和告警恢复;
    在所述告警属于所述告警关联规则对应的关联告警的情况下,检测预设时间区间内是否上报了所述关联告警的告警状态变化;
    在未上报所述关联告警的告警状态变化的情况下,下调所述告警关联规则的所述有效程度;
    在上报了所述关联告警的告警状态变化的情况下,根据所述关联告警的告警状态变化的上报顺序,调整所述告警关联规则的所述有效程度。
  6. 根据权利要求5所述的告警关联规则生成方法,其中,所述根据所述关联告警的告警状态变化的上报顺序,调整所述告警关联规则的所述有效程度,包括:
    在所述关联告警中的告警状态变化上报顺序为根告警状态变化上报后,再上报子告警状态变化的情况下,上调所述告警关联规则的所述有效程度;
    在所述关联告警中的告警状态变化上报顺序为所述子告警状态变化上报后,再上报所述根告警状态变化的情况下,下调所述告警关联规则的所述有效程度。
  7. 根据权利要求4所述的告警关联规则生成方法,其中,在所述生效所述告警关联规则 后,还包括:
    根据所述实时告警消息,对生效后的所述告警关联规则的所述有效程度进行更新;
    在所述告警关联规则的所述有效程度不大于所述第二预设阈值的情况下,失效所述告警关联规则。
  8. 根据权利要求4至7中任一项所述的告警关联规则生成方法,其中,在所述根据实时告警消息或所述历史告警切片,对所述告警规则的有效程度进行更新后,还包括:
    在所述告警关联规则的所述有效程度小于所述第三预设阈值的情况下,删除所述告警关联规则;其中,所述第三预设阈值小于所述第二预设阈值。
  9. 根据权利要求1所述的告警关联规则生成方法,其中,所述根据所述关联告警生成告警关联规则,包括:
    获取所述关联告警中根告警对应的第一告警关键特征和子告警对应的第二告警关键特征;其中,告警关键特征包括资源类型和告警码;
    根据所述第一告警关键特征和所述第二告警关键特征,生成所述告警关联规则;
    所述告警关联规则包括:在检测到存在告警关键特征分别为所述第一告警关键特征和所述第二告警关键特征的多个告警的情况下,将所述告警关键特征为所述第一告警关键特征的告警标记为所述根告警,将所述告警关键特征为所述第二告警关键特征的告警标记为所述子告警,所述根告警与所述子告警为关联告警。
  10. 一种告警关联规则生成装置,包括:
    获取模块,设置为根据历史告警切片,对资源类型关系树中的各资源进行关联告警分析,获取关联告警及所述关联告警的关联次数;
    计算模块,设置为根据各所述资源之间的预设关联强度,获取所述关联告警对应的关联系数;
    确定模块,设置为根据所述关联次数和所述关联系数,获取所述关联告警的关联程度;
    生成模块,设置为在所述关联程度大于第一预设阈值的情况下,根据所述关联告警生成告警关联规则。
  11. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至9中任意一项所述的告警关联规则生成方法。
  12. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的告警关联规则生成方法。
PCT/CN2022/130711 2021-11-25 2022-11-08 告警关联规则生成方法、装置、电子设备和存储介质 WO2023093527A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111413538.1 2021-11-25
CN202111413538.1A CN116170281A (zh) 2021-11-25 2021-11-25 告警关联规则生成方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023093527A1 true WO2023093527A1 (zh) 2023-06-01

Family

ID=86416919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130711 WO2023093527A1 (zh) 2021-11-25 2022-11-08 告警关联规则生成方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN116170281A (zh)
WO (1) WO2023093527A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118430202B (zh) * 2024-07-03 2024-09-17 易联云计算(杭州)有限责任公司 基于历史快照归集的告警阈值迭代系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414933A (zh) * 2007-10-15 2009-04-22 中兴通讯股份有限公司 一种告警相关性信息的处理方法及装置
US20100332918A1 (en) * 2009-06-30 2010-12-30 Alcatel-Lucent Canada Inc. Alarm correlation system
CN106250288A (zh) * 2016-07-29 2016-12-21 浪潮软件集团有限公司 一种基于数据挖掘的根告警分析识别方法
CN112787860A (zh) * 2020-12-30 2021-05-11 广东电网有限责任公司电力调度控制中心 一种根告警分析识别方法及装置
CN113486192A (zh) * 2021-07-06 2021-10-08 中国建设银行股份有限公司 一种告警聚合方法及相关设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414933A (zh) * 2007-10-15 2009-04-22 中兴通讯股份有限公司 一种告警相关性信息的处理方法及装置
US20100332918A1 (en) * 2009-06-30 2010-12-30 Alcatel-Lucent Canada Inc. Alarm correlation system
CN106250288A (zh) * 2016-07-29 2016-12-21 浪潮软件集团有限公司 一种基于数据挖掘的根告警分析识别方法
CN112787860A (zh) * 2020-12-30 2021-05-11 广东电网有限责任公司电力调度控制中心 一种根告警分析识别方法及装置
CN113486192A (zh) * 2021-07-06 2021-10-08 中国建设银行股份有限公司 一种告警聚合方法及相关设备

Also Published As

Publication number Publication date
CN116170281A (zh) 2023-05-26

Similar Documents

Publication Publication Date Title
US11706079B2 (en) Fault recovery method and apparatus, and storage medium
EP3968243A1 (en) Method and apparatus for realizing model training, and computer storage medium
US8583779B2 (en) Root cause analysis approach with candidate elimination using network virtualization
US20150095338A1 (en) Systems and methods for categorizing exceptions and logs
CN106815125A (zh) 一种日志审计方法及平台
US10884805B2 (en) Dynamically configurable operation information collection
CN112769605A (zh) 一种异构多云的运维管理方法及混合云平台
US11706114B2 (en) Network flow measurement method, network measurement device, and control plane device
CN110489317B (zh) 基于工作流的云系统任务运行故障诊断方法与系统
WO2023040259A1 (zh) 资源告警分析方法、装置、电子设备和存储介质
WO2023093527A1 (zh) 告警关联规则生成方法、装置、电子设备和存储介质
CN114465874A (zh) 故障预测方法、装置、电子设备与存储介质
CN115333966B (zh) 一种基于拓扑的Nginx日志分析方法、系统及设备
WO2019109961A1 (zh) 故障诊断方法及装置
JPWO2015182629A1 (ja) 監視システム、監視装置及び監視プログラム
US20230106935A1 (en) Network probe placement optimization
CN115686381B (zh) 存储集群运行状态的预测方法及装置
CN117914511A (zh) 一种基于数据交换、日志分析的安全审计系统
CN112039907A (zh) 一种基于物联网终端评测平台的自动测试方法及系统
CN116662127A (zh) 一种设备告警信息分类并预警的方法、系统、设备和介质
CN116643937A (zh) 数据日志的图像分析
WO2022149149A1 (en) Artificial intelligence with dynamic causal model for failure analysis in mobile communication network
CN118509336B (zh) 一种考虑电力消耗的通信网络优化方法、装置及设备
CN116471066B (zh) 一种基于流量探针的流量分析方法
CN112291804B (zh) 一种5g网络切片下噪声网络的服务故障诊断方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE