CN114615130B - Method and device for managing life cycle of infrastructure alarm rule - Google Patents

Method and device for managing life cycle of infrastructure alarm rule Download PDF

Info

Publication number
CN114615130B
CN114615130B CN202210511659.8A CN202210511659A CN114615130B CN 114615130 B CN114615130 B CN 114615130B CN 202210511659 A CN202210511659 A CN 202210511659A CN 114615130 B CN114615130 B CN 114615130B
Authority
CN
China
Prior art keywords
infrastructure
alarm rule
alarm
state change
rule template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210511659.8A
Other languages
Chinese (zh)
Other versions
CN114615130A (en
Inventor
刘汉文
艾辉
李大海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhizhe Sihai Beijing Technology Co ltd
Original Assignee
Zhizhe Sihai Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhizhe Sihai Beijing Technology Co ltd filed Critical Zhizhe Sihai Beijing Technology Co ltd
Priority to CN202210511659.8A priority Critical patent/CN114615130B/en
Publication of CN114615130A publication Critical patent/CN114615130A/en
Application granted granted Critical
Publication of CN114615130B publication Critical patent/CN114615130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/082Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality

Abstract

The invention provides a life cycle management method and a life cycle management device for an infrastructure alarm rule, wherein the method comprises the following steps: monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd; if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template. The invention adaptively updates the alarm rules, so that the life cycle of each alarm rule is adaptive to the life cycle of the corresponding infrastructure or alarm rule template, and the efficient and non-perceptual alarm rule life cycle management is completed.

Description

Method and device for managing life cycle of infrastructure alarm rule
Technical Field
The invention relates to the technical field of data monitoring, in particular to a life cycle management method and device for an infrastructure alarm rule.
Background
In the current community platform, a great number of services, such as a question and answer service, a video service, and the like, need to be provided to the outside. In the existing service, an index observation is usually performed on the service based on a monitoring manner, such as an order completion failure rate. Because the data system in the current platform is huge, the number of indexes constructed for index monitoring of various services can reach hundreds of millions of levels. Because the number of indexes to be monitored is extremely large and the change frequency of the infrastructure is high, how to monitor the index data of all the infrastructures in the platform and maintain the life cycle of the alarm service becomes a complicated and important problem. The life cycle of the alarm service needs to be consistent with that of the corresponding infrastructure, namely, when the infrastructure goes online, the alarm service corresponding to the infrastructure should be added immediately, and when the infrastructure goes offline, the alarm service corresponding to the infrastructure should be stopped.
However, if the infrastructure side actively reports the alarm rule to add a corresponding alarm service in time when the infrastructure comes online, the alarm service of the offline infrastructure may not be recovered, resources may not be reasonably covered, and the alarm message may be inaccurate. In addition, the industry mostly adopts a strategy of regularly polling the database, and discovers the online and offline of the infrastructure in a polling mode, so as to manage the life cycle of the corresponding alarm service. However, this strategy is more suitable for a scenario with a small amount of data, which may result in a large amount of occupied Memory once the amount of data is large, and may also result in a serious Memory overflow error if the container Memory reaches the limit. Even if a distributed method or a master-worker mode is adopted to overcome the problem of excessive memory occupation, more unavailable risks are increased, which is contrary to the design principle that an alarm system avoids using additional components as much as possible and reduces risk points. Furthermore, introducing other components, such as kafka messaging middleware, is relatively difficult for risk and troubleshooting, and is difficult to apply in industrial production environments.
Therefore, a method for managing the life cycle of the corresponding alarm service efficiently and unconsciously when the infrastructure goes on and off is needed.
Disclosure of Invention
The invention provides a life cycle management method and device of an infrastructure alarm rule, which are used for solving the defects of inaccurate and low-efficiency alarm service life cycle management in the prior art.
The invention provides a life cycle management method of an infrastructure alarm rule, which comprises the following steps:
monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd;
if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template;
if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
According to the method for managing the life cycle of the infrastructure alarm rule provided by the invention, the updating of the alarm rule is carried out based on the infrastructure and/or the alarm rule template with the changed state, so that the life cycle of each alarm rule is adapted to the life cycle of the infrastructure and/or the alarm rule template corresponding to the alarm rule, and then the method further comprises the following steps:
monitoring the update event of each alarm rule in the etcd in real time based on a hook function provided by kubernets;
if an update event of the alarm rule in the etcd is monitored, updating the alarm rule stored in the memory cache based on the update event of the alarm rule; in an initial state, pulling all alarm rules stored in the etcd and storing the alarm rules in the memory cache;
and acquiring the alarm rule stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rule stored in the memory cache.
According to the life cycle management method of the infrastructure alarm rule provided by the invention, the updating of the alarm rule stored in the memory cache based on the updating event of the alarm rule specifically comprises the following steps:
determining an updated alarm rule to occur based on an update event of the alarm rule;
updating the consumption queue established in the memory cache by utilizing a callback function based on the updated alarm rule; and the consumption queue comprises alarm rules corresponding to current infrastructure.
According to the method for managing the life cycle of the infrastructure alarm rule provided by the invention, the alarm rule stored in the memory cache is acquired, and the index data of the corresponding infrastructure is checked based on the alarm rule stored in the memory cache, and the method specifically comprises the following steps:
and sequentially reading the alarm rules in the consumption queue, and checking the index data of the corresponding infrastructure based on the read alarm rules.
According to the method for managing the life cycle of the infrastructure alarm rule provided by the invention, if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, the infrastructure and/or the alarm rule template with the state change is determined, and the method specifically comprises the following steps:
when the state change of the infrastructure and/or the state change of the alarm rule template are monitored, acquiring label information of each infrastructure and/or each alarm rule template before and after the state change; each infrastructure and each alarm rule template are preset with corresponding label information;
determining an infrastructure in which the state change occurs based on a difference between tag information of each infrastructure before and after the state change;
and/or determining the alarm rule template with the changed state based on the difference between the label information of each alarm rule template before and after the state change.
According to the method for managing the life cycle of the infrastructure alarm rule provided by the invention, if the condition change of the infrastructure is monitored to be the newly added infrastructure, the updating of the alarm rule is carried out based on the infrastructure and/or the alarm rule template with the condition change, and the method specifically comprises the following steps:
determining an alarm rule template corresponding to the newly added infrastructure based on the infrastructure type of the newly added infrastructure and the alarm rule templates corresponding to the infrastructure types;
and generating an alarm rule corresponding to the newly added infrastructure based on the alarm rule template corresponding to the newly added infrastructure, and adding the alarm rule corresponding to the newly added infrastructure into the existing alarm rule set.
According to the method for managing the life cycle of the infrastructure alarm rule provided by the invention, if the condition change of the infrastructure is monitored to be the deletion of the infrastructure, the alarm rule is updated based on the infrastructure and/or the alarm rule template with the condition change, and the method specifically comprises the following steps:
determining an alarm rule corresponding to the deleted infrastructure in an existing alarm rule set based on the type of the deleted infrastructure;
and deleting the alarm rule corresponding to the deleted infrastructure from the existing alarm rule set.
The invention also provides a life cycle management device of the infrastructure alarm rule, which comprises the following components:
the state change monitoring unit is used for monitoring the state change of each infrastructure and the state change of the alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd;
the alarm rule updating unit is used for determining the infrastructure and/or the alarm rule template with the changed state if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure is monitored, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the changed state so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template;
if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for lifecycle management of infrastructure alarm rules as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for lifecycle management of infrastructure alarm rules as described in any one of the above.
The present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method for lifecycle management of infrastructure alarm rules as described in any one of the above.
The life cycle management method and the life cycle management device of the infrastructure alarm rule provided by the invention monitor the state change of each infrastructure and the state change of the alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd, when the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or alarm rule template with the changed state, can automatically sense the change and adaptively update the alarm rule when the infrastructure or the alarm rule template changes, the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure or alarm rule template, and efficient and sensorless alarm rule life cycle management is completed.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow diagram of a method for lifecycle management of infrastructure alarm rules provided by the present invention;
FIG. 2 is a schematic diagram illustrating the operation of the watch mechanism provided by the present invention;
FIG. 3 is a schematic diagram of an alarm rule adaptive update process provided by the present invention;
FIG. 4 is a schematic diagram of a life cycle management device for an infrastructure alarm rule provided by the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Fig. 1 is a schematic flowchart of a life cycle management method for an infrastructure alarm rule according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 110, monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd;
step 120, if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change, so that the life cycle of each alarm rule is adapted to the life cycle of the infrastructure and/or the alarm rule template corresponding to the alarm rule;
if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
In particular, etcd may be used as a persistent database for the platform. Among them, etcd is a highly available key value storage system, mainly used for shared key value warehouse and service discovery. The etcd processes log replication through a Raft consistency algorithm to ensure strong consistency, also provides functions of data TTL invalidation, data change monitoring, multi-value, directory monitoring, distributed lock atom operation and the like, and can conveniently track and manage the state of cluster nodes. The watch mechanism of the etcd can monitor the change of the key value pair of the specified key or the specified key prefix, so the embodiment of the invention uses the mechanism of the etcd to monitor the state change of the infrastructure, and the operation process of the watch mechanism is as shown in fig. 2.
Because the types of infrastructures inside the platform are various and include but are not limited to tidb, redis, pulsar, kafka and the like, in order to normally manage the life cycle of the corresponding alarm service, the infrastructure + template mode can be adopted for global management. Specifically, as shown in fig. 3, the state change of each infrastructure and the state change of the alarm rule template corresponding to each infrastructure type can be monitored in real time through the watch mechanism of etcd. The state change of the infrastructure comprises addition and deletion, and the state change of the alarm rule template comprises addition, modification and deletion.
If the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, the infrastructure and/or the alarm rule template with the state change can be determined by using a watch mechanism of the etcd. Here, the prefixes stored in the etcd by the infrastructures of the same type are the same, and the prefixes of the alarm rule templates belonging to the same infrastructure type are also the same in the etcd. Thus, based on the watch mechanism, a change in the status of an infrastructure of a certain infrastructure type or a change in the status of an alarm rule template of a certain infrastructure type can be monitored instantaneously. Based on the difference between the infrastructure and/or alarm rule templates of the infrastructure type before and after the state change, the infrastructure and/or alarm rule template with the state change can be located.
Based on the infrastructure and/or the alarm rule template with the state change, the alarm rule can be updated, so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template. The alarm rules corresponding to the infrastructure are used for the inspection engine layer to inspect the index data of the infrastructure so as to alarm in time.
Here, if the state change of any infrastructure or any alarm rule template is newly increased, the update operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification. Namely, when the infrastructure is newly added, the alarm rule corresponding to the infrastructure is newly added; when an infrastructure is deleted, all alarm rules corresponding to that infrastructure are deleted. When an alarm rule template corresponding to a certain infrastructure type is newly added, alarm rules corresponding to the alarm rule template are newly added for all infrastructures of the infrastructure type in the platform; when an alarm rule template corresponding to a certain infrastructure type is deleted, all alarm rules corresponding to the alarm rule template are deleted aiming at all infrastructures of the infrastructure type in the platform; when the alarm rule template corresponding to a certain infrastructure type is modified, all the alarm rules corresponding to the alarm rule template are modified one by one aiming at all the infrastructures of the infrastructure type in the platform.
By the updating mode of the alarm rules, when the infrastructure or the alarm rule template changes, the change can be automatically sensed and the alarm rules can be adaptively updated, so that the life cycle of each alarm rule is adaptive to the life cycle of the corresponding infrastructure or alarm rule template, the coverage accuracy of the total station service is ensured to be 100%, and the efficient and imperceptible alarm rule life cycle management is completed.
The method provided by the embodiment of the invention monitors the state change of each infrastructure and the state change of the alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd, when the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, the infrastructure and/or the alarm rule template with the state change is determined, and updating the alarm rules based on the infrastructure and/or alarm rule template with the changed state, can automatically sense the change and adaptively update the alarm rule when the infrastructure or the alarm rule template changes, the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure or alarm rule template, and efficient and sensorless alarm rule life cycle management is completed.
Based on the above embodiment, the updating of the alarm rules based on the infrastructure and/or alarm rule template having the state change, so that the life cycle of each alarm rule is adapted to the life cycle of the infrastructure and/or alarm rule template corresponding to the life cycle, and then further includes:
monitoring the update event of each alarm rule in the etcd in real time based on a hook function provided by kubernets;
if an update event of the alarm rule in the etcd is monitored, updating the alarm rule stored in the memory cache based on the update event of the alarm rule; in an initial state, pulling all alarm rules stored in the etcd and storing the alarm rules in the memory cache;
and acquiring the alarm rule stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rule stored in the memory cache.
Specifically, after the alarm rule is updated, the index data of the corresponding infrastructure may be checked based on the updated alarm rule to alarm in time. However, the frequency of infrastructure changes within the platform is very high, and therefore the frequency of updating the alarm rules is also very high. The updating modes of the alarm rules are all executed in the persistent layer of the etcd, so that the latest alarm rules are stored in the persistent layer of the etcd after the alarm rules of the infrastructure with the changed states are added or deleted. Before the data inspection is carried out by the inspection engine layer, the latest alarm rule needs to be called, but the read-write efficiency of the disk is low, and the working efficiency of the inspection engine layer is greatly reduced under the condition that the alarm rule is frequently updated.
In this regard, embodiments of the present invention utilize the informer mechanism of kubernets to overcome the above-mentioned problems associated with too high a frequency of infrastructure changes. The agent mechanism can listen to various events (such as creation and deletion) of the resource object and then trigger a callback function, so that corresponding logic processing can be performed when various events occur.
Specifically, the update events of each alarm rule in the etcd can be monitored in real time based on the hook function provided by kubernets. When the alarm rule update event in the etcd is monitored, the alarm rule with the update and the corresponding update mode (addition, deletion or modification) can be obtained based on the alarm rule update event, and the alarm rule stored in the memory cache is updated, so that the alarm rule stored in the memory cache is consistent with the alarm rule in the etcd, namely the latest alarm rule. In an initial state, that is, when the program just starts to run, all the alarm rules stored in the etcd may be first pulled and stored in the memory cache.
Subsequently, the alarm rule stored in the memory cache can be acquired, so that the index data of the corresponding infrastructure is checked based on the latest alarm rule stored in the memory cache, and the accuracy of the alarm check is ensured.
Based on any of the embodiments, the updating the alarm rule stored in the memory cache based on the update event of the alarm rule specifically includes:
determining an updated alarm rule based on an update event of the alarm rule;
updating the consumption queue established in the memory cache by utilizing a callback function based on the updated alarm rule; and the consumption queue comprises alarm rules corresponding to current infrastructure.
Specifically, based on the update event of the alarm rule, the updated alarm rule and the corresponding update mode are determined. Updating the consumption queue established in the memory cache by utilizing a callback function based on the updated alarm rule; the consumption queue comprises alarm rules corresponding to current infrastructure. For example, when the updated alarm rule is newly added, the newly added alarm rule is inserted into the consumption queue; when the updating mode of the updated alarm rule is deletion, deleting the deleted alarm rule from the consumption queue; and when the updating mode of the updated alarm rule is modification, replacing the updated alarm rule in the consumption queue by using the modified alarm rule.
Based on any of the embodiments, the acquiring the alarm rule stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rule stored in the memory cache specifically includes:
and sequentially reading the alarm rules in the consumption queue, and checking the index data of the corresponding infrastructure based on the read alarm rules.
Specifically, the alarm rule arranged at the top in the consumption queue may be read, the index data of the infrastructure corresponding to the alarm rule may be checked based on the read alarm rule, and the alarm rule may be moved out from the top of the consumption queue. Subsequently, the alarm rule which is arranged at the forefront in the consumption queue at present is read, the index data of the infrastructure corresponding to the alarm rule is checked based on the read alarm rule, and the alarm rule is removed. And repeating the process until the inspection engine layer reads all the alarm rules.
Based on any of the above embodiments, if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure is monitored, determining the infrastructure and/or the alarm rule template with the state change, specifically including:
when the state change of the infrastructure and/or the state change of the alarm rule template are monitored, acquiring label information of each infrastructure and/or each alarm rule template before and after the state change; wherein, each infrastructure and each alarm rule template are preset with corresponding label information;
determining an infrastructure in which the state change occurs based on a difference between tag information of each infrastructure before and after the state change;
and/or determining the alarm rule template with the changed state based on the difference between the label information of each alarm rule template before and after the state change.
Specifically, the etcd sets corresponding tag information for each infrastructure and each alarm rule template. The tag information may include type information (e.g., type information) corresponding to the infrastructure or alarm rule template, and identification information (e.g., name information) corresponding to the infrastructure or alarm rule template itself. When the condition change of the infrastructure or the alarm rule target is monitored, the label information of each infrastructure or alarm rule template before and after the condition change can be obtained. And comparing the label information of each infrastructure or alarm rule template before and after the state change, and acquiring the difference between the label information of each infrastructure before and after the state change and/or the difference between the label information of each alarm rule template before and after the state change, so as to determine the infrastructure and/or alarm rule template with the state change.
Based on any of the above embodiments, if it is monitored that the state change of the infrastructure is a new infrastructure, the updating of the alarm rule based on the infrastructure and/or the alarm rule template in which the state change occurs specifically includes:
determining an alarm rule template corresponding to the newly added infrastructure based on the infrastructure type of the newly added infrastructure and the alarm rule templates corresponding to the infrastructure types;
and generating an alarm rule corresponding to the newly added infrastructure based on the alarm rule template corresponding to the newly added infrastructure, and adding the alarm rule corresponding to the newly added infrastructure into the existing alarm rule set.
Based on any of the above embodiments, if it is monitored that the state change of the infrastructure is to delete the infrastructure, the updating of the alarm rule based on the infrastructure and/or the alarm rule template in which the state change occurs specifically includes:
determining an alarm rule corresponding to the deleted infrastructure in an existing alarm rule set based on the type of the deleted infrastructure;
and deleting the alarm rule corresponding to the deleted infrastructure from the existing alarm rule set.
The following describes the life cycle management device of the infrastructure alarm rule provided by the present invention, and the life cycle management device of the infrastructure alarm rule described below and the life cycle management method of the infrastructure alarm rule described above may be referred to in correspondence with each other.
Based on any of the above embodiments, fig. 4 is a schematic structural diagram of a life cycle management apparatus for an infrastructure alarm rule according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes: a state change monitoring unit 410 and an alarm rule updating unit 420.
The state change monitoring unit 410 is configured to monitor a state change of each infrastructure and a state change of an alarm rule template corresponding to each infrastructure type based on a watch mechanism of the etcd;
the alarm rule updating unit 420 is configured to determine an infrastructure and/or an alarm rule template with a state change if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure is monitored, and update the alarm rules based on the infrastructure and/or the alarm rule template with the state change, so that a life cycle of each alarm rule is adapted to a life cycle of the infrastructure and/or the alarm rule template corresponding to the life cycle;
if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
The device provided by the embodiment of the invention monitors the state change of each infrastructure and the state change of the alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd, when the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or alarm rule template with the changed state, can automatically sense the change and adaptively update the alarm rule when the infrastructure or the alarm rule template changes, the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure or alarm rule template, and efficient and sensorless alarm rule life cycle management is completed.
Based on any of the above embodiments, the updating of the alarm rules based on the infrastructure and/or alarm rule templates with the state change is performed so that the life cycle of each alarm rule is adapted to the life cycle of the infrastructure and/or alarm rule template corresponding to the life cycle, and then the updating method further includes:
monitoring the update event of each alarm rule in the etcd in real time based on a hook function provided by kubernets;
if an update event of the alarm rule in the etcd is monitored, updating the alarm rule stored in the memory cache based on the update event of the alarm rule; in the initial state, pulling all the alarm rules stored in the etcd and storing the alarm rules in the memory cache;
and acquiring the alarm rules stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rules stored in the memory cache.
Based on any of the embodiments, the updating the alarm rule stored in the memory cache based on the update event of the alarm rule specifically includes:
determining an updated alarm rule based on an update event of the alarm rule;
updating the consumption queue established in the memory cache by utilizing a callback function based on the updated alarm rule; and the consumption queue comprises alarm rules corresponding to current infrastructure.
Based on any of the embodiments, the acquiring the alarm rule stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rule stored in the memory cache specifically includes:
and sequentially reading the alarm rules in the consumption queue, and checking the index data of the corresponding infrastructure based on the read alarm rules.
Based on any of the above embodiments, if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure is monitored, determining the infrastructure and/or the alarm rule template with the state change, specifically including:
when the state change of the infrastructure and/or the state change of the alarm rule template are monitored, acquiring label information of each infrastructure and/or each alarm rule template before and after the state change; wherein, each infrastructure and each alarm rule template are preset with corresponding label information;
determining an infrastructure in which the state change occurs based on a difference between tag information of each infrastructure before and after the state change;
and/or determining the alarm rule template with the changed state based on the difference between the label information of each alarm rule template before and after the state change.
Based on any of the above embodiments, if it is monitored that the state change of the infrastructure is a new infrastructure, the updating of the alarm rule based on the infrastructure and/or the alarm rule template in which the state change occurs specifically includes:
determining an alarm rule template corresponding to the newly added infrastructure based on the infrastructure type of the newly added infrastructure and the alarm rule templates corresponding to the infrastructure types;
and generating an alarm rule corresponding to the newly added infrastructure based on the alarm rule template corresponding to the newly added infrastructure, and adding the alarm rule corresponding to the newly added infrastructure into the existing alarm rule set.
Based on any of the above embodiments, if it is monitored that the state change of the infrastructure is to delete the infrastructure, the updating of the alarm rule based on the infrastructure and/or the alarm rule template in which the state change occurs specifically includes:
determining an alarm rule corresponding to the deleted infrastructure in an existing alarm rule set based on the type of the deleted infrastructure;
and deleting the alarm rule corresponding to the deleted infrastructure from the existing alarm rule set.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a method for lifecycle management of infrastructure alarm rules, the method comprising: monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd; if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template; if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
In addition, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the method for lifecycle management of infrastructure alarm rules provided by the above methods, the method comprising: monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd; if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template; if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for lifecycle management for infrastructure alarm rules provided by the above methods, the method comprising: monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd; if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template; if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for lifecycle management of infrastructure alarm rules, comprising:
monitoring the state change of each infrastructure and the state change of an alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd;
if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure are monitored, determining the infrastructure and/or the alarm rule template with the state change, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the state change so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template;
if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
2. The method for managing the life cycle of the infrastructure alarm rules according to claim 1, wherein the updating of the alarm rules based on the infrastructure and/or alarm rule templates with the changed states is performed so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template, and thereafter further comprising:
monitoring the update event of each alarm rule in the etcd in real time based on a hook function provided by kubernets;
if an update event of the alarm rule in the etcd is monitored, updating the alarm rule stored in the memory cache based on the update event of the alarm rule; in an initial state, pulling all alarm rules stored in the etcd and storing the alarm rules in the memory cache;
and acquiring the alarm rules stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rules stored in the memory cache.
3. The method for lifecycle management for infrastructure alarm rules according to claim 2, wherein the updating the alarm rules stored in the memory cache based on the update event of the alarm rules specifically comprises:
determining an updated alarm rule based on an update event of the alarm rule;
updating the consumption queue established in the memory cache by utilizing a callback function based on the updated alarm rule; and the consumption queue comprises alarm rules corresponding to current infrastructure.
4. The method for lifecycle management of infrastructure alarm rules according to claim 3, wherein the obtaining the alarm rules stored in the memory cache to check the index data of the corresponding infrastructure based on the alarm rules stored in the memory cache specifically comprises:
and sequentially reading the alarm rules in the consumption queue, and checking the index data of the corresponding infrastructure based on the read alarm rules.
5. The method for lifecycle management of infrastructure alarm rules according to claim 1, wherein determining the infrastructure and/or alarm rule template with a state change if a state change of the infrastructure and/or a state change of the alarm rule template corresponding to the infrastructure type is monitored specifically comprises:
when the state change of the infrastructure and/or the state change of the alarm rule template are monitored, acquiring label information of each infrastructure and/or each alarm rule template before and after the state change; each infrastructure and each alarm rule template are preset with corresponding label information;
determining an infrastructure in which the state change occurs based on a difference between tag information of each infrastructure before and after the state change;
and/or determining the alarm rule template with the state change based on the difference between the label information of each alarm rule template before and after the state change.
6. The method according to claim 5, wherein if it is monitored that the status of the infrastructure changes to a new infrastructure, the updating of the alarm rules based on the infrastructure and/or the alarm rule template in which the status changes occurs comprises:
determining an alarm rule template corresponding to the newly added infrastructure based on the infrastructure type of the newly added infrastructure and the alarm rule templates corresponding to the infrastructure types;
and generating an alarm rule corresponding to the newly added infrastructure based on the alarm rule template corresponding to the newly added infrastructure, and adding the alarm rule corresponding to the newly added infrastructure into the existing alarm rule set.
7. The method for lifecycle management for infrastructure alarm rules according to claim 6, wherein if it is monitored that the change of state of the infrastructure is a deletion of the infrastructure, the updating of the alarm rule based on the infrastructure and/or the alarm rule template in which the change of state occurs specifically comprises:
determining an alarm rule corresponding to the deleted infrastructure in an existing alarm rule set based on the type of the deleted infrastructure;
and deleting the alarm rule corresponding to the deleted infrastructure from the existing alarm rule set.
8. An infrastructure alarm rules lifecycle management apparatus, comprising:
the state change monitoring unit is used for monitoring the state change of each infrastructure and the state change of the alarm rule template corresponding to each infrastructure type based on the watch mechanism of the etcd;
the alarm rule updating unit is used for determining the infrastructure and/or the alarm rule template with the changed state if the state change of the infrastructure and/or the state change of the alarm rule template corresponding to the type of the infrastructure is monitored, and updating the alarm rules based on the infrastructure and/or the alarm rule template with the changed state so that the life cycle of each alarm rule is adapted to the life cycle of the corresponding infrastructure and/or alarm rule template;
if the state change of any infrastructure or any alarm rule template is newly increased, the updating operation corresponding to the alarm rule is newly increased; if the state change of any infrastructure or any alarm rule template is deletion, the updating operation corresponding to the alarm rule is deletion; and if the state change of any alarm rule template is modification, the updating operation corresponding to the alarm rule is modification.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for lifecycle management of infrastructure alarm rules according to any of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for lifecycle management of infrastructure alarm rules according to any of claims 1 to 7.
CN202210511659.8A 2022-05-12 2022-05-12 Method and device for managing life cycle of infrastructure alarm rule Active CN114615130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210511659.8A CN114615130B (en) 2022-05-12 2022-05-12 Method and device for managing life cycle of infrastructure alarm rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210511659.8A CN114615130B (en) 2022-05-12 2022-05-12 Method and device for managing life cycle of infrastructure alarm rule

Publications (2)

Publication Number Publication Date
CN114615130A CN114615130A (en) 2022-06-10
CN114615130B true CN114615130B (en) 2022-07-12

Family

ID=81870400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210511659.8A Active CN114615130B (en) 2022-05-12 2022-05-12 Method and device for managing life cycle of infrastructure alarm rule

Country Status (1)

Country Link
CN (1) CN114615130B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104125286A (en) * 2014-08-03 2014-10-29 浙江网新恒天软件有限公司 Smart cloud management system based on cloud computing for enterprise infrastructure
CN113128908A (en) * 2021-05-12 2021-07-16 中国建设银行股份有限公司 Management method and system for life cycle of infrastructure
CN113159475A (en) * 2020-12-04 2021-07-23 中国国家铁路集团有限公司 Infrastructure full life cycle monitoring platform and method
CN113504969A (en) * 2021-07-07 2021-10-15 北京汇钧科技有限公司 Container event alarm method and device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504504B2 (en) * 2008-09-26 2013-08-06 Oracle America, Inc. System and method for distributed denial of service identification and prevention
US10623262B2 (en) * 2017-06-20 2020-04-14 Vmware, Inc. Methods and systems to adjust a monitoring tool and auxiliary servers of a distributed computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104125286A (en) * 2014-08-03 2014-10-29 浙江网新恒天软件有限公司 Smart cloud management system based on cloud computing for enterprise infrastructure
CN113159475A (en) * 2020-12-04 2021-07-23 中国国家铁路集团有限公司 Infrastructure full life cycle monitoring platform and method
CN113128908A (en) * 2021-05-12 2021-07-16 中国建设银行股份有限公司 Management method and system for life cycle of infrastructure
CN113504969A (en) * 2021-07-07 2021-10-15 北京汇钧科技有限公司 Container event alarm method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
信息网络运维管理系统在供电企业中的建设及应用;胡海燕等;《中国西部科技》;20110525(第15期);全文 *

Also Published As

Publication number Publication date
CN114615130A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
US10860406B2 (en) Information processing device and monitoring method
CN111314422A (en) Kafka-based message processing method and system, storage medium and computer equipment
CN111010318A (en) Method and system for discovering loss of connection of terminal equipment of Internet of things and equipment shadow server
CN110543512B (en) Information synchronization method, device and system
CN114416703A (en) Method, device, equipment and medium for automatically monitoring data integrity
CN113364877A (en) Data processing method, device, electronic equipment and medium
CN112416957A (en) Data increment updating method and device based on data model layer and computer equipment
CN114615130B (en) Method and device for managing life cycle of infrastructure alarm rule
CN110378154B (en) File set integrity checking method and device
CN108268605A (en) A kind of communal space method for managing resource and system
CN111381932B (en) Method, device, electronic equipment and storage medium for triggering application program change
CN103778218A (en) Cloud computation-based standard information consistency early warning system and method
CN116506496B (en) Locking method, device, equipment and computer readable storage medium for equipment
CN108184141B (en) Processing method of monitoring video task and server
CN111953580B (en) Method, device and storage medium for sending and acquiring session
CN111917599B (en) Management system and method for cloud platform host state
CN113032395A (en) Cache refreshing method and device of application server
CN110879774B (en) Network element performance data alarming method and device
CN117950750A (en) Application program deployment change processing method, device, equipment and medium
CN115617764A (en) Network cloud performance data storage method and device and electronic equipment
CN116232848A (en) Alarm data pushing method and device
CN117891692A (en) Alarm data processing method, device, equipment and storage medium
CN114493107A (en) Data quality verification method and system for task flow, electronic device and storage medium
CN114372295A (en) Method, device and storage medium for detecting security risk of Internet of things card
CN114510433A (en) Memory management method, device and storage medium for optical transmission device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant