CN113381884A - Full link monitoring method and device for monitoring alarm system - Google Patents

Full link monitoring method and device for monitoring alarm system Download PDF

Info

Publication number
CN113381884A
CN113381884A CN202110612028.0A CN202110612028A CN113381884A CN 113381884 A CN113381884 A CN 113381884A CN 202110612028 A CN202110612028 A CN 202110612028A CN 113381884 A CN113381884 A CN 113381884A
Authority
CN
China
Prior art keywords
alarm
monitoring
self
index
trigger
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110612028.0A
Other languages
Chinese (zh)
Other versions
CN113381884B (en
Inventor
黄鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuhe Information Technology Co Ltd
Original Assignee
Shanghai Shuhe Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuhe Information Technology Co Ltd filed Critical Shanghai Shuhe Information Technology Co Ltd
Priority to CN202110612028.0A priority Critical patent/CN113381884B/en
Publication of CN113381884A publication Critical patent/CN113381884A/en
Application granted granted Critical
Publication of CN113381884B publication Critical patent/CN113381884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0622Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)
  • Telephonic Communication Services (AREA)
  • Alarm Systems (AREA)

Abstract

The invention discloses a full-link monitoring method and a full-link monitoring device for a monitoring alarm system, relates to the technical field of internet operation and maintenance, and realizes continuous monitoring of a full link while improving the reliability of the monitoring alarm system and a self-monitoring system. The method comprises the following steps: respectively deploying a monitoring alarm system and a self-monitoring system in different environments; configuring an alarm index of a monitoring alarm system and a trigger frequency of a self-monitoring system; the self-monitoring system periodically acquires an alarm index from the monitoring alarm system according to the trigger frequency, and judges whether the alarm trigger time of the monitoring alarm system is overtime or not according to the timestamp and the current timestamp when the service alarm index in the alarm index is generated; the monitoring alarm system periodically pulls monitoring indexes from the self-monitoring system according to the trigger frequency, and judges whether the alarm trigger state of the self-monitoring system is normal or not; and based on the alarm triggering time and the judgment result of the alarm triggering state, regularly sending an alarm triggering result to a set receiver according to the triggering frequency.

Description

Full link monitoring method and device for monitoring alarm system
Technical Field
The invention relates to the technical field of internet operation and maintenance, in particular to a full link monitoring method and device for a monitoring alarm system.
Background
The monitoring alarm is the most important link in operation and maintenance, and the problems are discovered through continuous information acquisition, convergence and analysis, so that the purposes of early warning and fault discovery in advance in time and providing detailed data for tracing and positioning the problems afterwards are achieved. The monitoring of the monitoring alarm system is also called self-monitoring of the monitoring alarm system, and aims to find the fault state of the monitoring alarm system in time and inform relevant responsible persons of processing and maintaining in time. The monitoring schemes for monitoring and warning systems in the prior art are as follows:
1. by additionally deploying a third-party monitoring component, such as a self-monitoring service (dead eye) used by the open-falcon, and adding a process survival monitoring alarm to the dead eye after deployment is finished, the open-falcon and a monitoring alarm system can be monitored mutually, and any party can sense the problem in time;
2. the method comprises the steps that a plurality of sets of monitoring systems are deployed to achieve mutual monitoring, for example, at least two independent Prometous instances are monitored in a cross mode, each Prometous pulls indexes of all the other Prometous, and once one Prometous goes down, other Prometous can discover and alarm.
Therefore, most of the self-monitoring components or self-monitoring systems in the existing self-monitoring schemes are deployed on the same infrastructure as the monitoring alarm system, and once the infrastructure fails, the monitoring schemes fail, so that the purpose of self-monitoring cannot be achieved. In addition, because the link for monitoring the alarm is long, the alarm is notified to a responsible person from the acquisition of the index information, if any step has a problem, the monitoring alarm is disabled, and the existing self-monitoring scheme only monitors the state of the system and lacks the monitoring of the whole link.
Disclosure of Invention
The invention aims to provide a full link monitoring method and a full link monitoring device for a monitoring alarm system, which can realize continuous monitoring of a full link while improving the reliability of the monitoring alarm system and a self-monitoring system.
A first aspect of the present invention provides a full link monitoring method for monitoring an alarm system, including:
respectively deploying a monitoring alarm system and a self-monitoring system in different environments to enable the monitoring alarm system and the self-monitoring system to monitor each other;
configuring an alarm index of a monitoring alarm system and a trigger frequency of a self-monitoring system;
the self-monitoring system periodically acquires an alarm index from the monitoring alarm system according to the trigger frequency, and judges whether the alarm trigger time of the monitoring alarm system is overtime or not according to the timestamp and the current timestamp when the service alarm index in the alarm index is generated;
the monitoring alarm system periodically pulls a monitoring index from the self-monitoring system according to the trigger frequency, compares the monitoring index with a preset threshold value and judges whether the alarm trigger state of the self-monitoring system is normal or not;
and sending alarm triggering results to a set receiver periodically according to the triggering frequency based on the alarm triggering time and the judgment result of the alarm triggering state.
Preferably, the method for respectively deploying the monitoring alarm system and the self-monitoring program in different environments comprises the following steps:
the monitoring alarm system is deployed on a target machine, and the self-monitoring system is deployed on a server-free framework of the cloud service.
Preferably, the method for configuring the alarm index of the monitoring alarm system and the trigger frequency of the self-monitoring system includes:
the configured alarm indexes of the monitoring alarm system comprise hardware alarm indexes, service alarm indexes and application program alarm indexes;
the configured trigger frequency of the self-monitoring system refers to that a hardware alarm index, a service alarm index and an application alarm index are obtained from the monitoring alarm system regularly according to the trigger frequency.
Further, the method for the self-monitoring system to periodically obtain the alarm index from the monitoring alarm system according to the trigger frequency and judge whether the alarm trigger time of the monitoring alarm system is overtime according to the timestamp and the current timestamp when the service alarm index in the alarm index is generated comprises the following steps:
the self-monitoring system checks whether the acquired alarm indexes simultaneously have a hardware alarm index, a business index alarm index and an application program alarm index, and when the acquired alarm indexes simultaneously exist, the self-monitoring system checks the acquired alarm indexes based on the time difference value between the timestamp when the business alarm index is generated and the current timestamp when the self-monitoring system acquires the alarm index,
and when the time difference value does not exceed the first threshold value, judging that the alarm triggering time of the monitoring alarm system is not overtime, otherwise, judging that the alarm triggering time of the monitoring alarm system is overtime.
Further, the method for the self-monitoring system to periodically obtain the alarm index from the monitoring alarm system according to the trigger frequency and judge whether the alarm trigger time of the monitoring alarm system is overtime according to the timestamp and the current timestamp when the service alarm index in the alarm index is generated further comprises the following steps:
the self-monitoring system checks whether an alarm triggering record of a hardware alarm index, a service alarm index and an application program alarm index exists in a database of the monitoring alarm system, and meanwhile, the self-monitoring system checks the time difference of a timestamp generated when the service alarm index in the database is generated and a timestamp written into the database by the alarm triggering record;
and when the time difference does not exceed the second threshold, judging that the alarm triggering time of the monitoring alarm system is not overtime, otherwise, judging that the alarm triggering time of the monitoring alarm system is overtime.
Further, the method for the monitoring alarm system to periodically pull the monitoring index from the self-monitoring system according to the trigger frequency and compare the monitoring index with the preset threshold value to judge whether the alarm trigger state of the self-monitoring system is normal or not includes the following steps:
the monitoring alarm system periodically pulls monitoring indexes from the self-monitoring system according to the trigger frequency, wherein the monitoring indexes comprise one or more of the total calling times, wrong calling times, calling processing time and calling memory consumption of the self-monitoring system in a trigger frequency period;
the monitoring alarm system compares and judges the pulled monitoring indexes with preset thresholds in a one-to-one correspondence mode, when any index exceeds the preset threshold, the alarm triggering state of the self-monitoring system is considered to be abnormal, and otherwise, the alarm triggering state of the self-monitoring system is considered to be normal.
Optionally, after the alarm triggering result is periodically sent to the set receiver according to the triggering frequency based on the alarm triggering time and the judgment result of the alarm triggering state, the method further includes:
the self-monitoring system periodically checks a receipt record table of the notification system and judges whether an alarm triggering result is successfully sent to a specified receiver;
and if the judgment result is that the alarm trigger result is not successfully sent to the preset receiver, the alarm trigger result is sent to a responsible person configured in advance in the notification system again.
Illustratively, the monitoring alarm system comprises a prometheus system, and the server-free architecture of the cloud service is a Serverless service.
Compared with the prior art, the full link monitoring method for monitoring the alarm system provided by the invention has the following beneficial effects:
the invention provides a full link monitoring method for monitoring alarm system, which comprises deploying monitoring alarm system and self-monitoring system in different environments, wherein the monitoring alarm system can not only monitor the target machine, but also realize the mutual monitoring of the monitoring alarm system and self-monitoring system, concretely, configuring the alarm index of the monitoring alarm system and the trigger frequency of the self-monitoring system in advance, then obtaining the alarm index from the monitoring alarm system by the self-monitoring system according to the trigger frequency, judging whether the alarm trigger time of the monitoring alarm system is overtime according to the timestamp and current timestamp when the service alarm index in the alarm index is generated, simultaneously pulling the monitoring index from the self-monitoring system by the monitoring alarm system according to the trigger frequency, comparing the monitoring index with the preset threshold value to judge whether the alarm trigger state of the self-monitoring system is normal, and finally, based on the alarm triggering time and the judgment result of the alarm triggering state, sending alarm triggering results to a set receiver periodically according to the triggering frequency.
Therefore, the monitoring alarm system and the self-monitoring system are not deployed on the same infrastructure, so that the risk of the failure of the monitoring alarm system and the self-monitoring system can be effectively reduced, and the tracking monitoring can be performed on whether the receiver successfully receives the alarm triggering result, thereby realizing the full-link monitoring.
A second aspect of the present invention provides a full link monitoring apparatus for monitoring an alarm system, which is applied to the full link monitoring method for monitoring an alarm system in the foregoing technical solution, and the apparatus includes:
the deployment unit is used for respectively deploying the monitoring alarm system and the self-monitoring system in different environments so as to enable the monitoring alarm system and the self-monitoring system to monitor each other;
the configuration unit is used for configuring the alarm indexes of the monitoring alarm system and the trigger frequency of the self-monitoring system;
the first monitoring unit is used for periodically acquiring an alarm index from the monitoring alarm system through the self-monitoring system according to the trigger frequency, and judging whether the alarm trigger time of the monitoring alarm system is overtime or not according to a timestamp and a current timestamp when a service alarm index in the alarm index is generated;
the second monitoring unit is used for pulling a monitoring index from the self-monitoring system periodically according to the trigger frequency through the monitoring alarm system, and comparing the monitoring index with a preset threshold value to judge whether the alarm trigger state of the self-monitoring system is normal or not;
and the sending detection unit is used for sending the alarm triggering result to a set receiver periodically according to the triggering frequency based on the alarm triggering time and the judgment result of the alarm triggering state.
Compared with the prior art, the beneficial effects of the full link monitoring device for monitoring the alarm system provided by the invention are the same as the beneficial effects of the full link monitoring method for monitoring the alarm system provided by the technical scheme, and the detailed description is omitted here.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, performs the steps of the above-mentioned full link monitoring method for monitoring an alarm system.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as the beneficial effects of the full-link monitoring method for monitoring the alarm system provided by the technical scheme, and the detailed description is omitted here.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic flow chart of a full link monitoring method for monitoring an alarm system according to an embodiment of the present invention;
FIG. 2 is a task flow diagram of a monitoring alarm system monitored by a self-monitoring system in an embodiment of the present invention;
fig. 3 is a task flow diagram of a monitoring self-monitoring system of a monitoring alarm system in an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present embodiment provides a full link monitoring method for monitoring an alarm system, including:
respectively deploying a monitoring alarm system and a self-monitoring system in different environments to enable the monitoring alarm system and the self-monitoring system to monitor each other; configuring an alarm index of a monitoring alarm system and a trigger frequency of a self-monitoring system; the self-monitoring system periodically acquires an alarm index from the monitoring alarm system according to the trigger frequency, and judges whether the alarm trigger time of the monitoring alarm system is overtime or not according to the timestamp and the current timestamp when the service alarm index in the alarm index is generated; the monitoring alarm system periodically pulls a monitoring index from the self-monitoring system according to the trigger frequency, compares the monitoring index with a preset threshold value and judges whether the alarm trigger state of the self-monitoring system is normal or not; and based on the alarm triggering time and the judgment result of the alarm triggering state, regularly sending an alarm triggering result to a set receiver according to the triggering frequency.
In the full-link monitoring method for monitoring an alarm system provided in this embodiment, a monitoring alarm system and a self-monitoring system are respectively deployed in different environments, where the monitoring alarm system not only can monitor a target machine, but also can monitor the monitoring alarm system and the self-monitoring system, specifically, an alarm indicator of the monitoring alarm system and a trigger frequency of the self-monitoring system are configured in advance, then the self-monitoring system periodically obtains the alarm indicator from the monitoring alarm system according to the trigger frequency, determines whether an alarm trigger time of the monitoring alarm system is over time according to a timestamp and a current timestamp when a service alarm indicator in the alarm indicator is generated, and meanwhile, the monitoring alarm system periodically pulls the monitoring indicator from the self-monitoring system according to the trigger frequency, compares the monitoring indicator with a preset threshold value to determine whether an alarm trigger state of the self-monitoring system is normal, and finally, based on the alarm triggering time and the judgment result of the alarm triggering state, sending alarm triggering results to a set receiver periodically according to the triggering frequency.
Therefore, in the embodiment, the monitoring alarm system and the self-monitoring system are not deployed on the same infrastructure, so that the risk of complete failure of the monitoring alarm system and the self-monitoring system can be effectively reduced, and meanwhile, whether the receiver successfully receives the alarm triggering result can be tracked and monitored, so that full link monitoring is realized.
In the above embodiment, the method for respectively deploying the monitoring alarm system and the self-monitoring program in different environments includes:
the monitoring alarm system is deployed on a target machine, and the self-monitoring system is deployed on a server-free framework of the cloud service.
Illustratively, the monitoring alarm system comprises a prometheus system, and the Serverless architecture of the cloud service is a Serverless service.
In specific implementation, the monitoring and warning system comprises a Prometheus and an alert manager, the Prometheus and the alert manager are respectively deployed on the target machine and are in signal connection with the target machine, and are simultaneously built in the same infrastructure with the target machine, and the monitoring and warning system can continuously monitor and warn the target machine and acquire warning indexes. The self-monitoring system is deployed on a server-free framework of the cloud service, namely, the Serverless service, and is in signal connection with the self-monitoring system by using Prometheus, so that the self-monitoring system can regularly acquire alarm indexes through Prometheus to realize monitoring of alarm triggering time of the self-monitoring alarm system. Preferably, the self-monitoring system is a self-monitoring program.
It can be seen that, in the present embodiment, the monitoring alarm system and the self-monitoring system are respectively deployed in different environments, so that the risk of total failures of the monitoring alarm system and the self-monitoring system can be effectively reduced, and the self-monitoring system is deployed on the serverless service.
In the above embodiment, the method for configuring the alarm indicator of the monitoring alarm system and the trigger frequency of the self-monitoring system includes:
the alarm indexes of the configured monitoring alarm system comprise hardware alarm indexes, service alarm indexes and application program alarm indexes; the configured trigger frequency of the self-monitoring system refers to that a hardware alarm index, a service alarm index and an application alarm index are obtained from the monitoring alarm system regularly according to the trigger frequency.
In specific implementation, three alarm categories, namely a hardware alarm, a Tomcat application alarm and a service alarm, are configured through a monitoring alarm system, wherein the hardware alarm is used for monitoring resource consumption conditions of a CPU (central processing unit), a memory, network flow and the like of a server of a target machine and alarms when the resource consumption conditions are larger than a threshold value, an application program alarm is used for monitoring jvm states of the target machine and judging whether an application health check interface is normal or not, the state is normal, namely the alarm, the service alarm is an alarm which is actively reported by the Tomcat application and takes a current timestamp as a service value, and the index is larger than 0, namely the alarm.
Generally speaking, when a target machine normally works, a hardware alarm index, a service alarm index and an application alarm index are all larger than a threshold value 0, so that the three alarm indexes are all triggered all the time, a self-monitoring system can regularly check whether the three alarms are normally issued to a receiver terminal, and the issuing modes comprise short message alarm reminding, mail alarm reminding, communication software alarm reminding and the like.
In the above embodiment, the method for periodically obtaining the alarm indicator from the monitoring alarm system by the self-monitoring system according to the trigger frequency, and determining whether the alarm trigger time of the monitoring alarm system is overtime according to the timestamp and the current timestamp when the service alarm indicator in the alarm indicator is generated includes:
the self-monitoring system checks whether the acquired alarm indexes simultaneously have hardware alarm indexes, business alarm indexes and application program alarm indexes, and the self-monitoring system determines the time difference between the timestamp generated by the business alarm indexes and the current timestamp acquired by the self-monitoring system when the alarm indexes simultaneously exist; and when the time difference value does not exceed the first threshold value, judging that the alarm triggering time of the monitoring alarm system is not overtime, otherwise, judging that the alarm triggering time of the monitoring alarm system is overtime.
Referring to fig. 2, in an implementation, the self-monitoring system is deployed on a Serverless service, and the configuration triggering is performed once every n minutes. The task flow of the self-monitoring system according to the monitoring and warning system is as follows:
1. checking whether the three alarm indexes exist on Prometheus at the same time, if not, indicating that the alarm is abnormal, if so, continuously checking the timestamp of the service alarm index and the current time to compare and calculate the difference, if the difference does not exceed a first threshold, indicating that the alarm trigger is not overtime, otherwise, indicating that the alarm trigger is overtime;
2. checking whether a database of a monitoring alarm system has trigger records of the three alarms, comparing a timestamp inserted into the database with a timestamp of a service alarm index to calculate a difference value, if the difference value does not exceed a second threshold value, indicating that the alarm trigger is not overtime, otherwise, indicating that the alarm trigger is overtime;
3. checking whether the notification system successfully sends the three alarms in each channel mode;
4. checking the receipt record table of the notification system, checking the receipt record for the notification modes (short messages and telephones) of the receipt channels, and judging whether the notification is successfully issued to the receiver terminal.
5. And if any one of the steps is abnormal, informing the relevant responsible person through a notification mode of the cloud monitoring service, and further realizing the full link monitoring of the alarm process.
The detection process adopts a pipeline mode to disassemble the whole monitoring alarm link into a plurality of links and polls the state of each step, thereby detecting the whole link, not only confirming whether the alarm notification reaches a receiver, but also positioning the specific step of failure.
In the above embodiment, the method for the monitoring alarm system to periodically pull the monitoring index from the self-monitoring system according to the trigger frequency, and compare the monitoring index with the preset threshold to determine whether the alarm trigger state of the self-monitoring system is normal includes:
the monitoring alarm system periodically pulls monitoring indexes from the self-monitoring system according to the trigger frequency, wherein the monitoring indexes comprise one or more of the total calling times, wrong calling times, calling processing time and calling memory consumption of the self-monitoring system in a trigger frequency period; the monitoring alarm system compares and judges the pulled monitoring indexes with preset thresholds in a one-to-one correspondence mode, when any index exceeds the preset threshold, the alarm triggering state of the self-monitoring system is considered to be abnormal, and otherwise, the alarm triggering state of the self-monitoring system is considered to be normal.
Referring to fig. 3, in specific implementation, the task flow of the monitoring and warning system monitoring self-monitoring system is as follows:
1. prometheus draws metrics from a monitored system, including: the total number of calls in n minutes, the number of call errors in n minutes, the call processing time (function execution time) and the memory consumption;
2. prometheus configures an alarm of a self-monitoring system, and alarms when any one or more conditions of the total calling times within n minutes being 0, the calling error times within n minutes being more than 0, the function execution time being more than 30s and the memory consumption being more than 500M occur;
3. prometheus triggers an alarm, informs an alert manager to perform de-duplication inhibition on the silent operation of an alarm group, reduces the disturbance rate of a user for receiving the alarm notification, and then calls a monitoring alarm system; illustratively, grouping is to assemble the alarms of multiple database instances into one alarm, inhibiting means that if a certain machine hangs, all applications deployed in the machine do not alarm and only alarm the machine to hang, and silencing means that no alarm is given on a non-working day and only an alarm is given on a working day.
4. The monitoring alarm system searches for a corresponding receiver and selects different notification modes according to the alarm level;
5. and calling a notification system to transmit the alarm notification to the terminal of the receiver in a selected notification mode.
In the above embodiment, after the sending the alarm triggering result to the set receiver according to the triggering frequency periodically based on the alarm triggering time and the judgment result of the alarm triggering state, the method further includes:
the self-monitoring system periodically checks a receipt record table of the notification system and judges whether an alarm triggering result is successfully sent to a specified receiver; and if the judgment result is that the alarm trigger result is not successfully sent to the preset receiver, the alarm trigger result is sent to a responsible person configured in advance in the notification system again.
In specific implementation, the receiver is a user and is responsible for being a responsible person of the related system, and when the alarm triggering result cannot be normally sent to a preset appointed receiver, the abnormal phenomenon is timely sent to the responsible person so as to inform the responsible person to perform troubleshooting and maintenance on the related system.
Example two
The embodiment provides a full link monitoring device for monitoring an alarm system, which includes:
the deployment unit is used for respectively deploying the monitoring alarm system and the self-monitoring system in different environments so as to enable the monitoring alarm system and the self-monitoring system to monitor each other;
the configuration unit is used for configuring the alarm indexes of the monitoring alarm system and the trigger frequency of the self-monitoring system;
the first monitoring unit is used for periodically acquiring an alarm index from the monitoring alarm system through the self-monitoring system according to the trigger frequency, and judging whether the alarm trigger time of the monitoring alarm system is overtime or not according to a timestamp and a current timestamp when a service alarm index in the alarm index is generated;
the second monitoring unit is used for pulling a monitoring index from the self-monitoring system periodically according to the trigger frequency through the monitoring alarm system, and comparing the monitoring index with a preset threshold value to judge whether the alarm trigger state of the self-monitoring system is normal or not;
and the sending detection unit is used for sending the alarm triggering result to a set receiver periodically according to the triggering frequency based on the alarm triggering time and the judgment result of the alarm triggering state.
Compared with the prior art, the beneficial effects of the full link monitoring device for monitoring the alarm system provided by the invention are the same as the beneficial effects of the full link monitoring method for monitoring the alarm system provided by the technical scheme, and the detailed description is omitted here.
In the above embodiment, the method for respectively deploying the monitoring alarm system and the self-monitoring program in different environments includes:
the monitoring alarm system is deployed on a target machine, and the self-monitoring system is deployed on a server-free framework of the cloud service.
Illustratively, the monitoring alarm system comprises a prometheus system, and the Serverless architecture of the cloud service is a Serverless service.
In specific implementation, the monitoring and warning system comprises a Prometheus and an alert manager, the Prometheus and the alert manager are respectively deployed on the target machine and are in signal connection with the target machine, and are simultaneously built in the same infrastructure with the target machine, and the monitoring and warning system can continuously monitor and warn the target machine and acquire warning indexes. The self-monitoring system is deployed on a server-free framework of the cloud service, namely, the Serverless service, and is in signal connection with the self-monitoring system by using Prometheus, so that the self-monitoring system can regularly acquire alarm indexes through Prometheus to realize monitoring of alarm triggering time of the self-monitoring alarm system. Preferably, the self-monitoring system is a self-monitoring program.
It can be seen that, in the present embodiment, the monitoring alarm system and the self-monitoring system are respectively deployed in different environments, so that the risk of total failures of the monitoring alarm system and the self-monitoring system can be effectively reduced, and the self-monitoring system is deployed on the serverless service.
In the above embodiment, the method for configuring the alarm indicator of the monitoring alarm system and the trigger frequency of the self-monitoring system includes:
the alarm indexes of the configured monitoring alarm system comprise hardware alarm indexes, service alarm indexes and application program alarm indexes; the configured trigger frequency of the self-monitoring system refers to that a hardware alarm index, a service alarm index and an application alarm index are obtained from the monitoring alarm system regularly according to the trigger frequency.
In specific implementation, three alarm categories, namely a hardware alarm, a Tomcat application alarm and a service alarm, are configured through a monitoring alarm system, wherein the hardware alarm is used for monitoring resource consumption conditions of a CPU (central processing unit), a memory, network flow and the like of a server of a target machine and alarms when the resource consumption conditions are larger than a threshold value, an application program alarm is used for monitoring jvm states of the target machine and judging whether an application health check interface is normal or not, the state is normal, namely the alarm, the service alarm is an alarm which is actively reported by the Tomcat application and takes a current timestamp as a service value, and the index is larger than 0, namely the alarm.
Generally speaking, when a target machine normally works, a hardware alarm index, a service alarm index and an application alarm index are all larger than a threshold value 0, so that the three alarm indexes are all triggered all the time, a self-monitoring system can regularly check whether the three alarms are normally issued to a receiver terminal, and the issuing modes comprise short message alarm reminding, mail alarm reminding, communication software alarm reminding and the like.
In the above embodiment, the method for periodically obtaining the alarm indicator from the monitoring alarm system by the self-monitoring system according to the trigger frequency, and determining whether the alarm trigger time of the monitoring alarm system is overtime according to the timestamp and the current timestamp when the service alarm indicator in the alarm indicator is generated includes:
the self-monitoring system checks whether the acquired alarm indexes simultaneously have hardware alarm indexes, business alarm indexes and application program alarm indexes, and the self-monitoring system determines the time difference between the timestamp generated by the business alarm indexes and the current timestamp acquired by the self-monitoring system when the alarm indexes simultaneously exist; and when the time difference value does not exceed the first threshold value, judging that the alarm triggering time of the monitoring alarm system is not overtime, otherwise, judging that the alarm triggering time of the monitoring alarm system is overtime.
In specific implementation, the self-monitoring system is deployed on a Serverless service, and the triggering mode is configured to be executed once every n minutes. The task flow of the self-monitoring system according to the monitoring and warning system is as follows:
1. checking whether the three alarm indexes exist on Prometheus at the same time, if not, indicating that the alarm is abnormal, if so, continuously checking the timestamp of the service alarm index and the current time to compare and calculate the difference, if the difference does not exceed a first threshold, indicating that the alarm trigger is not overtime, otherwise, indicating that the alarm trigger is overtime;
2. checking whether a database of a monitoring alarm system has trigger records of the three alarms, comparing a timestamp inserted into the database with a timestamp of a service alarm index to calculate a difference value, if the difference value does not exceed a second threshold value, indicating that the alarm trigger is not overtime, otherwise, indicating that the alarm trigger is overtime;
3. checking whether the notification system successfully sends the three alarms in each channel mode;
4. checking the receipt record table of the notification system, checking the receipt record for the notification modes (short messages and telephones) of the receipt channels, and judging whether the notification is successfully issued to the receiver terminal.
5. And if any one of the steps is abnormal, informing the relevant responsible person through a notification mode of the cloud monitoring service, and further realizing the full link monitoring of the alarm process.
The detection process adopts a pipeline mode to disassemble the whole monitoring alarm link into a plurality of links and polls the state of each step, thereby detecting the whole link, not only confirming whether the alarm notification reaches a receiver, but also positioning the specific step of failure.
In the above embodiment, the method for the monitoring alarm system to periodically pull the monitoring index from the self-monitoring system according to the trigger frequency, and compare the monitoring index with the preset threshold to determine whether the alarm trigger state of the self-monitoring system is normal includes:
the monitoring alarm system periodically pulls monitoring indexes from the self-monitoring system according to the trigger frequency, wherein the monitoring indexes comprise one or more of the total calling times, wrong calling times, calling processing time and calling memory consumption of the self-monitoring system in a trigger frequency period; the monitoring alarm system compares and judges the pulled monitoring indexes with preset thresholds in a one-to-one correspondence mode, when any index exceeds the preset threshold, the alarm triggering state of the self-monitoring system is considered to be abnormal, and otherwise, the alarm triggering state of the self-monitoring system is considered to be normal.
In specific implementation, the task flow of the monitoring and self-monitoring system of the monitoring and warning system is as follows:
1. prometheus draws metrics from a monitored system, including: the total number of calls in n minutes, the number of call errors in n minutes, the call processing time (function execution time) and the memory consumption;
2. prometheus configures an alarm of a self-monitoring system, and alarms when any one or more conditions of the total calling times within n minutes being 0, the calling error times within n minutes being more than 0, the function execution time being more than 30s and the memory consumption being more than 500M occur;
3. prometheus triggers an alarm, informs an alert manager to perform de-duplication inhibition on the silent operation of an alarm group, reduces the disturbance rate of a user for receiving the alarm notification, and then calls a monitoring alarm system; illustratively, grouping is to assemble the alarms of multiple database instances into one alarm, inhibiting means that if a certain machine hangs, all applications deployed in the machine do not alarm and only alarm the machine to hang, and silencing means that no alarm is given on a non-working day and only an alarm is given on a working day.
4. The monitoring alarm system searches for a corresponding receiver and selects different notification modes according to the alarm level;
5. and calling a notification system to transmit the alarm notification to the terminal of the receiver in a selected notification mode.
In the above embodiment, after the sending the alarm triggering result to the set receiver according to the triggering frequency periodically based on the alarm triggering time and the judgment result of the alarm triggering state, the method further includes:
the self-monitoring system periodically checks a receipt record table of the notification system and judges whether an alarm triggering result is successfully sent to a specified receiver; and if the judgment result is that the alarm trigger result is not successfully sent to the preset receiver, the alarm trigger result is sent to a responsible person configured in advance in the notification system again.
In specific implementation, the receiver is a user and is responsible for being a responsible person of the related system, and when the alarm triggering result cannot be normally sent to a preset appointed receiver, the abnormal phenomenon is timely sent to the responsible person so as to inform the responsible person to perform troubleshooting and maintenance on the related system.
EXAMPLE III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program, which, when being executed by a processor, performs the steps of the above-mentioned full link monitoring method for monitoring an alarm system.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by this embodiment are the same as the beneficial effects of the full-link monitoring method for monitoring an alarm system provided by the above technical solution, and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the invention may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the embodiment, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A full link monitoring method for monitoring an alarm system, comprising:
respectively deploying a monitoring alarm system and a self-monitoring system in different environments to enable the monitoring alarm system and the self-monitoring system to monitor each other;
configuring an alarm index of a monitoring alarm system and a trigger frequency of a self-monitoring system;
the self-monitoring system periodically acquires an alarm index from the monitoring alarm system according to the trigger frequency, and judges whether the alarm trigger time of the monitoring alarm system is overtime or not according to the timestamp and the current timestamp when the service alarm index in the alarm index is generated;
the monitoring alarm system periodically pulls a monitoring index from the self-monitoring system according to the trigger frequency, compares the monitoring index with a preset threshold value and judges whether the alarm trigger state of the self-monitoring system is normal or not;
and sending alarm triggering results to a set receiver periodically according to the triggering frequency based on the alarm triggering time and the judgment result of the alarm triggering state.
2. The method of claim 1, wherein the method of deploying the monitoring alarm system and the self-monitoring program separately in different environments comprises:
the monitoring alarm system is deployed on a target machine, and the self-monitoring system is deployed on a server-free framework of the cloud service.
3. The method of claim 1, wherein configuring the alarm indicators of the monitoring alarm system and the trigger frequency of the self-monitoring system comprises:
the configured alarm indexes of the monitoring alarm system comprise hardware alarm indexes, service alarm indexes and application program alarm indexes;
the configured trigger frequency of the self-monitoring system refers to that a hardware alarm index, a service alarm index and an application alarm index are obtained from the monitoring alarm system regularly according to the trigger frequency.
4. The method of claim 3, wherein the method for periodically obtaining the alarm indicator from the monitoring alarm system according to the trigger frequency from the monitoring alarm system, and determining whether the alarm trigger time of the monitoring alarm system is over time according to the timestamp of the generation of the service alarm indicator in the alarm indicator and the current timestamp, comprises:
the self-monitoring system checks whether the acquired alarm indexes simultaneously have hardware alarm indexes, business alarm indexes and application program alarm indexes, and the self-monitoring system determines the time difference between the timestamp generated by the business alarm indexes and the current timestamp acquired by the self-monitoring system when the alarm indexes simultaneously exist;
and when the time difference value does not exceed the first threshold value, judging that the alarm triggering time of the monitoring alarm system is not overtime, otherwise, judging that the alarm triggering time of the monitoring alarm system is overtime.
5. The method of claim 4, wherein the method for periodically obtaining the alarm indicator from the monitoring alarm system according to the trigger frequency from the monitoring alarm system, and determining whether the alarm trigger time of the monitoring alarm system is over time according to the timestamp of the generation of the service alarm indicator in the alarm indicator and the current timestamp, further comprises:
the self-monitoring system checks whether an alarm triggering record of a hardware alarm index, a service index alarm index and an application program alarm index exists in a database of the monitoring alarm system, and simultaneously, based on the time difference value of a timestamp when the service alarm index in the database is generated and a timestamp when the alarm triggering record is written into the database,
and when the time difference does not exceed the second threshold, judging that the alarm triggering time of the monitoring alarm system is not overtime, otherwise, judging that the alarm triggering time of the monitoring alarm system is overtime.
6. The method of claim 5, wherein the monitoring alarm system periodically pulls a monitoring index from the self-monitoring system according to the trigger frequency, and the method for comparing the monitoring index with the preset threshold to determine whether the alarm trigger state of the self-monitoring system is normal comprises:
the monitoring alarm system periodically pulls monitoring indexes from the self-monitoring system according to the trigger frequency, wherein the monitoring indexes comprise one or more of the total calling times, wrong calling times, calling processing time and calling memory consumption of the self-monitoring system in a trigger frequency period;
the monitoring alarm system compares and judges the pulled monitoring indexes with preset thresholds in a one-to-one correspondence mode, when any index exceeds the preset threshold, the alarm triggering state of the self-monitoring system is considered to be abnormal, and otherwise, the alarm triggering state of the self-monitoring system is considered to be normal.
7. The method according to any one of claims 1-6, wherein after periodically sending the alarm trigger result to the set receiver according to the trigger frequency based on the alarm trigger time and the judgment result of the alarm trigger state, the method further comprises:
the self-monitoring system periodically checks a receipt record table of the notification system and judges whether an alarm triggering result is successfully sent to a specified receiver;
and if the judgment result is that the alarm trigger result is not successfully sent to the preset receiver, the alarm trigger result is sent to a responsible person configured in advance in the notification system again.
8. The method according to any one of claims 1-6, wherein the monitoring alarm system comprises a prometheus system, and the server-less architecture of the cloud service is a Serverless service.
9. A full link monitoring system for monitoring an alarm system, comprising:
the deployment unit is used for respectively deploying the monitoring alarm system and the self-monitoring system in different environments so as to enable the monitoring alarm system and the self-monitoring system to monitor each other;
the configuration unit is used for configuring the alarm indexes of the monitoring alarm system and the trigger frequency of the self-monitoring system;
the first monitoring unit is used for periodically acquiring an alarm index from the monitoring alarm system through the self-monitoring system according to the trigger frequency, and judging whether the alarm trigger time of the monitoring alarm system is overtime or not according to a timestamp and a current timestamp when a service alarm index in the alarm index is generated;
the second monitoring unit is used for pulling a monitoring index from the self-monitoring system periodically according to the trigger frequency through the monitoring alarm system, and comparing the monitoring index with a preset threshold value to judge whether the alarm trigger state of the self-monitoring system is normal or not;
and the sending detection unit is used for sending the alarm triggering result to a set receiver periodically according to the triggering frequency based on the alarm triggering time and the judgment result of the alarm triggering state.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 8.
CN202110612028.0A 2021-06-02 2021-06-02 Full link monitoring method and device for monitoring alarm system Active CN113381884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110612028.0A CN113381884B (en) 2021-06-02 2021-06-02 Full link monitoring method and device for monitoring alarm system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110612028.0A CN113381884B (en) 2021-06-02 2021-06-02 Full link monitoring method and device for monitoring alarm system

Publications (2)

Publication Number Publication Date
CN113381884A true CN113381884A (en) 2021-09-10
CN113381884B CN113381884B (en) 2023-01-31

Family

ID=77575325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110612028.0A Active CN113381884B (en) 2021-06-02 2021-06-02 Full link monitoring method and device for monitoring alarm system

Country Status (1)

Country Link
CN (1) CN113381884B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118581A (en) * 2022-06-27 2022-09-27 广东长天思源环保科技股份有限公司 Internet of things data full-link monitoring and intelligent security system based on 5G

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203763556U (en) * 2014-02-24 2014-08-13 广东宝莱特医用科技股份有限公司 System with double monitoring functions
CN105974906A (en) * 2016-05-12 2016-09-28 深圳市中工巨能科技有限公司 Double monitoring-activating measurement and control device
CN106776243A (en) * 2016-12-30 2017-05-31 中国银联股份有限公司 A kind of monitoring method and device for monitoring software
WO2018028573A1 (en) * 2016-08-12 2018-02-15 中兴通讯股份有限公司 Method and device for fault handling, and controller
CN111083003A (en) * 2018-10-22 2020-04-28 中兴通讯股份有限公司 Monitoring system and method, storage medium and processor
CN111581060A (en) * 2020-05-11 2020-08-25 金蝶软件(中国)有限公司 Prometheus-based log alarm system and method and related equipment
CN111949483A (en) * 2020-08-13 2020-11-17 星辰天合(北京)数据科技有限公司 Monitoring device and monitoring system
WO2021073433A1 (en) * 2019-10-16 2021-04-22 平安科技(深圳)有限公司 Monitoring method and device, server, and storage medium
CN112732536A (en) * 2020-12-30 2021-04-30 平安科技(深圳)有限公司 Data monitoring and alarming method and device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203763556U (en) * 2014-02-24 2014-08-13 广东宝莱特医用科技股份有限公司 System with double monitoring functions
CN105974906A (en) * 2016-05-12 2016-09-28 深圳市中工巨能科技有限公司 Double monitoring-activating measurement and control device
WO2018028573A1 (en) * 2016-08-12 2018-02-15 中兴通讯股份有限公司 Method and device for fault handling, and controller
CN106776243A (en) * 2016-12-30 2017-05-31 中国银联股份有限公司 A kind of monitoring method and device for monitoring software
CN111083003A (en) * 2018-10-22 2020-04-28 中兴通讯股份有限公司 Monitoring system and method, storage medium and processor
WO2021073433A1 (en) * 2019-10-16 2021-04-22 平安科技(深圳)有限公司 Monitoring method and device, server, and storage medium
CN111581060A (en) * 2020-05-11 2020-08-25 金蝶软件(中国)有限公司 Prometheus-based log alarm system and method and related equipment
CN111949483A (en) * 2020-08-13 2020-11-17 星辰天合(北京)数据科技有限公司 Monitoring device and monitoring system
CN112732536A (en) * 2020-12-30 2021-04-30 平安科技(深圳)有限公司 Data monitoring and alarming method and device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
董蕾等: "工业云PaaS平台监控告警服务的设计实现", 《信息技术与信息化》 *
马永等: "基于Prometheus的基础软硬件全链路监控设计和实现", 《电子技术与软件工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118581A (en) * 2022-06-27 2022-09-27 广东长天思源环保科技股份有限公司 Internet of things data full-link monitoring and intelligent security system based on 5G
CN115118581B (en) * 2022-06-27 2024-04-12 广东长天思源环保科技股份有限公司 Internet of things data all-link monitoring and intelligent guaranteeing system based on 5G

Also Published As

Publication number Publication date
CN113381884B (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN102932466B (en) The distributed source method for supervising of content-based distributing network and system
CN109218102A (en) A kind of alarm monitoring method and system
CN103810076B (en) The monitoring method and device of data duplication
CN113381884B (en) Full link monitoring method and device for monitoring alarm system
CN114996085A (en) Prometheus-based real-time service monitoring method and system
CN110875841A (en) Alarm information pushing method and device and readable storage medium
CN109905262A (en) A kind of monitoring system and monitoring method of CDN device service
WO2023083079A1 (en) System, method and apparatus for monitoring third-party system, and device and storage medium
WO2023123801A1 (en) Log aggregation system, and method for improving availability of log aggregation system
CN101222369B (en) Network element link time-sharing detecting method and device
CN107968727A (en) A kind of detection method, device and the medium of CIFS services
CN113518020A (en) Method, device and equipment for detecting disconnection return and readable storage medium
CN109699041B (en) RRU channel fault diagnosis processing method, device and computer storage medium
CN102195824B (en) Method, device and system for out-of-service alarm of data service system
CN116055303A (en) Link monitoring processing method and device, electronic equipment and storage medium
CN115002001B (en) Method, device, equipment and medium for detecting sub-health of cluster network
TW201409968A (en) Information and communication service quality estimation and real-time alarming system and method
CN112383409B (en) Network status code aggregation alarm method and system
CN114168371A (en) Intelligent automatic fault alarm system
WO2014040470A1 (en) Alarm message processing method and device
CN112181780A (en) Detection and alarm method, device and equipment for containerized platform core component
CN114610560A (en) System abnormity monitoring method, device and storage medium
CN111918233A (en) Anomaly detection method suitable for wireless aviation network
CN115118575B (en) Monitoring method, monitoring device, electronic equipment and storage medium
CN113300908B (en) Link monitoring method and system based on unidirectional network boundary equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant