CN117411801A - Aviation Internet all-link monitoring and alarming system based on grafana - Google Patents

Aviation Internet all-link monitoring and alarming system based on grafana Download PDF

Info

Publication number
CN117411801A
CN117411801A CN202311346966.6A CN202311346966A CN117411801A CN 117411801 A CN117411801 A CN 117411801A CN 202311346966 A CN202311346966 A CN 202311346966A CN 117411801 A CN117411801 A CN 117411801A
Authority
CN
China
Prior art keywords
module
alarm
grafana
data
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311346966.6A
Other languages
Chinese (zh)
Inventor
易斌
冯世清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Air Space Internet Technology Co ltd
Original Assignee
Air Space Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Air Space Internet Technology Co ltd filed Critical Air Space Internet Technology Co ltd
Priority to CN202311346966.6A priority Critical patent/CN117411801A/en
Publication of CN117411801A publication Critical patent/CN117411801A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G5/00Traffic control systems for aircraft, e.g. air-traffic control [ATC]
    • G08G5/0073Surveillance aids
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0882Utilisation of link capacity

Abstract

The invention relates to the technical field of aviation Internet all-link monitoring and alarming, in particular to an aviation Internet all-link monitoring and alarming system based on grafana. The invention relates to an aviation Internet all-link monitoring alarm system based on grafana, which comprises an alarm collection module, an alarm notification module, an alarm processing module, a disaster recovery and fault recovery module, a data acquisition module, a data display module based on grafana visualization function, an alarm setting module, a database, a data analysis and visualization tool, a user authority management module, a report module, a fault exercise and simulation module, an automatic operation and maintenance support module, an SLA monitoring and report module, a fault tolerance and disaster backup support module, a multi-cluster management module and an expansion module.

Description

Aviation Internet all-link monitoring and alarming system based on grafana
Technical Field
The invention relates to the technical field of aviation Internet all-link monitoring and alarming, in particular to an aviation Internet all-link monitoring and alarming system based on grafana.
Background
In order to ensure the normal operation of an aviation system, real-time monitoring is usually performed by means of an aviation internet monitoring system.
Through searching, the interface of the aviation Internet monitoring system in the prior art is complex, the aviation Internet monitoring system does not have an active alarm function, and when the aviation Internet monitoring system detects a fault, a user is required to actively find, so that the workload of the user is improved, and an aviation Internet all-link monitoring alarm system capable of actively alarming is required.
Disclosure of Invention
The invention aims to provide an aviation Internet all-link monitoring alarm system based on grafana, which can actively alarm.
In order to solve the problems, the technical scheme provided by the invention is as follows: the aviation Internet all-link monitoring alarm system based on grafana comprises an alarm collection module, an alarm notification module, an alarm processing module, a disaster tolerance and fault recovery module, a data acquisition module, a data display module based on grafana visualization function, an alarm setting module, a database, a data analysis and visualization tool, a user authority management module, a report module, a fault exercise and simulation module, an automatic operation and maintenance support module, an SLA monitoring and reporting module, a fault tolerance and disaster backup support module, a multi-cluster management module and an expansion module;
the alarm collection module collects alarm information of a server, a switch, a router and a firewall by means of a node_exporter and a win_ exporter, blackbox _ exporter, redis _exporter tool;
the alarm notification module notifies through mails, short messages and nails and displays alarm information on a dashboard of the data display module in a centralized manner;
the alarm processing module is used for identifying and confirming alarm information, determining specific equipment or nodes, analyzing the reason of alarm generation and recording the alarm information, the processing process and the solution;
the disaster recovery and fault recovery module comprises a backup system, redundant equipment and a disaster recovery plan;
the data acquisition module acquires key data indexes in real time through a data source, wherein the key data indexes comprise the state of aviation equipment, service response time and network bandwidth utilization rate;
the data display module graphically displays the acquired data indexes based on the grafana visual function;
the alarm setting module allows a user to configure alarm rules according to own requirements, wherein the alarm rules comprise thresholds and trend changes of key data indexes;
the data analysis and visualization tools allow users to analyze historical data, create trend charts, and statistical charts;
the user authority management module is used for user authority management and can set authority levels of different users or user groups;
the report module is used for generating a report and a custom report template;
the fault drilling and simulating module is used for evaluating the response and recovery capacity of the system under various fault scenes;
the automatic operation and maintenance support module is integrated with the Ansible, puppet automatic operation and maintenance tool to carry out automatic operation and maintenance support;
the SLA monitoring and reporting module sets monitoring indexes according to a Service Level Agreement (SLA) of the aviation Internet system and generates a periodic SLA monitoring report;
the fault-tolerant and disaster-tolerant backup support module automatically switches to the backup node under the condition of main node faults or other anomalies;
the expansion module is used for adding a data source plug-in, an alarm notification plug-in and a data processing plug-in;
the multi-cluster management module supports monitoring and managing multiple aviation Internet clusters.
As an improvement, the alarm information comprises hardware faults, network anomalies and service interruption, and the node_exporter and win_ exporter, blackbox _ exporter, redis _exporter tools collect alarm information from servers, switches, routers and firewalls and classify, filter and process the alarm information.
As an improvement, the alarm level of the alarm notification module comprises serious, high, medium and low, wherein the serious level indicates that the system has serious faults or cannot work normally, and immediate action is needed to be taken for repairing;
high-level representation systems have significant problems or performance degradation that require timely attention and handling;
a mid-level indicates that the system is experiencing general problems or warnings that require later processing;
a low level indicates that a secondary problem or warning is occurring with the system without significantly affecting overall operation.
As an improvement, the processing procedure of the alarm processing module comprises the following steps:
step one, alarming identification, namely identifying and confirming alarming information, and knowing the nature and influence range of a problem;
secondly, alarming and positioning, and determining specific equipment or nodes;
analyzing the alarm, analyzing the reason of the alarm, and adopting proper measures to solve the problem;
alarming is carried out, corresponding repairing measures are adopted according to actual conditions, and normal operation of the system is restored;
and fifthly, recording alarm information, processing procedures and solutions for subsequent analysis and reference.
The data sources include ARINC 429 bus, server log, network traffic as improvements.
The beneficial effects of the invention are as follows: the invention displays the information collected by the alarm collection module through the data display module, is convenient for users to check the running condition of the aviation system at any time, can directly and intensively display the alarm notification of the alarm notification module, and the alarm notification module is notified through mails, short messages and nails, so as to realize active alarm, facilitate the users to notice the alarm of the invention in time, facilitate the timely processing, and realize the automatic configuration, deployment and monitoring of the aviation Internet system through the automatic operation and maintenance support module, thereby improving the operation and maintenance efficiency and the system stability.
Detailed Description
Preferred embodiments of the present invention are described in detail below.
The aviation Internet all-link monitoring alarm system based on grafana comprises an alarm collection module, an alarm notification module, an alarm processing module, a disaster tolerance and fault recovery module, a data acquisition module, a data display module based on grafana visualization function, an alarm setting module, a database, a data analysis and visualization tool, a user authority management module, a report module, a fault exercise and simulation module, an automatic operation and maintenance support module, an SLA monitoring and reporting module, a fault tolerance and disaster backup support module, a multi-cluster management module and an expansion module;
the alarm collection module collects alarm information of a server, a switch, a router and a firewall by means of a node_exporter and a win_ exporter, blackbox _ exporter, redis _exporter tool, wherein the alarm information comprises hardware faults, network anomalies and service interruption, and the node_exporter and win_ exporter, blackbox _ exporter, redis _exporter tools collect alarm information from the server, the switch, the router and the firewall and classify, filter and process the alarm information;
the alarm notification module notifies through mail, short message and nail and displays the alarm information on the instrument board of the data display module, the alarm level of the alarm notification module comprises serious, high, medium and low,
the severity level indicates that the system has serious faults or cannot work normally and needs to be immediately repaired by taking action;
high-level representation systems have significant problems or performance degradation that require timely attention and handling;
a mid-level indicates that the system is experiencing general problems or warnings that require later processing;
low level indicates that a secondary problem or warning is present with the system without significantly affecting overall operation;
the alarm processing module is used for identifying and confirming alarm information, determining specific equipment or nodes, analyzing the reason of alarm generation and recording the alarm information, the processing process and the solution, and the processing process of the alarm processing module comprises the following steps:
step one, alarming identification, namely identifying and confirming alarming information, and knowing the nature and influence range of a problem;
secondly, alarming and positioning, and determining specific equipment or nodes so as to solve the problem in a targeted manner;
analyzing the alarm, analyzing the reason of the alarm, and adopting proper measures to solve the problem;
alarming is carried out, corresponding repairing measures are adopted according to actual conditions, and normal operation of the system is restored;
fifthly, recording alarm information, a processing process and a solution, so as to facilitate subsequent analysis and reference;
the disaster recovery and fault recovery module comprises a backup system, redundant equipment and a disaster recovery plan so as to ensure that the system can recover normal operation as soon as possible even if a fault or interruption occurs;
the data acquisition module acquires key data indexes in real time through a data source, wherein the data source comprises an ARINC 429 bus, a server log and network traffic, and the key data indexes comprise the state of aviation equipment, service response time and network bandwidth utilization rate;
the data display module graphically displays the acquired data indexes based on the grafana visual function so that a user can intuitively know the running condition of the aviation Internet system, and the user can customize a dashboard to flexibly configure the required monitoring indexes and display modes;
the alarm setting module allows a user to configure alarm rules according to own requirements, wherein the alarm rules comprise threshold values and trend changes of key data indexes, and when abnormal conditions are monitored, the system triggers an alarm and sends a notification to related personnel or team;
the data analysis and visualization tools allow users to analyze historical data, create trend graphs and statistical graphs, and users can get in depth knowledge of the performance conditions of the system, find potential problems and bottlenecks, and make reasonable decisions and improvement measures;
the user authority management module is used for user authority management, can set authority levels of different users or user groups, and can limit the access authority of the user to the monitoring data and the set access authority through the authority management, so that the safety and the privacy of the data are ensured;
the report module is used for generating a report and a custom report template, and a user can summarize and present key performance indexes, alarm event records and trend analysis through the report function, so that decision support and service visualization are provided for a management layer;
the fault drilling and simulating module is used for evaluating the response and recovery capacity of the system under various fault scenes, and improving the emergency response plan and the elasticity of the system by simulating faults and analyzing drilling results;
the automatic operation and maintenance support module is integrated with the Ansible, puppet automatic operation and maintenance tool to carry out automatic operation and maintenance support, and through the automatic operation and maintenance support, the automatic configuration, deployment and monitoring of an aviation system can be realized, and the operation and maintenance efficiency and the system stability are improved;
the SLA monitoring and reporting module sets monitoring indexes according to a Service Level Agreement (SLA) of the aviation Internet system and generates periodic SLA monitoring reports which can help evaluate the performance and availability of the system and share monitoring results with clients or partners to ensure that the system maintains a high service level;
the fault-tolerant and disaster-tolerant backup support module is automatically switched to the backup node under the condition of main node faults or other anomalies, so that high availability and reliability of the monitoring system can be ensured, and normal operation of the monitoring and alarming functions of the system can be maintained even under unpredictable conditions;
the expansion module is used for adding a data source plug-in, an alarm notification plug-in and a data processing plug-in so as to further enhance the monitoring and alarm capabilities of the system;
the multi-cluster management module supports monitoring and managing a plurality of aviation Internet clusters, and can comprehensively monitor the running state of the whole system and determine the performance difference and problems among different clusters by integrating data and indexes of the plurality of clusters.
Furthermore, the grafana-based aviation internet full-link monitoring and warning system supports InfluxDB, elasticsearch, prometheus data source plug-in.
Furthermore, the grafana-based aviation internet full-link monitoring and alarming system saves collected data into a database, and supports the inquiry and analysis of historical data, so that a user can check data trend in a specific time period, compare performance changes in different time periods and find potential problems and improvement spaces according to requirements.
Further, grafana provides a rich instrument board customization function, a user can create a plurality of instrument boards according to specific requirements of the system, each instrument board can contain different monitoring indexes, charts and data views so as to meet specific monitoring requirements of the user on each component of the system, based on the flexibility of Grafana, a plurality of dimensions of the aviation internet system, including hardware equipment, application programs and network topology, can be monitored, and the user can comprehensively know the overall performance of the system and the relation among links by comprehensively monitoring the dimensions;
furthermore, the grafana-based aviation internet all-link monitoring alarm system supports alarm strategies and rules with flexible configuration, can be optimized according to specific requirements, and can be used for setting different alarm levels, durations and alarm notification conditions by a user so as to adjust the sensitivity and importance of alarms according to different types of problems.
Furthermore, the grafana-based aviation Internet all-link monitoring and warning system supports running on cloud services, server clusters or containerized environment platforms, and can select a deployment mode suitable for the user according to actual requirements, so that high availability and elastic expansion are realized.
Furthermore, based on historical data and real-time data, the grafana-based aviation Internet all-link monitoring and alarming system can execute abnormal trend analysis, identify abnormal events and potential faults in the aviation Internet system, and prevent faults and optimize performance of the system in advance by analyzing abnormal trends.
Working principle: the system supports operation on cloud service, server clusters or containerized environment platforms, can select a deployment mode suitable for users according to actual demands, realizes high availability and elastic expansion, can optimize according to specific demands, can set different alarm levels, durations and alarm notification conditions so as to adjust the sensitivity and importance of alarms according to different types of problems, displays information collected by an alarm collection module through a data display module, facilitates users to check the operation condition of an aviation system at any time, can directly and intensively display alarm notification of an alarm notification module, and realizes active alarm through mails, short messages and nails, thereby facilitating the users to timely notice the alarm of the invention so as to timely process the alarm, and can realize automatic configuration, deployment and monitoring of the aviation Internet system, improve the operation efficiency and system stability, identify abnormal events and potential faults in the aviation Internet system by executing abnormal trend analysis, prevent the faults in advance and optimize the performance of the system, and reduce the work load of the aviation Internet system with great user time.
The invention and its embodiments have been described above without limitation, and the actual construction is not limited thereto. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims (5)

1. The aviation Internet all-link monitoring and alarming system based on grafana is characterized in that: the system comprises an alarm collection module, an alarm notification module, an alarm processing module, a disaster recovery and fault recovery module, a data acquisition module, a data display module based on grafana visualization function, an alarm setting module, a database, a data analysis and visualization tool, a user authority management module, a report module, a fault drilling and simulation module, an automatic operation and maintenance support module, an SLA monitoring and reporting module, a fault tolerance and disaster backup support module, a multi-cluster management module and an expansion module;
the alarm collection module collects alarm information of a server, a switch, a router and a firewall by means of a node_exporter and a win_ exporter, blackbox _ exporter, redis _exporter tool;
the alarm notification module notifies through mails, short messages and nails and displays alarm information on a dashboard of the data display module in a centralized manner;
the alarm processing module is used for identifying and confirming alarm information, determining specific equipment or nodes, analyzing the reason of alarm generation and recording the alarm information, the processing process and the solution;
the disaster recovery and fault recovery module comprises a backup system, redundant equipment and a disaster recovery plan;
the data acquisition module acquires key data indexes in real time through a data source, wherein the key data indexes comprise the state of aviation equipment, service response time and network bandwidth utilization rate;
the data display module graphically displays the acquired data indexes based on the grafana visual function;
the alarm setting module allows a user to configure alarm rules according to own requirements, wherein the alarm rules comprise thresholds and trend changes of key data indexes;
the data analysis and visualization tools allow users to analyze historical data, create trend charts, and statistical charts;
the user authority management module is used for user authority management and can set authority levels of different users or user groups;
the report module is used for generating a report and a custom report template;
the fault drilling and simulating module is used for evaluating the response and recovery capacity of the system under various fault scenes;
the automatic operation and maintenance support module is integrated with the Ansible, puppet automatic operation and maintenance tool to carry out automatic operation and maintenance support;
the SLA monitoring and reporting module sets monitoring indexes according to a Service Level Agreement (SLA) of the aviation Internet system and generates a periodic SLA monitoring report;
the fault-tolerant and disaster-tolerant backup support module automatically switches to the backup node under the condition of main node faults or other anomalies;
the expansion module is used for adding a data source plug-in, an alarm notification plug-in and a data processing plug-in;
the multi-cluster management module supports monitoring and managing multiple aviation Internet clusters.
2. The grafana-based aviation internet all-link monitoring and warning system according to claim 1, wherein: the alarm information comprises hardware fault, network abnormality, service interruption, node_exporter, win_exporter,
The blackbox exporter, redis exporter tool gathers and sorts, filters and processes alarm information from servers, switches, routers, firewalls.
3. The grafana-based aviation internet all-link monitoring and warning system according to claim 1, wherein: the alarm levels of the alarm notification module include severe, high, medium, low,
the severity level indicates that the system has serious faults or cannot work normally and needs to be immediately repaired by taking action;
high-level representation systems have significant problems or performance degradation that require timely attention and handling;
a mid-level indicates that the system is experiencing general problems or warnings that require later processing;
a low level indicates that a secondary problem or warning is occurring with the system without significantly affecting overall operation.
4. The grafana-based aviation internet all-link monitoring and warning system according to claim 1, wherein: the processing procedure of the alarm processing module comprises the following steps:
step one, alarming identification, namely identifying and confirming alarming information, and knowing the nature and influence range of a problem;
secondly, alarming and positioning, and determining specific equipment or nodes;
analyzing the alarm, analyzing the reason of the alarm, and adopting proper measures to solve the problem;
alarming is carried out, corresponding repairing measures are adopted according to actual conditions, and normal operation of the system is restored;
and fifthly, recording alarm information, processing procedures and solutions for subsequent analysis and reference.
5. The grafana-based aviation internet all-link monitoring and warning system according to claim 1, wherein: the data sources include ARINC 429 bus, server log, network traffic.
CN202311346966.6A 2023-10-18 2023-10-18 Aviation Internet all-link monitoring and alarming system based on grafana Pending CN117411801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311346966.6A CN117411801A (en) 2023-10-18 2023-10-18 Aviation Internet all-link monitoring and alarming system based on grafana

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311346966.6A CN117411801A (en) 2023-10-18 2023-10-18 Aviation Internet all-link monitoring and alarming system based on grafana

Publications (1)

Publication Number Publication Date
CN117411801A true CN117411801A (en) 2024-01-16

Family

ID=89497337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311346966.6A Pending CN117411801A (en) 2023-10-18 2023-10-18 Aviation Internet all-link monitoring and alarming system based on grafana

Country Status (1)

Country Link
CN (1) CN117411801A (en)

Similar Documents

Publication Publication Date Title
CN102447570B (en) Monitoring device and method based on health degree analysis
US6327677B1 (en) Method and apparatus for monitoring a network environment
Lim et al. A log mining approach to failure analysis of enterprise telephony systems
US6856942B2 (en) System, method and model for autonomic management of enterprise applications
US7500142B1 (en) Preliminary classification of events to facilitate cause-based analysis
CA2983306C (en) System and method for handling events involving computing systems and networks using fabric monitoring system
CN107947998B (en) Real-time monitoring system based on application system
CN103166788B (en) A kind of collection control Control management system
CN103716173A (en) Storage monitoring system and monitoring alarm issuing method
CN109614283A (en) The monitoring system of distributed experiment & measurement system
CN112688819A (en) Comprehensive management system for network operation and maintenance
CN104574219A (en) System and method for monitoring and early warning of operation conditions of power grid service information system
CN108390907B (en) Management monitoring system and method based on Hadoop cluster
US20160191359A1 (en) Reactive diagnostics in storage area networks
CN114244676A (en) Intelligent IT integrated gateway system
KR20220092680A (en) Apparatus and method for deep learning based failure prediction in intelligent integrated control system
CN110784352B (en) Data synchronous monitoring and alarming method and device based on Oracle golden gate
US7120633B1 (en) Method and system for automated handling of alarms from a fault management system for a telecommunications network
CN110609761B (en) Method and device for determining fault source, storage medium and electronic equipment
CN117411801A (en) Aviation Internet all-link monitoring and alarming system based on grafana
Pinzón et al. Real-time health condition monitoring of SCADA infrastructure of power transmission systems control centers
KR100887874B1 (en) System for managing fault of internet and method thereof
CN101640603A (en) Active remote network management system
CN116151787A (en) IT operation and maintenance management system
CN113986645A (en) Multi-server distributed monitoring system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination