CN113949615A

CN113949615A - Method for realizing dynamically-perceivable network topology of fault based on zabbix and grafana

Info

Publication number: CN113949615A
Application number: CN202111152810.5A
Authority: CN
Inventors: 王长海; 周开制; 周铮; 黄中章; 兰建华; 张昕; 罗海宇; 吴宇昊; 陈科先; 蒋发俊
Original assignee: Guangxi Communications Design Group Co Ltd
Current assignee: Guangxi Communications Design Group Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-18

Abstract

The invention provides a method for realizing fault dynamic perception of network topology based on zabbix and Grafana, belonging to the field of network equipment and server monitoring. The invention monitors the running states of the servers and the network equipment in real time, the custom shell script obtains the current alarm number of each server and network equipment through the database, realizes dynamic perception of equipment fault alarm through a network topological graph, automatically discovers the fault and gives alarm prompt in the network topology, quickly positions the fault information from the global view and improves the operation and maintenance efficiency.

Description

Method for realizing dynamically-perceivable network topology of fault based on zabbix and grafana

Technical Field

The invention relates to the field of network equipment and server monitoring, in particular to a method for realizing dynamic fault perception network topology based on zabbix and grafana.

Background

At present, the construction of a data center network generally includes devices of different manufacturers, devices of the same manufacturer have different models, and devices of the same model are upgraded along with the devices, the versions of the operating systems are different, the information provided by the operating systems is different, the support degree of the SNMP is different, a monitoring system which can integrate network devices of different manufacturers and servers of different operating systems and can display fault information of the network devices through a network topology map is lacked, and thus great difficulty is brought to operation and maintenance management work.

Disclosure of Invention

The invention aims to provide a method for realizing dynamic fault perception of a network topology based on zabbix and grafana, and solve the technical problem that a server cannot display fault information of the server through a network topology.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the method for realizing the dynamic fault perception network topology based on zabbix and grafana comprises the following steps: ,

step 1: the server installs an agent to acquire monitoring information of the server, and the network equipment starts an SNMP protocol;

step 2: the zabbix server acquires monitoring data of the server and the network equipment through an agent and an SNMP protocol, and stores the monitoring data in a database;

and step 3: self-defining a shell script and a monitoring item, and acquiring an alarm number monitoring item of each monitored device;

and 4, step 4: drawing a network topological graph according to an actual network architecture by using a drawio tool;

and 5: and the Grafana uses zabbix as a data source, configures FlowCharting plug-in and drawio equipment element ID mapping, and shows the equipment running state on the network topology diagram.

Further, the specific process of step 1 is to install a zabbix-agent on the linux server and the windows server, modify a zabbix-agent. conf configuration file, configure an IP address of the zabbix-server and an IP address of the server itself, configure a read group of the SNMP protocol on the network device, and configure a read group name and version of the switch.

Further, the specific process in step 2 is to configure a server associated Template OS Linux Template of the Linux operating system, a server associated Template OS Windows Template of the Windows system, and a network device associated Template Net Huawei VRP SNMPv2 Template in the zabbix-server, and after completing the Template association, obtain monitoring information of the flow operating states of the CPU, the memory, the disk, and the network card of the server and monitoring information of the flow operating states of the CPU, the memory, the fan, and the interface of the network device, and transmit the monitoring information to the server for storage.

Further, the specific process in step 3 is that, because the default monitoring template of zabbix-server has no monitoring items of the server and the current alarm number of the network device, the custom shell script needs to be configured to obtain the current alarm number of the network device and the server through the database, and the custom shell script 1 is configured first: and (3) get _ zabbix _ host.sh, acquiring all the monitored network equipment and server names, and then configuring a custom shell script 2: and zabbix _ host _ schemes.sh, and acquiring the current alarm number of the monitored network equipment and the server equipment.

Further, the specific process of obtaining the current alarm count of the monitored network device and the server device is adding in a configuration file zabbix _ agentd.conf of zabbix-server: the method comprises the steps of configuring a custom monitoring template in a zabbix-server, and associating the custom monitoring template to the zabbix-server, so that current alarm data of each monitoring device can be obtained.

Further, the specific process of step 4 is to install and use a drawio tool, draw the interconnection relationship between the network device and the server according to the actual network topology, and generate a unique element ID for each device on the drawio tool.

Further, the specific process of step 5 is,

step 5.1: using a FlowCharting plug-in of grafana, configuring a data source as zabbix, and setting a monitoring item as the alarm number of each monitoring device;

step 5.2: importing the network topology primitive code of the drawio drawing into a Source Content frame of FlowCharting;

step 5.3: mapping the configured equipment element ID and the alarm number monitoring item;

step 5.4: and repeating the step 5.3, configuring the mapping relation between all equipment elements on the topological graph and the alarm number monitoring items of the equipment corresponding to zabbix, displaying red for abnormity and flashing warning identification when the equipment has an alarm, displaying green for normal when the equipment does not have the alarm, and realizing the function of dynamically sensing the global fault through the equipment color and the warning on the network topological graph.

Further, the specific process of step 5.1 is,

step 5.1.1: selecting a data source as zabbix;

step 5.1.2: configuring a group name of the monitoring equipment;

step 5.1.3: configuring an application set of a monitoring device;

step 5.1.4: configuring a host name of the monitoring equipment;

step 5.1.5: and configuring a monitoring item, and expressing the monitoring item by using a regular expression.

Further, the specific process of step 5.2 is,

step 5.2.1: configuring a display graph as FlowCharting;

step 5.2.2: configuring an access address of the drawio;

step 5.2.3: source Content is imported into the network topology element code at draoiw.

Further, the specific process of step 5.3 is,

step 5.3.1: self-defining a rule name;

step 5.3.2: configuring the rule as a monitoring item for mapping the alarm number of the equipment;

step 5.3.3: configuring an alarm number threshold value, and enabling the equipment to be bright green when the alarm number is 0;

step 5.3.4: configuring an alarm number threshold value, and lightening red when the alarm number of the equipment is more than 1;

step 5.3.5: when an alarm is configured, the warning mark flickers for reminding;

step 5.3.6: and configuring an element ID mapping rule, wherein the color of the equipment element is consistent with the color set by the alarm threshold, and the equipment element is green when no alarm exists and red when an alarm exists.

The zabbix-server monitors the running states of the servers and the network equipment in real time by using an interface of a zabbix-agent program and an snmp interface, configures a custom shell script to obtain the current alarm number of each server and the network equipment through a database, and configures a custom monitoring template to be associated with the zabbix-server to generate a monitoring item of the current alarm number of each equipment; the method comprises the steps of using a drawio tool to depict a network topology of an interconnection relation between network equipment and server equipment, enabling each server and the network equipment to generate a unique element ID on a drawio network topology graph, leading all element codes on the network topology graph into a FlowCharting plugin of grafana, configuring a mapping relation between each equipment element ID and a corresponding equipment alarm number monitoring item monitored by zabbix in the network topology in the FlowCharting, configuring alarm threshold values and color alarms corresponding to the elements, enabling the equipment elements to turn red and have alarm flashing prompts when the alarm number generated by the equipment exceeds the threshold values, and achieving dynamic perception of equipment fault alarms through the network topology graph.

Due to the adoption of the technical scheme, the invention has the following beneficial effects:

the invention monitors the running states of the servers and the network equipment in real time, the custom shell script obtains the current alarm number of each server and network equipment through the database, realizes dynamic perception of equipment fault alarm through a network topological graph, automatically discovers the fault and gives alarm prompt in the network topology, quickly positions the fault information from the global view and improves the operation and maintenance efficiency.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a device topology diagram of the present invention;

FIG. 3 is a network topology of the present invention as drawn at drawio;

FIG. 4 is a graph of alarm counts for the monitoring device of the present invention;

FIG. 5 is a diagram of the network topology element code import FlowCharting internal display of the present invention;

FIG. 6 is a diagram of the device alarm display of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings by way of examples of preferred embodiments. It should be noted, however, that the numerous details set forth in the description are merely for the purpose of providing the reader with a thorough understanding of one or more aspects of the present invention, which may be practiced without these specific details.

The zabbix-agent is deployed on the linux server and the windows server local machine, and is used for collecting performance index data such as a server CPU, a memory, a disk, a network card and the like and sending the performance index data to the zabbix-server;

network equipment starts an snmp protocol, and a zabbix-server acquires performance index data such as a CPU (central processing unit), a memory, interface traffic and the like of a network through the snmp protocol;

the database is used for storing the monitoring data of the network and the server equipment acquired by the zabbix-server;

grafana is an open-source data visualization tool, provides various plugins, supports various data source access, and provides a visual display of monitoring data;

a drawio drawing tool for drawing a network topological graph, wherein a device icon on the network topological graph generates a unique element ID, and element codes are exported and imported into FlowCharting of grafana, so that the same network topology as that drawn by the drawio can be shown in the grafan;

As shown in fig. 1-6, a method for implementing a dynamically fault-aware network topology based on zabbix and grafana includes the following steps:

1. install zabbix-agent on linux server and windows server, modify zabbix-agent. conf configuration file:

Server＝172.20.101.9

Hostname＝172.20.101.10

note that: the Server is the IP address of the zabbix-Server, and the Hostname is the IP address of the local Server.

2. Configuring a snmp reading group in a network device, taking Hua as an example of switch configuration:

snmp-agent community read cipher public

snmp-agent sys-info version v2c

note that: the SNmp reading group is named public and the version is v2 c.

3. Configuring a server associated Template OS Linux Template of a Linux operating system, a server associated Template OS Windows Template of a Windows system and a network device associated Template Net Huawei VRP SNMPv2 Template in the zabbix-server, and acquiring monitoring information of running states of a server CPU, a memory, a disk, a network card flow and the like and monitoring information of running states of the network device CPU, the memory, a fan, an interface flow and the like after completing Template association.

4. Because the default monitoring template of zabbix-server has no monitoring items of the server and the current alarm number of the network equipment, the custom shell script needs to be configured to obtain the current alarm number of the network equipment and the server through the database.

4.1, configuring a custom shell script 1: and (6) get _ zabbix _ host.sh, and acquiring names of all monitored network devices and servers, wherein the codes are as follows:

executing the shell script 1 code sh get _ zabbix _ host.sh, and outputting data in json format as follows:

note that: { # HOST } is a variable and the value is the name of the monitored device.

4.2, configuring a custom shell script 2: zabbix _ host _ schemes.sh, obtaining the current alarm number of the monitored network equipment and the monitored server equipment, wherein the code is as follows:

/usr/bin/mysql-h 127.0.0.1-uzabbix-pzabbix zabbix-s-e'select count(*)from items a leftjoin hosts as b on a.hostid＝b.hostid where a.itemid in(SELECT itemid FROM functions where triggerid in(select t riggerid from triggers where value＝1 and status<>1))and b.name＝"'"$1"'"\G'-N 2>/dev/null|tail-1

note that: after the code 1 is executed, the value of a data variable { # HOST } is the name of the corresponding monitored device, the value of { # HOST } is used as a value of a parameter $1 and is transmitted into a script 2 for execution, and the alarm quantity currently existing in the corresponding device is obtained, for example, the code is executed: sh zabbix _ host _ schemes.sh "linux server" obtains the current alarm number of the linux server.

Adding the following components in a configuration file zabbix _ agentd.conf of zabbix-server: configuring a custom monitoring template in the zabbix-server and associating the custom monitoring template to the zabbix-server, so as to obtain the current alarm data of each monitoring device, wherein the code of the custom monitoring template is as follows:

6, installing and using a drawio tool, drawing an interconnection relationship between the network device and the server according to an actual network topology, and generating a unique element ID for each device on the drawio, for example, a network topology drawing on the drawio in fig. 3, where the element code is:

the method includes that (mxgraphsodel dx is equal to "782" dy ═ 579"grid ═"0"grid size ═"10"guides ═"1"tooltips ═"1"connect ═"1 "rows ═"1"fold ═"1"page ═"1"pageScale ═"1"pageWidth ═"827"pageHeight ═"1169"math ═"0"shadow ═"0 "< rotation >, < mxCell id > ═ 0"/> < mxCell ═ 1 "ent ═ 0"/> < mxCell id ═ YJd9XUj9 ═ lwujsnsen-13 "value mgel-13" network equipment "style ═" text ═ text; html 1; strokeColor ═ none; fillColor ═ none; align is center; verticlalign ═ middle; white space ═ wrap; round is 0; sketch is 0; "parent" 1"vertex ═ 1" > < mxGeometry x ═ 241"y ═"331.5"width ═"70"height ═"31"as ═" geometry "/> < mxCell id ═" YJd9XUj9 eclvujsnmgel-15 "value ═" windows server "style ═" text; html 1; strokeColor ═ none; fillColor ═ none; align is center; verticlalign ═ middle; white space ═ wrap; round is 0; sketch is 0; "parent" 1"vertex ═ 1" > < mxGeometry x ═ 155"y ═ 496" width ═ 95"height ═ 25" as ═ geometry "/> < mxCell > < mxceuid ═ YJd9XUj9 ecllux jsnmgel-18" value ═ linux server "style ═ text; html 1; strokeColor ═ none; fillColor ═ none; align is center; verticlalign ═ middle; white space ═ wrap; round is 0; sketch is 0; "parent" ("1" vertex ═ 1"> < mxGeometry x ═ 314" y ═ "495.5" width ═ "95" height ═ "23" as ═ "geometry"/> < mxCell id ═ "YJd9XUj9 ecllux jsnmgel-19" value ═ "style ═" shape ═ partialRectant; white space ═ wrap; html 1; bottom ═ 1; right 1; left is 1; top is 0; fillColor ═ none; routingcentrx ═ 0.5; sketch is 0; stroke Width is 3; rotation-180; "parent" ("1" vertex ═ 1"> < mxGeometry x ═ 291" y ═ "443" width ═ "130" height ═ "27" as ═ "geometry"/> < mxCell id ═ "YJd9XUj9ec lwujsnmmel-20" value ═ "line; stroke Width is 2; direction ═ south; html 1; sketch is 0; "parent" ("1" vertex ═ 1"> < mxGeometry x ═ 354.5" y ═ "390" width ═ "10" height ═ "54" as ═ "geometry"/> < mxCell id ═ "2" value ═ and "style ═ outlineconnected ═ 0; dashed is 0; vertical laboratory position is bottom; vertical align is to p; align is center; html 1; shape is mxgraph, aws3, conditional _ server; fillColor #7D7C 7C; gradientColor ═ none; "parent" ("1" vertex ═ 1"> < mxGeometry x ═ 268" y ═ 473"width ═"46.5"height ═"63"as ═ geometriy"/> < mxCell id ═ "3" value ═ and "style ═ outlineconnected ═ 0; dashed is 0; vertical laboratory position is bottom; vertical align is to p; align is center; html 1; shape is mxgraph, aws3, conditional _ server; fillColor #7D7C 7C; gradientColor ═ none; "parent" ("1" vertex ═ 1"> < mxGeometry x ═ 406" y ═ 473"width ═ 46.5" height ═ "63" as ═ geometriy "/> < mxCell id ═"4"value ═ and" style ═ mxgraph. html 1; pointerEvents ═ 1; dashed is 0; fillColor # 036897; strokeColor # ffffff; stroke Width is 2; vertical laboratory position is bottom; vertical align top; align is center; outlineconnected ═ 0; "parent" ("1" vertex "(" 1 ") > < mx geometry x" ("330" y "(" 323 "width" ("58") height ═ 67 "as" ("geometry"/> "/mxCell > </root >) </mx graph model >")

Using FlowCharting plugin of grafana, configuring data source to be zabbix, and monitoring item is alarm number of each monitoring device, as shown in fig. 4:

step 1: selecting a data source as zabbix;

step 2: configuring a group name of the monitoring equipment;

and step 3: configuring an application set of a monitoring device;

and 4, step 4: configuring a host name of the monitoring equipment;

and 5: and configuring a monitoring item, and expressing the monitoring item by using a regular expression.

Importing the network topology primitive code of the drawio drawing into a Source Content frame of FlowCharting, as shown in FIG. 5:

step 1: configuring a display graph as FlowCharting;

step 2: configuring an access address of the drawio;

and step 3: source Content is imported into the network topology element code at draoiw.

The configured device element ID is mapped to the alarm count monitoring entry, as illustrated by the configuration entry of fig. 6:

step 1: self-defining a rule name;

step 2: configuring the rule as a monitoring item for mapping the alarm number of the equipment;

and step 3: configuring an alarm number threshold value, and enabling the equipment to be bright green when the alarm number is 0;

and 4, step 4: configuring an alarm number threshold value, and lightening red when the alarm number of the equipment is more than 1;

and 5: when an alarm is configured, the warning mark flickers for reminding;

step 6: and (4) configuring an element ID mapping rule, wherein the color of the equipment element is consistent with the color set by the alarm threshold value (green when no alarm exists, and red when an alarm exists).

10. And (4) repeating the step 9, configuring the mapping relation between all the equipment elements on the topological graph and the alarm number monitoring items of the equipment corresponding to zabbix, and finally displaying the effect as shown in fig. 6, wherein when the equipment has an alarm, the equipment displays red for abnormity and has a flashing warning mark for reminding, and when the equipment does not have the alarm, the equipment displays green for normal, and the global fault dynamic perception function is realized by the equipment color and the warning on the network topological graph.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.

Claims

1. The method for realizing the dynamic fault perception network topology based on zabbix and grafana is characterized by comprising the following steps: ,

2. The zabbix and grafana-based method for implementing failure-aware dynamic network topology according to claim 1, wherein: the specific process of the step 1 is that a zabbix-agent is installed on a linux server and a windows server, a zabbix-agent. conf configuration file is modified, an IP address of the zabbix-server and an IP address of a local server are configured, a reading group of an SNMP protocol is configured on network equipment, and a reading group name and version of an exchanger are configured.

3. The zabbix and grafana-based method for implementing failure-aware dynamic network topology according to claim 1, wherein: the specific process of step 2 is that a server associated Template OS Linux Template of the Linux operating system, a server associated Template OS Windows Template of the Windows system and a network device associated Template Net Huawei VRP SNMPv2 Template are configured on the zabbix-server, after the Template association is completed, monitoring information of the flow operation state of the server CPU, the memory, the disk and the network card and monitoring information of the flow operation state of the network device CPU, the memory, the fan and the interface are obtained, and the monitoring information is transmitted to the server for storage.

4. The zabbix and grafana-based method for implementing failure-aware dynamic network topology according to claim 1, wherein: the specific process of the step 3 is that since the default monitoring template of the zabbix-server has no monitoring items of the server and the current alarm number of the network device, the custom shell script needs to be configured to obtain the current alarm number of the network device and the server through the database, and the custom shell script 1 is configured first: and (3) get _ zabbix _ host.sh, acquiring all the monitored network equipment and server names, and then configuring a custom shell script 2: and zabbix _ host _ schemes.sh, and acquiring the current alarm number of the monitored network equipment and the server equipment.

5. The zabbix and grafana-based fault dynamically-aware network topology realization method of claim 4, wherein: the specific process of obtaining the current alarm number of the monitored network equipment and the server equipment is adding in a configuration file zabbix _ agentd.conf of zabbix-server: UserParameter & zabbix _ host,/usr/local/bin/get _ zabbix _ host

And configuring a custom monitoring template in the zabbix-server, and associating the custom monitoring template with the zabbix-server, so as to obtain the current alarm data of each monitoring device.

6. The zabbix and grafana-based method for implementing failure-aware dynamic network topology according to claim 1, wherein: the specific process of step 4 is to install and use a drawio tool, draw the interconnection relationship between the network device and the server according to the actual network topology, and generate a unique element ID for each device on the drawio.

7. The zabbix and grafana-based method for implementing failure-aware dynamic network topology according to claim 1, wherein: the specific process of step 5 is that,

8. The zabbix and grafana-based fault dynamically-aware network topology realization method of claim 7, wherein: the specific process of step 5.1 is,