CN112084098A - Resource monitoring system and working method - Google Patents

Resource monitoring system and working method Download PDF

Info

Publication number
CN112084098A
CN112084098A CN202011132356.2A CN202011132356A CN112084098A CN 112084098 A CN112084098 A CN 112084098A CN 202011132356 A CN202011132356 A CN 202011132356A CN 112084098 A CN112084098 A CN 112084098A
Authority
CN
China
Prior art keywords
monitoring
data
cluster
node
index data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011132356.2A
Other languages
Chinese (zh)
Inventor
韩娜
李亦辰
丁艳丽
李鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202011132356.2A priority Critical patent/CN112084098A/en
Publication of CN112084098A publication Critical patent/CN112084098A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based

Abstract

The invention provides a resource monitoring system and a working method, wherein the resource monitoring system comprises: the data acquisition module is used for acquiring monitoring index data of the mixed resources and sending the acquired monitoring index data to the data processing module; the data processing module is used for receiving the monitoring index data sent by the data acquisition module, forming alarm information for the monitoring index data meeting the preset alarm rule by using Prometous, and triggering the alarm of the corresponding resource; and the data storage module is used for storing the monitoring index data received by the data processing module. Prometheus is an open-source monitoring and warning system and is provided with a time sequence database, and the Prometheus is compiled based on Golang and has strong performance; monitoring index collection is carried out through various exporters, and monitoring of various mixed resources such as physical/virtual nodes, network nodes, container clusters, middleware/databases and applications is supported.

Description

Resource monitoring system and working method
Technical Field
The invention relates to the technical field of electronic information monitoring, in particular to a resource monitoring system and a working method.
Background
A conventional IT monitoring system or product can only monitor a single resource or several resources, such as network monitoring, server monitoring, virtual machine monitoring, middleware monitoring, database monitoring, application monitoring, etc. The container is a virtualization technology which has been developed recently, the monitoring scheme aiming at the container in the market is limited, the container cannot be monitored by products which can monitor the traditional system, and the monitoring scheme of the primary container, such as cAdvisor, heapster, metrics-server and the like, cannot be separated from the container. Currently, most enterprise systems are used by mixing a container and a cloud platform virtual machine, and mixing an open source component and a closed source component, but there is no unified product or method capable of simultaneously supporting monitoring of the mixed resources.
Disclosure of Invention
The embodiment of the invention provides a resource monitoring system, which is used for supporting the monitoring of mixed resources and comprises the following components:
the data acquisition module is used for acquiring monitoring index data of the mixed resources and sending the acquired monitoring index data to the data processing module; the data acquisition module comprises: the multi-type index collector exporter and Pushgateway cluster;
the data processing module is used for receiving the monitoring index data sent by the data acquisition module, forming alarm information for the monitoring index data meeting the preset alarm rule by using Prometous, and triggering the alarm of the corresponding resource;
the data storage module is used for storing the monitoring index data received by the data processing module; the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster.
The embodiment of the invention also provides a working method of the resource monitoring system, which is used for supporting the monitoring of the mixed resource and comprises the following steps:
the data acquisition module acquires monitoring index data of the mixed resources and sends the acquired monitoring index data to the data processing module; wherein, the data acquisition module includes: the multi-type index collector exporter and Pushgateway cluster;
the data processing module receives the monitoring index data sent by the data acquisition module, and utilizes Prometous to form alarm information for the monitoring index data meeting a preset alarm rule, so as to trigger the alarm of corresponding resources;
the data storage module stores the monitoring index data received by the data processing module; wherein, the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the working method of the resource monitoring system is realized.
An embodiment of the present invention also provides a computer-readable storage medium, where a computer program for executing the operating method of the resource monitoring system is stored in the computer-readable storage medium.
In the embodiment of the invention, a data acquisition module is arranged to acquire the monitoring index data of the mixed resource and send the acquired monitoring index data to a data processing module, and the data acquisition module comprises: the multi-type index collector exporter and Pushgateway cluster; the data processing module is arranged and used for receiving the monitoring index data sent by the data acquisition module, and warning information is formed on the monitoring index data meeting the preset warning rule by using Prometous to trigger the warning of corresponding resources; through setting up data storage module, the control index data that storage data processing module received, data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster. Prometheus is an open-source monitoring and warning system and is provided with a time sequence database, and the Prometheus is compiled based on Golang and has strong performance; monitoring index collection is carried out through various types of exporters (index collectors), and various resources such as physical/virtual nodes, network nodes, container clusters, middleware/databases, applications and the like are supported, so that the monitoring requirement of mixed resources is met, and the monitoring of the mixed resources is supported.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a resource monitoring system in an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of the data processing module 102 according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a deployment framework of a resource monitoring system in an embodiment of the present invention.
Fig. 4 is a schematic flow chart of the acquisition, processing and storage of the monitoring data in the implementation of the specific application of the present invention.
Fig. 5 is a schematic diagram of a working method of a resource monitoring system in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a resource monitoring system, configured to support monitoring of a hybrid resource, as shown in fig. 1, the resource monitoring system includes:
the data acquisition module 101 is used for acquiring monitoring index data of the mixed resources and sending the acquired monitoring index data to the data processing module; wherein, the data acquisition module includes: the multi-type index collector exporter and Pushgateway cluster;
the data processing module 102 is configured to receive the monitoring index data sent by the data acquisition module, form alarm information for the monitoring index data meeting a predetermined alarm rule by using Prometheus, and trigger an alarm of a corresponding resource;
the data storage module 103 is used for storing the monitoring index data received by the data processing module; wherein, the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster.
In the embodiment of the invention, a data acquisition module is arranged to acquire the monitoring index data of the mixed resource and send the acquired monitoring index data to a data processing module, and the data acquisition module comprises: the multi-type index collector exporter and Pushgateway cluster; the data processing module is arranged and used for receiving the monitoring index data sent by the data acquisition module, and warning information is formed on the monitoring index data meeting the preset warning rule by using Prometous to trigger the warning of corresponding resources; through setting up data storage module, the control index data that storage data processing module received, data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster. Prometheus is an open-source monitoring and warning system and is provided with a time sequence database, and the Prometheus is compiled based on Golang and has strong performance; monitoring index collection is carried out through various types of exporters (index collectors), and various resources such as physical/virtual nodes, network nodes, container clusters, middleware/databases, applications and the like are supported, so that the monitoring requirement of mixed resources is met, and the monitoring of the mixed resources is supported.
The Prometheus is an open-source monitoring alarm system with a time sequence database, is compiled based on Golang, has strong performance, and supports physical/virtual nodes, network nodes, container clusters, middleware/databases and application monitoring index collection through various exporters (called index collectors). For index collection objects which cannot be met by the authorities, the exporter collector can be developed and compiled through multiple language clients provided by the index collection objects, and the monitoring requirements of mixed resources are met.
The multiple types of exporters are responsible for actual acquisition of monitoring indexes of the monitored objects, and the range of the acquired objects of the exporters can be expanded through secondary development.
Pushgateway is an important tool in Prometheus ecology, and different data are collected and collected by Prometheus in a Pull mode when business data are monitored.
In specific implementation, the data processing module 102, as shown in fig. 2, includes:
the Prometheus Server cluster 201 is used for receiving the monitoring index data, carrying out aggregation and filtration pretreatment on the monitoring index data, and then forming alarm information by the monitoring index data meeting the preset alarm rule and sending the alarm information to an alarm module alert of Prometheus;
the Consul cluster 202 is configured to perform service registration and discovery for the monitoring target, so that the promemeus Server cluster establishes an HTTP channel for the monitoring target registered in the Consul and remotely receives monitoring index data in a Pull manner.
An alarm module alert manager203 of Prometheus is configured to receive alarm information, perform alarm storage, deduplication, and suppression, and trigger hook function operation to implement alarm on corresponding resources.
The Prometheus Server cluster comprises a plurality of Prometheus servers (monitoring centers), wherein each three Prometheus servers are a group of monitoring centers, each group of monitoring centers is marked as a Worker node, a Primary node is deployed on the upper layer, and the Worker node is connected to the lower part of the Primary node to form a Primary-Worker pyramid mode.
The native Prometheus single Server supports about 1000 ten thousand per second index processing speed, when the index processing speed exceeds the magnitude, a plurality of servers are generally used for forming a pyramid mode scheme of Primary-Worker for horizontal extension, but each Worker is still single-node deployment, and single-point failure is easy to occur. Therefore, the embodiment of the invention adds keepalive to perform multi-copy deployment on each Worker on the basis of the Primary-Worker mode, and simultaneously solves the problems of Server end scale and high availability.
Here, Keepalived is exchange mechanism software, which is software similar to layer3,4&5 exchange mechanism, that is, layer3, layer 4 and layer 5 exchange we say at ordinary times. Keepalived is done automatically without manual intervention. The method mainly provides loadbalancing and high-availability functions, the load balancing realizes a virtual service kernel module (ipv) which needs to depend on Linux, and the high availability realizes the fault transfer service among a plurality of machines through a VRRP protocol.
HAProxy is a free and open source software written in C language that provides high availability, load balancing, and TCP and HTTP based application proxies, supporting virtual hosts.
A Consul cluster is a cluster of three-node Consul. Consul is a distributed, highly available, and horizontally extensible tool for providing service discovery functions for Prometous, that is, a way to automatically discover a monitoring Target, by storing the address, port, and Label information of the monitoring Target (called Target) into Consul, Prometous Server will automatically obtain these information from Consul, and establish an HTTP channel for each Target to perform Pull collection. Consul self-contained clustering scheme, no additional tools are required.
Because the exporter can only collect and transmit the monitoring index data in a Pull mode, and Pushgateway is used for meeting the index which cannot be directly collected in the Pull mode. In a specific embodiment, the PushGateway cluster is specifically configured to:
receiving monitoring index data which is self-pushed by a mixed resource of which the exporter cannot acquire data in a Push mode;
and transmitting the monitoring index data to the Prometheus Server cluster by using a Pull mode.
In specific implementation, the PushGateway cluster is deployed by three nodes, a keepalive is deployed on each node to be responsible for starting a PushGateway process, and a Hasproxy is responsible for load balancing and forwarding; the first node is a Master node, the other two nodes are Backup nodes, and the first node is in a cold standby mode.
In a specific embodiment, the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster. The local TSDB database is a Prometheus self-contained time sequence database and is used for storing data in real time. The monitoring index data is characterized by time property, and each sampling point of the index has a unique time stamp, so the monitoring index data is also called as time sequence data. The time sequence data is characterized by one-time writing and multiple-time reading, stable data flow and time-based latitude of the query mode. Therefore, unlike the relational database which uses a B + tree, the time-series database usually uses an LSM tree, and is characterized by large storage capacity, high data compression ratio (saving storage), high throughput, high concurrency, and the like. The traditional monitoring software usually uses a relational database, such as Oracle or MySQL, and Prometheus uses a time-series database, i.e. a local TSDB database, so that Prometheus can have larger-scale monitoring capability and stronger data acquisition and processing performance.
In a specific embodiment, three Prometheus servers in a group of monitoring centers establish shared storage for TSDB storage by using GPFS, so that Backup (data Backup software) continues to take over storage of monitoring data. Among them, gpfs (general Parallel File system) is the first shared File system of IBM corporation, and originated from the virtual shared disk technology (VSD) used on IBM SP systems. As the core of the technology, GPFS is a parallel disk file system, which ensures that all nodes in a resource group can access the entire file system in parallel; and the service operation aiming at the file system can be simultaneously and safely realized on a plurality of nodes using the file system. GPFS allows customers to share files, which may be distributed on different hard disks in different nodes: it provides many standard UNIX file system interfaces that allow applications to run on it without modification or re-editing.
In a specific embodiment, the remote storage Elasticsearch cluster is specifically configured to:
and writing the monitoring index data into the Elasticissearch cluster through PrometheusBeats as cold data of backup storage.
Wherein, PrometheusBeats is a storage adapter implementing RemoteWrite specification, and is used for writing the sent monitoring data into the Elasticsearch. The PromethesBeats are deployed by three nodes, a Keepalld responsible for starting the process of the PromethesBeats is deployed on each node, and a Haproxy responsible for load balancing and forwarding. The first node is a Master node, the other two nodes are Backup nodes, and the first node is in a cold standby mode.
Wherein, the ElasticSearch is a search server based on Lucene. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in Java and published as open source under the Apache licensing terms, and is currently a popular enterprise-level search engine. The design is used in cloud computing, can achieve real-time search, and is stable, reliable, quick, and convenient to install and use.
In a specific embodiment, in order to increase the functions of data filtering and processing, a layer of Logstash is added between each prometheus beats and elasticsearch. The Logstash is an open-source data collection engine and has real-time pipeline processing capacity. Briefly, logstack is used as a bridge between a data source and a data storage analysis tool, and is combined with elastic search and Kibana, so that the processing and analysis of data can be greatly facilitated. Logstack can accept almost a wide variety of data through more than 200 plug-ins. Including logs, network requests, relational databases, sensors or the internet of things, and the like.
The native Prometheus data storage uses a local TSDB time sequence database, the time sequence database is also deployed in a single node (usually in the same node with a Server, and can also be remotely mounted by using schemes such as NFS/NAS and the like), and a single point of failure is easy to occur. According to the embodiment of the invention, an Adaptor meeting the RemoteWrite standard is used, an Elasticissearch cluster is externally connected to serve as a remote storage of the Adaptor, a storage scheme of using a local TSDB for short-term hot data, using a local TSDB for long-term cold data and using the Elasticissearch cluster for backup is formed, and the problem of the Prometheus storage is solved.
By the secondary development exporter and the high-availability extension scheme in the embodiment of the invention, the requirement of monitoring the mixed resources under the large-scale distributed cluster of an enterprise can be met, and the monitoring of the mixed resources under the large-scale distributed cluster can be realized. Due to the high availability, stability and expandability of the resource monitoring system provided by the embodiment of the invention, the unified and centralized monitoring requirements of various mixed resources, such as servers, network equipment, operating systems, middleware, application programs and container clusters, under an enterprise super-large scale distributed cluster architecture can be met. Meanwhile, all modules of the system are open source component products, no commercial software exists, and a large amount of cost can be saved for enterprises.
A specific example is given below to illustrate how the resource monitoring system is constructed and used to monitor the mixed resources.
As shown in fig. 3, is the deployment framework of this particular example. Wherein:
the middle prometheusServer, namely the monitoring center, is mainly responsible for initiating Pull request acquisition indexes to a monitored object, converging, filtering and calculating whether the indexes meet the alarm rule, and the TSDB, namely the time sequence database, of the prometheusServer is responsible for local data storage. The PrometheusServer is deployed by adopting three nodes, and each node is further deployed with a Keeplived to provide VIP. The first node is a Master node, the other two nodes are Backup nodes and are in a cold standby mode (not started), when the Master is down, the VIP drifts to the second Backup node, the Backup node is upgraded to the Master, and the keep is responsible for starting a Server process to complete the high-availability function of the Server. Three servers use the GPFS to establish shared storage for TSDB storage so that the Backup (data Backup software) continues to take over the storage of the monitoring data. When the Server reaches the bottleneck and needs to be expanded, three monitoring centers are expanded once to form a group, the group of monitoring centers is marked as a Worker node, a Primary node is deployed on the upper layer, and a Worker is connected below the Primary node to form a Primary-Worker pyramid mode.
And the XX _ exporter of the index collector at the lower left is responsible for actually collecting indexes of the monitored objects, and the XX _ exporter is developed for the second time to expand the range of the collected objects. The PushGateway at the upper left is used for meeting the index which cannot be directly acquired by the Pull mode, and the acquisition object pushes the index to the PushGateway by the Push mode and then takes the index away by the PrometheusServer by the Pull. Pushgateway also adopts three-node deployment, and besides the deployment of keepalive to provide a high-availability function, Haproxy is also deployed on each node for load balancing forwarding for flow flattening.
The upper service discovery function uses three nodes, Consul, to form a cluster. The service discovery is a mode of Prometous automatically discovering a monitoring Target, and by storing the address, port and Label information of the monitoring Target (called Target) into Consul, Prometous Server can automatically acquire the information from Consul and establish an HTTP channel for each Target to perform Pull acquisition. Consul self-contained clustering scheme, no additional tools are required.
Below is Prometheus's remote storage cluster Elasticsearch, where the monitor data must be written to ElasticsearcCluster through Prometheus Beats. The PrometesBeats are memory adapters implementing the RemoteWrite specification, which are used for writing the sent monitoring data into the Elasticissearch, and a layer of Logstash is added between each PrometesBeats and the Elasticissearch in order to increase the functions of data filtering and processing. PrometheusBeats adopts three-node deployment, and high availability and load balance are realized through Haproxy + Keepallved. The elastic search itself is a multi-node cluster, and can be extended in a horizontal direction in theory as a backup storage scheme for cold data.
The upper right is alert manager, which is the alarm module of promemeus. The PrometeUSServer sends the alarm information meeting the predefined Rule to the Alertmanager, the Alertmanager stores the alarm and can trigger hook operation to realize sending mail, short message or WeChat notice. The Alertmanager supports the three-node cluster deployment and provides the duplicate removal and suppression of the alarm.
And various clients supported by Prometheus are arranged at the lower right, including self-contained WebUI, Grafana graphic clients and ClientSDK of various languages, so that a developer can perform secondary development and extension based on Prometheus conveniently.
After the components such as the Prometeus Server, Pushgateway, elastic search, Alertmanager and the like are deployed, the components can be respectively expanded horizontally when the capacity is insufficient in the later period. In the specific maintenance process, monitoring data and alarm data stored in the TSDB, the Alertmanager and the elastic search need to be cleaned regularly.
Further, the Prometous Server may also apply Thanos to support high availability configurations in addition to the Keepalived high availability scheme. Remote storage in addition to the Elasticsearch cluster, there are several schemes, such as storage to TiKV by means of a tiprometus adapter.
The flow of collecting, processing and storing the monitoring data by using the deployment framework is shown in fig. 4, and includes:
s1, registering the monitoring target in a service registration center Consul (namely, registering an XX _ exporter address), and marking an IP, a port and a custom Label;
s2, collecting local monitoring indexes by an XX _ exporter deployed in a monitoring target, starting an HTTP Server by the XX _ exporter, and waiting for timed collection by a Prometheus Server;
s3, the PrometeussServer acquires a monitoring target address (namely an XX _ exporter address) from Consul, periodically initiates an HTTP request to the XX _ exporter, converges, filters and calculates whether the acquired indexes meet an alarm rule, firstly writes monitoring index data into a local TSDB, and simultaneously writes another monitoring index data into a remote storage cluster Elasticisearch for storage according to a configured remote storage adapter address;
s4, forming alarm information with a fixed format for the data meeting the alarm rule, sending the alarm information to an Alertmanager, finishing alarm storage, duplicate removal and suppression by the Alertmanager, and triggering alarm modes of sending short messages, mails, WeChat and the like to inform a user according to the configured hook rule;
and S5, the user can use Grafana to check the monitoring performance data curve and know the monitoring details.
By setting a Primary-Worker pyramid mode and a keepalive high-availability mode of the Promultimedia Server, the requirements of expansibility, stability, availability and performance are considered, and the requirement of an enterprise super-large-scale distributed cluster is met. By setting a local TSDB hot data storage + Elasticissearch cluster cold data backup remote storage scheme of Prometheus, long-term storage and high data availability guarantee of monitoring data are met. The index collection range is expanded through secondary development of XX _ exporter, and the index + PushGateway is collected through a Pull mode to carry out Push indexes, so that the unified and centralized monitoring of various mixed resources (such as servers, operating systems, application programs and containers) is met, and various enterprise network environments (some systems are hidden behind firewalls, servers cannot access and cannot carry out Pull collection at the same time, and at the moment, the proxy nodes can be set to serve as pushgateways to carry out centralized uploading) and application environments (the indexes are actively pushed out to the pushgateways in the application programs) are considered.
The implementation of the above specific application is only an example, and the rest of the embodiments are not described in detail.
Based on the same inventive concept, an embodiment of the present invention further provides a working method of a resource monitoring system, and since the principle of the problem solved by the working method of the resource monitoring system is similar to that of the resource monitoring system, the implementation of the working method of the resource monitoring system can refer to the implementation of the resource monitoring system, and repeated parts are not repeated, and the working method is shown in fig. 5 and includes:
step 501: the data acquisition module acquires monitoring index data of the mixed resources and sends the acquired monitoring index data to the data processing module; wherein, the data acquisition module includes: the multi-type index collector exporter and Pushgateway cluster;
step 502: the data processing module receives the monitoring index data sent by the data acquisition module, and utilizes Prometous to form alarm information for the monitoring index data meeting a preset alarm rule, so as to trigger the alarm of corresponding resources;
step 503: the data storage module stores the monitoring index data received by the data processing module; wherein, the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster.
In a specific embodiment, the step 502 specifically implements a process including:
the Consul cluster performs service registration and discovery for the monitoring target, so that the Prometheus Server cluster establishes an HTTP channel for the monitoring target registered in the Consul cluster and remotely receives monitoring index data in a Pull mode;
the Prometheus Server cluster receives the monitoring index data, performs aggregation and filtering preprocessing on the monitoring index data, and then forms alarm information to send to an alarm module alert of Prometheus;
and an alarm module Alertmanager of Prometous receives the alarm information, performs alarm storage, duplicate removal and suppression, and triggers hook operation to realize alarm on corresponding resources.
In specific implementation, the PushGateway cluster receives monitoring index data which is self-pushed by a mixed resource of which the exporter cannot acquire data in a Push mode; and transmitting the monitoring index data to the Prometheus Server cluster by using a Pull mode.
In a specific embodiment, when the remote storage Elasticsearch cluster is used for storage, the monitoring index data is written into the Elasticsearch cluster through prometheus tables to serve as cold data for backup storage. Wherein, PrometousBeats is a storage adapter implementing the RemoteWrite specification;
the embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the working method of the resource monitoring system is realized.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program for executing the working method of the resource monitoring system is stored.
In summary, the resource monitoring system and the working method provided by the embodiment of the invention have the following advantages:
through setting up data acquisition module, gather mixed resource's control index data, send the control index data who gathers for data processing module, data acquisition module includes: the multi-type index collector exporter and Pushgateway cluster; the data processing module is arranged and used for receiving the monitoring index data sent by the data acquisition module, and warning information is formed on the monitoring index data meeting the preset warning rule by using Prometous to trigger the warning of corresponding resources; through setting up data storage module, the control index data that storage data processing module received, data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster. As Prometheus is an open-source monitoring alarm system and is provided with a time sequence database, the Prometheus is compiled based on Golang and has strong performance; monitoring index collection of various resources such as physical/virtual nodes, network nodes, container clusters, middleware/databases, applications and the like is supported by various exporters (index collectors), so that the monitoring requirement of mixed resources is met, and the monitoring of the mixed resources is supported.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A resource monitoring system, comprising:
the data acquisition module is used for acquiring monitoring index data of the mixed resources and sending the acquired monitoring index data to the data processing module; the data acquisition module comprises: the multi-type index collector exporter and Pushgateway cluster;
the data processing module is used for receiving the monitoring index data sent by the data acquisition module, forming alarm information for the monitoring index data meeting the preset alarm rule by using Prometous, and triggering the alarm of the corresponding resource;
the data storage module is used for storing the monitoring index data received by the data processing module; the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster.
2. The resource monitoring system of claim 1, wherein the data processing module comprises:
the Prometheus Server cluster is used for receiving the monitoring index data, carrying out aggregation and filtration pretreatment on the monitoring index data, and then forming alarm information to be sent to an alarm module alert of Prometheus;
the system comprises a Consul cluster, a monitoring target Server and a monitoring target Server, wherein the Consul cluster is used for performing service registration and discovery on the monitoring target so that the Prometheus Server cluster establishes an HTTP channel for the monitoring target registered in Consul according to the monitoring target and remotely receives monitoring index data in a Pull mode;
and an alarm module alert manager of Prometous is used for receiving alarm information, performing alarm storage, duplicate removal and suppression, and triggering hook operation to realize alarm on corresponding resources.
3. The resource monitoring system of claim 2, wherein the PushGateway cluster is specifically configured to:
receiving monitoring index data which is self-pushed by a mixed resource of which the exporter cannot acquire data in a Push mode;
and transmitting the monitoring index data to the Prometheus Server cluster by using a Pull mode.
4. The resource monitoring system of claim 2, wherein the Prometheus Server cluster comprises a plurality of Prometheus servers, each three Prometheus servers form a group of monitoring centers, each group of monitoring centers is marked as a Worker node, and a Primary node is deployed at an upper layer and is connected with the Worker node below the Primary node to form a Primary-Worker pyramid mode.
5. The resource monitoring system of claim 4, wherein each group of monitoring centers is deployed by three nodes, each node is deployed with a keepalive providing VIP, and the keepalive is responsible for starting the process of Prometous Server;
the first node is a Master node, the other two nodes are Backup nodes, and the first node is in a cold standby mode.
6. The resource monitoring system of claim 1, wherein the PushGateway cluster is deployed by three nodes, and a keepalive is deployed on each node to start the PushGateway process, and a Haproxy is responsible for load balancing forwarding;
the first node is a Master node, the other two nodes are Backup nodes, and the first node is in a cold standby mode.
7. The resource monitoring system of claim 1, wherein the remote storage Elasticsearch cluster is specifically configured to:
writing the monitoring index data into an Elasticissearch cluster through PrometheusBeats to serve as cold data of backup storage;
wherein, PrometousBeats is a storage adapter implementing the RemoteWrite specification;
the PrometheusBeats are deployed by three nodes, a Keepaived responsible for starting the progress of the PrometheusBeats is deployed on each node, and a Haproxy responsible for load balancing forwarding;
the first node is a Master node, the other two nodes are Backup nodes, and the first node is in a cold standby mode.
8. A method of operating a resource monitoring system as claimed in any one of claims 1 to 7, comprising:
the data acquisition module acquires monitoring index data of the mixed resources and sends the acquired monitoring index data to the data processing module; wherein, the data acquisition module includes: the multi-type index collector exporter and Pushgateway cluster;
the data processing module receives the monitoring index data sent by the data acquisition module, and utilizes Prometous to form alarm information for the monitoring index data meeting a preset alarm rule, so as to trigger the alarm of corresponding resources;
the data storage module stores the monitoring index data received by the data processing module; wherein, the data storage module includes: a local TSDB database and a remote storage Elasticissearch cluster.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of claim 8 when executing the computer program.
10. A computer-readable storage medium storing a computer program for executing the method of claim 8.
CN202011132356.2A 2020-10-21 2020-10-21 Resource monitoring system and working method Pending CN112084098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011132356.2A CN112084098A (en) 2020-10-21 2020-10-21 Resource monitoring system and working method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011132356.2A CN112084098A (en) 2020-10-21 2020-10-21 Resource monitoring system and working method

Publications (1)

Publication Number Publication Date
CN112084098A true CN112084098A (en) 2020-12-15

Family

ID=73730904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011132356.2A Pending CN112084098A (en) 2020-10-21 2020-10-21 Resource monitoring system and working method

Country Status (1)

Country Link
CN (1) CN112084098A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559296A (en) * 2020-12-23 2021-03-26 南方电网深圳数字电网研究院有限公司 Prometheus-based virtual machine monitoring method and tool, electronic device and storage medium
CN112631860A (en) * 2020-12-21 2021-04-09 常州微亿智造科技有限公司 Industrial Internet of things data transmission Worker service monitoring method and device
CN112835766A (en) * 2021-02-10 2021-05-25 杭州橙鹰数据技术有限公司 Application monitoring method and device
CN112948127A (en) * 2021-03-30 2021-06-11 北京滴普科技有限公司 Cloud platform container average load monitoring method, terminal device and readable storage medium
CN112994935A (en) * 2021-02-04 2021-06-18 烽火通信科技股份有限公司 prometheus management and control method, device, equipment and storage medium
CN113037549A (en) * 2021-03-04 2021-06-25 浪潮云信息技术股份公司 Operation and maintenance environment warning method
CN113037547A (en) * 2021-03-03 2021-06-25 浪潮云信息技术股份公司 Resource performance acquisition monitoring and warning system
CN113055490A (en) * 2021-03-24 2021-06-29 杭州群核信息技术有限公司 Data storage method and device
CN113342373A (en) * 2021-05-31 2021-09-03 杭州沃趣科技股份有限公司 Implementation method and system for Prometheus universal collector
CN113542068A (en) * 2021-07-15 2021-10-22 中国银行股份有限公司 Redis multi-instance monitoring system and method
CN114153518A (en) * 2021-10-25 2022-03-08 国网江苏省电力有限公司信息通信分公司 Autonomous capacity expansion and reduction method for cloud native MySQL cluster
CN114860510A (en) * 2022-07-08 2022-08-05 飞狐信息技术(天津)有限公司 Data monitoring method and system of micro-service system
CN115499431A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Public cloud multi-resource pool operation and maintenance monitoring system
CN115827393A (en) * 2023-02-21 2023-03-21 德特赛维技术有限公司 Server cluster monitoring and warning system
CN116561076A (en) * 2023-05-10 2023-08-08 合芯科技(苏州)有限公司 Monitoring method and device for distributed file system, computer equipment and medium
CN116737498A (en) * 2023-06-15 2023-09-12 中科驭数(北京)科技有限公司 Telemetry data acquisition method, system, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968482A (en) * 2019-12-18 2020-04-07 上海良鑫网络科技有限公司 Enterprise service and application intelligent monitoring system
CN111049705A (en) * 2019-12-23 2020-04-21 深圳前海微众银行股份有限公司 Method and device for monitoring distributed storage system
CN111064781A (en) * 2019-12-10 2020-04-24 北京金山云网络技术有限公司 Multi-container cluster monitoring data acquisition method and device and electronic equipment
CN111147596A (en) * 2019-12-30 2020-05-12 中国移动通信集团江苏有限公司 Prometous cluster deployment method, device, equipment and medium
CN111459750A (en) * 2020-03-18 2020-07-28 平安科技(深圳)有限公司 Private cloud monitoring method and device based on non-flat network, computer equipment and storage medium
CN111752795A (en) * 2020-06-18 2020-10-09 多加网络科技(北京)有限公司 Full-process monitoring alarm platform and method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111064781A (en) * 2019-12-10 2020-04-24 北京金山云网络技术有限公司 Multi-container cluster monitoring data acquisition method and device and electronic equipment
CN110968482A (en) * 2019-12-18 2020-04-07 上海良鑫网络科技有限公司 Enterprise service and application intelligent monitoring system
CN111049705A (en) * 2019-12-23 2020-04-21 深圳前海微众银行股份有限公司 Method and device for monitoring distributed storage system
CN111147596A (en) * 2019-12-30 2020-05-12 中国移动通信集团江苏有限公司 Prometous cluster deployment method, device, equipment and medium
CN111459750A (en) * 2020-03-18 2020-07-28 平安科技(深圳)有限公司 Private cloud monitoring method and device based on non-flat network, computer equipment and storage medium
CN111752795A (en) * 2020-06-18 2020-10-09 多加网络科技(北京)有限公司 Full-process monitoring alarm platform and method thereof

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631860A (en) * 2020-12-21 2021-04-09 常州微亿智造科技有限公司 Industrial Internet of things data transmission Worker service monitoring method and device
CN112559296A (en) * 2020-12-23 2021-03-26 南方电网深圳数字电网研究院有限公司 Prometheus-based virtual machine monitoring method and tool, electronic device and storage medium
CN112994935A (en) * 2021-02-04 2021-06-18 烽火通信科技股份有限公司 prometheus management and control method, device, equipment and storage medium
CN112835766A (en) * 2021-02-10 2021-05-25 杭州橙鹰数据技术有限公司 Application monitoring method and device
CN113037547A (en) * 2021-03-03 2021-06-25 浪潮云信息技术股份公司 Resource performance acquisition monitoring and warning system
CN113037549A (en) * 2021-03-04 2021-06-25 浪潮云信息技术股份公司 Operation and maintenance environment warning method
CN113055490B (en) * 2021-03-24 2022-10-11 杭州群核信息技术有限公司 Data storage method and device
CN113055490A (en) * 2021-03-24 2021-06-29 杭州群核信息技术有限公司 Data storage method and device
CN112948127A (en) * 2021-03-30 2021-06-11 北京滴普科技有限公司 Cloud platform container average load monitoring method, terminal device and readable storage medium
CN112948127B (en) * 2021-03-30 2023-11-10 北京滴普科技有限公司 Cloud platform container average load monitoring method, terminal equipment and readable storage medium
CN113342373A (en) * 2021-05-31 2021-09-03 杭州沃趣科技股份有限公司 Implementation method and system for Prometheus universal collector
CN113342373B (en) * 2021-05-31 2022-04-22 杭州沃趣科技股份有限公司 Implementation method and system for Prometheus universal collector
CN113542068A (en) * 2021-07-15 2021-10-22 中国银行股份有限公司 Redis multi-instance monitoring system and method
CN114153518A (en) * 2021-10-25 2022-03-08 国网江苏省电力有限公司信息通信分公司 Autonomous capacity expansion and reduction method for cloud native MySQL cluster
CN114860510A (en) * 2022-07-08 2022-08-05 飞狐信息技术(天津)有限公司 Data monitoring method and system of micro-service system
CN115499431A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Public cloud multi-resource pool operation and maintenance monitoring system
CN115827393A (en) * 2023-02-21 2023-03-21 德特赛维技术有限公司 Server cluster monitoring and warning system
CN115827393B (en) * 2023-02-21 2023-10-20 德特赛维技术有限公司 Server cluster monitoring and alarming system
CN116561076A (en) * 2023-05-10 2023-08-08 合芯科技(苏州)有限公司 Monitoring method and device for distributed file system, computer equipment and medium
CN116737498A (en) * 2023-06-15 2023-09-12 中科驭数(北京)科技有限公司 Telemetry data acquisition method, system, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN112084098A (en) Resource monitoring system and working method
US11429499B2 (en) Heartbeat monitoring of virtual machines for initiating failover operations in a data storage management system, including operations by a master monitor node
US9529883B2 (en) Maintaining two-site configuration for workload availability between sites at unlimited distances for products and services
CN105959151B (en) A kind of Stream Processing system and method for High Availabitity
US10084858B2 (en) Managing continuous priority workload availability and general workload availability between sites at unlimited distances for products and services
US10353918B2 (en) High availability and disaster recovery in large-scale data warehouse
US10474694B2 (en) Zero-data loss recovery for active-active sites configurations
US10560544B2 (en) Data caching in a collaborative file sharing system
US10338958B1 (en) Stream adapter for batch-oriented processing frameworks
US9047126B2 (en) Continuous availability between sites at unlimited distances
CN108845865A (en) A kind of monitoring service dispositions method, system and storage medium
CN115640110B (en) Distributed cloud computing system scheduling method and device
CN106547790B (en) Relational database service system
CN113127526A (en) Distributed data storage and retrieval system based on Kubernetes
CN107180034A (en) The group system of MySQL database
CN114567633A (en) Cloud platform system supporting full life cycle of multi-stack database and management method
Singh Cluster-level logging of containers with containers: Logging challenges of container-based cloud deployments
CN111858260A (en) Information display method, device, equipment and medium
CN115801811B (en) Cloud edge cooperation method and device
Jiao et al. Task Scheduling System Based on Consensus Algorithm in P2P Network
CN117194015A (en) Acquisition task allocation method, acquisition task allocation device and storage medium
Yang et al. A Scheme of High Available System for Alarm Image Transfer
CN115766733A (en) Node information processing system and method
Zhang et al. ZooKeeper+: The Optimization of Election Algorithm in Complex Network Circumstance
CN106101208A (en) The method building cross-platform high-availability system based on Ethernet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination