CN115934464A

CN115934464A - Information platform monitoring and collecting system

Info

Publication number: CN115934464A
Application number: CN202211592846.XA
Authority: CN
Inventors: 于德江; 左鹏; 王禹博; 徐士强
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-07

Abstract

The invention discloses an information platform monitoring and collecting system, which belongs to the technical field of container performance acquisition and monitoring, and aims to solve the technical problems of realizing the refined management of a K8S cluster container, conveniently checking the problem and timely processing the problem, wherein the technical scheme is as follows: the system comprises a data collection and extraction unit and a monitoring alarm unit, wherein the data collection and extraction unit comprises a data collection layer and a data extraction layer, and the monitoring alarm unit comprises a data display layer, an alarm rule configuration layer, an alarm generation layer and an alarm display layer; the data collection layer is used for collecting host data, system data and container data, carrying out standardized processing on the collected data and storing the data; and the data extraction layer is used for normalizing and filtering the data acquired by the data collection layer through an alarm rule language in a programmed yaml file during deployment, and extracting the required data to the monitoring alarm module.

Description

Information platform monitoring and collecting system

Technical Field

The invention relates to the technical field of container performance acquisition and monitoring, in particular to an information platform monitoring and acquisition system.

Background

Kubernets, K8S for short, can be used to manage containerized applications on multiple hosts in a cloud platform. The application deployment is realized by deploying the containers, the containers are isolated from each other, each container has a file system, processes among the containers cannot influence each other, and computing resources can be distinguished. Compared with a virtual machine, the container can be deployed rapidly, and the container can be migrated among different clouds and different versions of operating systems because the container is decoupled from underlying facilities and a machine file system.

Monitoring is very important work in k8s cluster operation and maintenance management, collects operation data in a cluster timely and comprehensively, and is a basis for observing cluster operation states, knowing cluster operation trends and carrying out alarm notification according to certain rules. However, for a cluster with a large number of containers, the existing monitoring method is easy to cause the problems of overlarge gateway pressure and monitoring data loss.

Therefore, how to realize the fine management of the K8S cluster container, conveniently find the problem and timely process the problem is a technical problem to be solved urgently at present.

Disclosure of Invention

The technical task of the invention is to provide an information platform monitoring and collecting system to solve the problems of how to realize the fine management of a K8S cluster container, conveniently find problems and timely process the problems.

The technical task of the invention is realized in the following way, the system for monitoring and acquiring the information platform comprises a data collecting and extracting unit and a monitoring and alarming unit, wherein the data collecting and extracting unit comprises a data collecting layer and a data extracting layer, and the monitoring and alarming unit comprises a data display layer, an alarming rule configuration layer, an alarming generation layer and an alarming display layer;

the data collection layer is used for collecting host data, system data and container data, carrying out standardized processing on the collected data and storing the data;

the data extraction layer is used for normalizing and filtering data acquired by the data collection layer through an alarm rule language in a compiled yaml file during deployment, extracting required data to the monitoring alarm module, and storing the collected data to a self-contained time sequence database of Prometheus through the exporter by the Prometheus for Grafana calling, wherein the data is in a uniform format;

the data display layer enables a web interface to be used for displaying the data acquired by the data collection layer in a unified mode, the display mode comprises a curve graph, a bar graph and a cake state, and the data are graphed, so that operation and maintenance personnel can be helped to know the operation state and the operation trend of a host or a network within a period of time and the operation state and the operation trend are used as the basis for the operation and maintenance personnel to troubleshoot problems or solve the problems;

the alarm rule configuration layer is used for configuring built-in alarm rules of all set resources in a yml configuration file prometheus.yml of Prometheus and pushing alarm information;

the alarm event generation layer is used for recording the alarm event in real time and notifying a user;

the user display layer is a web display interface and is used for uniformly displaying the monitoring statistical result and the alarm fault result.

Preferably, the data collection layer collects data in the following manner:

(1) building a Kubernetes cluster according to the requirements of actual service and resource conditions, and taking the cluster as a monitoring target;

(2) installing an acquisition component exporter, cadvisor or telegraf in the cluster to realize acquisition of cluster performance data, wherein the cluster performance data comprises cpu, memory, disk and network resource data information;

(3) monitoring indexes with different dimensions are collected through an exporter and are exposed through a data format supported by Prometous, and the Prometous periodically pulls data and displays the data by Grafana;

(4) collecting performance index data related to the container and Pod through cadvisor, and grabbing the performance index data by prometheus through an exposed metrics interface;

(5) and collecting the performance index data of the host through a prometheus-node-exporter, and capturing the performance index data by prometheus through an exposed metrics interface.

Preferably, the Prometheus building and installing process is as follows:

(1) Packaging the Prometheus mirror image and putting the Prometheus mirror image into a cluster mirror image warehouse for subsequent installation of Prometheus;

(2) Creating a namespace with the name of monitering in the constructed Kubernets cluster, and storing the containers operated by Prometheus;

(3) Distributing the reading authority of the cluster to the monitering, and obtaining resource related information of the cluster by Prometous through an API (application program interface) of Kubernetes;

(4) Creating a ConfigMap at monitering for storing the configuration of the Prometheus container and the configuration of dynamically discovering the pod and the running service in the kubernets cluster;

(5) Creating Prometous in a Delployment mode, and installing Prometous through a yaml file;

(6) And connecting Prometheus, mapping an internal port of Prometheus into an external port through a yaml file, and automatically connecting the Kubernetes cluster to Prometheus, namely Prometheus deployment succeeds.

More preferably, the working process of Prometheus is as follows:

(1) The Prometheus server periodically pulls metrics from configured exporters;

(2) The Prometheus server locally stores the collected metrics, runs the defined alert. Rules, records a new time sequence or pushes an alarm to Grafana;

(3) Processing the received alarm by Grafana according to the configuration file, and sending an alarm;

(4) And in the graphical interface, visually acquiring data.

Preferably, the data display layer adopts a Grafana tool, and the Grafana tool deployment process specifically comprises the following steps:

(1) Packaging Grafana mirror images and placing the Grafana mirror images into a cluster mirror image warehouse for subsequent installation of Grafana;

(2) Installing Grafana through a yaml file;

(3) Connecting Grafana, mapping Grafana internal ports into external ports through a yaml file, and automatically connecting the Kubernets to Grafana;

(4) Logging in Grafana by using an administrator account and configuring a data source of Prometous;

(5) Editing JSON files needing diagram types, importing the JSON files into Grafana, and calling the styles of all diagrams to display the diagrams of all data types;

(6) And connecting the Grafana to see the monitoring data of the relevant default mode, namely the Grafana is successfully deployed.

Preferably, the alarm rule configuration layer comprises an alarm rule configuration module, a receiving module, a sending module and a message notification module;

the alarm rule configuration module is used for configuring built-in alarm rules of all set resources in a yml configuration file prometheus.yml of Prometheus;

the receiving module is used for receiving the alarm information sent by the data collection and extraction unit and pushing the alarm information to the alarm management component alert manager when the instant index data of the container is captured on the tenant-side cluster of the data collection and extraction unit to trigger an alarm rule;

the sending module is used for sending the alarm information in the alarm management component Alertmanager to the message notification module;

the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account number and the preset theme of the message sending channel, and the theme and the subscription terminal of the theme.

Preferably, after the alarm rule configuration module loads and configures, the alarm rule configuration module accesses the address of the data collection and extraction unit and the index capture rule according to a K8S dynamic discovery mechanism, periodically captures the instantaneous index of each data collection and extraction unit, and the promemeus periodically calculates whether the alarm rule expression reaches the index threshold according to the alarm rule:

when the alarm rule expression meets the condition, prometheus pushes alarm information to AlertManager;

the alarm information comprises the UUID of the container, the name of the container, the node where the container is located, the threshold value of the set monitoring index and the current instantaneous value of the monitoring index.

Preferably, the message sending channel of the module through which the message passes comprises a mailbox, a short message, a nail and a WeChat.

The information platform monitoring and collecting system has the following advantages:

the invention monitors and alarms K8s cluster resource, can realize monitoring CPU/memory of cluster server container, and can monitor container group resource uninterruptedly after the container group is rescheduled, can monitor application service set under different copy conditions, and obtain original and aggregated monitoring data of a plurality of container groups, then send the monitored data to user in alarm mode in real time, and display the monitoring data in different modes; therefore, the K8S cluster container is managed finely, the problem is conveniently checked and is processed in time;

secondly, data acquisition is carried out on K8S container resources by using a monitoring acquisition component exporter, and the reading permission of a cluster is distributed, so that the resource related information of the cluster can be acquired through an API (application program interface) of Kubernetes;

the invention realizes the fine management of the K8S cluster container, is convenient to check the problem and process the problem in time, is beneficial to understanding the system behavior of the container and realizes the monitoring of the resource use condition.

Drawings

The invention is further described below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of an information platform monitoring and acquisition system.

Detailed Description

An information platform monitoring and collecting system according to the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

Example (b):

as shown in fig. 1, the embodiment provides an information platform monitoring and collecting system, which includes a data collecting and extracting unit and a monitoring and warning unit, where the data collecting and extracting unit includes a data collecting layer and a data extracting layer, and the monitoring and warning unit includes a data display layer, a warning rule configuration layer, a warning generation layer, and a warning display layer;

the data extraction layer is used for normalizing and filtering the data acquired by the data collection layer through an alarm rule language in a compiled yaml file during deployment, extracting required data to the monitoring alarm module, and storing the collected data to a self-contained time sequence database of Prometous through the exporter by the Prometous for grafana calling;

the alarm rule configuration layer is used for configuring built-in alarm rules of all set resources in a yml configuration file promemeus.yml of promemeus and pushing alarm information;

The realization process of the monitoring is that hardware resources, software resources, system information and the like related in the platform and the service system are brought into a unified operation and maintenance monitoring platform, unified management, unified specification, unified processing and unified display are realized on various different data sources by eliminating the difference of management software and the difference of data acquisition means, and finally, operation and maintenance standardized, automatic and intelligent large operation and maintenance management is realized. Operation monitoring and fault warning are two main functional modules of a monitoring system.

The data collection layer in this embodiment collects data in the following manner:

(1) building a Kubernetes cluster according to actual business and resource condition requirements, and taking the cluster as a monitoring target;

(4) collecting performance index data related to the container and the Pod through the cadvisor, and grabbing the performance index data through an exposed metrics interface by prometheus;

The process of Prometheus building and installation in the embodiment is specifically as follows:

(1) Packing the Prometheus mirror images and putting the Prometheus mirror images into a cluster mirror image warehouse for subsequent installation of Prometheus;

(5) Creating Prometheus in a Deployment mode, and installing Prometheus through a yaml file;

(6) And connecting Prometeus, mapping an internal port of Prometeus into an external port through a yaml file, and automatically connecting the Kubernets cluster to Prometeus, namely the Prometeus is successfully deployed.

The working process of Prometheus in this embodiment is specifically as follows:

(1) The Prometheus server periodically pulls metrics from configured exporters;

(4) And in the graphical interface, visually acquiring data.

The data display layer in this embodiment adopts a Grafana tool, and the deployment process of the Grafana tool is specifically as follows:

(1) Packaging the Grafana mirror image and putting the Grafana mirror image into a cluster mirror image warehouse for subsequent installation of Grafana;

(2) Installing Grafana through a yaml file;

(3) Connecting Grafana, mapping Grafana internal ports into external ports through a yaml file, and automatically connecting the Grafana to the Kubernets cluster;

The alarm rule configuration layer in the embodiment comprises an alarm rule configuration module, a receiving module, a sending module and a message notification module;

After the alarm rule configuration module in this embodiment is configured and loaded, the address and the index capture rule of the data collection and extraction unit are accessed according to a K8S dynamic discovery mechanism, the instantaneous indexes of each data collection and extraction unit are periodically captured, and the prometheus periodically calculates whether the alarm rule expression reaches the index threshold according to the alarm rule:

the alarm information comprises UUID of the container, the name of the container, the node where the container is located, a threshold value for setting the monitoring index and the current instantaneous value of the monitoring index.

The message sending channels of the module through which the messages pass in the embodiment comprise mailboxes, short messages, nails and WeChat.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An information platform monitoring and collecting system is characterized by comprising a data collecting and extracting unit and a monitoring and alarming unit, wherein the data collecting and extracting unit comprises a data collecting layer and a data extracting layer, and the monitoring and alarming unit comprises a data display layer, an alarming rule configuration layer, an alarming generation layer and an alarming display layer;

the data display layer enables a web interface to be used for uniformly displaying the data acquired by the data collection layer, and the display modes comprise a curve graph, a bar chart and a cake state;

2. The information platform monitoring and acquisition system according to claim 1, wherein the data collection layer collects data in the following manner:

3. The information platform monitoring and collecting system according to claim 2, wherein the Prometheus building and installing process is specifically as follows:

(2) Creating a name space named as monitering in the constructed Kubernets cluster, wherein the name space is used for storing a container operated by Prometheus;

(4) Creating a ConfigMap at monitoring for storing the configuration of the Prometheus container and the configuration of the dynamically discovered pod and the running service in the Kubernetes cluster;

4. The information platform monitoring and acquisition system according to claim 3, wherein the working process of Prometheus is as follows:

(1) The Prometheus server periodically pulls metrics from configured exporters;

(4) And in the graphical interface, visually acquiring data.

5. The information platform monitoring and collecting system according to claim 1, wherein the data presentation layer employs a Grafana tool, and a Grafana tool deployment process specifically includes:

(2) Installing Grafana through a yaml file;

(5) Editing JSON files needing the chart types, importing the JSON files into Grafana, calling the styles of all the charts, and displaying the charts of all the data types;

6. The information platform monitoring and acquisition system according to claim 1, wherein the alarm rule configuration layer comprises an alarm rule configuration module, a receiving module, an issuing module and a message notification module;

the receiving module is used for receiving the alarm information sent by the data collecting and extracting unit when capturing instant index data of a container on a tenant-side cluster of the data collecting and extracting unit to trigger an alarm rule, and pushing the alarm information to an alarm management component alert manager;

7. The information platform monitoring and collecting system according to claim 6, wherein after the alarm rule configuration module is configured and loaded, the data collection and extraction unit address and the index capture rule are accessed according to a K8S dynamic discovery mechanism, the instantaneous indexes of the data collection and extraction units are captured periodically, and the prometheus periodically calculates whether the alarm rule expression reaches the index threshold according to the alarm rule:

8. The information platform monitoring and acquisition system according to claim 6 or 7, wherein the message transmission channels of the module through which the message passes comprise a mailbox, a short message, a nail and a WeChat.