CN112511339B

CN112511339B - Container monitoring alarm method, system, equipment and storage medium based on multiple clusters

Info

Publication number: CN112511339B
Application number: CN202011251413.9A
Authority: CN
Inventors: 叶奕珺
Original assignee: Baofu Network Technology Shanghai Co ltd
Current assignee: Baofu Network Technology Shanghai Co ltd
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2023-04-07
Anticipated expiration: 2040-11-09
Also published as: CN112511339A

Abstract

The application discloses a container monitoring and alarming method, a system, equipment and a storage medium based on multiple clusters, wherein the method comprises the following steps: configuring capturing rules of indexes of all set resources in prometheus.yml through a monitoring module, deploying monitoring components of at least one cluster to be monitored, and periodically capturing instantaneous index data of running of each resource in the cluster by the monitoring components according to the preset capturing rules; yml, configuring alarm rules of all set resources in promemeus by an alarm module, configuring alarm information by an alarm management component, and sending the alarm information to a message notification module; when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager. The method and the device can monitor the operation index of each node of the multiple clusters and give an alarm to abnormal conditions in time.

Description

Container monitoring alarm method, system, equipment and storage medium based on multiple clusters

Technical Field

The present invention relates to a cluster technology, and in particular, to a container monitoring and warning method, system, device, and storage medium based on multiple clusters.

Background

With the popularization of container technology, more and more enterprises develop applications through a micro-service framework, deliver codes in a mirror image mode, deploy operation services in a container mode, and switch operation and maintenance monitoring from a traditional virtual machine to monitoring of containers. Currently, the mainstream container monitoring scheme adopts the modes of exporters (collection) + Prometheus (pulling and storing) + Grafana (display graph) + alert (threshold alarm).

By adopting the modes of exporters (collection), prometheus (pulling and storing), grafana (display chart) and Alertmanager (threshold alarm), the technical requirements of operation and maintenance personnel are high, the configuration is complicated, the technical details of Prometheus, promQL query statements and the like need to be known, and the meanings of various running states and indexes of Kubernetes (K8 s for short) various resources need to be known. In addition, excessive storage space is wasted without simplified indexes, and monitoring and alarming in a multi-cluster environment need to maintain multiple sets of configuration. The excessive configuration greatly increases the learning and using cost of operation and maintenance personnel, and is especially useless for developers who want to customize application threshold value alarms.

Disclosure of Invention

The present invention is directed to a container monitoring and alarming method, system, device and storage medium based on multiple clusters, so as to solve the problems set forth in the foregoing technical background.

In order to achieve the purpose, the invention adopts the following technical scheme:

the first aspect of the present application provides a container monitoring and alarming method based on multiple clusters, including:

the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through a monitoring module, configuring capture rules of indexes of all set resources in promemeus.yml, and deploying monitoring components of at least one cluster to be monitored, wherein the monitoring components capture instantaneous index data of running of each resource in the cluster periodically according to preset capture rules;

the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through an alarm module, configuring alarm rules of all set resources in promemeus.yml, and configuring alarm information through an alarm management component Alertmanager to send the alarm information to a message notification module;

configuring account passwords of a message sending channel through a message notification module, and managing different alarm information to be sent to corresponding subscription terminals by adding a theme and the subscription terminal of the theme;

when the alarm rule is triggered by the instantaneous index data of any resource operation captured by the monitoring module, the alarm information is sent to the message notification module through the Alertmanager, and the message notification module sends the alarm information to the corresponding subscription terminal.

Preferably, the cluster is an 8ks cluster.

Preferably, the resources include one or more of a cluster, a host, a namespace, an application, and a container.

Preferably, the index includes one or more of a CPU, a memory, a storage disk, and a network.

Preferably, the grab rule includes one or more of grab address, grab cycle, index re-marking.

Preferably, deploying, by the monitoring module, the monitoring component of at least one cluster to be monitored includes: deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-explorer and a container index collector cAdviror respectively on each node of each cluster to be monitored, deploying a cluster state index collector club-state-metrics respectively on each cluster to be monitored, and,

and deploying a middleware collector corresponding to the specified middleware on each cluster to be monitored, wherein each middleware corresponds to an independent middleware collector.

More preferably, the host index collector node-expander and the container index collector cAdvisor collect the incoming index capture storage component Prometheus from the instantaneous index data running on each node (node), match the alarm rule configured in advance in yml profile Prometheus.

More preferably, in yml configuration file Prometheus, yml, the fetch address of the fetch pointer includes:

index access addresses of host index collector node-expoerter deployed by each node of each cluster;

index access addresses of container index collectors cAdvisor deployed at each node of each cluster;

index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and (c) a second step of,

the pointer access address of each middleware collector deployed on each cluster.

More preferably, when at least one second cluster needs to join in monitoring, the first cluster records the grabbing address and the access token of the grabbing index of the second cluster, the grabbing address and the access token of the grabbing index of the second cluster are added to the cluster deployment file yaml, and after configuration is completed, a reloading configuration interface of Prometheus is called to enable configuration to take effect; wherein the first cluster and the second cluster are different clusters.

Preferably, the grab rule comprises: and taking the cluster/host/namespace/application/container example as a resource latitude, only pulling the indexes such as CPU/memory/network/storage disk and the like which are most concerned by the storage user, and filtering a large number of indexes which are useless for the user.

Preferably, the method further comprises:

generating a first alarm strategy according to a strategy instruction input by a user;

updating a promemeus.yml configuration file of promemeus according to the first alarm policy, wherein the updated promemeus.yml comprises the first alarm policy; and calling a reloading configuration interface of Prometheus to enable the configuration to be effective.

Preferably, after the alarm rule is triggered by the instantaneous index data of any captured resource operation, the method further comprises: and the user checks the alarm information through the UI visualization module.

Preferably, the message sending channel configured by the message notification module comprises one or more of a mailbox, a short message, an enterprise WeChat, a voice telephone notification and a QQ notification.

Preferably, the method further comprises: presetting a theme subscribed by a user, wherein the theme comprises alarm information interested by the user; and when the captured instantaneous index data of any resource operation triggers an alarm rule, sending alarm information associated with the theme through a configured message sending channel.

Preferably, the alarm information includes: cluster dimension warning items, node dimension warning items and container group dimension warning items.

More preferably, the cluster dimension alarm item includes at least one of: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.

More preferably, the node dimension alarm item includes at least one of: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.

More preferably, the container group dimension alarm item includes at least one of: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.

A second aspect of the present application provides a container monitoring and warning system based on multiple clusters, including: monitoring module, alarm module and message notice module, wherein:

the monitoring module includes:

the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in yml configuration files promemeus;

the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;

the alarm module comprises:

the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in yml configuration files promemeus;

the receiving unit is used for receiving the alarm information sent by the monitoring module and pushing the alarm information to an alarm management component alert manager when the monitoring module determines that the instantaneous index data captured on the cluster to be monitored triggers an alarm rule;

the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module;

and the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account number and the preset theme of the message sending channel, and the theme and the subscription terminal of the theme.

Preferably, the alarm module further comprises: and the alarm rule updating unit is used for recording a strategy instruction input by a user, generating a first alarm strategy, updating promemeus.yml configuration file promemeus.yml of promemeus according to the first alarm strategy, wherein the updated promemeus.yml comprises the first alarm strategy.

Preferably, the multi-cluster-based container monitoring and warning system further includes: and the UI visualization module is used for inquiring and/or displaying the alarm information sent by the alarm module and/or the instantaneous index data monitored by the monitoring module.

More preferably, the UI visualization module may be displayed through dashboard chart information.

Preferably, the cluster is an 8ks cluster.

Preferably, the monitoring assembly comprises:

the index grabbing storage component Prometous is used for being deployed in the first cluster;

the alarm management component Alertmanager is used for being deployed in the first cluster;

the system comprises a host index collector node-explorer and a container index collector cAdvisor, wherein the host index collector node-explorer and the container index collector cAdvisor are used for being deployed at each node (node) of each cluster to be monitored;

the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored; and (c) a second step of,

and the middleware collector is used for being deployed in each cluster to be monitored, and each middleware collector corresponds to an independent middleware.

More preferably, in yml configuration file promemeus of promemeus, the fetch address of the fetch pointer includes:

index access addresses of container index collectors cAdvisors deployed by each node of each cluster;

Preferably, the grab rule comprises: and taking cluster/host/namespace/application/container instances as resource latitude, only pulling and storing the indexes such as CPU/memory/network/storage disk and the like which are most concerned by the user, and filtering a large amount of indexes which are useless to the user.

Preferably, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.

More preferably, the cluster dimension alarm item includes at least one of: the utilization rate of a CPU exceeds 80%, the utilization rate of a memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.

More preferably, the node dimension alarm item includes at least one of: the utilization rate of a CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.

The third aspect of the present application provides a container monitoring and warning device based on multiple clusters, including:

a memory having a computer program stored therein;

a processor for executing all computer programs in said memory for implementing the steps of said multi-cluster based container monitoring alarm method of the first aspect disclosed herein.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the multi-cluster based container monitoring alarm method of the first aspect disclosed herein.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the application discloses a container monitoring and alarming method, system, equipment and storage medium based on multiple clusters, wherein the monitoring module and the alarming module can monitor the operation indexes of each node and container of the multiple clusters and give an alarm in time for abnormal conditions, so that the reasonable adjustment and distribution of system resources are facilitated, and the overall performance of the clusters is improved;

the container monitoring and alarming system based on the Kubernetes cluster can be automatically deployed without complex configuration;

the method simplifies and optimizes mass resource monitoring indexes based on Kubernets;

the method and the device can customize the alarm rule and the push of the alarm information, so that operation and maintenance and developers can smoothly realize monitoring and alarm of the concerned application service on the premise of completely not knowing Prometheus and Kubernetes technologies.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a block diagram of a multi-cluster based container monitoring and warning system according to a preferred embodiment of the present invention;

FIG. 2 is a flow chart of cluster deployment in a preferred embodiment of the present invention;

FIG. 3 is a diagram of the cluster deployment results of the preferred embodiment of the present invention;

FIG. 4 is a flow chart of a multi-cluster based container monitoring alarm method according to a preferred embodiment of the present invention;

FIG. 5 is a flowchart of a user creating alert rules in accordance with a preferred embodiment of the present invention;

FIG. 6 is a functional block diagram of a multi-cluster based container monitoring and alert system in accordance with a preferred embodiment of the present invention;

fig. 7 is a schematic structural diagram of a multi-cluster-based container monitoring and warning device in accordance with a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

A Kubernetes cluster (hereinafter referred to as a cluster) is composed of a plurality of host nodes. All the applications are managed by the cluster in a container form and distributed and deployed on the nodes through the cluster container orchestration function. The container monitoring and warning system can be deployed on a main cluster and supports monitoring of a plurality of clusters.

Fig. 1 is a block diagram of a container monitoring and warning system based on multiple clusters according to a preferred embodiment of the present invention. As shown in fig. 1, a multi-cluster-based container monitoring and alarming system includes: monitoring module 1, warning module 2, message notification module 3 and UI visualization module 4, wherein:

the monitoring module 1 includes:

the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in yml configuration files prometheus.yml of Prometheus;

the alarm module 2 includes:

the warning rule maintenance unit is used for configuring warning rules of all set resources in yml configuration files promemeus;

the monitoring module 1 is used for capturing instantaneous index data of a cluster to be monitored, and sending the instantaneous index data to the receiving unit;

the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module 3;

the message notification module 3 is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the theme and the subscription terminal of the theme;

and the UI visualization module 4 is used for inquiring and/or displaying the alarm information sent by the alarm module 2 and/or the instantaneous index data monitored by the monitoring module 1.

The monitoring component in the above comprises:

1) The index grabbing storage component Prometous is used for being deployed in the main cluster;

2) The alarm management component Alertmanager is used for being deployed in the main cluster;

3) A core collector:

a host index collector node-explorer for being deployed at each node (node) of each cluster to be monitored;

a container index collector cAdvisor for being deployed at each node (node) of each cluster to be monitored;

the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored;

4) And (4) other collectors:

various middleware collectors corresponding to the middleware can be customized, such as collectors of MySQL, mongoDB, redis and the like, and only the cluster deployment file yaml needs to be provided under the path specified by the monitoring module, wherein each middleware instance deploys an independent middleware collector, for example, if the cluster has three MySQL, three middleware collectors need to be deployed, and each middleware collector is responsible for one MySQL.

When a plurality of clusters need to be added into monitoring, the main cluster needs to add information such as access addresses and access tokens of other clusters so as to normally access each cluster and deploy monitoring components.

Fig. 2 is a flow chart of cluster deployment in the present application, and a deployment result chart is shown with reference to fig. 3.

As shown in fig. 2, the deployment process of the cluster is:

step S01: judging whether a basic component (namely a monitoring component) is deployed, if so, executing a step S11, otherwise, executing a step S02;

step S02: generating a main cluster deployment file yaml;

step S11: judging whether a new cluster is deployed at the same time, if so, executing the step S12, otherwise, executing the step S21;

step S12: inputting the access address (capture address) and the access token of the new cluster, and executing step S13;

step S13: judging whether the network is connected, if so, executing a step S14, otherwise, executing a step S12;

step S14: judging whether a collector of a new cluster is deployed or not, if so, executing the step S15, otherwise, executing the step S21;

step S15: generating a new cluster deployment file yaml;

step S21: judging whether a new deployment file is generated, if so, executing the step S31, otherwise, ending the deployment process;

step 31: and starting to run the deployment file and ending the deployment process.

In the above, the access addresses of the grab indicators of all resources recorded in promemeus.

1) Index access addresses of host index collector node-expoerter deployed by each node of each cluster;

2) Index access addresses of container index collectors cAdvisor deployed at each node of each cluster;

3) Index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster;

4) The pointer access address of each middleware collector deployed on each cluster.

The capture rules in the above are: the indexes of various collectors are filtered and recalculated, and only the indexes of CPU/memory/network/disk and the like which are most concerned by a storage user are pulled by taking a cluster/host/namespace/application/container example as a resource latitude, so that a large number of indexes which are useless to the user are eliminated, the storage pressure is reduced, and the query performance of the user is greatly improved.

In the above content, when a new cluster is added, after the main cluster records a new cluster access address and an access token, the monitoring module adds an index access address and an access token for accessing a new cluster collector in a configuration file, and after configuration is completed, calls a reloading configuration interface of promemeus to enable configuration to take effect.

Fig. 4 is a flowchart of a container monitoring alarm method based on multiple clusters according to a preferred embodiment of the present invention. As shown in fig. 4, a container monitoring and alarming method based on multiple clusters includes:

step 01: and installing access addresses (grabbing addresses) for grabbing indexes for deploying all resources and alarm rules of all resources through yml configuration files of Prometous.

Wherein the access address includes: recording the index access address of a host index collector node-expoerter deployed at each node of each cluster; recording the index access address of a container index collector cAdviror deployed at each node of each cluster; recording index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and recording the index access address of each middleware collector deployed on each cluster.

Step 02: deploying the monitoring component of at least one cluster to be monitored through the cluster deployment file yaml, wherein the monitoring component periodically captures instantaneous index data of each resource operation in the cluster according to a preset capture rule.

Deploying, by a monitoring module, a monitoring component of at least one cluster to be monitored, comprising: the method comprises the steps of deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-inserter and a container index collector cAdviror on each node (node) of each cluster to be monitored respectively, deploying a cluster state index collector club-state-metrics on each cluster to be monitored respectively, deploying a middleware collector corresponding to a specified middleware on each cluster to be monitored, and enabling each middleware to correspond to an independent middleware collector.

Step 03: when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager.

The method comprises the steps that a host index collector node-expander and a container index collector cAdviror collect and transmit instant index data running on each node, an index capture storage component Prometheus is collected and transmitted, alarm rules configured in yml configuration files of the Prometheus are matched, and if the alarm rules are triggered, alarm management components Alertmanager configure alarm information and transmit the alarm information to a message notification module.

Step 04: and the message notification module sends the alarm information to a corresponding subscription terminal.

And the message notification module is configured with an account password of a message sending channel, and manages different alarm information to be sent to the corresponding subscription terminal by adding a theme and the subscription terminal of the theme. The message sending channel configured by the message notification module can be a mailbox, a short message, an enterprise WeChat, a voice telephone notification, a QQ notification and the like. The message notification module presets a topic subscribed by the user, wherein the topic comprises the warning information interested by the user. And when the captured instantaneous index data of any resource operation triggers an alarm rule, the message notification module sends alarm information associated with the theme to the subscription terminal through the configured message sending channel.

In a specific application scenario, the writing threshold of the configuration file is high, and taking the yaml file as an example, a user needs to know information such as attributes (such as names, deployment units and the like) of each container on a cluster to be monitored and meanings of various data indexes very much, so that a correct yaml file can be written, the operation is complex, and the monitoring efficiency is reduced. Therefore, in the application, a user can create an alarm rule through the UI visualization module, generate a configuration page of the alarm rule, issue a policy instruction through the configuration page to generate a first alarm policy, update the yml configuration file of Prometheus according to the first alarm policy, where the updated yml configuration file of Prometheus includes the first alarm policy, and then activate the alarm rule by using a mechanism of reloading the configuration file of Prometheus.

For example, a user may add an alarm rule through the UI visualization module, monitor all container instances (resources) under all clusters, and alarm a subscribing terminal subscribing to a specified topic when the memory usage rate (index) is greater than (condition) 80% (threshold). And the alarm module records the alarm rule created by the user, modifies the Prometheus configuration file and activates the alarm rule by utilizing a Prometheus reloading configuration file mechanism.

In addition, after the alarm rule is triggered by the instantaneous index data of any resource operation, the user can also check alarm information through the UI visualization module.

Specifically, a flow chart of creating the alarm rule is shown in fig. 5.

In the foregoing content, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.

Wherein the cluster dimension alarm item may include: the utilization rate of a CPU exceeds 80%, the utilization rate of a memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.

Wherein the node dimension alarm item may include: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.

Wherein the container group dimension alarm item may include: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.

Referring to fig. 6, the operation principle of the container monitoring and warning system of the present application is as follows:

1) And the monitoring module maintains the index access address and the index capture rule of each cluster collector in promemeus.

2) And the alarm module maintains an alarm rule formula in prometheus.yml, and adds and modifies the alarm rule through the UI visualization module.

3) And (3) Prometheus loading configuration, periodically capturing the instantaneous indexes of each collector according to the index access address and the index capture rule, wherein the collectors do not store data, but enable the Prometheus to capture the instantaneous indexes.

4) And the Prometheus periodically calculates whether the alarm rule expression reaches the requirement index threshold value according to the alarm rule.

5) Prometheus pushes alerts to alert manager when the alert rule expression satisfies a condition, such as memory usage of a certain container instance is greater than 80%.

6) Summarizing and alarming and pushing: and after the alarm is collected into the alert manager, sending the alarm information to the message notification module according to the configuration file of the alert manager.

7) The message notification module is pre-configured with account passwords of message sending channels (short messages, mailboxes, enterprise WeChats and the like), and reasonably manages different alarms to be sent to different subscription terminals by adding themes and terminals (mobile phone numbers, mailbox addresses and the like) subscribed by the themes. Once the alarm rule is triggered, the user can receive a notification through a preset sending channel, a preset theme and a preset subscription terminal.

The present application further provides a multi-cluster-based container monitoring and alarming device, which may specifically be a client deployed with a kubernets platform, as shown in fig. 7, the container monitoring and alarming device includes a memory 31 and a processor 32, where the memory 31 stores a computer program, and the processor 32 is configured to execute all the computer programs in the memory 31, so as to implement the steps of the multi-cluster container monitoring and alarming method described above.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for multi-cluster container monitoring alarm as described above.

In summary, the present application discloses a container monitoring and alarming method, system, device and storage medium based on multiple clusters, which can monitor the operation index of each node of the multiple clusters through a monitoring module and an alarming module, and alarm the abnormal condition in time, thereby facilitating reasonable adjustment and allocation of system resources and improving the overall performance of the clusters; according to the method and the device, a container monitoring alarm system based on a Kubernets cluster can be automatically deployed without complex configuration; the method simplifies and optimizes a large amount of resource monitoring indexes based on Kubernetes; the method and the device can customize the alarm rule and the push of the alarm information, so that operation and maintenance and developers can smoothly monitor and alarm the concerned application service on the premise of completely not knowing Prometheus and Kubernets technologies.

The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions for the present invention are within the scope of the present invention for those skilled in the art. Accordingly, equivalent alterations and modifications are intended to be included within the scope of the present invention, without departing from the spirit and scope of the invention.

Claims

1. The container monitoring and alarming method based on the multi-cluster is characterized by being applied to a multi-cluster environment and comprising the following steps: the method comprises the steps that a container monitoring and alarming system is deployed on a main cluster and supports monitoring of a plurality of clusters, wherein the container monitoring and alarming system comprises a monitoring module, an alarming module and a message notification module;

when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager, and the message notification module sends the alarm information to the corresponding subscription terminal;

the deployment of the monitoring component of at least one cluster to be monitored through the monitoring module comprises the following steps: deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-inserter and a container index collector cAdviror on each node of each cluster to be monitored respectively, deploying a cluster state index collector club-state-metrics on each cluster to be monitored respectively, deploying a middleware collector corresponding to a specified middleware on each cluster to be monitored, and enabling each middleware to correspond to an independent middleware collector; the capture rule comprises: filtering and recalculating indexes of various collectors, taking a cluster/host/namespace/application/container example as a resource latitude, only pulling and storing the indexes of a CPU/memory/network/storage disk most concerned by a user, and filtering the indexes which are useless for the user; when at least one second cluster needs to be added into monitoring, the first cluster records the grabbing address and the access token of the grabbing index of the second cluster, the grabbing address and the access token of the grabbing index of the second cluster are added into a cluster deployment file yaml, and after configuration is completed, a reloading configuration interface of Prometheus is called to enable the configuration to take effect; the first cluster and the second cluster are different clusters, and the first cluster is a main cluster.

2. The multi-cluster-based container monitoring alarm method according to claim 1, wherein instantaneous index data running on each node is collected by a host index collector node-expander and a container index collector cAdvisor into an index capture storage component promemeus, matching alarm rules preconfigured in yml profile promemeus. Yml of promemeus, and if an alarm rule is triggered, an alarm management component alert is sent to a message notification module by an alarm manager.

3. The multi-cluster-based container monitoring alarm method according to claim 1, wherein in yml configuration file Prometheus, yml, the capture address of the indicator comprises:

index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,

4. The multi-cluster-based container monitoring alarm method according to claim 1, further comprising:

5. The multi-cluster based container monitoring alarm method of claim 1, further comprising: presetting a theme subscribed by a user, wherein the theme comprises alarm information interested by the user; and when the captured instantaneous index data of any resource operation triggers an alarm rule, sending alarm information associated with the theme through a configured message sending channel.

6. A multi-cluster based container monitoring and warning system, deployed on a master cluster, supporting monitoring of multiple clusters, the system comprising: monitoring module, alarm module and message notice module, wherein:

the monitoring module comprises:

the alarm module comprises:

the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the preset topic and the preset subscription terminal of the topic;

wherein the monitoring assembly comprises:

the system comprises an index capture storage component Prometheus and a target storage component, wherein the index capture storage component is used for being deployed in a first cluster, and the first cluster is a main cluster;

the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored; and the number of the first and second groups,

the middleware collector is used for being deployed in each cluster to be monitored, and each middleware collector corresponds to an independent middleware;

wherein the crawling rules comprise: and filtering and recalculating indexes of various collectors, taking cluster/host/namespace/application/container examples as resource latitudes, only pulling and storing the indexes of CPU/memory/network/storage disk most concerned by users, and filtering the indexes useless for the users.

7. A multi-cluster based container monitoring and warning device, comprising:

a memory having a computer program stored therein;

a processor for executing all computer programs in said memory to implement the steps of the multi-cluster based container monitoring alarm method according to any of claims 1 to 5.

8. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the multi-cluster based container monitoring alarm method according to any of the claims 1 to 5.