CN112256401B

CN112256401B - Prometheus high-availability system based on Kubernetes environment and implementation method

Info

Publication number: CN112256401B
Application number: CN202011186088.2A
Authority: CN
Inventors: 张建伟; 魏金雷; 张晖; 孙思清; 高传集; 蔡卫卫
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-03-15
Anticipated expiration: 2040-10-30
Also published as: CN112256401A

Abstract

The invention discloses a Prometous high-availability system and an implementation method based on a Kubernetes environment, belonging to the technical field of element calculation, aiming at solving the technical problem of ensuring that multiple copies of Prometous nodes work simultaneously and avoiding the risk of monitoring data acquisition loss of a single node, and adopting the technical scheme as follows: the system comprises a Manager end and a Client end, wherein the Manager end and the Client end are both deployed in Kubernets in a Pod mode; the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration; and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy. The invention also discloses a Prometheus high-availability implementation method based on the Kubernetes environment.

Description

Prometheus high-availability system based on Kubernetes environment and implementation method

Technical Field

The invention relates to the technical field of cloud computing, in particular to a Prometheus high-availability system and an implementation method based on a Kubernetes environment.

Background

Kubernetes is an open source container cluster management tool used for managing various containerized applications in a cloud platform. Prometheus is an open source monitoring alarm solution in a container environment, is a second graduation item of the CNCF, and becomes a de facto standard for monitoring alarm solutions in the container environment. But Prometheus is currently mainly single-node working and there is no good high availability scheme. And a mature and stable monitoring alarm scheme is very important for the cloud platform. Therefore, how to ensure that multiple copies of Prometheus nodes work simultaneously and avoid the risk of loss of single-node monitoring data acquisition is a technical problem to be solved urgently at present.

Disclosure of Invention

The technical task of the invention is to provide a Prometheus high availability system and an implementation method based on a Kubernetes environment, so as to solve the problem of how to ensure that multiple copies of Prometheus nodes work simultaneously and avoid the risk of monitoring data acquisition loss of a single node.

The technical task of the invention is realized in the following way, a Prometheus high-availability system based on Kubernets environment comprises a Manager end (a server end) and a Client end (a Client end), wherein the Manager end and the Client end are deployed in the Kubernets in a Pod way;

the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration;

and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy.

Preferably, the Manager end comprises,

the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;

the configuration module is used for sending a configuration command to a Pod node of the Client end;

the Alert module is used for receiving an alarm generated by Prometheus at the Client terminal.

Preferably, the Pod node of each Client comprises Prometheus and Sidecar, and the Sidecar is deployed in the same Pod node with the Prometheus in a container mode;

the system comprises a Sidecar, a Manager, a Slave and a Slave, wherein the Sidecar is used for receiving a command sent by a configuration module of the Manager, executing the updating operation of a Prometheus configuration file, and synchronizing the updating command to a Slave role node except a Master role node;

prometheus is used for updating the configuration file and sending the generated alarm data to the Manager side.

Preferably, the working process of the system is as follows:

firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;

secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;

thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;

and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.

A Prometheus high-availability implementation method based on a Kubernetes environment is specifically as follows:

s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};

the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;

s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;

s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of T_sleepAnd the self state of Sidecar is set as candidate;

s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;

s5, Sidecar in candidate enters the voting stage;

s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;

s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;

s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:

(1) the configuration module sends a configuration command to a random Client terminal;

(2) judging whether the Client side is a node of the Master role:

if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;

if not, the nodes of the Slave role can forward the configuration modification command to the Master node;

s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;

s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;

s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;

s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:

if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;

s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.

Preferably, the step S5, where the Sidecar in the candidate state enters the election voting stage, is as follows:

s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;

s502, sending voting information to all known nodes, and counting voting results;

s503, judging whether the number of the node votes exceeds half:

if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;

if not, jumping to the step S505;

s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;

s505, when T ═ T (T)_low,T_high) And after the random time, entering the next voting.

Preferably, the step S6 of sending heartbeat information to guarantee the survival status of the Sidecar includes the following two cases:

(one) when a node in Salve role crashes, Kubernets will be at time T_graceAnd then reconstructing the new strain as follows:

firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;

updating local configuration files and indexId in heartbeat information and then switching to a Slave state;

and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.

Preferably, the step S9 of setting indexId as the latest log index number and updating the index number into the heartbeat information includes the following steps:

s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;

s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;

s903 sends a configuration reloading command to the corresponding Prometous;

s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.

An electronic device, comprising: a memory and at least one processor;

wherein the memory has stored thereon a computer program;

the at least one processor executes the computer program stored by the memory such that the at least one processor performs the Prometheus high availability implementation method in a kubernets-based environment as described above.

A computer-readable storage medium, in which a computer program is stored which is executable by a processor to implement the Prometheus high availability implementation method in a kubernets-based environment as described above.

The Prometheus high-availability system and the implementation method based on the Kubernets environment have the following advantages:

the invention ensures that the Prometous nodes with multiple copies work simultaneously, avoids the risk of losing the monitoring data acquisition of a single node, and simultaneously ensures that only one node can send an alarm and multiple Prometous monitoring acquisition tasks are consistent based on a distributed election strategy;

the method dynamically acquires the Pod IP of the Client where the Prometheus node is located through Apiserver of Kubernetes, and ensures that all Client ends of a cluster can still be accessed even if the IP address changes after Pod reconstruction;

thirdly, the multinode Prometheus cluster ensures high availability of monitoring tasks and integrity of collected data;

through a distributed election strategy, the invention ensures that the configuration of multiple nodes is updated consistently and only a Master node can send alarm feedback information to a Manager at the same time, thereby avoiding the repeated sending of alarm data;

(V) the cluster deployment of the invention does not change a Prometous system, has no code logic invasion and is easy to deploy.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block diagram of a Prometheus high availability system based on the Kubernets environment.

Detailed Description

The Prometheus high availability system and the implementation method based on the Kubernetes environment are described in detail below with reference to the drawings and specific embodiments of the specification.

Example 1:

as shown in fig. 1, the Prometheus high availability system based on the kubernets environment of the present invention includes a Manager end (server end) and a Client end (Client end), both of which are deployed in kubernets in Pod manner;

The Manager end in the present embodiment includes,

In this embodiment, the Pod node of each Client includes Prometheus and Sidecar, and the Sidecar is deployed in the same Pod node as Prometheus in a container manner; the method comprises the following specific steps:

(1) the Sidecar selects the Master node through a distributed election strategy, and only the Prometous of the Master node sends the generated alarm to the Manager end at the same moment in order to avoid repeated alarms generated by multiple Prometous;

(2) the method comprises the steps that Sidecar of a Master node of a Client end receives a command sent by a Manager end configuration module and executes updating operation of a Prometheus configuration file;

(3) sidecar of the Master node synchronizes the update command to other Salve nodes and executes the same update operation.

The working process of the system is as follows:

Example 2:

the invention relates to a Prometheus high-availability implementation method based on a Kubernetes environment, which comprises the following specific steps:

s5, Sidecar in candidate enters the voting stage;

(2) judging whether the Client side is a node of the Master role:

In this embodiment, the step S5 of entering the candidate voting phase by the Sidecar in the candidate state is as follows:

s503, judging whether the number of the node votes exceeds half:

if not, jumping to the step S505;

In this embodiment, the step of ensuring the survival status of the Sidecar by sending the heartbeat information in step S6 includes the following two cases:

In this embodiment, setting indexId as the latest log index number in step S9 and updating the index number into the heartbeat information are as follows:

s903 sends a configuration reloading command to the corresponding Prometous;

Example 3:

an embodiment of the present invention further provides an electronic device, including: a memory and at least one processor;

wherein the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory to cause the at least one processor to perform a Prometheus high availability implementation method in a kubernets-based environment in any embodiment of the invention.

Example 4:

an embodiment of the present invention further provides a computer-readable storage medium, where a plurality of instructions are stored, and the instructions are loaded by a processor, so that the processor executes the Prometheus high-availability implementation method based on a kubernets environment in any embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A Prometheus high-availability system based on a Kubernets environment is characterized by comprising a Manager end and a Client end, wherein the Manager end and the Client end are deployed in the Kubernets in a Pod mode;

the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration; wherein, the Manager end comprises a plurality of modules,

the Alert module is used for receiving an alarm generated by Prometous at a Client terminal;

the Client terminal is used for determining nodes of Master roles in the cluster through a distributed election strategy; the Pod node of each Client comprises Prometheus and Sidecar, and the Sidecar and the Prometheus are deployed in the same Pod node in a container mode;

2. The Prometheus high availability system under the kubernets environment of claim 1, wherein the system specifically operates as follows:

3. A Prometheus high-availability implementation method based on a Kubernetes environment is characterized by comprising the following steps:

s5, Sidecar in candidate enters the voting stage;

(2) judging whether the Client side is a node of the Master role:

4. The Prometheus high availability implementation method based on the Kubernetes environment as claimed in claim 3, wherein the step S5 is implemented by entering the candidate voting phase of the Sidecar in the candidate state as follows:

s503, judging whether the number of the node votes exceeds half:

if not, jumping to the step S505;

5. The Prometheus high availability implementation method in a Kubernetes environment according to claim 3, wherein the step S6 of ensuring the survival status of the Sidecar by sending heartbeat information includes the following two cases:

6. The Prometheus high availability implementation method based on the kubernets environment according to any one of claims 3-5, wherein the step S9 of setting indexId as the latest log index number and updating into the heartbeat information is as follows:

s903 sends a configuration reloading command to the corresponding Prometous;

7. An electronic device, comprising: a memory and at least one processor;

wherein the memory has stored thereon a computer program;

the at least one processor executing the memory-stored computer program causes the at least one processor to perform the Prometheus high availability implementation under a kubernets-based environment of any one of claims 3 to 6.

8. A computer-readable storage medium, in which a computer program is stored, which computer program is executable by a processor to implement the Prometheus high availability implementation method in a kubernets-based environment as claimed in any one of claims 3 to 6.