CN112256401B - Prometheus high-availability system based on Kubernetes environment and implementation method - Google Patents

Prometheus high-availability system based on Kubernetes environment and implementation method Download PDF

Info

Publication number
CN112256401B
CN112256401B CN202011186088.2A CN202011186088A CN112256401B CN 112256401 B CN112256401 B CN 112256401B CN 202011186088 A CN202011186088 A CN 202011186088A CN 112256401 B CN112256401 B CN 112256401B
Authority
CN
China
Prior art keywords
node
prometheus
pod
configuration
sidecar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011186088.2A
Other languages
Chinese (zh)
Other versions
CN112256401A (en
Inventor
张建伟
魏金雷
张晖
孙思清
高传集
蔡卫卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202011186088.2A priority Critical patent/CN112256401B/en
Publication of CN112256401A publication Critical patent/CN112256401A/en
Application granted granted Critical
Publication of CN112256401B publication Critical patent/CN112256401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/301Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Abstract

The invention discloses a Prometous high-availability system and an implementation method based on a Kubernetes environment, belonging to the technical field of element calculation, aiming at solving the technical problem of ensuring that multiple copies of Prometous nodes work simultaneously and avoiding the risk of monitoring data acquisition loss of a single node, and adopting the technical scheme as follows: the system comprises a Manager end and a Client end, wherein the Manager end and the Client end are both deployed in Kubernets in a Pod mode; the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration; and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy. The invention also discloses a Prometheus high-availability implementation method based on the Kubernetes environment.

Description

Prometheus high-availability system based on Kubernetes environment and implementation method
Technical Field
The invention relates to the technical field of cloud computing, in particular to a Prometheus high-availability system and an implementation method based on a Kubernetes environment.
Background
Kubernetes is an open source container cluster management tool used for managing various containerized applications in a cloud platform. Prometheus is an open source monitoring alarm solution in a container environment, is a second graduation item of the CNCF, and becomes a de facto standard for monitoring alarm solutions in the container environment. But Prometheus is currently mainly single-node working and there is no good high availability scheme. And a mature and stable monitoring alarm scheme is very important for the cloud platform. Therefore, how to ensure that multiple copies of Prometheus nodes work simultaneously and avoid the risk of loss of single-node monitoring data acquisition is a technical problem to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a Prometheus high availability system and an implementation method based on a Kubernetes environment, so as to solve the problem of how to ensure that multiple copies of Prometheus nodes work simultaneously and avoid the risk of monitoring data acquisition loss of a single node.
The technical task of the invention is realized in the following way, a Prometheus high-availability system based on Kubernets environment comprises a Manager end (a server end) and a Client end (a Client end), wherein the Manager end and the Client end are deployed in the Kubernets in a Pod way;
the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration;
and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy.
Preferably, the Manager end comprises,
the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;
the configuration module is used for sending a configuration command to a Pod node of the Client end;
the Alert module is used for receiving an alarm generated by Prometheus at the Client terminal.
Preferably, the Pod node of each Client comprises Prometheus and Sidecar, and the Sidecar is deployed in the same Pod node with the Prometheus in a container mode;
the system comprises a Sidecar, a Manager, a Slave and a Slave, wherein the Sidecar is used for receiving a command sent by a configuration module of the Manager, executing the updating operation of a Prometheus configuration file, and synchronizing the updating command to a Slave role node except a Master role node;
prometheus is used for updating the configuration file and sending the generated alarm data to the Manager side.
Preferably, the working process of the system is as follows:
firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;
secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;
thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;
and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.
A Prometheus high-availability implementation method based on a Kubernetes environment is specifically as follows:
s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};
the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;
s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;
s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of TsleepAnd the self state of Sidecar is set as candidate;
s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;
s5, Sidecar in candidate enters the voting stage;
s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;
s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;
s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:
(1) the configuration module sends a configuration command to a random Client terminal;
(2) judging whether the Client side is a node of the Master role:
if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;
if not, the nodes of the Slave role can forward the configuration modification command to the Master node;
s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;
s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;
s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;
s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:
if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;
s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.
Preferably, the step S5, where the Sidecar in the candidate state enters the election voting stage, is as follows:
s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;
s502, sending voting information to all known nodes, and counting voting results;
s503, judging whether the number of the node votes exceeds half:
if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;
if not, jumping to the step S505;
s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;
s505, when T ═ T (T)low,Thigh) And after the random time, entering the next voting.
Preferably, the step S6 of sending heartbeat information to guarantee the survival status of the Sidecar includes the following two cases:
(one) when a node in Salve role crashes, Kubernets will be at time TgraceAnd then reconstructing the new strain as follows:
firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;
updating local configuration files and indexId in heartbeat information and then switching to a Slave state;
and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.
Preferably, the step S9 of setting indexId as the latest log index number and updating the index number into the heartbeat information includes the following steps:
s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;
s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;
s903 sends a configuration reloading command to the corresponding Prometous;
s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.
An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executes the computer program stored by the memory such that the at least one processor performs the Prometheus high availability implementation method in a kubernets-based environment as described above.
A computer-readable storage medium, in which a computer program is stored which is executable by a processor to implement the Prometheus high availability implementation method in a kubernets-based environment as described above.
The Prometheus high-availability system and the implementation method based on the Kubernets environment have the following advantages:
the invention ensures that the Prometous nodes with multiple copies work simultaneously, avoids the risk of losing the monitoring data acquisition of a single node, and simultaneously ensures that only one node can send an alarm and multiple Prometous monitoring acquisition tasks are consistent based on a distributed election strategy;
the method dynamically acquires the Pod IP of the Client where the Prometheus node is located through Apiserver of Kubernetes, and ensures that all Client ends of a cluster can still be accessed even if the IP address changes after Pod reconstruction;
thirdly, the multinode Prometheus cluster ensures high availability of monitoring tasks and integrity of collected data;
through a distributed election strategy, the invention ensures that the configuration of multiple nodes is updated consistently and only a Master node can send alarm feedback information to a Manager at the same time, thereby avoiding the repeated sending of alarm data;
(V) the cluster deployment of the invention does not change a Prometous system, has no code logic invasion and is easy to deploy.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block diagram of a Prometheus high availability system based on the Kubernets environment.
Detailed Description
The Prometheus high availability system and the implementation method based on the Kubernetes environment are described in detail below with reference to the drawings and specific embodiments of the specification.
Example 1:
as shown in fig. 1, the Prometheus high availability system based on the kubernets environment of the present invention includes a Manager end (server end) and a Client end (Client end), both of which are deployed in kubernets in Pod manner;
the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration;
and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy.
The Manager end in the present embodiment includes,
the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;
the configuration module is used for sending a configuration command to a Pod node of the Client end;
the Alert module is used for receiving an alarm generated by Prometheus at the Client terminal.
In this embodiment, the Pod node of each Client includes Prometheus and Sidecar, and the Sidecar is deployed in the same Pod node as Prometheus in a container manner; the method comprises the following specific steps:
(1) the Sidecar selects the Master node through a distributed election strategy, and only the Prometous of the Master node sends the generated alarm to the Manager end at the same moment in order to avoid repeated alarms generated by multiple Prometous;
(2) the method comprises the steps that Sidecar of a Master node of a Client end receives a command sent by a Manager end configuration module and executes updating operation of a Prometheus configuration file;
(3) sidecar of the Master node synchronizes the update command to other Salve nodes and executes the same update operation.
The working process of the system is as follows:
firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;
secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;
thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;
and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.
Example 2:
the invention relates to a Prometheus high-availability implementation method based on a Kubernetes environment, which comprises the following specific steps:
s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};
the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;
s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;
s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of TsleepAnd the self state of Sidecar is set as candidate;
s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;
s5, Sidecar in candidate enters the voting stage;
s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;
s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;
s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:
(1) the configuration module sends a configuration command to a random Client terminal;
(2) judging whether the Client side is a node of the Master role:
if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;
if not, the nodes of the Slave role can forward the configuration modification command to the Master node;
s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;
s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;
s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;
s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:
if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;
s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.
In this embodiment, the step S5 of entering the candidate voting phase by the Sidecar in the candidate state is as follows:
s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;
s502, sending voting information to all known nodes, and counting voting results;
s503, judging whether the number of the node votes exceeds half:
if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;
if not, jumping to the step S505;
s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;
s505, when T ═ T (T)low,Thigh) And after the random time, entering the next voting.
In this embodiment, the step of ensuring the survival status of the Sidecar by sending the heartbeat information in step S6 includes the following two cases:
(one) when a node in Salve role crashes, Kubernets will be at time TgraceAnd then reconstructing the new strain as follows:
firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;
updating local configuration files and indexId in heartbeat information and then switching to a Slave state;
and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.
In this embodiment, setting indexId as the latest log index number in step S9 and updating the index number into the heartbeat information are as follows:
s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;
s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;
s903 sends a configuration reloading command to the corresponding Prometous;
s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.
Example 3:
an embodiment of the present invention further provides an electronic device, including: a memory and at least one processor;
wherein the memory stores computer-executable instructions;
the at least one processor executes the computer-executable instructions stored by the memory to cause the at least one processor to perform a Prometheus high availability implementation method in a kubernets-based environment in any embodiment of the invention.
Example 4:
an embodiment of the present invention further provides a computer-readable storage medium, where a plurality of instructions are stored, and the instructions are loaded by a processor, so that the processor executes the Prometheus high-availability implementation method based on a kubernets environment in any embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A Prometheus high-availability system based on a Kubernets environment is characterized by comprising a Manager end and a Client end, wherein the Manager end and the Client end are deployed in the Kubernets in a Pod mode;
the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration; wherein, the Manager end comprises a plurality of modules,
the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;
the configuration module is used for sending a configuration command to a Pod node of the Client end;
the Alert module is used for receiving an alarm generated by Prometous at a Client terminal;
the Client terminal is used for determining nodes of Master roles in the cluster through a distributed election strategy; the Pod node of each Client comprises Prometheus and Sidecar, and the Sidecar and the Prometheus are deployed in the same Pod node in a container mode;
the system comprises a Sidecar, a Manager, a Slave and a Slave, wherein the Sidecar is used for receiving a command sent by a configuration module of the Manager, executing the updating operation of a Prometheus configuration file, and synchronizing the updating command to a Slave role node except a Master role node;
prometheus is used for updating the configuration file and sending the generated alarm data to the Manager side.
2. The Prometheus high availability system under the kubernets environment of claim 1, wherein the system specifically operates as follows:
firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;
secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;
thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;
and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.
3. A Prometheus high-availability implementation method based on a Kubernetes environment is characterized by comprising the following steps:
s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};
the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;
s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;
s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of TsleepAnd the self state of Sidecar is set as candidate;
s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;
s5, Sidecar in candidate enters the voting stage;
s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;
s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;
s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:
(1) the configuration module sends a configuration command to a random Client terminal;
(2) judging whether the Client side is a node of the Master role:
if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;
if not, the nodes of the Slave role can forward the configuration modification command to the Master node;
s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;
s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;
s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;
s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:
if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;
s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.
4. The Prometheus high availability implementation method based on the Kubernetes environment as claimed in claim 3, wherein the step S5 is implemented by entering the candidate voting phase of the Sidecar in the candidate state as follows:
s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;
s502, sending voting information to all known nodes, and counting voting results;
s503, judging whether the number of the node votes exceeds half:
if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;
if not, jumping to the step S505;
s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;
s505, when T ═ T (T)low,Thigh) And after the random time, entering the next voting.
5. The Prometheus high availability implementation method in a Kubernetes environment according to claim 3, wherein the step S6 of ensuring the survival status of the Sidecar by sending heartbeat information includes the following two cases:
(one) when a node in Salve role crashes, Kubernets will be at time TgraceAnd then reconstructing the new strain as follows:
firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;
updating local configuration files and indexId in heartbeat information and then switching to a Slave state;
and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.
6. The Prometheus high availability implementation method based on the kubernets environment according to any one of claims 3-5, wherein the step S9 of setting indexId as the latest log index number and updating into the heartbeat information is as follows:
s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;
s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;
s903 sends a configuration reloading command to the corresponding Prometous;
s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.
7. An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executing the memory-stored computer program causes the at least one processor to perform the Prometheus high availability implementation under a kubernets-based environment of any one of claims 3 to 6.
8. A computer-readable storage medium, in which a computer program is stored, which computer program is executable by a processor to implement the Prometheus high availability implementation method in a kubernets-based environment as claimed in any one of claims 3 to 6.
CN202011186088.2A 2020-10-30 2020-10-30 Prometheus high-availability system based on Kubernetes environment and implementation method Active CN112256401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011186088.2A CN112256401B (en) 2020-10-30 2020-10-30 Prometheus high-availability system based on Kubernetes environment and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011186088.2A CN112256401B (en) 2020-10-30 2020-10-30 Prometheus high-availability system based on Kubernetes environment and implementation method

Publications (2)

Publication Number Publication Date
CN112256401A CN112256401A (en) 2021-01-22
CN112256401B true CN112256401B (en) 2022-03-15

Family

ID=74268968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011186088.2A Active CN112256401B (en) 2020-10-30 2020-10-30 Prometheus high-availability system based on Kubernetes environment and implementation method

Country Status (1)

Country Link
CN (1) CN112256401B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112994935B (en) * 2021-02-04 2022-06-17 烽火通信科技股份有限公司 prometheus management and control method, device, equipment and storage medium
CN112925612A (en) * 2021-03-15 2021-06-08 浪潮软件科技有限公司 Monitoring service static configuration management method based on Kubernetes
CN114598585A (en) * 2022-03-07 2022-06-07 浪潮云信息技术股份公司 Method and system for monitoring hardware through snmptrapd
CN115827393B (en) * 2023-02-21 2023-10-20 德特赛维技术有限公司 Server cluster monitoring and alarming system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528367B1 (en) * 2016-09-02 2020-01-07 Intuit Inc. Execution of workflows in distributed systems
CN111045901A (en) * 2019-12-11 2020-04-21 东软集团股份有限公司 Container monitoring method and device, storage medium and electronic equipment
CN111147596A (en) * 2019-12-30 2020-05-12 中国移动通信集团江苏有限公司 Prometous cluster deployment method, device, equipment and medium
CN111176783A (en) * 2019-11-20 2020-05-19 航天信息股份有限公司 High-availability method and device for container treatment platform and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10924398B2 (en) * 2018-09-25 2021-02-16 Ebay Inc. Time-series data monitoring with sharded server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528367B1 (en) * 2016-09-02 2020-01-07 Intuit Inc. Execution of workflows in distributed systems
CN111176783A (en) * 2019-11-20 2020-05-19 航天信息股份有限公司 High-availability method and device for container treatment platform and electronic equipment
CN111045901A (en) * 2019-12-11 2020-04-21 东软集团股份有限公司 Container monitoring method and device, storage medium and electronic equipment
CN111147596A (en) * 2019-12-30 2020-05-12 中国移动通信集团江苏有限公司 Prometous cluster deployment method, device, equipment and medium

Also Published As

Publication number Publication date
CN112256401A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN112256401B (en) Prometheus high-availability system based on Kubernetes environment and implementation method
CN108121782B (en) Distribution method of query request, database middleware system and electronic equipment
CN108804523B (en) Data synchronization method, system and computer readable storage medium
US9672244B2 (en) Efficient undo-processing during data redistribution
CN112084258A (en) Data synchronization method and device
CN111124277A (en) Deep learning data set caching method, system, terminal and storage medium
CN109145060B (en) Data processing method and device
CN112153133B (en) Data sharing method, device and medium
CN113360456B (en) Data archiving method, device, equipment and storage medium
CN113268472B (en) Distributed data storage system and method
CN113094430B (en) Data processing method, device, equipment and storage medium
CN111352943A (en) Method and device for realizing data consistency, server and terminal
CN109726211B (en) Distributed time sequence database
CN111641716A (en) Self-healing method of parameter server, parameter server and parameter service system
CN112015595B (en) Master-slave database switching method, computing device and storage medium
CN112000850B (en) Method, device, system and equipment for processing data
CN106951443B (en) Method, equipment and system for synchronizing copies based on distributed system
CN112187889A (en) Data synchronization method, device and storage medium
CN115004662A (en) Data synchronization method, data synchronization device, data storage system and computer readable medium
CN109165259B (en) Index table updating method based on network attached storage, processor and storage device
CN113515574B (en) Data synchronization method and device
US10860580B2 (en) Information processing device, method, and medium
CN111399753B (en) Method and device for writing pictures
CN116737764A (en) Method and device for data synchronization, electronic equipment and storage medium
CN108376104B (en) Node scheduling method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant