CN112256401B - Prometheus high-availability system based on Kubernetes environment and implementation method - Google Patents
Prometheus high-availability system based on Kubernetes environment and implementation method Download PDFInfo
- Publication number
- CN112256401B CN112256401B CN202011186088.2A CN202011186088A CN112256401B CN 112256401 B CN112256401 B CN 112256401B CN 202011186088 A CN202011186088 A CN 202011186088A CN 112256401 B CN112256401 B CN 112256401B
- Authority
- CN
- China
- Prior art keywords
- node
- prometheus
- pod
- configuration
- sidecar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012544 monitoring process Methods 0.000 claims abstract description 52
- 238000004590 computer program Methods 0.000 claims description 7
- 230000004048 modification Effects 0.000 claims description 7
- 238000012986 modification Methods 0.000 claims description 7
- 230000002085 persistent effect Effects 0.000 claims description 6
- 230000004083 survival effect Effects 0.000 claims description 6
- 239000002674 ointment Substances 0.000 claims description 4
- 238000005266 casting Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/301—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
Abstract
The invention discloses a Prometous high-availability system and an implementation method based on a Kubernetes environment, belonging to the technical field of element calculation, aiming at solving the technical problem of ensuring that multiple copies of Prometous nodes work simultaneously and avoiding the risk of monitoring data acquisition loss of a single node, and adopting the technical scheme as follows: the system comprises a Manager end and a Client end, wherein the Manager end and the Client end are both deployed in Kubernets in a Pod mode; the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration; and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy. The invention also discloses a Prometheus high-availability implementation method based on the Kubernetes environment.
Description
Technical Field
The invention relates to the technical field of cloud computing, in particular to a Prometheus high-availability system and an implementation method based on a Kubernetes environment.
Background
Kubernetes is an open source container cluster management tool used for managing various containerized applications in a cloud platform. Prometheus is an open source monitoring alarm solution in a container environment, is a second graduation item of the CNCF, and becomes a de facto standard for monitoring alarm solutions in the container environment. But Prometheus is currently mainly single-node working and there is no good high availability scheme. And a mature and stable monitoring alarm scheme is very important for the cloud platform. Therefore, how to ensure that multiple copies of Prometheus nodes work simultaneously and avoid the risk of loss of single-node monitoring data acquisition is a technical problem to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a Prometheus high availability system and an implementation method based on a Kubernetes environment, so as to solve the problem of how to ensure that multiple copies of Prometheus nodes work simultaneously and avoid the risk of monitoring data acquisition loss of a single node.
The technical task of the invention is realized in the following way, a Prometheus high-availability system based on Kubernets environment comprises a Manager end (a server end) and a Client end (a Client end), wherein the Manager end and the Client end are deployed in the Kubernets in a Pod way;
the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration;
and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy.
Preferably, the Manager end comprises,
the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;
the configuration module is used for sending a configuration command to a Pod node of the Client end;
the Alert module is used for receiving an alarm generated by Prometheus at the Client terminal.
Preferably, the Pod node of each Client comprises Prometheus and Sidecar, and the Sidecar is deployed in the same Pod node with the Prometheus in a container mode;
the system comprises a Sidecar, a Manager, a Slave and a Slave, wherein the Sidecar is used for receiving a command sent by a configuration module of the Manager, executing the updating operation of a Prometheus configuration file, and synchronizing the updating command to a Slave role node except a Master role node;
prometheus is used for updating the configuration file and sending the generated alarm data to the Manager side.
Preferably, the working process of the system is as follows:
firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;
secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;
thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;
and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.
A Prometheus high-availability implementation method based on a Kubernetes environment is specifically as follows:
s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};
the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;
s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;
s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of TsleepAnd the self state of Sidecar is set as candidate;
s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;
s5, Sidecar in candidate enters the voting stage;
s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;
s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;
s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:
(1) the configuration module sends a configuration command to a random Client terminal;
(2) judging whether the Client side is a node of the Master role:
if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;
if not, the nodes of the Slave role can forward the configuration modification command to the Master node;
s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;
s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;
s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;
s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:
if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;
s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.
Preferably, the step S5, where the Sidecar in the candidate state enters the election voting stage, is as follows:
s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;
s502, sending voting information to all known nodes, and counting voting results;
s503, judging whether the number of the node votes exceeds half:
if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;
if not, jumping to the step S505;
s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;
s505, when T ═ T (T)low,Thigh) And after the random time, entering the next voting.
Preferably, the step S6 of sending heartbeat information to guarantee the survival status of the Sidecar includes the following two cases:
(one) when a node in Salve role crashes, Kubernets will be at time TgraceAnd then reconstructing the new strain as follows:
firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;
updating local configuration files and indexId in heartbeat information and then switching to a Slave state;
and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.
Preferably, the step S9 of setting indexId as the latest log index number and updating the index number into the heartbeat information includes the following steps:
s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;
s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;
s903 sends a configuration reloading command to the corresponding Prometous;
s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.
An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executes the computer program stored by the memory such that the at least one processor performs the Prometheus high availability implementation method in a kubernets-based environment as described above.
A computer-readable storage medium, in which a computer program is stored which is executable by a processor to implement the Prometheus high availability implementation method in a kubernets-based environment as described above.
The Prometheus high-availability system and the implementation method based on the Kubernets environment have the following advantages:
the invention ensures that the Prometous nodes with multiple copies work simultaneously, avoids the risk of losing the monitoring data acquisition of a single node, and simultaneously ensures that only one node can send an alarm and multiple Prometous monitoring acquisition tasks are consistent based on a distributed election strategy;
the method dynamically acquires the Pod IP of the Client where the Prometheus node is located through Apiserver of Kubernetes, and ensures that all Client ends of a cluster can still be accessed even if the IP address changes after Pod reconstruction;
thirdly, the multinode Prometheus cluster ensures high availability of monitoring tasks and integrity of collected data;
through a distributed election strategy, the invention ensures that the configuration of multiple nodes is updated consistently and only a Master node can send alarm feedback information to a Manager at the same time, thereby avoiding the repeated sending of alarm data;
(V) the cluster deployment of the invention does not change a Prometous system, has no code logic invasion and is easy to deploy.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block diagram of a Prometheus high availability system based on the Kubernets environment.
Detailed Description
The Prometheus high availability system and the implementation method based on the Kubernetes environment are described in detail below with reference to the drawings and specific embodiments of the specification.
Example 1:
as shown in fig. 1, the Prometheus high availability system based on the kubernets environment of the present invention includes a Manager end (server end) and a Client end (Client end), both of which are deployed in kubernets in Pod manner;
the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration;
and the Client terminal is used for determining nodes of the Master role in the cluster through a distributed election strategy.
The Manager end in the present embodiment includes,
the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;
the configuration module is used for sending a configuration command to a Pod node of the Client end;
the Alert module is used for receiving an alarm generated by Prometheus at the Client terminal.
In this embodiment, the Pod node of each Client includes Prometheus and Sidecar, and the Sidecar is deployed in the same Pod node as Prometheus in a container manner; the method comprises the following specific steps:
(1) the Sidecar selects the Master node through a distributed election strategy, and only the Prometous of the Master node sends the generated alarm to the Manager end at the same moment in order to avoid repeated alarms generated by multiple Prometous;
(2) the method comprises the steps that Sidecar of a Master node of a Client end receives a command sent by a Manager end configuration module and executes updating operation of a Prometheus configuration file;
(3) sidecar of the Master node synchronizes the update command to other Salve nodes and executes the same update operation.
The working process of the system is as follows:
firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;
secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;
thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;
and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.
Example 2:
the invention relates to a Prometheus high-availability implementation method based on a Kubernetes environment, which comprises the following specific steps:
s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};
the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;
s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;
s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of TsleepAnd the self state of Sidecar is set as candidate;
s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;
s5, Sidecar in candidate enters the voting stage;
s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;
s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;
s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:
(1) the configuration module sends a configuration command to a random Client terminal;
(2) judging whether the Client side is a node of the Master role:
if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;
if not, the nodes of the Slave role can forward the configuration modification command to the Master node;
s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;
s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;
s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;
s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:
if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;
s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.
In this embodiment, the step S5 of entering the candidate voting phase by the Sidecar in the candidate state is as follows:
s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;
s502, sending voting information to all known nodes, and counting voting results;
s503, judging whether the number of the node votes exceeds half:
if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;
if not, jumping to the step S505;
s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;
s505, when T ═ T (T)low,Thigh) And after the random time, entering the next voting.
In this embodiment, the step of ensuring the survival status of the Sidecar by sending the heartbeat information in step S6 includes the following two cases:
(one) when a node in Salve role crashes, Kubernets will be at time TgraceAnd then reconstructing the new strain as follows:
firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;
updating local configuration files and indexId in heartbeat information and then switching to a Slave state;
and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.
In this embodiment, setting indexId as the latest log index number in step S9 and updating the index number into the heartbeat information are as follows:
s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;
s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;
s903 sends a configuration reloading command to the corresponding Prometous;
s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.
Example 3:
an embodiment of the present invention further provides an electronic device, including: a memory and at least one processor;
wherein the memory stores computer-executable instructions;
the at least one processor executes the computer-executable instructions stored by the memory to cause the at least one processor to perform a Prometheus high availability implementation method in a kubernets-based environment in any embodiment of the invention.
Example 4:
an embodiment of the present invention further provides a computer-readable storage medium, where a plurality of instructions are stored, and the instructions are loaded by a processor, so that the processor executes the Prometheus high-availability implementation method based on a kubernets environment in any embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (8)
1. A Prometheus high-availability system based on a Kubernets environment is characterized by comprising a Manager end and a Client end, wherein the Manager end and the Client end are deployed in the Kubernets in a Pod mode;
the Manager end is used for dynamically acquiring an access address of a Pod node of the Client end through Apiserver of Kubernetes, pulling Client end monitoring data corresponding to the address, removing duplication of the data, and sending a command for updating Prometheus configuration; wherein, the Manager end comprises a plurality of modules,
the Metric module is used for acquiring monitoring data from the plurality of Pod nodes, and returning the monitoring data to a monitoring data requester after the monitoring data is deduplicated;
the configuration module is used for sending a configuration command to a Pod node of the Client end;
the Alert module is used for receiving an alarm generated by Prometous at a Client terminal;
the Client terminal is used for determining nodes of Master roles in the cluster through a distributed election strategy; the Pod node of each Client comprises Prometheus and Sidecar, and the Sidecar and the Prometheus are deployed in the same Pod node in a container mode;
the system comprises a Sidecar, a Manager, a Slave and a Slave, wherein the Sidecar is used for receiving a command sent by a configuration module of the Manager, executing the updating operation of a Prometheus configuration file, and synchronizing the updating command to a Slave role node except a Master role node;
prometheus is used for updating the configuration file and sending the generated alarm data to the Manager side.
2. The Prometheus high availability system under the kubernets environment of claim 1, wherein the system specifically operates as follows:
firstly, a Manager dynamically acquires an access address of a Pod node of a Client end through Apiserver of Kubernetes;
secondly, the Manager terminal pulls the Client terminal monitoring data of the corresponding address and deduplicates the data, and simultaneously sends a command for updating Prometheus configuration;
thirdly, the Client determines a Master role node in the cluster through a distributed election strategy, the Master role node interacts with the Manager, the Prometheus configuration file is updated, and the generated alarm is sent to the Manager;
and (IV) the Master role node synchronizes the Slave role node with the configured and updated command, so that the consistency of the monitoring tasks of Prometheus is ensured.
3. A Prometheus high-availability implementation method based on a Kubernetes environment is characterized by comprising the following steps:
s1, deploying odd number of Pod nodes in Kubernets in a StatefUlSet mode, and marking labels on each Pod node as label (label { "prometheus _ cluster": true "};
the Manager end is deployed in a Deployment mode, and the number of the Pod is not limited;
s2, storing and using persistent storage Persistentvolume by the Pod node of each Client end, and ensuring that the data of the original Pod node is not lost after the Pod node is rebuilt, thereby ensuring that the Pod node is updated to increment update;
s3, after the Sidecar at the Client end is started, entering a sleep state with the duration of TsleepAnd the self state of Sidecar is set as candidate;
s4, the Sidecar obtains and screens the IP address of the Pod node of each node of the Prometheus cluster through Apiserver of Kubernetes and the identification label of the Pod node used by Prometheus;
s5, Sidecar in candidate enters the voting stage;
s6, the Sidecar guarantees the survival state by sending heartbeat information; the heartbeat information comprises the role of the node and the local latest log file index number indexId;
s7, the Sidecar of the node with the Master role receives the configuration updating command sent by the Manager end and sends the alarm generated by Prometheus to the Manager end; simultaneously, the Sidecar sends a command of reloading configuration to Prometheus belonging to the same Pod node to enable the Prometheus to reload the configuration file;
s8, the Manager side acquires all IP sets of Pod nodes of the Client side in operation through Apiserver; the method comprises the following specific steps:
(1) the configuration module sends a configuration command to a random Client terminal;
(2) judging whether the Client side is a node of the Master role:
if yes, the Sidecar on the system can modify the configuration file of Prometous and sends a loading configuration signal to a container where Prometous is located;
if not, the nodes of the Slave role can forward the configuration modification command to the Master node;
s9, after Prometous configuration update of the node with the Master role is completed, Sidecar in the Pod node records the update operation of the configuration and persists the update operation to a local volume mounted in a persistent volume storage in the form of a log file, and the index number of the local log file is automatically increased by 1; setting indexId as the latest log index number and updating the index number into heartbeat information;
s10, after receiving a user request to inquire the monitoring data, the Manager end calls a Metric module to inquire the monitoring data;
s11, inquiring Apiserver of Kubernetes by a Metric module, screening and acquiring all Client nodes and IP addresses thereof according to label of Pod; meanwhile, the Metric module pulls the monitoring data collected in Prometheus of the Client end through the acquired IP of the Pod node and the acquired URL of the monitoring data;
s12, after the Metric module obtains time sequence data of monitoring data in Prometheus, a minimum heap is constructed in a memory, key of a heap node is a time stamp of the data, and value is corresponding time sequence data; the method for pulling monitoring data of Prometheus in other Pod nodes in the same manner specifically includes:
if the data of the corresponding time exists in the heap, discarding the record until the data processing of all Prometheus is completed;
s13, the user obtains complete monitoring data through a Metric module of the Manager end, and obtains alarm data generated by prometheus through an Alert module.
4. The Prometheus high availability implementation method based on the Kubernetes environment as claimed in claim 3, wherein the step S5 is implemented by entering the candidate voting phase of the Sidecar in the candidate state as follows:
s501, casting the ticket to the Pod with the minimum number created by the StatefUlSet;
s502, sending voting information to all known nodes, and counting voting results;
s503, judging whether the number of the node votes exceeds half:
if yes, setting the node as a Master role node, setting the rest nodes as Slave role nodes, and executing a step S504;
if not, jumping to the step S505;
s504, broadcasting local information < id, indexId > by a node with a Master role; wherein id is the only expression, and the value is increased by 1 after the Master role node is replaced each time; indexId represents the latest value after the local log index is updated;
s505, when T ═ T (T)low,Thigh) And after the random time, entering the next voting.
5. The Prometheus high availability implementation method in a Kubernetes environment according to claim 3, wherein the step S6 of ensuring the survival status of the Sidecar by sending heartbeat information includes the following two cases:
(one) when a node in Salve role crashes, Kubernets will be at time TgraceAnd then reconstructing the new strain as follows:
firstly, entering a candidate state, and after synchronizing the Sidecar with a node of a cluster Master role, acquiring a lost log index;
updating local configuration files and indexId in heartbeat information and then switching to a Slave state;
and (II) when the crash node is the node with the Master role, the rest nodes enter the election link again.
6. The Prometheus high availability implementation method based on the kubernets environment according to any one of claims 3-5, wherein the step S9 of setting indexId as the latest log index number and updating into the heartbeat information is as follows:
s901, the Master node synchronizes the command of configuration modification to the nodes of the rest Slave roles in a log file form;
s902, after receiving the synchronous operation log, the Sidecar of the node with the Slave role analyzes the command and updates the configuration file of Prometheus;
s903 sends a configuration reloading command to the corresponding Prometous;
s904, the nodes in the Slave role update the latest index numbers indexId in the heartbeat information.
7. An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executing the memory-stored computer program causes the at least one processor to perform the Prometheus high availability implementation under a kubernets-based environment of any one of claims 3 to 6.
8. A computer-readable storage medium, in which a computer program is stored, which computer program is executable by a processor to implement the Prometheus high availability implementation method in a kubernets-based environment as claimed in any one of claims 3 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011186088.2A CN112256401B (en) | 2020-10-30 | 2020-10-30 | Prometheus high-availability system based on Kubernetes environment and implementation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011186088.2A CN112256401B (en) | 2020-10-30 | 2020-10-30 | Prometheus high-availability system based on Kubernetes environment and implementation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112256401A CN112256401A (en) | 2021-01-22 |
CN112256401B true CN112256401B (en) | 2022-03-15 |
Family
ID=74268968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011186088.2A Active CN112256401B (en) | 2020-10-30 | 2020-10-30 | Prometheus high-availability system based on Kubernetes environment and implementation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256401B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112994935B (en) * | 2021-02-04 | 2022-06-17 | 烽火通信科技股份有限公司 | prometheus management and control method, device, equipment and storage medium |
CN112925612A (en) * | 2021-03-15 | 2021-06-08 | 浪潮软件科技有限公司 | Monitoring service static configuration management method based on Kubernetes |
CN114598585A (en) * | 2022-03-07 | 2022-06-07 | 浪潮云信息技术股份公司 | Method and system for monitoring hardware through snmptrapd |
CN115827393B (en) * | 2023-02-21 | 2023-10-20 | 德特赛维技术有限公司 | Server cluster monitoring and alarming system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10528367B1 (en) * | 2016-09-02 | 2020-01-07 | Intuit Inc. | Execution of workflows in distributed systems |
CN111045901A (en) * | 2019-12-11 | 2020-04-21 | 东软集团股份有限公司 | Container monitoring method and device, storage medium and electronic equipment |
CN111147596A (en) * | 2019-12-30 | 2020-05-12 | 中国移动通信集团江苏有限公司 | Prometous cluster deployment method, device, equipment and medium |
CN111176783A (en) * | 2019-11-20 | 2020-05-19 | 航天信息股份有限公司 | High-availability method and device for container treatment platform and electronic equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10924398B2 (en) * | 2018-09-25 | 2021-02-16 | Ebay Inc. | Time-series data monitoring with sharded server |
-
2020
- 2020-10-30 CN CN202011186088.2A patent/CN112256401B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10528367B1 (en) * | 2016-09-02 | 2020-01-07 | Intuit Inc. | Execution of workflows in distributed systems |
CN111176783A (en) * | 2019-11-20 | 2020-05-19 | 航天信息股份有限公司 | High-availability method and device for container treatment platform and electronic equipment |
CN111045901A (en) * | 2019-12-11 | 2020-04-21 | 东软集团股份有限公司 | Container monitoring method and device, storage medium and electronic equipment |
CN111147596A (en) * | 2019-12-30 | 2020-05-12 | 中国移动通信集团江苏有限公司 | Prometous cluster deployment method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN112256401A (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112256401B (en) | Prometheus high-availability system based on Kubernetes environment and implementation method | |
CN108121782B (en) | Distribution method of query request, database middleware system and electronic equipment | |
CN108804523B (en) | Data synchronization method, system and computer readable storage medium | |
US9672244B2 (en) | Efficient undo-processing during data redistribution | |
CN112084258A (en) | Data synchronization method and device | |
CN111124277A (en) | Deep learning data set caching method, system, terminal and storage medium | |
CN109145060B (en) | Data processing method and device | |
CN112153133B (en) | Data sharing method, device and medium | |
CN113360456B (en) | Data archiving method, device, equipment and storage medium | |
CN113268472B (en) | Distributed data storage system and method | |
CN113094430B (en) | Data processing method, device, equipment and storage medium | |
CN111352943A (en) | Method and device for realizing data consistency, server and terminal | |
CN109726211B (en) | Distributed time sequence database | |
CN111641716A (en) | Self-healing method of parameter server, parameter server and parameter service system | |
CN112015595B (en) | Master-slave database switching method, computing device and storage medium | |
CN112000850B (en) | Method, device, system and equipment for processing data | |
CN106951443B (en) | Method, equipment and system for synchronizing copies based on distributed system | |
CN112187889A (en) | Data synchronization method, device and storage medium | |
CN115004662A (en) | Data synchronization method, data synchronization device, data storage system and computer readable medium | |
CN109165259B (en) | Index table updating method based on network attached storage, processor and storage device | |
CN113515574B (en) | Data synchronization method and device | |
US10860580B2 (en) | Information processing device, method, and medium | |
CN111399753B (en) | Method and device for writing pictures | |
CN116737764A (en) | Method and device for data synchronization, electronic equipment and storage medium | |
CN108376104B (en) | Node scheduling method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |