CN115373799A - Cluster management method and device and electronic equipment - Google Patents

Cluster management method and device and electronic equipment Download PDF

Info

Publication number
CN115373799A
CN115373799A CN202210919560.1A CN202210919560A CN115373799A CN 115373799 A CN115373799 A CN 115373799A CN 202210919560 A CN202210919560 A CN 202210919560A CN 115373799 A CN115373799 A CN 115373799A
Authority
CN
China
Prior art keywords
node
cluster
nodes
database
deploying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210919560.1A
Other languages
Chinese (zh)
Inventor
唐钰杰
蒋伟
周国剑
苏飞
曾令江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202210919560.1A priority Critical patent/CN115373799A/en
Publication of CN115373799A publication Critical patent/CN115373799A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances

Abstract

The application relates to a cluster management method, a cluster management device and electronic equipment, which are used for solving the problem that a deployment and management scheme of a high-security and high-availability database (such as PostgreSQL) cluster is urgently needed. The method includes applying a cluster deployment file in response to successful deployment of a cluster controller for managing a database cluster, and then deploying a database cluster composed of n nodes based on the cluster deployment file, where a single node includes at least a database system and a highly available component for storing state information of the n nodes, and then monitoring the n nodes for each of the n nodes based on the highly available component in each node to manage the database cluster. Based on the method, the automatic deployment, operation, maintenance and test of the database cluster are realized, the high performance, high availability and high safety of the database cluster are ensured, and corresponding hardware resources are effectively saved.

Description

Cluster management method and device and electronic equipment
Technical Field
The present application relates to the field of cloud-native technologies, and in particular, to a method and an apparatus for cluster management, and an electronic device.
Background
With the development of cloud native technologies, kubernets (K8 s), which is an application for managing containerization on multiple hosts in a cloud platform, introduced an Operator mode for managing stateful services applications, which made database containerization to the cloud possible.
Further, for an application with a stateful service such as PostgreSQL (relational database), the prior art applies a K8s cluster state storage component Etcd to store and manage state data and address port information of a PostgreSQL cluster, so as to implement highly-usable deployment and management of the PostgreSQL cluster in the Operator mode of K8 s.
However, in practical applications, the Etcd is more susceptible to external monitoring, in other words, the state data stored in the Etcd is also more susceptible to external monitoring, i.e., there is a problem of lack of security guarantee.
In view of this, a highly secure and highly available deployment and management scheme for PostgreSQL clusters is needed.
Disclosure of Invention
The application provides a cluster management method, a cluster management device and electronic equipment, which are used for realizing automatic deployment of a database Gao Keyong cluster, enabling deployment operation and maintenance tests to be more automatic, further ensuring high performance, high availability and high safety of database cluster management by deploying a high-availability component in each node, and effectively saving hardware resources such as storage and network required by cluster management.
In a first aspect, the present application provides a method for cluster management, where the method includes:
in response to successfully deploying the cluster controller, applying the cluster deployment file; wherein the cluster controller is used for managing a database cluster;
deploying a database cluster consisting of n nodes based on the cluster deployment file; wherein a single node comprises at least a database system and a high availability component for storing state information of the n nodes;
for each of the n nodes, monitoring the n nodes based on highly available components in the each node to manage the database cluster.
In one possible design, the applying the cluster deployment file in response to successfully deploying the cluster controller includes: an application controller deployment file and deploying a cluster controller based on the controller deployment file; applying a cluster deployment file in response to successfully deploying the cluster controller; the cluster deployment file at least comprises a first subfile and a second subfile, wherein the first subfile is used for deploying the high-availability components, and the second subfile is used for deploying the database system.
In one possible design, the deploying a database cluster composed of n nodes based on the cluster deployment file includes: in response to creating the database cluster, determining a total number of nodes that need to be created based on the cluster deployment file; if the number of the created nodes is less than the total number of the nodes needing to be created, circularly executing the following operations to create a single node: creating the single node, and deploying the high-availability component in the single node by adopting a first subfile in the cluster deployment file; deploying the database system in the single node using a second subfile in the cluster deployment file after deploying the highly available component; and according to the operation, creating n nodes matched with the total number of the nodes needing to be created, and obtaining a database cluster consisting of the n nodes.
In one possible design, the state information of a first node of the n nodes includes at least master node information; the first node is a master node, and the first node is a first created node; the state information of other nodes in the n nodes at least comprises slave node information; wherein the other nodes are slave nodes, and the other nodes are nodes which are not the first node among the n nodes.
In one possible design, the monitoring, for each of the n nodes, the n nodes based on highly available components in the each node to manage the database cluster includes: for each node of the n nodes, in response to state information in a highly available component of the each node not being synchronized, obtaining state information for the each node; determining a main node and at least one slave node according to the state information of each node; judging whether the state information in the high-availability component of the main node is less than the state information in the high-availability component of any slave node; if yes, determining that the main node fails, and selecting a target node from the at least one slave node to replace the main node.
In one possible design, after the determining whether the state information in the high available component of the master node is less than the state information in the high available component of any of the slave nodes, the method further includes: and if not, responding to the condition information in the high-availability component of the master node is more than the condition information in the high-availability component of any slave node, and synchronizing the condition information in the high-availability components of the n nodes.
In one possible design, the selecting a target node from the at least one slave node includes: acquiring a node identifier corresponding to each node in the n nodes to obtain n node identifiers; wherein the node identification characterizes a creation order of the each node; arranging the n node identifications in a sequence from small to large to obtain an arrangement sequence of the n node identifications, and taking the arrangement sequence as the arrangement sequence of the n nodes; and taking the next slave node of the master node as a target node according to the arrangement sequence of the n nodes.
To sum up, the application provides a deployment and management scheme of a high-security, high-availability and high-performance database (such as a PostgreSQL cluster), which effectively improves the resource utilization rate of database services, and has the advantages of simple deployment, convenient operation and maintenance, high performance, high availability and high security.
In a second aspect, the present application provides an apparatus for cluster management, the apparatus comprising:
the application module is used for responding to the successful deployment of the cluster controller and applying the cluster deployment file; wherein the cluster controller is used for managing a database cluster;
the deployment module is used for deploying a database cluster consisting of n nodes based on the cluster deployment file; wherein a single node comprises at least a database system and a high availability component for storing state information of the n nodes;
a management module that monitors the n nodes for each of the n nodes based on highly available components in the each node to manage the database cluster.
In one possible design, the application module, in response to successfully deploying the cluster controller, is specifically configured to apply the cluster deployment file by: an application controller deployment file and deploying a cluster controller based on the controller deployment file; applying a cluster deployment file in response to successfully deploying the cluster controller; the cluster deployment file at least comprises a first subfile and a second subfile, wherein the first subfile is used for deploying the high-availability components, and the second subfile is used for deploying the database system.
In one possible design, the deployment module is specifically configured to deploy a database cluster composed of n nodes based on the cluster deployment file, and the deployment module is specifically configured to: in response to creating the database cluster, determining a total number of nodes that need to be created based on the cluster deployment file; if the number of the created nodes is less than the total number of the nodes needing to be created, circularly executing the following operations to create a single node: creating the single node and deploying the highly available component in the single node by adopting a first subfile in the cluster deployment file; deploying the database system in the single node using a second subfile in the cluster deployment file after deploying the highly available component; and according to the operation, creating n nodes matched with the total number of the nodes needing to be created, and obtaining a database cluster consisting of the n nodes.
In one possible design, the state information of a first node of the n nodes includes at least master node information; the first node is a master node, and the first node is a first created node; the state information of other nodes in the n nodes at least comprises slave node information; wherein the other nodes are slave nodes, and the other nodes are nodes which are not the first node among the n nodes.
In one possible design, the monitoring, for each node of the n nodes, the n nodes based on the highly available component in each node to manage the database cluster, and the management module is specifically configured to: for each node of the n nodes, in response to state information in a highly available component of the each node not being synchronized, obtaining state information for the each node; determining a main node and at least one slave node according to the state information of each node; judging whether the state information in the high-availability component of the main node is less than the state information in the high-availability component of any slave node; if yes, determining that the main node fails, and selecting a target node from the at least one slave node to replace the main node.
In one possible design, after the determining whether the state information in the high available component of the master node is less than the state information in the high available component of any of the slave nodes, the management module is further configured to: and if not, responding to the condition information in the high-availability component of the main node is more than the condition information in the high-availability component of any slave node, and synchronizing the condition information in the high-availability components of the n nodes.
In a possible design, the selecting a target node from the at least one slave node, the managing module is specifically configured to: acquiring a node identifier corresponding to each node in the n nodes to obtain n node identifiers; wherein the node identification characterizes a creation order of the each node; arranging the n node identifications in a sequence from small to large to obtain an arrangement sequence of the n node identifications, and taking the arrangement sequence as the arrangement sequence of the n nodes; and taking the next slave node of the master node as a target node according to the arrangement sequence of the n nodes.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
the processor is configured to implement the above steps of the cluster management method when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs one of the cluster management method steps described above.
For each of the second to fourth aspects and possible technical effects of each aspect, please refer to the above description of the first aspect or the possible technical effects of each of the possible solutions in the first aspect, and no repeated description is given here.
Drawings
Fig. 1 is a first schematic diagram of a possible application scenario provided in the present application;
FIG. 2 is a first flowchart of a method for cluster management provided herein;
FIG. 3 is a schematic diagram of a database cluster deployment provided herein;
FIG. 4 is a second diagram illustrating a possible application scenario provided herein;
FIG. 5 is an interaction diagram of a node resource provided in the present application;
fig. 6 is a second flowchart of an automatic disaster recovery switching method provided in the present application;
FIG. 7 is a schematic diagram of an apparatus for cluster management provided in the present application;
fig. 8 is a schematic diagram of a structure of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments.
In the description of the present application "plurality" is understood as "at least two". "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.
The scheme provided by the embodiment of the application relates to a cloud-native technology. Specifically, the embodiment of the application is applied to the management of containerization applications on a plurality of hosts in a cloud platform, and particularly relates to a K8s cluster.
As shown in fig. 1, a first schematic view of an application scenario provided in the embodiment of the present application may include a management module (cluster controller) and a cluster module.
The management module is used for processing the monitoring event and sending a management instruction of the monitoring event to the cluster module.
Specifically, in fig. 1, the management module may include a listening sub-module, a queue sub-module, and a management sub-module. The monitoring submodule is used for monitoring a monitoring event, the queue submodule is used for acquiring the monitoring event from the monitoring submodule, and the management submodule sequentially extracts the monitoring event from the queue submodule to perform consumption processing and sends a management instruction of the monitoring event to the cluster module.
The cluster module is used for receiving a calling instruction of the management submodule and managing cluster resources based on the calling instruction so as to provide application service for the external module.
Specifically, the cluster module in fig. 1 may include a storage sub-module, a connection sub-module, a status sub-module, a termination node sub-module, and a service sub-module.
The storage submodule in fig. 1 is configured to store cluster data, and the connection submodule is configured to forward the cluster data stored in the storage submodule, that is, the storage submodule and the connection submodule may be configured to guarantee persistence of the data, and the design is further configured to support data disaster tolerance scheduling.
The status submodule in fig. 1 is configured to ensure respective status information of each node in the cluster, and further, each node resource at least includes a high availability management component, where the high availability management component and the management component are configured to store the status information of each node, and the status information at least includes a master-slave status. Of course, each node resource may also include a plurality of different application processes, which may be set based on the actual application situation, for example, node a may include a database process and a status process.
The termination submodule in fig. 1 is used to connect the status submodule and the service submodule, and specifically, the termination submodule may be used to determine a network address corresponding to the status information of each node, and transmit the network address to the service submodule.
The service submodule in fig. 1 is configured to receive a network address corresponding to the state information of each node, and then provide an application service to a corresponding external module based on the network address.
It should be noted that fig. 1 is only an example, and is not limited in the embodiments of the present application. Of course, the method provided in the embodiment of the present application is not limited to be used in the application scenario shown in fig. 1, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be realized by each module of the application scenario shown in fig. 1 will be described together in the following method embodiments, and will not be described in detail herein.
The method provided by the exemplary embodiments of the present application is described below with reference to the accompanying drawings in conjunction with the application scenarios described above, it should be noted that the above application scenarios are only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect.
An embodiment of the present application provides a method for cluster management, which is shown in fig. 2, and is used for automatically deploying and managing a database cluster, and the specific implementation process of the method is as follows:
step 201: in response to successfully deploying the cluster controller, applying the cluster deployment file; wherein the cluster controller is used for managing a database cluster;
in this embodiment of the application, if the master node of K8s is an execution subject, before executing step 201, the cluster controller needs to be deployed according to the control deployment file of the application. Specifically, the controller deployment file is first set, that is, the controller deployment file may be set (edited) according to actual conditions, where the emphasis is to apply the controller deployment file, then deploy the cluster controller based on the controller deployment file, and then apply the cluster deployment file in response to successful deployment of the cluster controller.
As shown in fig. 3, information transmission may be performed between Kubectl (client of K8 s) and kubernets Master (server including K8 s). An instruction of a user to create a PostgreSQL cluster (database cluster) can be received at the Kubectl end and sent to the Kubernetes Master. Then, the Kubernetes Master applies the controller deployment file operator. Yaml and the cluster deployment file cluster. Yaml for creating the PostgreSQL cluster based on the instruction.
Further, firstly, the kubernets Master needs to create a node for deploying the postgres Operator (cluster controller), that is, allocating a corresponding storage space for deployment of the postgres Operator. Yaml is called to deploy postgres Operator on the aforementioned node, denoted postgres Operator Pod in fig. 3, where postgres Operator Pod can be understood as a docker, which can be understood as a virtual machine allocated on kubernets Master, and where postgres Operator deployed thereon can be understood as running software. Specifically, the postgres Operator Pod is used to listen to an event sent from Kubectl and process the event.
In detail, as shown in fig. 4, each module in fig. 4 has a corresponding relationship with fig. 1. Among them, the Operator module (management module), i.e., postgres Operator Pod in fig. 3. In other words, the procedure of listening and processing the event by the postgres Operator Pod, that is, the procedure of listening and processing the event by the Operator module in fig. 4. Specifically, as shown in fig. 4, the client-go is configured to listen to an event sent by Kubectl, place the listened event in a workqueue (queue submodule), and call a pgcluster module to perform response processing on the corresponding event through a controller (management submodule) object.
Returning to the time point of successful deployment of the cluster controller after the role of the cluster controller and the application process are exemplarily described in conjunction with the drawings, after the cluster controller is successfully deployed, a cluster deployment file is also applied, wherein the cluster deployment file at least comprises a first subfile for deploying the highly available components and a second subfile for deploying the database system.
Illustratively, as shown in FIG. 3, a first subfile is used to deploy an agent (highly available component) of a node and a second subfile is used to deploy a PostgreSQL (database system) of the node.
In summary, in response to a successful deployment of a cluster controller, a cluster deployment file is applied.
Step 202: deploying a database cluster consisting of n nodes based on the cluster deployment file;
in the embodiment of the application, firstly, the total number of nodes needing to be created is determined based on the number of nodes indicated in the cluster deployment file, then, whether the number of created nodes is smaller than the total number of nodes needing to be created is judged, and if yes, the following operations are executed in a circulating mode to create a single node.
The operations performed by the single loop include: the method comprises the steps of firstly creating space resources of a single node, then deploying a first subfile in a file by using a cluster, deploying a high-availability component in the single node, and then deploying a database system in the single node by using a second subfile in the file by using the cluster.
And circularly executing the operation, and if the number of the created nodes is greater than or equal to the total number of the nodes needing to be created, obtaining the created n nodes and the database cluster formed by the n nodes.
It is worth mentioning that in the above process of creating nodes, state information indicating a master-slave relationship is added for each node, and the state information is stored in a highly available component of each node. Specifically, in n nodes: the state information of the first created node at least comprises main node information, namely the first created node is a main node; the state information of the other nodes not the first creation includes at least slave node information, i.e. the other nodes are slave nodes.
Illustratively, as shown in FIG. 3, the space resources of the first node Pod-0 are first created, and agent components (highly available components) are deployed in Pod-0 based on the first subfile in cluster. Then deploying a PostgreSQL database system in the Pod-0 based on a second subfile in cluster.yaml), specifically, the agent component is configured to initialize a PostgreSQL data directory, optimize a postgresql.conf file, start a postgres service process, circularly monitor a PostgreSQL database cluster state, perform cluster disaster recovery switching and scaling operations, and identify state information of the database system in the Pod-0 (for example: master slave state).
Further, as shown in fig. 3, taking Pod-0 as an example, after the agent component is deployed, a corresponding K8s cluster resource is also created, which includes: creating PVC-0 (PersistentVolumeClaim), binding PVC-0 and PV-0 (PersistentVolume), creating Service-0 and Endpoint-0, and binding Service-0 and Endpoint-0.
In addition, the postgres Operator pod is further configured to monitor the K8s cluster resources created by the Operator, analyze whether the currently created K8s cluster resources are consistent with the desired resources, and if not, call an Application Programming Interface (API) operation to enable the created K8s cluster resources to be consistent with the desired resources.
It is worth noting that "-0" is used in fig. 3 to characterize the cluster resource to which Pod-0 corresponds. In more detail, these cluster resources can be interpreted as a pgcluster module (cluster module) in fig. 4, and Pod-0 in fig. 3 can be equivalent to a Pod in the stateful set module in fig. 4.
In fig. 4, PV (storage submodule) is used to backup data, PVC (connection submodule) is used to connect PV with pod (node) in stateful set module (status submodule) used to ensure status of pod, which may include multiple pods, where each pod calls highly available component agent of PostgreSQL database by internal startup script init.
Taking a single pod as an example, as shown in fig. 5, the case of program interaction in a single pg-pod (database node), where one pg-pod resource may include PostgreSQL (database) service processes postgres and agent processes, the PostgreSQL service process is a PostgreSQL database service process (i.e., an application service process), the agent is a highly available component, and the highly available component is used to ensure the master-slave state of the database. In addition, in the pgcluster module of fig. 4, an Endpoint (termination node submodule) and a Service (Service submodule) are used to provide an application Service to the outside. Specifically, the Endpoint is configured to determine a corresponding network address according to the status of the pod, and the Service provides the application Service to the corresponding external module based on the network address.
Further, the following describes in detail the creation process of Pod-0 in fig. 3 with reference to multiple figures, and in the process of generating a node according to the embodiment of the present application, a corresponding node identifier may also be added to the node based on the creation order of the node. For example, the first node created is identified by node identification 0, the second node created is identified by node identification 1, the third node created is identified by node identification 2, and so on. Here, the node with node identification 0 is generally used as the master node, and the node with node identification greater than 0 is used as the slave node, which will help to automatically deploy and manage the cluster.
In other words, in the embodiment of the present application, a first created node is defined as a master node, only when the initialization of the high-availability component of the master node is successful, a second node is created next, the second node is used as a slave node, and the rest of the nodes are analogized in sequence, which is helpful to ensure that the function of automatic disaster recovery switching can be realized in the cluster operation process.
It is worth to be noted that, the node creation generally sets up the default relationship of one master and two slaves, that is, sets up one master node and two slaves, and this design can ensure the stable operation of automatic disaster recovery switching on one hand, and prevent one master node and one slave node from simultaneously failing; on the other hand can also reach resources are saved's purpose, compare in setting up more from the node, if two from the node can't guarantee the normal operating of service, then just will report wrong information transmission for the cloud platform correspondingly to make the technical staff obtain corresponding wrong information and in time participate in through the cloud platform, and then can in time solve this problem effectively.
In summary, based on the mode for creating the node provided by the present application, automatic deployment of the designated cluster can be realized in a cloud platform environment, and environment functions such as optimization of intelligent parameters, scaling-up and customized parameter configuration can be realized. In addition, the high-availability components are deployed in the nodes, so that the functions of index monitoring of high-availability deployment, fault detection and positioning, automatic disaster recovery and switching and the like of the designated cluster can be realized. Further, in practical application, because the high-availability components are deployed inside the nodes, the security of the cluster can be ensured.
Step 203: for each of the n nodes, monitoring the n nodes based on highly available components in the each node to manage the database cluster.
In the embodiment of the application, for each node in n nodes, the high-availability component of each node achieves the function of automatic disaster recovery switching by monitoring and managing the database cluster. Specifically, the state information of each node is determined in response to the state information in the highly available components of each node not being synchronized, and then a master node and at least one slave node are determined according to the state information of each node.
Further, after the master node and the slave nodes are determined, whether the state information in the high-availability component of the master node is less than the state information in the high-availability component of any slave node is judged: if not, responding to the condition information in the high-availability component of the master node is more than the condition information in the high-availability component of any slave node, synchronizing the condition information in the high-availability components of the n nodes, and completing master-slave synchronization; if yes, determining that the master node fails, selecting a target node from at least one slave node to replace the master node, and completing master-slave switching.
Specifically, a master-slave switching process relates to selection of slave nodes, and specifically, a node identifier corresponding to each node in the n nodes needs to be obtained, that is, a creation order is obtained to obtain n node identifiers, the n node identifiers need to be arranged in a descending order to obtain an arrangement order of the n node identifiers, the arrangement order is used as an arrangement order of the n nodes, and then a subsequent slave node of the master node is used as a target node according to the arrangement order of the n nodes.
For example, in the embodiment of the present application, a flow for managing a database cluster by a high-availability component to implement automatic disaster recovery switching may be shown in fig. 6.
As shown in fig. 6, the high availability component first determines whether the state information of the current nodes is synchronized: if the synchronization is carried out, all the current nodes are proved to be in normal operation; if not, i.e., asynchronous, it is necessary to continuously determine whether the database of the current master node (i.e., master library) is in an unavailable state (i.e., failed) based on the acquired status information.
In detail, the state information may be used to distinguish the master-slave relationship of each node, if the state information of the current master library is less than that of any slave library, then master-slave switching is performed, then node discovery is performed to ensure that the switched master node can work normally, and if the switched master node cannot work normally, the operation of determining synchronization is repeated. And if the switched master node can work normally, the node is used as a new master node, and the high-availability components in each node update corresponding state information, namely the role labels are updated.
Further, after the role labels are updated, the state information of each node is monitored periodically, and if the original master node is recovered to be normal, the original master node is switched to be the master node.
In other words, the state information of each node is monitored periodically (for example, once in 20 s), and then whether the current master node fails is determined by synchronization or asynchronization based on the state information of each node, that is, if the state information of each node is asynchronous and the state information of the master node is less than the state information of any slave node, it can be directly determined that the current master node fails.
In view of this, the embodiments of the present application may select a slave node as a target node based on the node identifier to perform switching between a master node and a slave node, that is, perform cluster management through a high-availability component, so as to achieve a function of automatic disaster recovery switching. According to the technical scheme provided by the embodiment of the application, not only can automatic deployment of the database Gao Keyong cluster be realized, but also the deployment operation and maintenance test can be more automatic, and a mode of deploying a high-availability component in each node further ensures high performance, high availability and high safety of database cluster management, and hardware resources such as storage and network required by cluster management are effectively saved.
To sum up, the deployment and management scheme of the PostgreSQL cluster with high security, high availability and high performance is provided by the embodiments of the present application. The method has the advantages of improving the resource utilization rate of the database service, along with simple deployment, convenient operation and maintenance, high performance, high availability and high safety.
On one hand, based on the design of an operator module, the PostgreSQL cluster is automatically deployed in the K8s environment, and containerization environment functions such as intelligent parameter optimization, capacity expansion, customized parameter configuration and the like are realized.
On the other hand, based on the design of the pgcluster module, the related functions of high-availability deployment of the PostgreSQL cluster, fault detection and positioning, automatic disaster recovery switching, index monitoring of transactions per second tps, queries per second qps and the like are realized.
Based on the same inventive concept, the present application further provides a cluster management apparatus, so as to provide a deployment and management scheme for a high-security, high-availability, and high-performance database (e.g., postgreSQL) cluster, so as to effectively improve resource utilization of database services, and the apparatus has the advantages of simple deployment, convenient operation and maintenance, high performance, high availability, and high security, and is shown in fig. 7, and includes:
an application module 701, which applies the cluster deployment file in response to successful deployment of the cluster controller; wherein the cluster controller is used for managing a database cluster;
a deployment module 702, which deploys a database cluster composed of n nodes based on the cluster deployment file; wherein a single node comprises at least a database system and a high availability component for storing state information of the n nodes;
a management module 703, for each node of the n nodes, monitoring the n nodes based on the high available component in the each node to manage the database cluster.
In one possible design, the application module 701, in response to successfully deploying the cluster controller, is specifically configured to apply the cluster deployment file: an application controller deployment file and deploying a cluster controller based on the controller deployment file; applying a cluster deployment file in response to successfully deploying the cluster controller; the cluster deployment file at least comprises a first subfile and a second subfile, wherein the first subfile is used for deploying the high-availability components, and the second subfile is used for deploying the database system.
In a possible design, the deploying module 702 is specifically configured to deploy a database cluster composed of n nodes based on the cluster deployment file, and to: in response to creating the database cluster, determining a total number of nodes that need to be created based on the cluster deployment file; if the number of the created nodes is less than the total number of the nodes needing to be created, circularly executing the following operations to create a single node: creating the single node and deploying the highly available component in the single node by adopting a first subfile in the cluster deployment file; deploying the database system in the single node using a second subfile in the cluster deployment file after deploying the highly available component; and according to the operation, creating n nodes matched with the total number of the nodes needing to be created, and obtaining a database cluster consisting of the n nodes.
In one possible design, the state information of a first node of the n nodes includes at least master node information; the first node is a master node, and the first node is a first created node; the state information of other nodes in the n nodes at least comprises slave node information; wherein the other nodes are slave nodes, and the other nodes are nodes which are not the first node among the n nodes.
In a possible design, for each node of the n nodes, the n nodes are monitored based on a high available component in the each node to manage the database cluster, and the management module 703 is specifically configured to: for each node of the n nodes, in response to state information in a highly available component of the each node not being synchronized, obtaining state information for the each node; determining a main node and at least one slave node according to the state information of each node; judging whether the state information in the high-availability component of the main node is less than the state information in the high-availability component of any slave node; if yes, determining that the main node fails, and selecting a target node from the at least one slave node to replace the main node.
In one possible design, after the determining whether the state information in the high-availability component of the master node is less than the state information in the high-availability component of any of the slave nodes, the management module 703 is further configured to: and if not, responding to the condition information in the high-availability component of the main node is more than the condition information in the high-availability component of any slave node, and synchronizing the condition information in the high-availability components of the n nodes.
In a possible design, the management module 703 is specifically configured to: acquiring a node identifier corresponding to each node in the n nodes to obtain n node identifiers; wherein the node identification characterizes a creation order of the each node; arranging the n node identifications in a sequence from small to large to obtain an arrangement sequence of the n node identifications, and taking the arrangement sequence as the arrangement sequence of the n nodes; and taking the next slave node of the master node as a target node according to the arrangement sequence of the n nodes.
Based on the device, not only can the automatic deployment of the database Gao Keyong cluster be realized, but also the deployment operation and maintenance test can be more automatic, and the mode of deploying the high-availability component in each node further ensures the high performance, high availability and high safety of database cluster management, and effectively saves hardware resources such as storage, network and the like required by the cluster management.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing cluster management apparatus, and with reference to fig. 8, the electronic device includes:
at least one processor 801 and a memory 802 connected to the at least one processor 801, in this embodiment, a specific connection medium between the processor 801 and the memory 802 is not limited in this application, and fig. 8 illustrates an example in which the processor 801 and the memory 802 are connected by a bus 800. The bus 800 is shown in fig. 8 by a thick line, and the connection manner between other components is merely illustrative and not limited thereto. The bus 800 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 8 with only one thick line for ease of illustration, but does not represent only one bus or type of bus. Alternatively, the processor 801 may also be referred to as a controller, without limitation to name a few.
In the embodiment of the present application, the memory 802 stores instructions executable by the at least one processor 801, and the at least one processor 801 may execute the cluster management method discussed above by executing the instructions stored in the memory 802. The processor 801 may implement the functions of the various modules in the apparatus shown in fig. 7.
The processor 801 is a control center of the apparatus/system, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions and process data of the apparatus/system by operating or executing instructions stored in the memory 802 and calling up data stored in the memory 802, thereby performing overall monitoring of the apparatus/system.
In one possible design, the processor 801 may include one or more processing units, and the processor 801 may integrate an application processor that handles primarily operating systems, user interfaces, application programs, and the like, and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801. In some embodiments, the processor 801 and the memory 802 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 801 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the cluster management method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
The memory 802, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 802 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 802 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 802 in the embodiments of the present application may also be circuitry or any other device/system capable of performing a storage function for storing program instructions and/or data.
By programming the processor 801, the code corresponding to the cluster management method described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the cluster management method in the embodiment shown in fig. 2 when running. How to program the processor 801 is well known to those skilled in the art and will not be described in detail herein.
Based on the same inventive concept, embodiments of the present application further provide a storage medium storing computer instructions, which, when executed on a computer, cause the computer to perform the cluster management method discussed above.
In some possible embodiments, the aspects of the cluster management method provided in this application may also be implemented in the form of a program product comprising program code for causing a control device to perform the steps of the cluster management method according to various exemplary embodiments of this application described above in this specification when the program product is run on an apparatus.
It should be apparent to one skilled in the art that embodiments of the present application may be provided as a method, apparatus/system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of cluster management, the method comprising:
in response to successfully deploying the cluster controller, applying the cluster deployment file; wherein the cluster controller is used for managing a database cluster;
deploying a database cluster consisting of n nodes based on the cluster deployment file; wherein a single node comprises at least a database system and a high availability component for storing state information of the n nodes;
for each of the n nodes, monitoring the n nodes based on highly available components in the each node to manage the database cluster.
2. The method of claim 1, wherein applying the cluster deployment file in response to successfully deploying the cluster controller comprises:
applying a controller deployment file and deploying a cluster controller based on the controller deployment file;
applying a cluster deployment file in response to successfully deploying the cluster controller; the cluster deployment file at least comprises a first subfile and a second subfile, wherein the first subfile is used for deploying the high-availability components, and the second subfile is used for deploying the database system.
3. The method of claim 1, wherein said deploying a database cluster comprised of n nodes based on said cluster deployment file comprises:
determining the total number of nodes needing to be created based on the number of the nodes indicated in the cluster deployment file;
if the number of the created nodes is less than the total number of the nodes needing to be created, circularly executing the following operations to create a single node:
creating the single node and deploying the highly available component in the single node by adopting a first subfile in the cluster deployment file;
deploying the database system in the single node using a second subfile in the cluster deployment file after deploying the highly available component;
and according to the operation, creating n nodes matched with the total number of the nodes needing to be created, and obtaining a database cluster consisting of the n nodes.
4. The method of claim 1, wherein the state information of a first node of the n nodes comprises at least master node information; the first node is a master node, and the first node is a first created node;
the state information of other nodes in the n nodes at least comprises slave node information; wherein the other nodes are slave nodes, and the other nodes are nodes which are not the first node in the n nodes.
5. The method of any one of claims 1-4, wherein said monitoring, for each of said n nodes, said n nodes to manage said database cluster based on highly available components in said each node, comprises:
for each node of the n nodes, in response to state information in a highly available component of the each node not being synchronized, obtaining state information for the each node;
determining a main node and at least one slave node according to the state information of each node;
judging whether the state information in the high-availability component of the main node is less than the state information in the high-availability component of any slave node;
if yes, determining that the main node fails, and selecting a target node from the at least one slave node to replace the main node.
6. The method of claim 5, wherein after said determining whether the state information in the highly available components of the master node is less than the state information in the highly available components of any of the slave nodes, further comprising:
and if not, responding to the condition information in the high-availability component of the main node is more than the condition information in the high-availability component of any slave node, and synchronizing the condition information in the high-availability components of the n nodes.
7. The method of claim 5, wherein said selecting a target node from said at least one slave node comprises:
acquiring a node identifier corresponding to each node in the n nodes to obtain n node identifiers; wherein the node identification characterizes a creation order of the each node;
arranging the n node identifications in a sequence from small to large to obtain an arrangement sequence of the n node identifications, and taking the arrangement sequence as the arrangement sequence of the n nodes;
and taking the next slave node of the master node as a target node according to the arrangement sequence of the n nodes.
8. An apparatus for cluster management, the apparatus comprising:
the application module is used for responding to the successful deployment of the cluster controller and applying the cluster deployment file; wherein the cluster controller is used for managing a database cluster;
the deployment module is used for deploying a database cluster consisting of n nodes based on the cluster deployment file; wherein a single node comprises at least a database system and a high availability component for storing state information of the n nodes;
a management module that monitors the n nodes for each of the n nodes based on highly available components in the each node to manage the database cluster.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-7 when executing the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN202210919560.1A 2022-08-02 2022-08-02 Cluster management method and device and electronic equipment Pending CN115373799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210919560.1A CN115373799A (en) 2022-08-02 2022-08-02 Cluster management method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210919560.1A CN115373799A (en) 2022-08-02 2022-08-02 Cluster management method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115373799A true CN115373799A (en) 2022-11-22

Family

ID=84063151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210919560.1A Pending CN115373799A (en) 2022-08-02 2022-08-02 Cluster management method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115373799A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757361A (en) * 2022-12-26 2023-03-07 北京柏睿数据技术股份有限公司 Method and system for automatic deployment of database cluster system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757361A (en) * 2022-12-26 2023-03-07 北京柏睿数据技术股份有限公司 Method and system for automatic deployment of database cluster system
CN115757361B (en) * 2022-12-26 2023-08-18 北京柏睿数据技术股份有限公司 Automatic deployment method and system for database cluster system

Similar Documents

Publication Publication Date Title
CN110389900B (en) Distributed database cluster testing method and device and storage medium
CN107526659B (en) Method and apparatus for failover
CN115328663A (en) Method, device, equipment and storage medium for scheduling resources based on PaaS platform
CN111209110B (en) Task scheduling management method, system and storage medium for realizing load balancing
CN103200036B (en) A kind of automation collocation method of electric power system cloud computing platform
CN111897633A (en) Task processing method and device
CN102355369A (en) Virtual clustered system as well as processing method and processing device thereof
CN111597061A (en) Method and device for remote procedure call of client or server
CN110659262A (en) Master-slave switching method and device for redis node
CN108268305A (en) For the system and method for virtual machine scalable appearance automatically
CN109254876A (en) The management method and device of database in cloud computing system
CN113467873A (en) Virtual machine scheduling method and device, electronic equipment and storage medium
CN111459642A (en) Fault processing and task processing method and device in distributed system
CN111538585B (en) Js-based server process scheduling method, system and device
CN111541762A (en) Data processing method, management server, device and storage medium
CN115373799A (en) Cluster management method and device and electronic equipment
CN110333986B (en) Method for guaranteeing availability of redis cluster
CN114565502A (en) GPU resource management method, scheduling method, device, electronic equipment and storage medium
CN109257396B (en) Distributed lock scheduling method and device
CN113419818A (en) Basic component deployment method, device, server and storage medium
CN112631756A (en) Distributed regulation and control method and device applied to space flight measurement and control software
CN115291891A (en) Cluster management method and device and electronic equipment
CN112073499A (en) Dynamic service method of multi-machine type cloud physical server
CN114172903B (en) Node capacity expansion method, device, equipment and medium of slm scheduling system
CN111767122A (en) Distributed task scheduling management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination