CN116627721A

CN116627721A - Cloud primary database recovery method, device and storage medium based on hybrid cloud

Info

Publication number: CN116627721A
Application number: CN202310658588.9A
Authority: CN
Inventors: 孙斌; 尹萍; 王阳; 庞滨
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-08-22

Abstract

The invention discloses a cloud primary database recovery method, equipment and a storage medium based on hybrid cloud, which belong to the technical field of data processing, and aims to solve the technical problem of how to realize database recovery adaptive to a Kubernetes technology and complete data migration, thereby meeting the requirement of fast cloud loading of clients in cloud age, and adopting the following technical scheme: the method comprises the following steps: monitoring the Kubernetes custom resource event in real time through a recovery medium deployed in the Kubernetes cluster in an application mode; the method comprises the steps of monitoring and creating a Kubernetes custom resource event through a recovery medium, creating a Kubernetes Job resource required by recovery, and realizing recovery data; the method comprises the steps of establishing a Kubernetes Job resource required by recovery, wherein the Kubernetes Job resource required by recovery comprises a recovery task Job and a master-slave task Job; and monitoring the backup Kubernetes Job resource event through the recovery medium, judging whether the recovery task Job and the construction of the master-slave task Job are successful, and updating the information into the Kubernetes custom resource according to the task state execution result.

Description

Cloud primary database recovery method, device and storage medium based on hybrid cloud

Technical Field

The invention relates to the technical field of data processing, in particular to a cloud primary database recovery method, equipment and a storage medium based on hybrid cloud.

Background

Referring to database restoration, two main-stream backup tool software needs to be described first: the open source xtracackup software realizes the physical backup and the mysqldump software realizes the logical backup. Database restoration is to realize a restoration function based on the existing physical backup or logical backup as source data.

Kubernetes in the cloud era is an open source and is used for managing containerized applications on a plurality of hosts in a cloud platform, and the purpose of Kubernetes is to enable the containerized applications to be deployed simply and efficiently, and the Kubernetes provides a mechanism for application deployment, planning, updating and maintenance. Operators were developed by CoreOS to extend the Kubernetes API, an application-specific controller that was used to create, configure and manage complex stateful applications such as databases, caching and monitoring systems. Operators are built on top of the Kubernetes-based resource and controller concept, but at the same time contain application-specific domain knowledge.

Kubernetes is a container orchestration engine that is a Google open source that supports automated deployment, large scale scalability, application containerization management. When an application is deployed in a production environment, multiple instances of the application are typically deployed to load balance application requests.

In Kubernetes, multiple containers may be created, one application instance running in each container, and then management, discovery, and access to the set of application instances is implemented through a built-in load balancing policy, where no complex manual configuration and processing by operation and maintenance personnel is required for these details.

With the gradual rise of cloud computing, higher requirements are put forward on databases, so that how to realize database recovery suitable for the Kubernetes technology and complete data migration, and further, the requirements of customers on quick cloud loading in the cloud era are the technical problems to be solved at present.

Disclosure of Invention

The technical task of the invention is to provide a cloud primary database recovery method, equipment and a storage medium based on a hybrid cloud, so as to solve the problem of how to realize database recovery which is adaptive to the Kubernetes technology, complete data migration and further meet the requirement of fast cloud loading of clients in the cloud era.

The technical task of the invention is realized in the following way, namely a cloud primary database recovery method based on hybrid cloud, which comprises the following steps:

monitoring the Kubernetes custom resource event in real time through a recovery medium deployed in the Kubernetes cluster in an application mode;

the method comprises the steps of monitoring and creating a Kubernetes custom resource event through a recovery medium, creating a Kubernetes Job resource required by recovery, and realizing recovery data; the method comprises the steps of establishing a Kubernetes Job resource required by recovery, wherein the Kubernetes Job resource required by recovery comprises a recovery task Job and a master-slave task Job;

and monitoring the backup Kubernetes Job resource event through the recovery medium, judging whether the recovery task Job and the construction of the master-slave task Job are successful, and updating the information into the Kubernetes custom resource according to the task state execution result.

Preferably, the recovery medium supports the recovery of two types of data files, namely the logical backup and the full-volume physical backup of the database, the recovery medium and the Kubernetes rely on a user-defined resource (CR) to transmit user request information, the definition of the recovery user-defined resource is completed when the recovery medium is initialized, and a user does not need to care about the creation of the resource and can know the attribute configuration of the resource.

More preferably, the recovery medium provides an API interface in a Restful style for a user to use, and the API interface comprises an initiating recovery request interface, a checking recovery task history interface, a checking current recovery task interface, and an suspending and deleting recovery task interface; the API interface provided by the recovery medium is a checking interface for recovering the CR state information, and is realized by calling kubernetes commands, so that the user is prevented from directly contacting kubernetes, and the use threshold of the user is reduced.

More preferably, the database recovery file in the database logical backup is imported from the external object storage, and the recovery medium supports a swift protocol and an S3 protocol for the user to select and use; in the data restoration CR, a location of the backup in an object store (OSS) is defined, where the location in the object store (OSS) includes a domain name, a bucket, a file name, and a database node (one or more, a plurality of database nodes are database clusters to which a restoration task is to restore).

More preferably, the recovery of the full physical backup is specifically as follows:

the user calls an initiating recovery request interface provided by the recovery medium;

the recovery medium generates a recovery request custom resource according to the user transfer parameters, and invokes the Kubernetes to create the custom resource;

after the Kubernetes receives the request, creating a recovery request custom resource, and generating a recovery custom resource creation event;

after monitoring a recovery request custom resource creation event, the recovery medium calls the Kubernetes to update the custom resource state to be 'in the starting of a recovery task'; meanwhile, the recovery medium calls Kubernetes, and a recovery task Job is created;

the recovery medium monitors the event of successful creation of the recovery task Job, updates the self-defined resource state into 'in the execution of the recovery task', and waits for the execution result of the recovery task Job;

after the Kubernetes recovery task Job is created, resource allocation is carried out according to Job configuration, and after Job load resource Pod is successfully created, database recovery operation is executed; wherein, the backup is restored to be a database cluster, and the restoring task Job covers the restoration of all database nodes;

after the recovery task is successfully executed, kubernetes generates a recovery task Job successful execution event which is monitored by a recovery medium, and the recovery medium adjusts the recovery custom resource state to be 'successful in the recovery task execution'; in contrast, when the load (Pod resource) of the recovery task Job is not successfully created, or the recovery process is abnormal, the execution failure of the recovery task Job is caused, the recovery medium updates the specific failure reason into the recovery custom resource, and the update status is "the execution failure of the recovery task".

The restoration of a logical backup of a database is generally consistent with the restoration process described above. The only difference is that the restoration of the edit backup is done only at the master node.

More preferably, in the case where the execution of the recovery task Job fails, the mode of recovering the cause of the media tracking failure is specifically:

(1) monitoring a resume task Job event;

(2) monitoring a resume task Job load (Pod resource) event and an execution log;

the recovery medium guarantees the integrity of the information of the recovery request custom resource, and the information is completely presented to the user; the recovery medium monitors event information of failure of the recovery task Job, and filters according to information in the event of the recovery task Job:

if the failure reason exists in the resume task Job event message, updating the failure reason into the resume request CR resource;

if the failure cause is not clear in the resume task Job event message, the resume medium monitors the Pod resource (resume Job load) event, acquires the event information generated by the Pod resource to make reasonable judgment, and confirms the failure cause.

More preferably, the recovery medium realizes the data recovery function of the single-node database and the cluster database according to the full physical backup; when a plurality of database nodes needing to be restored exist in the user request, the restoration medium considers that the database nodes provided by the user are in a cluster, the restoration medium performs backup restoration, master-slave construction is performed, the master node in the cluster defaults to select the first database node provided by the user, and otherwise, the designated master-slave construction is performed;

the database recovery process of the clustered database is consistent with a single database recovery process.

More preferably, cluster recovery and building is as follows:

the cluster is built after the recovery of each database node is completed; when the recovery medium monitors that the recovery task Job successfully executes the message, the recovery medium changes the self-defined resource state into 'in cluster construction', and meanwhile, the Kubernetes is called to create and construct the cluster Job;

the Job metadata information of the built cluster is generated according to the user request data, and when the cluster is built, the recovery medium automatically analyzes the node backup file to obtain the necessary data information of the built cluster (the user only needs to provide a database account for building a master and slave);

when the recovery medium monitors the completion message of constructing the cluster Job, the user-defined resource state is updated to be 'successful recovery' by invoking Kubernetes, and the cluster is constructed.

An electronic device, comprising: a memory and at least one processor;

wherein the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory, causing the at least one processor to perform a hybrid cloud-based cloud-primary database restoration method as described above.

A computer readable storage medium having stored therein computer executable instructions that when executed by a processor perform a hybrid cloud-based cloud primary database restoration method as described above.

Wherein the Operator was developed by CoreOS to extend Kubernetes API, an application-specific controller that was used to create, configure and manage complex stateful applications such as databases, caching and monitoring systems. Operators are built on top of the Kubernetes-based resource and controller concept, but at the same time contain application-specific domain knowledge. The key to creating an Operator is the design of the CRD (custom resource). The Operator essentially coincides with the Controller schema in Kubernetes, which will do such a management for its resource: to monitor or otherwise check its expected state and then compare it with the current state, if there is some discrepancy in it, it will go to the corresponding update.

The cloud primary database recovery method, the device and the storage medium based on the hybrid cloud have the following advantages:

the invention provides a database recovery mode with low research and development cost, high research and development efficiency and high expandability, has stronger adaptability, and can be widely applied to various database recovery systems;

the cloud primary technology-based cloud recovery method is realized, can realize cross-cloud recovery, and solves the problems of difficult cloud loading, difficult data migration and the like of users;

and thirdly, the invention expands the functions of the Kubernetes by means of the original characteristics of the Kubernetes cloud, and realizes the database recovery task by utilizing a Custom Resource object (Custom Resource), thereby meeting the requirement of fast cloud loading of cloud time clients.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of a CR state flow diagram for backup recovery use;

FIG. 2 is a flow chart for restoring a backup file.

Detailed Description

The method, the device and the storage medium for recovering the cloud primary database based on the hybrid cloud are described in detail below with reference to the attached drawings and the specific embodiments.

Example 1:

the embodiment provides a cloud primary database recovery method based on hybrid cloud, which specifically comprises the following steps:

s1, monitoring a Kubernetes custom resource event in real time through a recovery medium deployed in a Kubernetes cluster in an application mode;

s2, monitoring and creating a Kubernetes custom resource event through a recovery medium, creating a Kubernetes Job resource required by recovery, and realizing recovery data; the method comprises the steps of establishing a Kubernetes Job resource required by recovery, wherein the Kubernetes Job resource required by recovery comprises a recovery task Job and a master-slave task Job;

s3, monitoring a backup Kubernetes Job resource event through a recovery medium, judging whether the recovery task Job and the construction of the master-slave task Job are successful, and updating information into a Kubernetes custom resource according to a task state execution result.

In this embodiment, the restoring medium supports the restoration of two types of data files, namely, logical backup and full-volume physical backup, and the restoring medium and Kubernetes rely on a Custom Resource (CR) to transfer user request information, and the definition of the restoring custom resource is completed when the restoring medium is initialized, so that a user does not need to care about the creation of the resource and understand the attribute configuration of the resource.

The recovery medium in the embodiment provides an API interface of a Restful style for a user to use, and the API interface comprises an initiating recovery request interface, a recovery task history checking interface, a current recovery task checking interface, and a stopping and deleting recovery task interface; the API interface provided by the recovery medium is a checking interface for recovering the CR state information, and is realized by calling kubernetes commands, so that the user is prevented from directly contacting kubernetes, and the use threshold of the user is reduced.

Database recovery files in the database logical backup in the embodiment are imported from an external object storage, and a recovery medium supports a swift protocol and an S3 protocol for a user to select and use; in the data restoration CR, a location of the backup in an object store (OSS) is defined, where the location in the object store (OSS) includes a domain name, a bucket, a file name, and a database node (one or more, a plurality of database nodes are database clusters to which a restoration task is to restore).

The recovery of the full-scale physical backup in this embodiment is specifically as follows:

(1) The user calls an initiating recovery request interface provided by the recovery medium;

(2) The recovery medium generates a recovery request custom resource according to the user transfer parameters, and invokes the Kubernetes to create the custom resource;

(3) After the Kubernetes receives the request, creating a recovery request custom resource, and generating a recovery custom resource creation event;

(4) After monitoring a recovery request custom resource creation event, the recovery medium calls the Kubernetes to update the custom resource state to be 'in the starting of a recovery task'; meanwhile, the recovery medium calls Kubernetes, and a recovery task Job is created;

(5) The recovery medium monitors the event of successful creation of the recovery task Job, updates the self-defined resource state into 'in the execution of the recovery task', and waits for the execution result of the recovery task Job;

(6) After the Kubernetes recovery task Job is created, resource allocation is carried out according to Job configuration, and after Job load resource Pod is successfully created, database recovery operation is executed; wherein, the backup is restored to be a database cluster, and the restoring task Job covers the restoration of all database nodes;

(7) After the recovery task is successfully executed, kubernetes generates a recovery task Job successful execution event which is monitored by a recovery medium, and the recovery medium adjusts the recovery custom resource state to be 'successful in the recovery task execution'; in contrast, when the load (Pod resource) of the recovery task Job is not successfully created, or the recovery process is abnormal, the execution failure of the recovery task Job is caused, the recovery medium updates the specific failure reason into the recovery custom resource, and the update status is "the execution failure of the recovery task".

Wherein, the recovery of the logical backup of the database is substantially consistent with the recovery process described above. The only difference is that the restoration of the edit backup is done only at the master node.

In this embodiment, when the execution of the recovery task Job fails, the mode of recovering the cause of the media tracking failure is specifically as follows:

(1) monitoring a resume task Job event;

The recovery medium in the embodiment realizes the data recovery function of the single-node database and the cluster database according to the full physical backup; when a plurality of database nodes needing to be restored exist in the user request, the restoration medium considers that the database nodes provided by the user are in a cluster, the restoration medium performs backup restoration, master-slave construction is performed, the master node in the cluster defaults to select the first database node provided by the user, and otherwise, the designated master-slave construction is performed;

wherein the database recovery process of the clustered database is consistent with a single database recovery process.

As shown in fig. 1, the cluster recovery and building in this embodiment are as follows:

(1) The cluster is built after the recovery of each database node is completed; when the recovery medium monitors that the recovery task Job successfully executes the message, the recovery medium changes the self-defined resource state into 'in cluster construction', and meanwhile, the Kubernetes is called to create and construct the cluster Job;

(2) The Job metadata information of the built cluster is generated according to the user request data, and when the cluster is built, the recovery medium automatically analyzes the node backup file to obtain the necessary data information of the built cluster (the user only needs to provide a database account for building a master and slave);

(3) When the recovery medium monitors the completion message of constructing the cluster Job, the user-defined resource state is updated to be 'successful recovery' by invoking Kubernetes, and the cluster is constructed.

As shown in fig. 1, the backup recovery use CR state is specifically as follows:

a) After the CR resource is successfully established by the recovery data request, the state is 'in startup';

b) After generating and recovering Job according to the CR resource information, converting the state flow into 'executing';

c) After the execution failure of the recovered data Job, the state flow is changed into failure;

d) After the execution of the recovered data Job is successful, the state flow is changed into 'successful';

e) If it is a single node, there is no "in cluster build" state, and a cluster instance (when it is restored to multiple nodes) is available.

As shown in fig. 2, the logical backup recovery flow and the single-node physical backup recovery flow of the single-node and cluster databases are specifically as follows:

(1) The recovery medium receives a recovery request;

(2) The recovery medium calls Kubernets to create recovery data CR;

(3) Kubernets creates recovery data CR;

(4) Monitoring a recovery medium for a recovery data CR creation event;

(5) The recovery medium creates a recovery data task Job and updates the recovery data CR state to "in recovery";

(6) Kubernets executes the resume data Job;

(7) The recovery medium monitors a Kubernets recovery task Job and judges whether the recovery task is successful or not;

(8) The recovery medium updates the CR state of the recovery data to be 'successful recovery' or 'failed recovery' according to the result of the recovery task Job;

the physical backup and recovery flow of the cluster database is specifically as follows:

(1) The recovery medium receives a recovery request;

(2) The recovery medium calls Kubernets to create recovery data CR;

(3) Kubernets creates recovery data CR;

(4) Monitoring a recovery medium for a recovery data CR creation event;

(6) Kubernets executes the resume data Job;

(8) The recovery medium updates the CR state of the recovery data into 'cluster building' or 'recovery failure' according to the result of the recovery task Job, and the recovery task Job successfully executes and enters the step (9);

(9) The recovery medium calls the Kubernets to create and build a database cluster Job according to the node information of the definition database in the recovery data CR;

(10) Kubernets performs the construction of database clusters Job;

(11) The recovery medium builds a database cluster Job by monitoring Kubernets, and judges whether the task is successful or not;

(12) The recovery medium updates the recovery data CR state to be "recovery successful" or "recovery failed" according to the result of building the database cluster Job.

Example 3:

the embodiment also provides an electronic device, including: a memory and at least one processor;

wherein the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory, causing the at least one processor to perform the hybrid cloud-based cloud-primary database restoration method of any of the present invention.

The processor may be a Central Processing Unit (CPU), but may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be used to store computer programs and/or modules, and the processor implements various functions of the electronic device by running or executing the computer programs and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal, etc. The memory may also include high-speed random access memory, but may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, memory card only (SMC), secure Digital (SD) card, flash memory card, at least one disk storage period, flash memory device, or other volatile solid state memory device.

Example 4:

the embodiment also provides a computer readable storage medium, in which a plurality of instructions are stored, the instructions being loaded by a processor, to cause the processor to execute the hybrid cloud-based cloud primary database restoration method according to any embodiment of the present invention. Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.

In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present invention.

Examples of the storage medium for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer by a communication network.

Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.

Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion unit connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion unit is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A cloud primary database recovery method based on hybrid cloud is characterized by comprising the following steps:

2. The method for recovering the cloud primary database based on the hybrid cloud according to claim 1, wherein the recovery medium supports recovery of two types of data files, namely logical backup and full-scale physical backup, the recovery medium and Kubernetes rely on user-defined resources to transfer user request information, and definition of the recovery user-defined resources is completed when the recovery medium is initialized.

3. The cloud primary database restoration method based on the hybrid cloud according to claim 1 or 2, wherein the restoration medium provides an API interface of Restful style for the user to use, and the API interface includes an initiate restoration request interface, a view restoration task history interface, a view current restoration task interface, and a suspend and delete restoration task interface; the API interface provided by the recovery medium is a viewing interface for recovering the CR state information and is realized by calling a kubernetes command.

4. The cloud primary database recovery method based on hybrid cloud as claimed in claim 3, wherein the database recovery file in the database logical backup is imported from an external object storage, and the recovery medium supports a swift protocol and an S3 protocol for user selection; in the data recovery CR, the location of the backup in the object store is defined, including the domain name, bucket, filename, and database node to be restored to.

5. The hybrid cloud-based cloud primary database restoration method of claim 4, wherein the restoration of the full-volume physical backup is specifically as follows:

after the recovery task is successfully executed, kubernetes generates a recovery task Job successful execution event which is monitored by a recovery medium, and the recovery medium adjusts the recovery custom resource state to be 'successful in the recovery task execution'; in contrast, when the load of the recovery task Job is not successfully created or the recovery process is abnormal, the execution failure of the recovery task Job is caused, the recovery medium updates the specific failure reason into the recovery custom resource, and the update state is 'failure of the recovery task execution'.

6. The cloud primary database recovery method based on hybrid cloud as claimed in claim 5, wherein in case of failure of execution of the recovery task Job, the recovery medium tracking failure cause is specifically:

(1) monitoring a resume task Job event;

(2) monitoring a Job load event of a recovery task and an execution log;

if the failure cause is not clear in the Job event message of the recovery task, the recovery medium monitors the Pod resource event, acquires the Pod resource generation event information, makes reasonable judgment, and confirms the failure cause.

7. The cloud primary database recovery method based on the hybrid cloud as claimed in claim 6, wherein the recovery medium realizes a data recovery function of the single-node database and the cluster database according to the full physical backup; when a plurality of database nodes needing to be restored exist in the user request, the restoration medium considers that the database nodes provided by the user are in a cluster, the restoration medium performs backup restoration, master-slave construction is performed, the master node in the cluster defaults to select the first database node provided by the user, and otherwise, the designated master-slave construction is performed.

8. The cloud primary database recovery method based on hybrid cloud as claimed in claim 7, wherein the cluster recovery and building is as follows:

the Job metadata information of the built cluster is generated according to user request data, and when the cluster is built, a recovery medium automatically analyzes the node backup file to obtain necessary data information of the built cluster;

9. An electronic device, comprising: a memory and at least one processor;

wherein the memory stores computer-executable instructions;

the at least one processor executing the memory-stored computer-executable instructions causes the at least one processor to perform the hybrid cloud-based cloud-primary database restoration method of any of claims 1 to 8.

10. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the hybrid cloud-based cloud primary database restoration method of any of claims 1 to 8.