CN113064766A

CN113064766A - Data backup method, device, equipment and storage medium

Info

Publication number: CN113064766A
Application number: CN202110495305.4A
Authority: CN
Inventors: 李祖金; 罗贤通; 伍俊杰; 罗新良; 梁锦辉; 周添伟; 肖兴钊
Original assignee: Digital Guangdong Network Construction Co Ltd
Current assignee: Digital Guangdong Network Construction Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-07-02

Abstract

The embodiment of the invention discloses a data backup method, a data backup device, data backup equipment and a storage medium. The method comprises the following steps: acquiring target updating data in a first cluster of a target database; acquiring a data source identifier of target updating data, and generating data to be processed according to the target updating data and the data source identifier; and copying the data to be processed to a second cluster of the target database. The embodiment of the invention can realize data backup among the database clusters and effectively avoid annular copy in the data backup process.

Description

Data backup method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a data backup method, a data backup device, data backup equipment and a storage medium.

Background

The rapid increase of the number of internet users can bring great pressure to the database, and in the prior art, a multi-computer-room implementation scheme, such as a cross-database same-city double-activity scheme, is generally adopted to implement the transverse capacity expansion of the database.

In the implementation of a database, the prior art supports data replication in a cluster, that is, synchronization and backup of data can be realized between different nodes in the same logical unit in a replication manner, for example, a Binlog-based master-slave synchronization mechanism can be adopted in a MySQL database, an AOF (application Only File, persistent) master-slave synchronization based on a Sync/Psync mechanism can be realized in a Redis database, an Oplog-based master-slave synchronization mechanism can be adopted in a MongoDB database, and the like, and these mechanisms support high availability of data redundancy and load sharing of read-write separation under one unit.

However, data synchronization in only one logical unit is generally insufficient for many services, and many services require the capability of data synchronization across logical units, and the data backup method provided in the prior art cannot form mutual backup between different clusters of multiple computer rooms, and cannot solve the problem of circular replication in the data backup process.

Disclosure of Invention

Embodiments of the present invention provide a data backup method, apparatus, device, and storage medium, so as to implement data backup between database clusters, and effectively avoid circular replication in a data backup process.

In a first aspect, an embodiment of the present invention provides a data backup method, including:

acquiring target updating data in a first cluster of a target database;

acquiring a data source identifier of the target updating data, and generating data to be processed according to the target updating data and the data source identifier;

and copying the data to be processed to a second cluster of the target database.

In a second aspect, an embodiment of the present invention further provides a data backup apparatus, including:

the data acquisition module is used for acquiring target update data in a first cluster of a target database;

the source identification module is used for acquiring a data source identification of the target updating data and generating data to be processed according to the target updating data and the data source identification;

and the data copying module is used for copying the data to be processed to the second cluster of the target database.

In a third aspect, an embodiment of the present invention further provides a server device, where the server device includes:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a data backup method provided by any of the embodiments of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data backup method provided in any embodiment of the present invention.

The embodiment of the invention acquires the target update data generated in the cluster in which the data update occurs in the target database, acquires the data source identifier of the target update data, generates the data to be processed according to the target update data and the data source identifier thereof, and further copies the data to be processed to other clusters of the target database, thereby synchronizing the newly generated data carrying the data source identifier among the clusters of the target database, realizing the data backup among the database clusters, and effectively avoiding the annular copy in the data backup process.

Drawings

Fig. 1 is a flowchart of a data backup method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an embodiment of a MongoDB database according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a workflow of a MongoDB database according to an embodiment of the present invention.

Fig. 4 is a flowchart of a data backup method according to a second embodiment of the present invention.

Fig. 5 is a schematic diagram of a data backup service of a MongoDB database according to a second embodiment of the present invention.

Fig. 6 is a schematic diagram of a disaster recovery process of the MongoDB database according to the second embodiment of the present invention.

Fig. 7 is a schematic structural diagram of a data backup device according to a third embodiment of the present invention.

Fig. 8 is a schematic structural diagram of a server device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.

It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of a data backup method provided in an embodiment of the present invention, where this embodiment is applicable to a case where data is synchronized between clusters of a database, and the method may be executed by a data backup apparatus provided in an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 1, the method comprises the following operations:

s110, acquiring target updating data in the first cluster of the target database.

Wherein the target database may be any type of database implemented by at least two clusters. The first cluster may be any cluster of the target database where data updates occur. The target update data may be data resulting from an update of data by the first cluster.

Correspondingly, the target database can be implemented by at least two clusters, and data synchronization is required between the clusters of the target database, so that when any cluster in the clusters of the target database is updated, data generated by data update needs to be synchronized to other clusters of the target database. Thus, target update data in a first cluster of a target database may be obtained to backup the target update data to other clusters of the target database.

Alternatively, the target database may be a MongoDB database, and fig. 2 is a schematic diagram of an implementation manner of the MongoDB database provided in the embodiment of the present invention. As shown in fig. 2, the same city dual-activity scheme may be adopted, and the target database is implemented by cluster 1 in the machine room a and cluster 2 in the machine room B. In each cluster of the target database, a master-slave mode may be adopted, and data is stored in a fragmented manner according to a Unique serial number (UID).

Exemplarily, fig. 3 is a schematic diagram of a workflow of a MongoDB database according to an embodiment of the present invention. As shown in fig. 3, in a specific example, when a user a of an application initiates a login request, a DBRouter (database router) may send a corresponding database read request to the database cluster 1 according to the UID of the application. When the user b of the application program initiates an operation request at the same time, the DBRouter may send the corresponding database operation request to the database cluster 2. The DBRouter may determine the node performing the database operation according to the network environment where the cluster is located, the real-time requirement of the operation request of the user on the data, and the last cacheevention (cache removal operation) configuration of the data, and specifically may determine the node performing the database operation according to the network environment, the real-time requirement, and a reference priority order of the cacheevention configuration.

S120, acquiring a data source identifier of the target updating data, and generating data to be processed according to the target updating data and the data source identifier.

The data source identifier may be data indicating a source of the target update data, and the source of the target update data may be a cluster of the target database that generates the target update data. The data to be processed may be data of other clusters than the first cluster that need to be synchronized to the target database.

Accordingly, a data source identification of the target update data may be obtained, and the data source flag may indicate that the target update data is generated by the first cluster. Optionally, the data source identifier may be any data that can uniquely represent the first cluster, for example, the data may be a unique code of the first cluster, or data generated according to the unique code of the first cluster.

Further, the data source identifier of the target update data is obtained, and the data to be processed may be generated according to the target update data and the data source identifier thereof, so that the target update data synchronized to the data to be processed of the other clusters except the first cluster carries the data source identifier thereof.

S130, copying the data to be processed to a second cluster of the target database.

Wherein the second cluster may be at least one cluster of the target database other than the first cluster.

Correspondingly, the data to be processed may be copied to the second cluster, so that the target update data carrying the data source identifier is written in the second cluster, and the data in the second cluster and the data in the first cluster after the data update are mutually backed up. The target update data written in the second cluster carries the data source identifier, so that the data copied from the first cluster can be marked, and when the data of the second cluster is synchronized to the first cluster, the data copied from the first cluster can not be copied to the first cluster according to the data source identifier of the data, thereby effectively avoiding ring copy.

Optionally, parallel replication may be adopted in the data replication process, and different documents or tables may enter different hash queues and be executed concurrently according to the replication granularity option.

The embodiment of the invention provides a data backup method, which comprises the steps of acquiring target update data generated in a cluster with data update in a target database, acquiring a data source identifier of the target update data, generating data to be processed according to the target update data and the data source identifier thereof, and further copying the data to be processed to other clusters of the target database, so that the newly generated data with the data source identifier is synchronized among the clusters of the target database, the data backup among the database clusters is realized, and meanwhile, the annular copy is effectively avoided in the data backup process.

Example two

Fig. 4 is a flowchart of a data backup method according to a second embodiment of the present invention. In the embodiment of the present invention, a specific optional implementation manner is provided for acquiring the data source identifier of the target update data and generating the data to be processed according to the target update data and the data source identifier.

As shown in fig. 4, the method of the embodiment of the present invention specifically includes:

s210, acquiring target updating data in the first cluster of the target database.

In an optional embodiment of the present invention, the obtaining target update data in the first cluster of the target database may include: in an instance in which it is determined that the target update data exists in a first cluster of a target database, a read operation is performed on a secondary replica node of the first cluster to read the target update data.

Wherein a Secondary replica node may be a node for maintaining a Secondary replica.

Accordingly, it is determined that the target update data exists in the first cluster of the target database, that is, it is determined that the data update occurs in the first cluster, and the data update of the first cluster needs to be synchronized to other clusters of the target database. The Secondary replica node may be selected to perform a read operation to read the target update data in the Secondary replica. In the embodiment, by executing the read operation on the secondary copy node, the read-write operation on the central node in the cluster for maintaining the Primary copy can be reduced, and the reduction of the central node I/O (input/output) flow occupied by the original service of the database is avoided.

S220, acquiring a data source identifier of the target updating data, and generating data to be processed according to the target updating data and the data source identifier.

In an optional embodiment of the present invention, S220 may specifically include:

s221, obtaining a first cluster identifier of the first cluster, and determining the first cluster identifier as a data source identifier of the target update data.

Wherein the first cluster identification may be data for uniquely representing the first cluster.

Accordingly, the first cluster identifier may be any pre-generated data that can uniquely represent the first cluster, and since the target update data is generated by data update occurring in the first cluster, the first cluster identifier may be obtained as a data source identifier of the target update data.

Alternatively, the first cluster identifier may be an Opid of the first cluster, and the Opid of the first cluster may be typed in the Oplog of the first cluster of the target database, so that the Opid of the Oplog of the first cluster may be pulled by a poll method as a data source identifier of the target update data generated in the first cluster.

S222, writing the target updating data and the data source identification into a queue to be processed, and generating data to be processed in the queue to be processed.

The pending queue may be a data queue for storing pending data.

Correspondingly, a queue to be processed with a certain data capacity can be preset, after the target update data and the data source identification thereof are obtained, the target update data together with the data source identification thereof are copied and written into the preset queue to be processed, and the queue to be processed is used as the data to be processed in the queue to be processed.

Optionally, in the event that the Opid is typed into the Oplog of the first cluster of the target database to identify the first cluster, so as to obtain the Opid as the data source identifier of the target update data generated in the first cluster, the Opid may be carried by an op _ command in the data replication process.

Illustratively, if the operation request of the user a modifies the cluster 1, after the Oplog of the cluster 1 is updated correspondingly, the updated Oplog of the cluster 1 may be pulled by a poll method, and target update data and OPid are recorded and copied into a queue to be processed, so as to synchronize the update to the cluster 2.

Further, the target update data finally copied to cluster 2 carries the OPid of cluster 1, which may be equal to id _ a, for example. When synchronizing data in cluster 2 to cluster 1, data with Gid of cluster 2, for example, data with Gid equal to id _ B, where id _ a is the Opid of cluster 1 and id _ B is the Opid of cluster 2, may be only fetched from cluster 2, and data with id _ a is not fetched. Thus, the above-described embodiments can avoid circular replication.

S230, copying the data to be processed to a second cluster of the target database.

In an optional embodiment of the present invention, the copying the to-be-processed data to the second cluster of the target database may include: respectively distributing at least one piece of data to be processed in the queue to be processed to at least one worker node; and respectively writing the received data to be processed into a second cluster of the target database through the worker nodes.

The worker node may be a preset node, may receive the data to be processed in the queue to be processed, and may further perform a write operation on the second cluster.

Correspondingly, the to-be-processed data stored in the to-be-processed queue can be distributed to preset worker nodes, and the worker nodes can copy and write the received to-be-processed data into the second cluster to complete synchronization of the target update data.

Exemplarily, fig. 5 is a schematic diagram of a data backup service of a MongoDB database according to an embodiment of the present invention. As shown in fig. 5, by performing a read operation on the Secondary node of the first cluster, target update data and Opid in Oplog are obtained and copied into the Pending Queue, and then the data in Pending Queue is distributed to preset worker nodes worker1, worker2, worker3 and worker4, and then the worker1, worker2, worker3 and worker4 can copy the respectively received data to the second cluster by performing a write operation on the Primary node of the second cluster, so that the first cluster and the second cluster are backed up for data.

In an optional embodiment of the present invention, the distributing at least one piece of to-be-processed data in the to-be-processed queue to at least one worker node respectively may include: according to the sequence of writing the data to be processed into the queue to be processed, sequentially distributing the data to be processed to the worker nodes according to the priority sequence of the worker nodes; the writing, by each worker node, the received data to be processed into the second cluster of the target database includes: and writing the received data to be processed into a second cluster of the target database in sequence through each worker node according to the priority sequence.

The priority order of the worker nodes may be an execution priority order of the worker nodes when executing the write operation on the second cluster, and the priority order may be preset.

Correspondingly, the data to be processed written into the queue to be processed can be distributed firstly according to the sequence of writing the data to be processed into the queue to be processed, wherein the data to be processed written into the queue to be processed firstly can be distributed firstly, and the data to be processed comprises target updating data read firstly in the first cluster; and the data to be processed written into the queue to be processed later is distributed later, wherein the data to be processed comprises target updating data read later in the first cluster. According to the priority order of the worker nodes, the data to be processed written into the queue to be processed firstly can be distributed to the worker node with higher execution priority of the write operation, and the data to be processed written into the queue to be processed later is distributed to the worker node with lower execution priority of the write operation.

Further, each worker node writes the received data to be processed into the second cluster according to the priority order, the worker node with the higher execution priority can preferentially write the data to be processed into the second node, the data to be processed written into the queue to be processed is preferentially written into the first cluster, and then the data to be processed written into the queue to be processed is sequentially written into the second cluster. Therefore, the target update data carries the data source identification thereof, and can be backed up to the second cluster according to the sequence of the target update data read in the first cluster, so that the sequence consistency of the read operation and the write operation is ensured.

In the above embodiment, in the parallel replication process of data replication, the replicated shard _ key (granularity option) may be id, collection, or auto, and different documents or tables may enter different hash queues to be executed concurrently, where id represents to hash according to a document, collection represents to hash according to a table, and auto represents to automatically configure to hash according to a document or a table, specifically, if there is a unique key in a table, the hash is degenerated to collection, otherwise, the hash is equivalent to id. However, the order consistency of the read and write operations of data in one table can be guaranteed by the table hash, but the order consistency of the operations between different tables cannot be guaranteed; the order consistency of the operation of the same document, namely the primary key id in one table can be ensured by the document hash, but the order consistency of the operation of different documents cannot be ensured. In the above embodiment, the order consistency of the read and write operations on the data can be ensured based on the priority order of the worker nodes.

For example, if hashing is performed according to a table, the table can be sequentially distributed to worker nodes worker1, worker2, worker3 and worker4 according to the sequence written into the queue to be processed, wherein the priority sequence of the worker nodes is that the execution priority of the worker nodes worker1, worker2, worker3 and worker4 is from high to low, the sequence of writing data in the table into the second cluster can be ensured to be consistent with the sequence copied into the queue to be processed inside each worker node, and meanwhile, the writing sequence between the tables can be ensured to be consistent with the sequence copied into the queue to be processed.

In an optional embodiment of the invention, the method may further comprise: recording a synchronization offset between the first cluster and the second cluster during the deactivation of the working state of the failed cluster in the case that the first cluster or the second cluster is determined to be the failed cluster; and executing fault recovery operation on the fault cluster according to the synchronous offset under the condition of determining that the working state of the fault cluster is recovered.

Wherein the synchronization offset is a data amount of the target update data generated in the first cluster or the second cluster. The failure recovery operation may be an operation to recover from data synchronization errors caused by the failed cluster.

Accordingly, it may be determined that the working state of the cluster is inactive by pulling the surviving state of the cluster or detecting the heartbeat signal of the cluster, that is, the cluster is a faulty cluster. Exemplarily, fig. 6 is a schematic diagram of a disaster recovery process of a MongoDB database according to an embodiment of the present invention. As shown in fig. 6, when a faulty cluster occurs in the clusters of the target database, the DBRouter may send all the operation requests to other clusters with normal working states by modifying the database driver configuration interface, so as to implement traffic forwarding and disaster recovery.

At this time, the cluster in the normal operating state performs data update due to the execution of the operation request, and generates target update data, and then the data offset between the faulty cluster and the cluster in the normal operating state may be recorded according to the data amount of the generated target update data.

Optionally, when it is determined that the first cluster or the second cluster is the faulty cluster, the synchronization offset between the first cluster and the second cluster may be recorded at different multiple time points during the period of deactivation of the working state of the faulty cluster, and a timestamp during the period of deactivation of the working state of the faulty cluster may also be recorded, where the timestamp may be a timestamp used for recording the synchronization offset each time.

Further, under the condition that the working state of the fault cluster is determined to be recovered, according to the recorded synchronous offset, the data synchronous error generated between the fault cluster and other clusters with normal working states during the period that the working state of the fault cluster is inactivated can be determined, so that the fault recovery operation can be executed on the fault cluster according to the synchronous offset.

In an optional embodiment of the present invention, the performing a fault recovery operation on the fault cluster according to the synchronization offset may include: executing a failover operation on the failure cluster if it is determined that the synchronization offset exceeds the capacity of the queue to be processed; under the condition that the synchronous offset does not exceed the capacity of the queue to be processed, copying the data to be processed in the queue to be processed to the fault cluster with the recovered working state.

The pending queue capacity may be a preset data capacity of a queue for storing pending data. The failover operation may be an operation that migrates a database that was originally implemented by a cluster to other clusters.

Correspondingly, when the working state of the fault cluster is inactivated and target update data exists in other clusters with normal working states, the target update data and the data source identification thereof can be copied into the queue to be processed to generate data to be processed, so that the data to be processed is copied into the fault cluster after the working state of the fault cluster is recovered, and data synchronization among the clusters is completed. If the synchronous offset exceeds the capacity of the queue to be processed, the data to be processed cannot be continuously copied into the queue to be processed in the state that the data stored in the queue to be processed is full during the inactivation period of the working state of the fault cluster. Therefore, even if data backup is performed according to the queue to be processed after the working state of the failed cluster is restored, all target update data during the period of the working state deactivation cannot be synchronized, and a failover operation may be performed to replace the failed cluster.

Optionally, for the MongoDB database in the same city dual-activity mode, for a faulty machine room, which cannot be used for obtaining a survival state, a synchronous offset and a timestamp are recorded, a Raft algorithm (a consistency algorithm) can be adopted to synchronously record term, and the fault migration operation of the faulty machine room is completed through the DNS (Domain Name System) stream switching and DBRouter configuration of service application, so that the same city dual-activity of different machine rooms is realized.

Correspondingly, if the synchronous offset does not exceed the capacity of the queue to be processed, all data to be processed which needs to be backed up are stored in the queue to be processed, the data to be processed in the queue to be processed can be copied to the fault cluster of which the working state is restored, and data synchronization among the clusters is realized.

In the embodiment, the data offset is recorded, so that the flow forwarding and disaster recovery of the database cluster are realized, the data synchronization among the clusters is ensured, and the data loss caused by cluster faults is avoided.

The embodiment of the invention provides a data backup method, which comprises the steps of acquiring target update data generated in a cluster with data update in a target database, acquiring a data source identifier of the target update data, generating data to be processed according to the target update data and the data source identifier thereof, and further copying the data to be processed to other clusters of the target database, so that the newly generated data with the data source identifier is synchronized among the clusters of the target database, the data backup among the database clusters is realized, and meanwhile, the annular copy is effectively avoided in the data backup process; further, in the data copying process, the sequence consistency of the read-write operation on the data is ensured based on the priority sequence of the worker nodes.

EXAMPLE III

Fig. 7 is a schematic structural diagram of a data backup device according to a third embodiment of the present invention, as shown in fig. 7, the device includes: a data acquisition module 310, a source identification module 320, and a data replication module 330.

The data obtaining module 310 is configured to obtain target update data in a first cluster of a target database.

The source identification module 320 is configured to obtain a data source identification of the target update data, and generate to-be-processed data according to the target update data and the data source identification.

A data copying module 330, configured to copy the to-be-processed data to the second cluster of the target database.

In an optional implementation manner of the embodiment of the present invention, the data obtaining module 310 may be specifically configured to: in an instance in which it is determined that the target update data exists in a first cluster of a target database, a read operation is performed on a secondary replica node of the first cluster to read the target update data.

In an optional implementation manner of the embodiment of the present invention, the source identification module 320 may be specifically configured to: acquiring a first cluster identifier of the first cluster, and determining the first cluster identifier as a data source identifier of the target update data; and writing the target updating data and the data source identification into a queue to be processed to generate data to be processed in the queue to be processed.

In an optional implementation manner of the embodiment of the present invention, the data copying module 330 may include: the data distribution submodule is used for respectively distributing at least one piece of data to be processed in the queue to be processed to at least one worker node; and the data writing sub-module is used for writing the received data to be processed into the second cluster of the target database through each worker node.

In an optional implementation manner of the embodiment of the present invention, the data distribution sub-module may be specifically configured to: according to the sequence of writing the data to be processed into the queue to be processed, sequentially distributing the data to be processed to the worker nodes according to the priority sequence of the worker nodes; the data writing submodule may be specifically configured to: and writing the received data to be processed into a second cluster of the target database in sequence through each worker node according to the priority sequence.

In an optional implementation manner of the embodiment of the present invention, the apparatus may further include: a synchronization offset recording module, configured to record, when it is determined that the first cluster or the second cluster is a faulty cluster, a synchronization offset between the first cluster and the second cluster during an inactive period of an operating state of the faulty cluster; wherein the synchronization offset is a data amount of the target update data generated in the first cluster or the second cluster; and the fault recovery module is used for executing fault recovery operation on the fault cluster according to the synchronous offset under the condition of determining that the working state of the fault cluster is recovered.

In an optional implementation manner of the embodiment of the present invention, the failure recovery module may be specifically configured to: executing a failover operation on the failure cluster if it is determined that the synchronization offset exceeds the capacity of the queue to be processed; under the condition that the synchronous offset does not exceed the capacity of the queue to be processed, copying the data to be processed in the queue to be processed to the fault cluster with the recovered working state.

The device can execute the data backup method provided by any embodiment of the invention, and has the corresponding functional module and the beneficial effect of executing the data backup method.

The embodiment of the invention provides a data backup device, which is characterized in that target update data generated in a cluster with data update in a target database are acquired, a data source identifier of the target update data is acquired, data to be processed is generated according to the target update data and the data source identifier of the target update data, and the data to be processed is copied to other clusters of the target database, so that the newly generated data with the data source identifier are synchronized among the clusters of the target database, the data backup among the database clusters is realized, and the annular copy in the data backup process is effectively avoided.

Example four

Fig. 8 is a schematic structural diagram of a server device according to a fourth embodiment of the present invention. FIG. 8 illustrates a block diagram of an exemplary server device 12 suitable for use in implementing embodiments of the present invention. The server device 12 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in FIG. 8, the server device 12 is in the form of a general purpose computing device. The components of server device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Server device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by server device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Server device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

The server device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the server device 12, and/or with any devices (e.g., network card, modem, etc.) that enable the server device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, server device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, the network adapter 20 communicates with the other modules of the server device 12 via the bus 18. It should be appreciated that although not shown in FIG. 8, other hardware and/or software modules may be used in conjunction with server device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 16 executes various functional applications and data processing by running the program stored in the memory 28, so as to implement the data backup method provided by the embodiment of the present invention: acquiring target updating data in a first cluster of a target database; acquiring a data source identifier of the target updating data, and generating data to be processed according to the target updating data and the data source identifier; and copying the data to be processed to a second cluster of the target database.

EXAMPLE five

Fifth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where when the computer program is executed by a processor, the computer program implements a data backup method provided in the embodiments of the present invention: acquiring target updating data in a first cluster of a target database; acquiring a data source identifier of the target updating data, and generating data to be processed according to the target updating data and the data source identifier; and copying the data to be processed to a second cluster of the target database.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for data backup, comprising:

acquiring target updating data in a first cluster of a target database;

2. The method of claim 1, wherein obtaining target update data in the first cluster of target databases comprises:

in an instance in which it is determined that the target update data exists in a first cluster of a target database, a read operation is performed on a secondary replica node of the first cluster to read the target update data.

3. The method according to claim 1, wherein the obtaining of the data source identifier of the target update data and the generating of the data to be processed according to the target update data and the data source identifier comprise:

acquiring a first cluster identifier of the first cluster, and determining the first cluster identifier as a data source identifier of the target update data;

and writing the target updating data and the data source identification into a queue to be processed to generate data to be processed in the queue to be processed.

4. The method of claim 3, wherein the copying the data to be processed to the second cluster of the target database comprises:

respectively distributing at least one piece of data to be processed in the queue to be processed to at least one worker node;

and respectively writing the received data to be processed into a second cluster of the target database through the worker nodes.

5. The method of claim 4, wherein said distributing at least one of the pending data in the pending queue to at least one worker node, respectively, comprises:

according to the sequence of writing the data to be processed into the queue to be processed, sequentially distributing the data to be processed to the worker nodes according to the priority sequence of the worker nodes;

the writing, by each worker node, the received data to be processed into the second cluster of the target database includes:

and writing the received data to be processed into a second cluster of the target database in sequence through each worker node according to the priority sequence.

6. The method of claim 1, further comprising:

recording a synchronization offset between the first cluster and the second cluster during the deactivation of the working state of the failed cluster in the case that the first cluster or the second cluster is determined to be the failed cluster; wherein the synchronization offset is a data amount of the target update data generated in the first cluster or the second cluster;

and executing fault recovery operation on the fault cluster according to the synchronous offset under the condition of determining that the working state of the fault cluster is recovered.

7. The method of claim 6, wherein performing a fault recovery operation on the faulty cluster according to the synchronization offset comprises:

executing a failover operation on the failure cluster if it is determined that the synchronization offset exceeds the capacity of the queue to be processed;

under the condition that the synchronous offset does not exceed the capacity of the queue to be processed, copying the data to be processed in the queue to be processed to the fault cluster with the recovered working state.

8. A data backup apparatus, comprising:

9. A server device, characterized in that the server device comprises:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the data backup method of any of claims 1-7.

10. A computer storage medium having a computer program stored thereon, the program, when executed by a processor, implementing a data backup method according to any one of claims 1-7.