CN109144785B - Method and apparatus for backing up data - Google Patents

Method and apparatus for backing up data Download PDF

Info

Publication number
CN109144785B
CN109144785B CN201810980327.8A CN201810980327A CN109144785B CN 109144785 B CN109144785 B CN 109144785B CN 201810980327 A CN201810980327 A CN 201810980327A CN 109144785 B CN109144785 B CN 109144785B
Authority
CN
China
Prior art keywords
database
data
backup
transaction
transaction identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810980327.8A
Other languages
Chinese (zh)
Other versions
CN109144785A (en
Inventor
江俊汝
周坤龙
赖宝华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810980327.8A priority Critical patent/CN109144785B/en
Publication of CN109144785A publication Critical patent/CN109144785A/en
Application granted granted Critical
Publication of CN109144785B publication Critical patent/CN109144785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Abstract

The embodiment of the application discloses a method and a device for backing up data. One embodiment of the above method comprises: receiving a data backup instruction, wherein the data backup instruction is used for instructing data in a main database included by a database fragment to be backed up, and the database fragment also comprises a secondary database; the slave database is controlled to suspend backup of data of the master database and is controlled to acquire a binary log from the master database; acquiring identifiers of submitted transactions from a third-party server to obtain a transaction identifier set; determining a first target transaction identifier from the set of transaction identifiers; and controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier. The embodiment realizes strong consistency of the distributed database.

Description

Method and apparatus for backing up data
Technical Field
The embodiment of the application relates to the technical field of databases, in particular to a method and a device for backing up data.
Background
With the rapid development of computer technology, the degree of informatization of the society is higher and higher, data generated every day is more and more, and a lot of application programs need to process massive data, which puts very high requirements on the storage and reading of the data. As a tool for processing structured data, databases face a huge challenge, and databases that have been previously deployed on a single machine have gradually failed to meet the processing demands for rapidly growing massive data. In order to deal with the war of mass data, the distributed database is created, and the storage of data and the query of the data can be distributed to a node for execution through a distributed architecture, so that the capacity increase and the performance improvement of the system can be realized through expanding the node, and the data volume which can be processed by the database is greatly improved.
In order to ensure data security, backup (especially hot backup) and recovery functions are essential functions for a database, so-called hot backup, i.e. a backup of a database while it continues to provide services. Currently, many databases provide tools and methods for hot backup and recovery, and for distributed databases, due to their distributed architecture, hot backup and recovery require higher technologies.
Disclosure of Invention
The embodiment of the application provides a method and a device for backing up data.
In a first aspect, an embodiment of the present application provides a method for backing up data, including: receiving a data backup instruction, wherein the data backup instruction is used for instructing data in a main database included by a database fragment to be backed up, and the database fragment also comprises a secondary database; controlling the slave database to suspend backup of data of the master database and controlling the slave database to acquire a binary log from the master database; acquiring identifiers of submitted transactions from a third-party server to obtain a transaction identifier set; determining a first target transaction identifier according to the transaction identifier set; and controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier.
In some embodiments, the slave database is run with a first thread and a second thread; and the controlling the slave database to suspend the backup of the data of the master database and the controlling the slave database to obtain the binary log from the master database, comprising: and the first thread controlling the slave database suspends the backup of the data of the master database and the second thread controlling the slave database acquires the binary log from the master database.
In some embodiments, the controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier includes: and controlling a first thread of the slave database to perform transaction playback on a transaction submitted before the transaction corresponding to the first target transaction identifier in the binary log is submitted so as to backup the data in the master database.
In some embodiments, the obtaining the set of transaction identifiers of the committed transaction from the third-party server includes: acquiring a transaction identifier set of the submitted transaction from the third-party server at intervals of a first preset duration; and the above method further comprises: and storing the transaction identifier set and the acquisition time into a preset metadata database.
In some embodiments, the above method further comprises: controlling the slave database to upload the backup file obtained by the backup completion to a cloud server; and updating the backup state information in the metadata base in response to the completion of the uploading of the backup file.
In some embodiments, the controlling the slave database to obtain the binary log from the master database includes: acquiring a binary log from the master database at intervals of a second preset time length; and the above method further comprises: storing the obtained binary log; storing at least one item of the following information of the binary log in the metadata database: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier.
In some embodiments, the above method further comprises: receiving a data recovery instruction, wherein the data recovery instruction comprises a data recovery time point; determining a second target transaction identifier submitted at the data recovery time point according to information recorded in the metadata database; determining the binary log to which the second target transaction identifier belongs as a target binary log; and controlling the slave database and the master database to recover data based on the target binary log.
In a second aspect, an embodiment of the present application provides an apparatus for backing up data, including: the backup instruction receiving unit is configured to receive a data backup instruction, the data backup instruction is used for indicating that data in a master database included in the database shards are backed up, and the database shards further include a slave database; a log obtaining unit configured to control the slave database to suspend backup of data of the master database and to control the slave database to obtain a binary log from the master database; an identifier obtaining unit configured to obtain an identifier of the submitted transaction from a third-party server, resulting in a transaction identifier set; a first identifier determination unit configured to determine a first target transaction identifier according to the transaction identifier set; and the data backup unit is configured to control the slave database to backup the data in the master database according to the binary log and the first target transaction identifier.
In some embodiments, the slave database is run with a first thread and a second thread; and the log obtaining unit is further configured to: and the first thread controlling the slave database suspends the backup of the data of the master database and the second thread controlling the slave database acquires the binary log from the master database.
In some embodiments, the data backup unit is further configured to: and controlling a first thread of the slave database to perform transaction playback on a transaction submitted before the transaction corresponding to the first target transaction identifier in the binary log is submitted so as to backup the data in the master database.
In some embodiments, the identifier obtaining unit is further configured to: acquiring a transaction identifier set of the submitted transaction from the third-party server at intervals of a first preset duration; and the above apparatus further comprises: and the storage unit is configured to store the transaction identifier set and the acquisition time into a preset metadata base.
In some embodiments, the above apparatus further comprises: the backup file uploading unit is configured to control the slave database to upload the backup file obtained by completing the backup to the cloud server; and the memory cell is further configured to: and updating the backup state information in the metadata base in response to the completion of the uploading of the backup file.
In some embodiments, the log obtaining unit is further configured to: acquiring a binary log from the master database at intervals of a second preset time length; and the memory unit is further configured to: storing the obtained binary log; storing at least one of the following information of the binary journal into the metadata database: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier.
In some embodiments, the above apparatus further comprises: a restoration instruction receiving unit configured to receive a data restoration instruction including a data restoration time point; a second identifier determination unit configured to determine a second target transaction identifier submitted at the data recovery time point based on information recorded in the metadata base; a log determining unit configured to determine that the binary log to which the second target transaction identifier belongs is a target binary log; and a data recovery unit configured to control the slave database and the master database to recover data based on the target binary log.
In a third aspect, an embodiment of the present application provides an apparatus, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any embodiment of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method as described in any one of the embodiments of the first aspect.
According to the method and the device for backing up data provided by the above embodiments of the present application, after receiving the data backup instruction, the slave database included in the database shards of the distributed database may be controlled to suspend the data of the master database included in the backup database shards, and the slave database may be controlled to obtain the binary log from the master database. The identifier of the submitted transaction can be obtained from a third-party server, and a transaction identifier set is obtained. A first target transaction identifier may then be determined from the set of transaction identifiers. And finally, controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier. The method and the device of the embodiment can realize the strong consistency of each database fragment of the distributed database.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a diagram of an exemplary system architecture in which an embodiment of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for backing up data according to the present application;
FIG. 3 is a schematic diagram of one application scenario of a method for backing up data according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for backing up data according to the present application;
FIG. 5 is a structural schematic diagram of one embodiment of an apparatus for backing up data according to the present application;
FIG. 6 is a block diagram of a computer system suitable for use in implementing a local transaction manager of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for backing up data or the apparatus for backing up data of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include a terminal device 101, a Distributed Transaction Manager (DTM) 102, a Distributed database 103, and a third party server 104. The interaction between the terminal device 101 and the distributed transaction manager 102, between the distributed transaction manager 102 and the distributed database 103, between the distributed transaction manager 102 and the third party server 104, and between the distributed database 103 and the third party server 104 may be through a network. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal device 101 to interact with the distributed transaction manager 102 to receive or send messages or the like. Various communication client applications, such as database management applications, shopping-like applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal device 101.
The terminal apparatus 101 may be hardware or software. When the terminal device 101 is a hardware, it may be various electronic devices having a display screen, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented as multiple software or software modules (e.g., to provide distributed services) or as a single software or software module. And is not particularly limited herein.
Distributed transaction manager 102 may be a device for managing transactions of a distributed database. The distributed transaction manager 102 may process the received data such as the database operation request, and feed back the processing result to the terminal device 101.
It should be noted that the distributed transaction manager 102 may be hardware or software. When distributed transaction manager 102 is hardware, it may be implemented as a distributed transaction manager cluster of multiple distributed transaction managers. When the distributed transaction manager 102 is software, it may be implemented as a plurality of software or software modules.
The distributed database 103 may be various databases for providing data storage services the distributed database 103 may include a plurality of database shards 1031, 1032, 1033 each database shard may also include a master database and a slave database each database shard may also include local transaction managers (L overall transaction manager, &lttttranslation = L "&tttl &/t &gtt) local transaction managers each for managing transactions of the database shard.
The third party server 104 may be used to store identifiers of transactions that the distributed database 103 has committed. The distributed transaction manager 102 may send an identifier of the committed transaction to the third party server 104 after the transaction is committed. Each database shard in distributed database 103 may obtain an identifier of the committed transaction from third party server 104.
It should be noted that the third-party server 104 may be hardware or software. When the third-party server 104 is hardware, it may be implemented as a cluster of third-party servers 104 consisting of a plurality of third-party servers 104. When the third party server 104 is software, it may be implemented as a plurality of software or software modules.
The method for backing up data provided by the embodiment of the application is generally executed by a local transaction manager. Accordingly, the means for backing up data is typically located in the local transaction manager.
It should be understood that the number of end devices, distributed transaction managers, database shards in a distributed database, and third party servers in fig. 1 are merely illustrative. There may be any number of terminal devices, distributed transaction managers, database shards in a distributed database, and third party servers, depending on implementation needs.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for backing up data in accordance with the present application is shown. The method for backing up data of the embodiment comprises the following steps:
step 201, receiving a data backup instruction.
In this embodiment, an executing agent (e.g., the local transaction manager shown in fig. 1) of the method for backing up data may receive a data backup instruction, where the data backup instruction may be sent to the executing agent by a distributed transaction manager (e.g., the distributed transaction manager 102 shown in fig. 1) or may be triggered by a timing instruction inside the executing agent.
The data backup instruction is used for instructing to backup data in a main database included in the database fragment. Fragmentation is an effective way to spread a database across multiple physical nodes, and its main purpose is to break through the limitation of input or output capability of a single-node database server. Each physical node may be referred to as a database shard. The data stored in each database fragment is different, and different transactions can be processed simultaneously. A transaction refers to a minimum logical unit of work to access a database in order to implement a particular business function. A transaction is a sequence of operations.
For the distributed database system, an operation request for the database, which is sent to the distributed transaction manager by a user through the terminal device, is finally converted into a sequence of database fragment access operations distributed at each corresponding site in the network. A transaction in a distributed database system is therefore a sequence of distributed operations, called a distributed transaction. After each database fragment executes the sequence of the completion operation, a prepare commit message is sent to the execution subject. The execution principal, upon receiving the ready-to-commit message, sends a ready message to the distributed transaction manager. The distributed transaction manager commits the distributed transaction after receiving the ready message sent by each local transaction manager.
In order to improve the disaster tolerance capability, the load balancing capability and the like of each database fragment, a master database and a slave database may be arranged in each database fragment. In this way, when a problem occurs in the master database, the slave database can continue to process requests for the database without affecting the user experience. In this embodiment, the database shard further includes a slave database. The slave database may back up the data in the master database.
In step 202, the slave database is controlled to suspend backup of data of the master database and is controlled to obtain the binary log from the master database.
After receiving the data backup instruction, the execution main body can control the slave database in the database fragment to suspend backup of the data in the master database. Specifically, the execution main body may send a control instruction to the slave database to control the slave database to suspend backup of data in the master database. Likewise, the execution subject may also send another control instruction to the slave database to control the slave database to retrieve the binary log from the master database. Binary logs (Binary logs) are used to record modifications that occur in the database, such as data modification, creation and modification of tables, and the like. The binary log may further include a plurality of transaction identifiers recorded by the execution time and a plurality of database operation statements corresponding to each transaction identifier. The transaction identifier is generated by the distributed transaction manager after receiving the database operation request sent by the terminal device, and then is forwarded to each database fragment. The slave database can acquire the binary log of the master database and then execute various operation statements in the binary log, thereby maintaining synchronization in the slave database and the master database.
In some optional implementations of this embodiment, a first thread and a second thread may run from the database. The first thread is used for synchronizing data in the master database, and the second thread is used for acquiring the binary logs from the master database. The above step 202 may be implemented by the following steps not shown in fig. 2: the first thread controlling the slave database suspends the backup of the data of the master database and the second thread controlling the slave database fetches the binary log from the master database.
In this implementation, the first thread may be a SQ L thread of the slave database, and the SQ L thread is used to synchronize data in the master database.
The master database may obtain a binary log after the binary log of the name from the storage location through an I/O thread running thereon after receiving the request, and return the binary log to the I/O thread of the slave database, wherein the binary log after the binary log of the name refers to a binary log whose generation time is after the generation time of the binary log of the name.
Step 203, obtaining the identifier of the submitted transaction from the third-party server, and obtaining the transaction identifier set.
In this embodiment, the identifier of the committed transaction may be stored in the third-party server. The distributed transaction manager, upon committing the distributed transaction, can send an identifier of the committed distributed transaction to the third party server. The third party server may store an identifier of the committed distributed transaction. During data backup, the local transaction manager of each database fragment may obtain an identifier of the committed transaction from the third-party server, and obtain a transaction identifier set. In practical applications, the third party server may be a redis server.
Step 204, determining a first target transaction identifier according to the transaction identifier set.
After obtaining the transaction identifier set, the execution subject may determine a transaction identifier from the transaction identifier set as the first target transaction identifier. It will be appreciated that in order to ensure strong consistency of the distributed database, the first target transaction identifier determined by the local transaction manager of each database shard is the same. In a specific implementation, the local transaction manager of each database shard may agree in advance to use the identifier of the transaction with the latest commit time in the transaction identifier set as the first target transaction identifier.
And step 205, controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier.
The execution agent can determine the object identifier with the execution time before the execution time corresponding to the first target transaction identifier in the binary log, and then control the SQ L thread of the slave database to execute the operation statement corresponding to the determined object identifier, thereby realizing the backup of the data in the master database.
In some optional implementations of this embodiment, the step 205 may be specifically implemented by the following steps not shown in fig. 2: and controlling the first thread of the secondary database to perform transaction playback on the transaction submitted before the transaction corresponding to the first target transaction identifier in the secondary log is submitted so as to backup the data in the primary database.
In this implementation, the execution agent may send a "start slave sql _ thread negative xid" to the SQ L thread of the slave database to control the SQ L thread to start synchronizing data in the master database until the transaction indicated by the first target transaction identifier stops.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for backing up data according to the present embodiment. In the application scenario of fig. 3, after receiving the data backup instruction, the local transaction manager 301 sends a message to the slave database 302 to notify the slave database 302 to stop backup and obtain the binary log. The slave database 302 retrieves the binary log from the master database 303. The local transaction manager 301 obtains the set of transaction identifiers from the third party server 304 and determines the first target transaction identifier. The first target transaction identifier is then sent to the slave database 302. The backup is started upon receiving the first target transaction identifier from the database 302.
According to the method for backing up data provided by the above embodiment of the application, after the data backup instruction is received, the slave database included in the database fragment of the distributed database may be controlled to suspend the data of the master database included in the backup database fragment, and the slave database may be controlled to obtain the binary log from the master database. Identifiers of committed transactions can also be obtained from a third party server, resulting in a set of transaction identifiers. A first target transaction identifier may then be determined from the set of transaction identifiers. And finally, controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier. The method of the embodiment can realize the strong consistency of each database fragment of the distributed database.
In some optional implementations of this embodiment, the step 203 may specifically include the following steps not shown in fig. 2: a set of transaction identifiers for committed transactions is obtained from a third party server at a first predetermined duration of time. Accordingly, the method for backing up data may further include the following steps not shown in fig. 2: and storing the transaction identifier set and the acquisition time into a preset metadata database.
In this implementation, the execution subject may obtain the transaction identifier set of the committed transaction from the third-party server at a first preset duration (e.g., 1 second). And then storing the transaction identifier set and the acquisition time acquired each time into a preset metadata base. The metadata database may include a plurality of metadata tables, each for storing a set of transaction identifiers and a retrieval time.
In some optional implementations, when the execution subject obtains the transaction identifier set from the third-party server, the execution subject may further obtain a timestamp of the third-party server, and use the timestamp as the obtaining time of the transaction identifier set.
In some optional implementations of this embodiment, the method may further include the following steps not shown in fig. 2: controlling the database to upload the backup file obtained by completing the backup to a cloud server; and updating the backup state information in the metadata base in response to the uploading of the backup file being completed.
In this implementation, the slave database may notify the execution agent after the backup is completed. For example, a backup complete message may be sent from the database to the executing agent to indicate that the backup is complete. After receiving the message, the execution main body may control the backup file obtained by completing the backup to be uploaded to the cloud server from the database. Specifically, the execution main body may send a control instruction for instructing to upload the backup file to the slave database, and the slave database may upload the backup file to the cloud server after receiving the control instruction. After the upload from the database is complete, the execution agent may be notified. For example, an upload complete message may be sent from the database to the execution agent to indicate that the upload of the backup file is complete. The executing agent may update the backup state information in the metadata database upon receiving the message. In particular, the executing agent may set the backup status information in the metadata repository to "backed up" or other information that may be used to indicate that it has backed up.
In some optional implementations of this embodiment, the step 202 may further include the following steps not shown in fig. 2: and acquiring a binary log from the master database at a second preset time interval. Correspondingly, the above method may further include the following steps not shown in fig. 2: storing the obtained binary log; storing at least one of the following information of the binary journal in a metadata database: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier.
In this implementation, the execution subject may obtain the binary log from the master database at a second preset time interval. Specifically, the execution agent may simulate the slave database by running a process, and obtain the binary log from the master database through a thread under the process. Meanwhile, the thread may be set to acquire the binary log from the master database every second preset duration through a crontab command. The execution body may also store the retrieved binary log locally or in a database. The executing agent may then store information of the binary journal retrieved from the database into a metadata repository, which may include, but is not limited to: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier. Specifically, the I/O thread of the slave database may send information when the binary log is acquired to the execution main body, and the execution main body stores the information in the metadata database after receiving the information.
Through the implementation modes, the backup of the meta-information of the incremental part of the log data of the main database can be realized. Thus, the increased amount of log data can be found quickly.
With continued reference to FIG. 4, a flow 400 of another embodiment of a method for backing up data according to the present application is shown. As shown in fig. 4, the method for backing up data of the present embodiment may include the following steps:
step 401, a data recovery instruction is received.
In this embodiment, the data recovery instruction may be forwarded to the execution subject by the distributed transaction manager after the user sends the instruction to the distributed transaction manager through the terminal device. The data restoring instruction may include a data restoring time point for indicating data to be restored to the data restoring time point from the data in the master database and the slave database.
Step 402, determining a second target transaction identifier submitted at the data recovery time point based on the information recorded in the metadata repository.
After receiving the data recovery instruction, the execution subject may determine, according to the information recorded in the metadata database, a second target transaction identifier submitted at the data recovery time point. Since the commit time of each transaction identifier is recorded in the metadata base, the data recovery time point can be matched with the commit time to obtain the second target transaction identifier committed at the data recovery time point.
Step 403, determining that the binary log to which the second target transaction identifier belongs is a target binary log.
After determining the second target transaction identifier, the execution subject may further determine, according to information recorded in the metadata database, a binary log to which the second target transaction identifier belongs. Since the transaction identifier included in each binary log is recorded in the metadata base, the binary log to which the second target transaction identifier belongs can be obtained by matching the second target transaction identifier with each transaction identifier, and the binary log is determined as the target binary log.
Step 404, control retrieves data from the database and the master database based on the target binary log.
After determining the target binary log, the execution subject may determine other binary logs whose generation time is after the generation time of the target binary log. The executing agent may thus control the master database and the slave database to recover data from the target binary log and other binary logs described above.
The method for backing up data provided by the above embodiment of the application can realize data recovery at any time point; meanwhile, when data is recovered, a plurality of database fragments can be recovered in parallel, so that the time required by data recovery is reduced, and the efficiency of data recovery is improved.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for backing up data, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for backing up data of the present embodiment includes: a backup instruction receiving unit 501, a log obtaining unit 502, an identifier obtaining unit 503, a first identifier determining unit 504, and a data backup unit 505.
The backup instruction receiving unit 501 is configured to receive a data backup instruction. The data backup instruction is used for indicating that data in a main database included by the database fragment is to be backed up, and the database fragment further comprises a slave database.
A log obtaining unit 502 configured to control the slave database to suspend backup of data of the master database and to control the slave database to obtain the binary log from the master database.
An identifier obtaining unit 503 is configured to obtain an identifier of the committed transaction from the third-party server, resulting in a transaction identifier set.
A first identifier determination unit 504 configured to determine a first target transaction identifier from the set of transaction identifiers.
And a data backup unit 505 configured to control the slave database to backup the data in the master database according to the binary log and the first target transaction identifier.
In some alternative implementations of the present embodiment, a first thread and a second thread run from a database. The log obtaining unit 502 may be further configured to: the first thread of the slave database is controlled to suspend backup of data of the master database and the second thread of the slave database is controlled to obtain the binary log from the master database.
In some optional implementations of this embodiment, the data backup unit 505 may be further configured to: and controlling the first thread of the slave database to perform transaction playback on the transaction submitted before the transaction corresponding to the first target transaction identifier in the binary log is submitted so as to backup the data in the master database.
In some optional implementations of this embodiment, the identifier obtaining unit 503 may be further configured to: and acquiring a transaction identifier set of the submitted transaction from the third-party server at a first preset duration. The apparatus 500 may further include a storage unit, not shown in fig. 5, configured to store the transaction identifier set and the acquisition time in a preset metadata database.
In some optional implementations of this embodiment, the apparatus 500 may further include a backup file uploading unit, not shown in fig. 5, configured to control uploading of a backup file obtained by completing the backup from the database to the cloud server. The memory unit may be further configured to: updating backup state information in the metadata repository in response to the uploading of the backup file being completed.
In some optional implementations of the present embodiment, the log obtaining unit 502 may be further configured to: and acquiring a binary log from the master database at a second preset time interval. The memory unit may be further configured to: storing the obtained binary log; storing at least one of the following information of the binary journal in a metadata database: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier.
In some optional implementations of this embodiment, the apparatus 500 may further include a recovery instruction receiving unit, a second identifier determining unit, a log determining unit, and a data recovery unit, which are not shown in fig. 5.
Wherein the recovery instruction receiving unit is configured to receive a data recovery instruction. The data recovery instruction includes a data recovery time point.
A second identifier determination unit configured to determine a second target transaction identifier submitted at the data recovery time point based on information recorded in the metadata database.
A log determination unit configured to determine that the binary log to which the second target transaction identifier belongs is a target binary log.
And a data recovery unit configured to control recovery of data from the slave database and the master database based on the target binary log.
The apparatus for backing up data provided in the foregoing embodiment of the present application, after receiving the data backup instruction, may control the slave database included in the database shards of the distributed database to suspend the data of the master database included in the backup database shards, and may control the slave database to obtain the binary log from the master database. Identifiers of committed transactions can also be obtained from a third party server, resulting in a set of transaction identifiers. A first target transaction identifier may then be determined from the set of transaction identifiers. And finally, controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier. The device of the embodiment can realize the strong consistency of each database fragment of the distributed database.
It should be understood that units 501 to 505 recited in the apparatus 500 for backing up data correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the method for backing up data are equally applicable to the apparatus 500 and the units included therein and will not be described again here.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing the local transaction manager of embodiments of the present application is shown. The local transaction manager shown in fig. 6 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 606 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM602, and RAM 603 are connected to each other via a bus 604. An Input/Output (I/O) interface 605 is also connected to bus 604.
To the I/O interface 605, a storage section 606 including a hard disk or the like, and a communication section 607 including a Network interface card such as L AN (local Area Network) card, a modem, or the like, the communication section 607 performs communication processing via a Network such as the internet, a drive 608 is also connected to the I/O interface 605 as necessary, a removable medium 609 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 608 as necessary, so that a computer program read out therefrom is mounted into the storage section 606 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 607 and/or installed from the removable medium 609. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a backup instruction receiving unit, a log obtaining unit, an identifier obtaining unit, a first identifier determining unit, and a data backup unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the backup instruction receiving unit may also be described as a "unit that receives a data backup instruction".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: receiving a data backup instruction, wherein the data backup instruction is used for indicating that data in a main database included in a database fragment are backed up, and the database fragment also comprises a slave database; the slave database is controlled to pause the data of the backup master database and obtain the binary log from the master database; acquiring identifiers of submitted transactions from a third-party server to obtain a transaction identifier set; determining a first target transaction identifier from the set of transaction identifiers; and controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier.
The foregoing description is only exemplary of the preferred embodiments of this application and is made for the purpose of illustrating the general principles of the technology. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for backing up data, the method being applied to a distributed database comprising at least two database shards, the method comprising:
receiving a data backup instruction, wherein the data backup instruction is used for instructing data in a main database included in a database fragment to be backed up, the database fragment further comprises a secondary database, and a first thread and a second thread run in the secondary database;
controlling a first thread of the slave database to suspend backup of data of the master database and controlling a second thread of the slave database to acquire a binary log from the master database;
acquiring identifiers of submitted transactions from a third-party server to obtain a transaction identifier set;
determining a first target transaction identifier according to the transaction identifier set, wherein the first target transaction identifiers determined by the local transaction managers of the database shards are the same;
controlling the slave database to backup the data in the master database according to the binary log and the first target transaction identifier;
determining a first target transaction identifier from the set of transaction identifiers, comprising:
taking an identifier of a transaction with the latest commit time in the transaction identifier set as the first target transaction identifier;
the method further comprises the following steps:
receiving a data recovery instruction, wherein the data recovery instruction comprises a data recovery time point;
determining a second target transaction identifier submitted at the data recovery time point according to information recorded in a metadata database, wherein the metadata database is used for storing the binary log;
determining that the binary log to which the second target transaction identifier belongs is a target binary log;
controlling the slave database and the master database to recover data based on the target binary log.
2. The method of claim 1, wherein the controlling the slave database to backup data in the master database according to the binary log and the first target transaction identifier comprises:
and controlling a first thread of the slave database to perform transaction playback on a transaction submitted before the transaction corresponding to the first target transaction identifier in the binary log is submitted so as to backup the data in the master database.
3. The method of claim 1, wherein said obtaining identifiers of committed transactions from a third party server, resulting in a set of transaction identifiers, comprises:
acquiring identifiers of submitted transactions from a third-party server at intervals of a first preset duration to obtain a transaction identifier set; and
the method further comprises the following steps:
and storing the transaction identifier set and the acquisition time into a preset metadata database.
4. The method of claim 3, wherein the method further comprises:
controlling the slave database to upload the backup file obtained by completing the backup to a cloud server;
updating backup state information in the metadata repository in response to the uploading of the backup file being completed.
5. The method of claim 3, wherein the controlling the slave database to retrieve a binary log from the master database comprises:
acquiring a binary log from the master database at intervals of a second preset time length; and
the method further comprises the following steps:
storing the obtained binary log;
storing at least one of the following information of the binary journal into the metadata database: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier.
6. An apparatus for backing up data, the apparatus being disposed in a distributed database, the distributed database including at least two database shards, the apparatus comprising:
the data backup method comprises a backup instruction receiving unit, a backup instruction receiving unit and a backup instruction receiving unit, wherein the backup instruction receiving unit is configured to receive a data backup instruction, the data backup instruction is used for indicating that data in a main database included in a database fragment are backed up, the database fragment further comprises a secondary database, and a first thread and a second thread run in the secondary database;
a log obtaining unit configured to control a first thread of the slave database to suspend backup of data of the master database and to control a second thread of the slave database to obtain a binary log from the master database;
an identifier obtaining unit configured to obtain an identifier of the submitted transaction from a third-party server, resulting in a transaction identifier set;
a first identifier determining unit configured to determine a first target transaction identifier according to the transaction identifier set, where the first target transaction identifiers determined by the local transaction managers of the database shards are the same;
a data backup unit configured to control the slave database to backup data in the master database according to the binary log and the first target transaction identifier;
the first identifier determination unit is further configured to:
taking an identifier of a transaction with the latest commit time in the transaction identifier set as the first target transaction identifier;
the device further comprises:
a restoration instruction receiving unit configured to receive a data restoration instruction including a data restoration time point;
a second identifier determination unit configured to determine a second target transaction identifier submitted at the data recovery time point based on information recorded in a metadata database for storing the binary log;
a log determination unit configured to determine that the binary log to which the second target transaction identifier belongs is a target binary log;
a data recovery unit configured to control the slave database and the master database to recover data based on the target binary log.
7. The apparatus of claim 6, wherein the data backup unit is further configured to:
and controlling a first thread of the slave database to perform transaction playback on a transaction submitted before the transaction corresponding to the first target transaction identifier in the binary log is submitted so as to backup the data in the master database.
8. The apparatus of claim 6, wherein the identifier acquisition unit is further configured to:
acquiring identifiers of submitted transactions from the third-party server at intervals of a first preset duration to obtain a transaction identifier set; and
the device further comprises:
a storage unit configured to store the transaction identifier set and the acquisition time in a preset metadata database.
9. The apparatus of claim 8, wherein the apparatus further comprises:
the backup file uploading unit is configured to control the slave database to upload the backup file obtained by completing the backup to a cloud server; and
the storage unit is further configured to update backup state information in the metadata base in response to the upload of the backup file being completed.
10. The apparatus of claim 8, wherein the log acquisition unit is further configured to:
acquiring a binary log from the master database at intervals of a second preset time length; and
the storage unit is further configured to store the acquired binary log; storing at least one of the following information of the binary journal into the metadata database: a storage path, a name, a storage time, at least one transaction identifier included, and a commit time of the at least one transaction identifier.
11. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201810980327.8A 2018-08-27 2018-08-27 Method and apparatus for backing up data Active CN109144785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810980327.8A CN109144785B (en) 2018-08-27 2018-08-27 Method and apparatus for backing up data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810980327.8A CN109144785B (en) 2018-08-27 2018-08-27 Method and apparatus for backing up data

Publications (2)

Publication Number Publication Date
CN109144785A CN109144785A (en) 2019-01-04
CN109144785B true CN109144785B (en) 2020-07-28

Family

ID=64828254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810980327.8A Active CN109144785B (en) 2018-08-27 2018-08-27 Method and apparatus for backing up data

Country Status (1)

Country Link
CN (1) CN109144785B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913925B (en) * 2019-05-08 2023-08-18 厦门网宿有限公司 Data processing method and system in distributed storage system
CN111913972A (en) * 2019-05-10 2020-11-10 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN110209534B (en) * 2019-06-14 2022-09-16 四川长虹电器股份有限公司 System and method for automatically backing up mysql database
CN110209554B (en) * 2019-06-14 2023-08-11 上海中通吉网络技术有限公司 Database log distribution method, device and equipment
CN113254425B (en) * 2021-06-24 2022-01-11 阿里云计算有限公司 Method, apparatus, system, program and storage medium for database transaction retention

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106610876A (en) * 2015-10-23 2017-05-03 中兴通讯股份有限公司 Method and device for recovering data snapshot

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9570124B2 (en) * 2012-01-11 2017-02-14 Viavi Solutions Inc. High speed logging system
US9449039B2 (en) * 2012-11-26 2016-09-20 Amazon Technologies, Inc. Automatic repair of corrupted blocks in a database
CN106407356B (en) * 2016-09-07 2020-01-14 网易(杭州)网络有限公司 Data backup method and device
CN107451013B (en) * 2017-06-30 2020-12-25 北京奇虎科技有限公司 Data recovery method, device and system based on distributed system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106610876A (en) * 2015-10-23 2017-05-03 中兴通讯股份有限公司 Method and device for recovering data snapshot

Also Published As

Publication number Publication date
CN109144785A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109144785B (en) Method and apparatus for backing up data
US11741046B2 (en) Method and apparatus for creating system disk snapshot of virtual machine
US11507594B2 (en) Bulk data distribution system
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
CN106610876B (en) Data snapshot recovery method and device
EP3722973B1 (en) Data processing method and device for distributed database, storage medium, and electronic device
US11675741B2 (en) Adaptable multi-layered storage for deduplicating electronic messages
US20160352829A1 (en) Data replication across servers
US20190384674A1 (en) Data processing apparatus and method
US20230273864A1 (en) Data management system with limited control of external compute and storage resources
CN112965945A (en) Data storage method and device, electronic equipment and computer readable medium
CN113885780A (en) Data synchronization method, device, electronic equipment, system and storage medium
US10235251B2 (en) Distributed disaster recovery file sync server system
US10089375B2 (en) Idling individually specified objects during data replication
US11210212B2 (en) Conflict resolution and garbage collection in distributed databases
CN111338834B (en) Data storage method and device
US20200364241A1 (en) Method for data synchronization between a source database system and target database system
US10129328B2 (en) Centralized management of webservice resources in an enterprise
CN111444148A (en) Data transmission method and device based on MapReduce
CN110019131B (en) Method and device for multi-disk service
CN108196979B (en) Data backup method and device
CN113761075A (en) Method, device, equipment and computer readable medium for switching databases
US9880904B2 (en) Supporting multiple backup applications using a single change tracker
US20210248108A1 (en) Asynchronous data synchronization and reconciliation
CN113515574B (en) Data synchronization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant