WO2017181430A1 - 分布式系统的数据库复制方法及装置 - Google Patents
分布式系统的数据库复制方法及装置 Download PDFInfo
- Publication number
- WO2017181430A1 WO2017181430A1 PCT/CN2016/080068 CN2016080068W WO2017181430A1 WO 2017181430 A1 WO2017181430 A1 WO 2017181430A1 CN 2016080068 W CN2016080068 W CN 2016080068W WO 2017181430 A1 WO2017181430 A1 WO 2017181430A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- partition
- timestamp
- transaction
- target
- log
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1873—Versioning file systems, temporal file systems, e.g. file system supporting different historic versions of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/273—Asynchronous replication or reconciliation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
Definitions
- the present invention relates to the field of databases, and in particular, to a database replication method and apparatus for a distributed system.
- database replication refers to the replication of the database in the primary cluster to the standby cluster.
- the primary cluster encounters a disaster overall downtime, it provides data services through the standby cluster to solve the problem of remote disaster recovery.
- a common distributed system architecture refers to a logically unified system architecture that connects a plurality of physically dispersed nodes through a computer network.
- the nodes may be ordinary computers, mobile terminals, and workstations. Or a general-purpose server, a dedicated server, etc., or a virtual node, that is, a virtual machine.
- the data in the distributed system is divided into multiple partitions, each partition holds a part of data, and the collection of each partition data constitutes complete data, and each node may include one partition or multiple partitions.
- the distributed system includes a primary cluster and a standby cluster.
- the primary cluster and the standby cluster each include multiple nodes, and each of the nodes includes one partition or multiple partitions.
- the active and standby clusters have one-to-one partitions, and each partition of the active and standby clusters has a log buffer to store the replication logs used to record the transactions contained in the current partition. Multiple log records are recorded in the replication log, and each log record is used to record a transaction.
- transactions are divided into single-partition transactions and multi-partition transactions. Single-partition transactions are transactions that run only in one partition. Multi-partition transactions are transactions that run in all partitions.
- the active and standby clusters use the replication log to implement database replication.
- the existing methods of database replication usually include: when the buffer of a partition in the primary cluster is full or reaches a certain period, the primary cluster sends the replication log of the partition. The corresponding partition in the cluster is fed; the corresponding partition in the standby cluster performs all the log records in the replication log to implement database replication.
- the replication log of the multi-partition transaction is saved in all partitions of the primary cluster, however, the main may occur because the buffer conditions of the respective partitions are different or the periods are not synchronized.
- Some partitions in the cluster send the replication log of the multi-partition transaction to the corresponding partition of the standby cluster, and some partitions are not The replication log of the multi-partition transaction is sent out.
- some partitions of the standby cluster perform replication logs of a multi-partition transaction, and some partitions do not perform replication logs of the multi-partition transaction, so that the data of each partition is inconsistent.
- the embodiment of the present invention provides a database replication method and device for a distributed system, which solves the problem of data inconsistency in each partition of the standby cluster.
- the technical solution is as follows:
- a first aspect provides a database replication method for a distributed system, where the distributed system includes a primary cluster and a standby cluster, and the primary cluster and the standby cluster respectively include multiple partitions of a database, and the primary cluster The plurality of partitions in the primary cluster are in one-to-one correspondence with the plurality of partitions in the standby cluster, and each of the primary clusters sends a replication log of the partition to a corresponding partition in the standby cluster, where the replication log is in the replication log.
- the first partition in the standby cluster sends a timestamp of the newly added multi-partition transaction in the first partition to the coordination server, where the newly added multi-partition transaction is the first partition
- the multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the main cluster is received;
- the coordinating server receives the Determining the time of the first partition by the timestamp of the newly added multi-partition transaction and the timestamp of the multi-partition transaction of each partition of the standby cluster stored by the coordination server a timestamp, the target timestamp is used to indicate information of a multi-partition transaction that the first partition can perform;
- the coordination server sends the target timestamp to the first partition; the first partition is according to the The target timestamp, which executes the replication log in the first partition.
- the coordinating server according to the timestamp of receiving the newly added multi-partition transaction, and the standby cluster stored by the coordination server The timestamp of the multi-partition transaction for each partition, determining the target timestamp of the first partition includes:
- the timestamps of the multi-partition transactions of each partition of the standby cluster are completely coincident; if the timestamps of the multi-partition transactions of each partition of the standby cluster are not completely coincident, determining each partition of the standby cluster Obtaining the timestamps of the multi-partitioned transactions, and obtaining, from the timestamp of the multi-partition transaction of the partition of the standby cluster, the timestamp with the smallest value as the target timestamp of the first partition; If the timestamps of the multi-partition transactions of each partition of the standby cluster are completely coincident, the first specified timestamp is used as the target timestamp of the first partition, and the first specified timestamp is used to indicate The first partition performs the logging in the replication log within the partition until it stops the logging of the next new multi-partition transaction.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed, avoiding data inconsistency of partitions in the standby cluster.
- the performing, by the first partition, the replication log in the first partition according to the target timestamp includes: :
- the target timestamp of the first partition is the timestamp with the smallest value among the timestamps of the multi-partition transaction other than the intersection, performing the multi-partition transaction corresponding to the target timestamp in the replication log of the first partition
- the first partition ensures data consistency with other partitions in the standby cluster by performing log records within the partition according to the target timestamp.
- the coordinating server according to the timestamp of receiving the newly added multi-partition transaction, and the standby cluster stored by the coordination server The timestamp of the multi-partition transaction for each partition, determining the target timestamp of the first partition includes:
- a partition performs log records in the intra-partition replication log in sequence, and stops execution until a log record of a multi-partition transaction is encountered.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed. It is not necessary to enter a wait state every time a multi-partition transaction is encountered, and data inconsistency of each partition in the standby cluster is avoided.
- the first partition performs a replication log in the first partition according to the target timestamp include:
- the target timestamp of the first partition is the timestamp with the largest value in the intersection, performing a multi-partition transaction corresponding to the timestamp of the first multi-partition transaction after the target timestamp in the first partition
- the first partition ensures data consistency with other partitions in the standby cluster by performing log records within the partition according to the target timestamp.
- the coordination server is an independent device in the standby cluster, or is deployed on all nodes in the standby cluster.
- the flexibility to implement a database replication method is increased by providing multiple implementations of the coordination server.
- the time interval that the first partition in the standby cluster sends the newly added multi-partition transaction in the first partition to the coordination server includes:
- a distributed system in a second aspect, includes a primary cluster and a standby cluster, the primary cluster and the standby cluster respectively include multiple partitions of a database, and multiple partitions in the primary cluster and The plurality of partitions in the standby cluster are in one-to-one correspondence, and each partition in the primary cluster sends a replication log of the partition to a corresponding partition in the standby cluster, where the transaction of the data operation is recorded in the replication log.
- the standby cluster further includes a coordination server; the first partition in the standby cluster is configured to send, to the coordination server, a timestamp of a newly added multi-partition transaction in the first partition, the newly added multi-partition transaction After the first partition is sent to the coordinating server for the time interval of the multi-partition transaction, the multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the primary cluster is received;
- the coordination server is configured to: according to the timestamp of receiving the newly added multi-partition transaction and the timestamp of the multi-partition transaction of each partition of the standby cluster stored by the coordination server, a target timestamp of the first partition, the target timestamp is used to indicate information of a multi-partition transaction that the first partition can perform; the coordination server is further configured to send the target time to the first partition The first partition is further configured to execute a copy log in the first partition according to the target time stamp.
- a database replication method for a distributed system comprising:
- the newly added multi-partition transaction is the first cluster received from the last time the multi-partition transaction is sent, and the received primary cluster
- the multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the medium; according to the timestamp of receiving the newly added multi-partition transaction and the stored time stamp of the multi-partition transaction of each partition of the standby cluster
- Determining a target timestamp of the first partition the target timestamp is used to indicate information of a multi-partition transaction that the first partition can perform; sending the target timestamp to the first partition, by The first partition executes the copy log in the first partition according to the target time stamp.
- the time interval according to receiving the newly added multi-partition transaction and the stored multi-partition transaction of each partition of the standby cluster Timestamp, determining a target timestamp of the first partition includes:
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed. It is not necessary to enter a wait state every time a multi-partition transaction is encountered, and data inconsistency of each partition in the standby cluster is avoided.
- the receiving, according to the timestamp of receiving the newly added multi-partition transaction, and storing the multi-partition transaction of each partition of the standby cluster a timestamp, the target timestamp of the first partition is determined to: determine an intersection of timestamps of the multi-partition transaction of each partition of the standby cluster, and determine whether the intersection is an empty set; If the intersection is not an empty set, the timestamp with the largest value is obtained from the intersection as the target timestamp of the first partition, and the timestamp with the largest value is used to instruct the first partition to execute the first partition.
- the first partition can ensure data consistency between the first partition and other partitions in the standby cluster by performing log recording within the partition according to the target time stamp.
- a database replication method for a distributed system comprising:
- the performing the replication log in the first partition according to the target timestamp of the first partition includes:
- the target timestamp of the first partition is the timestamp with the smallest value among the timestamps of the multi-partition transaction other than the intersection of the timestamps of the multi-partition transactions of each partition of the standby cluster.
- Logging based on the target timestamp of the first partition ensures that the executed multi-partition transaction exists in all partitions. It does not have to enter a wait state every time a multi-partition transaction is encountered, avoiding the first partition and other partitions in the standby cluster. The data between them is inconsistent.
- the performing the replication log in the first partition according to the target timestamp of the first partition includes:
- the target timestamp of the first partition is the timestamp with the largest intersection value of the timestamp of the multi-partition transaction of each partition of the standby cluster, executing the target timestamp in the first partition
- the log record of the first multi-partition transaction corresponds to the log record before the multi-partition transaction; if the target timestamp of the first partition is the second specified timestamp, the day in the first partition is sequentially executed The record is stopped until the log record of the multi-partition transaction is encountered, and the second specified timestamp indicates that the intersection of the timestamps of the multi-partition transaction of each partition of the standby cluster is an empty set.
- Logging based on the target timestamp of the first partition ensures that the executed multi-partition transaction exists in all partitions. It does not have to enter a wait state every time a multi-partition transaction is encountered, avoiding the first partition and other partitions in the standby cluster. The data between them is inconsistent.
- a database replication apparatus for a distributed system, the apparatus comprising a plurality of functional modules for performing the method of the third aspect described above.
- the apparatus further includes other functional modules for performing the methods described in the various possible implementations of the third aspect above.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed.
- the first partition ensures data consistency with other partitions by performing log records within the partition according to the target timestamp.
- a database replication apparatus for a distributed system comprising a plurality of functional modules for performing the method of the fourth aspect described above.
- the apparatus further includes other functional modules for performing the method described in the multiple possible implementation manners of the foregoing fourth aspect.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed.
- the first partition ensures data consistency with other partitions by performing log records within the partition according to the target timestamp.
- a coordination server comprising a memory and a processor, the memory being for storing processor-executable instructions, the processor being configured to perform the method of the third aspect above; in a possible implementation The processor is further configured to perform a plurality of possible realities in the third aspect above The method described in the present mode.
- the first partition ensures data consistency with other partitions by performing log records within the partition according to the target timestamp.
- a database replication apparatus for a distributed system includes a memory and a processor, the memory is configured to store processor executable instructions, and the processor is configured to perform the method of the fourth aspect described above; In a possible implementation, the processor is further configured to perform the method described in the various possible implementations of the fourth aspect above.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed.
- the first partition ensures data consistency with other partitions by performing log records within the partition according to the target timestamp.
- Each multi-partition transaction timestamp contained in each partition in the standby cluster is sent to the coordinating server, so that the coordinating server can determine which multi-partition transactions already exist in all partitions according to the situation that each partition contains multi-partition transactions, and which Partitioned transactions do not exist in all partitions, and the target timestamps are used to inform the corresponding partitions which log records can be executed, so that the corresponding partitions can execute multi-partition transactions that exist in all partitions but are not necessarily executed, avoiding partitions in the standby cluster.
- the data is inconsistent.
- FIG. 1 is an architectural diagram of a distributed system according to an embodiment of the present invention
- FIG. 2 is a flowchart of a database replication method of a distributed system according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of interaction between a partition included in a standby cluster and a coordination server according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of interaction between a partition included in a standby cluster and a coordination server according to an embodiment of the present invention
- FIG. 5 is a flowchart of a database replication method of a distributed system according to an embodiment of the present invention.
- FIG. 6 is a schematic diagram of interaction between a partition included in a standby cluster and a coordination server according to an embodiment of the present invention
- FIG. 7 is a schematic diagram of interaction between a partition included in a standby cluster and a coordination server according to an embodiment of the present invention
- FIG. 8 is a block diagram of a database replication apparatus of a distributed system according to an embodiment of the present invention.
- FIG. 9 is a block diagram of a database replication apparatus of a distributed system according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of a coordination server according to an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of a database replication apparatus of a distributed system according to an embodiment of the present invention.
- FIG. 1 is a structural diagram of a distributed system according to an embodiment of the present invention.
- the distributed system includes a primary cluster and a standby cluster.
- the primary cluster and the standby cluster respectively include multiple partitions, and multiple partitions in the primary cluster have a one-to-one correspondence with multiple partitions in the standby cluster.
- the active and standby clusters each include three nodes, with each node including one partition as an example.
- the three partitions in the primary cluster are partition A, partition B, and partition C.
- the corresponding partitions in the standby cluster are partition A1, partition B1, and partition C1.
- the number of the partitions in the active and standby clusters is not limited.
- Each partition in the primary cluster maintains a replication log, that is, each partition in the primary cluster has a log buffer to hold a replication log for recording transactions contained in the current partition. Multiple log records are recorded in the replication log of each partition, and each log record is used to record one Transaction.
- Each log record of the replication log includes at least the timestamp and transaction type of the transaction, and the transaction type includes a single partition transaction and a multi-partition transaction.
- each log record also includes the specific content of the transaction, such as the operation performed by the transaction on the database or the database record modified by the transaction, so that the data can be copied by executing the log record.
- a single-partition transaction is a transaction that runs only in one partition
- a multi-partition transaction is a transaction that runs in all partitions.
- the timestamp of the transaction may be a transaction identifier (Identifier, ID) added by the system for the transaction when the transaction occurs, for uniquely identifying the transaction.
- Each partition in the primary cluster sends the maintained replication logs to the corresponding partitions in the standby cluster.
- the partition A in FIG. 1 as an example, when the partition A of the primary cluster acquires a transaction, the log record of the transaction is added in the replication log maintained by the partition A; and when the log buffer of the partition A is full or reaches When the specified period or idle state, the replication log is sent to the corresponding partition A1 in the standby cluster.
- Each partition in the standby cluster maintains a log buffer that is used to store replication logs for the corresponding partitions in the primary cluster. For example, when the partition A1 in the standby cluster receives the replication log of the corresponding partition A in the primary cluster, the replication log is stored in the log buffer of the partition A1.
- each of the partitions in the standby cluster may obtain the replication log in a manner of obtaining the replication log from the corresponding partition of the primary cluster periodically, which is not specifically limited in the implementation of the present invention.
- the distributed system also includes a coordination server, which may be included in the standby cluster, or may be included in the primary cluster, or may be a separate entity other than the standby cluster and the primary cluster. Specifically limited.
- a coordination server which may be included in the standby cluster, or may be included in the primary cluster, or may be a separate entity other than the standby cluster and the primary cluster.
- Each partition in the standby cluster is also used to send a multi-partition transaction timestamp within the partition to the coordination server.
- the coordination server is used to store timestamps for all multi-partition transactions in the standby cluster.
- the coordination server may be a standalone device other than the primary cluster or the standby cluster in the distributed system, or may belong to the standby cluster, be a standalone device in the standby cluster, or be deployed on all nodes in the distributed system or On the node, preferably on all nodes in the standby cluster or on one node, each node includes one partition or multiple partitions.
- the function of the coordination server may be implemented by using a distributed service framework, which may be a framework such as Zookeeper, which is not specifically limited in this embodiment of the present invention.
- FIG. 2 is a flowchart of a database replication method of a distributed system according to an embodiment of the present invention.
- the method process provided by the embodiment of the present invention includes:
- the first partition in the standby cluster sends a timestamp of the newly added multi-partition transaction in the first partition to the coordination server.
- the timestamp of the multi-partition transaction is used to indicate the logging of the multi-partition transaction in the replication log.
- the newly added multi-partition transaction is a multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the received primary cluster after the first partition completes the timestamp of sending the multi-partition transaction to the coordination server.
- the number of the newly added multi-partition transactions may be one or more, and is not specifically limited in the embodiment of the present invention.
- the replication log is stored to the log buffer of the first partition.
- the timestamp triggering condition that the first partition sends the newly added multi-partition transaction to the coordinating server may be that the log buffer of the first partition is full, or is a preset period, or the first partition is in an idle state, etc. This is not specifically limited.
- the preset period may be any value, and the preset period is configured according to system performance requirements, which is not specifically limited in this embodiment of the present invention.
- the process of sending the timestamp of the newly added multi-partition transaction to the coordination server by the first partition may be: detecting whether the log buffer of the current partition reaches the trigger condition; if the trigger condition is reached , from the stored replication log, obtain the timestamp of the newly added multi-partition transaction, that is, in the log record included in the replication log, obtain the timestamp corresponding to the transaction type of the multi-partition transaction; after that, the new The timestamp of the increased multi-partition transaction is sent to the coordination server.
- the first partition is in an idle state, that is, the executable transaction of the current partition has been completed, and is in a state of waiting to continue processing the new transaction.
- the first partition includes transaction logs of transaction 1, transaction 2, transaction 3, and transaction 4, where transaction 1 and transaction 2 are currently executable transactions, and transaction 3 is a currently unexecutable transaction, such as transaction 3 is a multi-partition transaction.
- the other partitions of the standby cluster do not all contain the replication log of the transaction 3. Then, the first partition enters the idle state, that is, waits for the state of transaction 3.
- the first partition may be any one of the partitions in the standby cluster.
- the storage capacity of the log buffer of each partition in the standby cluster may be the same or different, which is not specifically limited in this embodiment of the present invention.
- the coordinating server After receiving the timestamp of the newly added multi-partition transaction, the coordinating server determines whether the timestamps of the multi-partition transactions of each partition of the standby cluster are completely coincident. If not, the following step 203 is performed. If they coincide, the following step 204 is performed.
- the coordination server is configured to maintain a timestamp of the multi-partition transaction of each partition in the standby cluster.
- the coordination server can maintain a timestamp storage table, which is opened for each partition in the standby cluster.
- Corresponding timestamp storage area for storing the timestamp of the multi-partition transaction of each partition.
- the form of the timestamp storage table can be as shown in Table 1.
- the coordination server receives the timestamp of the newly added multi-partition transaction in the first partition of the standby cluster, and stores the timestamp of the multi-partition transaction to the timestamp storage area of the first partition. After that, it is determined whether the timestamps of the multi-partition transactions of each partition of the standby cluster are completely coincident, that is, whether there is a minimum timestamp in the timestamp of the multi-partition transaction of each partition of the standby cluster, and the minimum timestamp is The timestamp of a multi-partition transaction that does not exist in at least one partition's timestamp storage area.
- the coordination server can know whether the timestamps of the multi-partition transactions of each partition of the standby cluster are completely coincident without judgment.
- the embodiment of the present invention does not specifically limit whether there is a judgment step.
- the coordination server determines the intersection of the timestamps of the multi-partition transactions of each partition of the standby cluster, and the time of the multi-partition transaction except the intersection. In the stamp, the timestamp with the smallest value is obtained as the target timestamp of the first partition.
- the coordination server can obtain the target timestamp of the first partition from the timestamp of the multi-partition transaction except the intersection.
- the target timestamp of the first partition is used to indicate information of a multi-partition transaction that the first partition can perform, that is, to indicate that the first partition can perform log records, so as to prevent the first partition from executing other partitions not included. Multi-partition transaction logging, resulting in inconsistent data for each partition.
- the process of obtaining the target timestamp of the first partition may be: determining a timestamp of the multi-partition transaction in which each partition of the standby cluster coincides, that is, determining the timestamp of the multi-partition transaction of each partition of the standby cluster. Intersection; obtains the timestamp with the smallest value as the target timestamp of the first partition from the timestamp of the multi-partition transaction except the intersection.
- the standby cluster contains two partitions, partition one and partition two.
- Figure 3 is a schematic diagram of the partitions included in the standby cluster interacting with the coordination server.
- the log buffer of the partition one contains the timestamps corresponding to the transactions as 4, 5, 6, 11, 13, 14, and 15, wherein 4, 11, and 15 are multi-partitions.
- the timestamp of the transaction, and the rest is the timestamp of the single-partition transaction.
- the log buffer of partition 2 contains timestamps corresponding to transactions of 3, 4, 7, 8, 9, and 10, where 4 is the timestamp of the multi-partition transaction, and the rest is the timestamp of the single-partition transaction.
- the timestamp of the newly added multi-partition transaction is sent to the coordinating server.
- the timestamp storage area of the partition 1 includes timestamps 4, 11, and 15 of the multi-partition transaction
- the timestamp storage area of the partition 2 includes the timestamp 4 of the multi-partition transaction, indicating that the partition of the standby cluster is two.
- the log records corresponding to timestamps 11 and 15 of the multi-partition transaction are not included.
- the coordinating server can clearly know that the timestamps of the multi-partition transactions of partition one and partition two do not completely coincide, and the intersection of the time stamps of the multi-partition transactions of the two is 4, and the time stamps of the multi-partition transactions other than the intersection are 11 and 15. Then, the coordination server selects the timestamp with the smallest value as the target timestamp in 11 and 15, that is, selects the timestamp 11 of the multi-partition transaction as the target timestamp of the partition 2.
- the coordination server may also determine the timestamp of the multi-partition transaction of each partition of the standby cluster. After the intersection of the multi-partition transactions other than the intersection in the first partition, the timestamp with the smallest value is obtained as the target timestamp of the first partition.
- the coordination server uses the first specified timestamp as the target timestamp of the first partition, and the first specified timestamp is used to indicate that the first partition performs the intra-partition replication log. Logging in the process stops until the first new multi-partition transaction is logged.
- the specific content of the first specified timestamp may be preset.
- the value of the first specified timestamp may be 0 or infinity, or the first specified timestamp may be a special string, for example, "execute all"
- the specific content of the first specified timestamp is not specifically limited in the embodiment of the present invention.
- the server obtains the first specified timestamp, and uses the first specified timestamp as a target timestamp, where the target timestamp is used to indicate that the first partition can perform log records of all transactions in the partition, and does not cause the first partition and the standby.
- the data of other partitions in the cluster is inconsistent.
- the standby cluster contains two partitions, partition one and partition two.
- Figure 4 is a schematic diagram of the partitions included in the standby cluster interacting with the coordination server.
- the log buffer of partition 1 contains timestamps corresponding to transactions of 4, 5, 6, 11, 13, and 14, wherein 4, 11 are time stamps of multi-partition transactions, and the rest are single-partition transactions. Timestamp.
- the log buffer of partition 2 contains timestamps corresponding to transactions of 3, 4, 7, 8, 9, 10, and 11, wherein 4, 11 are time stamps of multi-partition transactions, and the rest are single-partition transaction timestamps.
- the partition 2 Take the trigger condition of the partition as the log buffer full. For example, if the log buffer of partition 2 reaches the trigger condition and 11 is the time stamp of the new multi-partition transaction of the partition, the partition 2 will time the multi-partition transaction.
- the stamp 11 is sent to the coordination server, which stores the timestamp of the multi-partition transaction into the timestamp storage area of the partition 2.
- the timestamp storage area of the partition 1 includes the timestamps 4 and 11 of the multi-partition transaction
- the timestamp storage area of the partition 2 includes the timestamps 4 and 11 of the multi-partition transaction, indicating two of the standby clusters.
- the partitions contain the log records corresponding to timestamps 4 and 11 of the multi-partition transaction.
- the coordinating server can clearly know that the timestamps of the multi-partition transactions of the partition one and the partition two completely coincide. Therefore, the coordinating server determines that the first specified time stamp is the target time stamp of the partition two.
- the coordination server can coordinate the replication logs in each partition execution partition of the standby cluster according to the time stamp of the multi-partition transaction of each partition in the maintained standby cluster.
- the process may include determining, by the coordination server, a target timestamp of the first partition according to a timestamp of receiving the newly added multi-partition transaction and a timestamp of the multi-partition transaction of each partition of the standby cluster stored by the coordination server.
- the coordinating server updates the timestamp of the multi-partition transaction of the first partition stored locally, and according to the updated multi-partition of each partition.
- the timestamp of the transaction which determines the target timestamp of the first partition.
- step 201 if the first partition reaches the trigger condition, the timestamp of the newly added multi-partition transaction is not obtained, that is, the newly added multi-partition transaction is sent to the coordinating server last time. After the timestamp, the first partition does not receive the replication log of the multi-partition transaction sent by the corresponding partition in the primary cluster. At this time, the first partition may send a designated identifier to the coordination server, where the specified identifier is used to indicate that there is no timestamp of the newly added multi-partition transaction in the first partition. After receiving the specified identifier sent by the first partition, the coordinating server obtains the target timestamp of the first partition in the same manner as the foregoing step 202 to step 204, and then performs the following steps 205 and 206.
- the coordinating server sends a target timestamp to the first partition.
- the first partition after receiving the target timestamp, may be to the coordination server. Feedback confirmation message. If the coordination server does not receive the confirmation message of the first partition within the specified time after sending the target timestamp to the first partition, the target timestamp is resent to the first partition to ensure that the first partition can receive the target. Timestamp.
- the first partition executes the replication log in the first partition according to the target timestamp.
- the process of the first partition performing the replication log of the first partition may be: if the target timestamp of the first partition is the timestamp with the smallest value among the timestamps of the multi-partition transaction other than the intersection, If the target timestamp is the first specified timestamp, all the log records in the replication log in the first partition are executed until the next new one is encountered. Stop execution when increasing the logging of partitioned transactions.
- the first partition may determine whether the target timestamp is the first specified timestamp; if the target timestamp is not the first specified timestamp, determine that the target timestamp is other than the intersection
- the timestamp with the smallest value in the timestamp of the multi-partition transaction, and according to the value indicated by the target timestamp, in the replication log stored in the first partition, the timestamp with the timestamp smaller than the target timestamp and the corresponding to-be-executed log are obtained. Record and execute the pending log record. If the target timestamp is the first specified timestamp, the log record in the replication log in the first partition is executed until the next new multi-partition transaction log record is encountered.
- the next new multi-partition transaction refers to the first multi-partition transaction sent by the first partition to the corresponding partition of the primary cluster after receiving the target timestamp.
- the target time stamp is the timestamp with the smallest value among the timestamps other than the intersection of the time stamps of the multi-partition transactions of the partitions in the standby cluster is explained.
- the coordination server sends the target timestamp to the partition 2.
- the partition 2 determines that the target timestamp is not the first specified timestamp “0” after receiving the target timestamp, and may be based on the value 11 indicated by the timestamp of the multi-partition transaction.
- obtaining a timestamp with a timestamp less than 11 includes timestamps 3, 4, 7, 8, 9, and 10. After that, the partition 2 performs the log records corresponding to the time stamps 3, 4, 7, 8, 9, and 10.
- the coordinating server sends the first specified timestamp as the target timestamp of the partition 2 to the partition 2, if the first specified timestamp is "0".
- Partition 2 when receiving the target After the stamp, according to the content of the target timestamp is “0”, the target timestamp is determined to be the first specified timestamp.
- the log records in the partition are executed until the subsequent record of the first new multi-partition transaction is stopped, because the current partition 2 has the existing multiple partitions. Log records of new multi-partition transactions have not been received outside of transactions 4 and 11, so partition 2 will perform log records with time stamps of 3, 4, 7, 8, 9, 10, and 11.
- the method provided by the embodiment of the present invention sends the timestamps of all the multi-partition transactions included in each partition of the standby cluster to the coordination server, so that the coordination server can determine which multiples are included according to the situation that each partition in the standby cluster includes multiple partition transactions.
- Partitioned transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and the target timestamp is used to inform the corresponding partition which log records can be executed, so that the corresponding partition can be executed in all partitions but not necessarily
- the multi-partition transaction does not have to enter a wait state every time a multi-partition transaction is encountered, avoiding data inconsistency of each partition.
- FIG. 5 is a flowchart of a database replication method of a distributed system according to an embodiment of the present invention.
- the method process provided by the embodiment of the present invention includes:
- the first partition sends a timestamp of the newly added multi-partition transaction in the first partition to the coordination server in the standby cluster.
- this step is the same as the content of step 201 above, and details are not described herein again.
- the coordinating server After receiving the timestamp of the newly added multi-partition transaction, the coordinating server determines an intersection of timestamps of the multi-partition transaction of each partition of the standby cluster, and determines whether the intersection is an empty set; if the intersection is not empty For the set, the following step 503 is performed, and if the intersection is an empty set, the following step 504 is performed.
- the coordination server is configured to maintain a timestamp of the multi-partition transaction of each partition in the standby cluster.
- the coordination server can maintain a timestamp storage table, which is opened for each partition in the standby cluster.
- Corresponding timestamp storage area for storing the timestamp of the multi-partition transaction of each partition.
- the form of the timestamp storage table can be as shown in Table 2.
- the coordination server can be used to determine whether the intersection is an empty set without using a judgment. .
- the coordination server obtains the timestamp with the largest value from the intersection as the target timestamp of the first partition.
- the coordination server can determine the timestamp with the largest value as the target timestamp in the intersection, and the target timestamp is used to inform the first partition that the first partition can be executed. Logging to prevent the first partition from performing multi-partition transaction log records not included in other partitions, resulting in inconsistent data for each partition.
- the standby cluster contains two partitions, partition one and partition two.
- FIG. 6 is a schematic diagram of interaction between a partition included in the standby cluster and the coordination server.
- the log buffer of the partition 1 contains the timestamps corresponding to the transactions as 4, 5, 6, 11, 13, 14, and 15, wherein 4, 11, and 15 are time stamps of the multi-partition transaction, and the rest The timestamp for a single partition transaction.
- the log buffer of partition 2 contains timestamps corresponding to transactions of 3, 4, 7, 8, 9, 10, and 11, wherein 4, 11 are time stamps of multi-partition transactions, and the rest are time stamps of single-partition transactions. .
- the trigger condition of the partition Take the trigger condition of the partition as the log buffer full. For example, if the log buffer of partition 1 reaches the trigger condition, 11 and 15 are the time stamps of the new multi-partition transaction added by the partition, then partition 1 will multi-partition transaction.
- the time stamps 11, 15 are sent to the coordinating server, and the coordinating server stores the time stamp of the multi-partition transaction in the timestamp storage area of the partition one. As shown in FIG.
- the timestamp storage area of the partition 1 includes the timestamps 4, 11, and 15 of the multi-partition transaction
- the timestamp storage area of the partition 2 includes the timestamps 4 and 11 of the multi-partition transaction, indicating that the standby cluster is Both partitions contain log records corresponding to timestamps 4 and 11 of the multi-partition transaction.
- the coordinating server can clearly know that the intersection of the timestamps of the multi-partition transactions of the partition one and the partition two is 4, 11, and the coordinating server selects the timestamp with the largest value as the target timestamp in the intersection, that is, obtains the largest of the 4 and 11 Timestamp, therefore, coordination server selection Take the timestamp 11 of the multi-partition transaction as the target timestamp.
- the coordination server uses the second specified timestamp as the target timestamp of the first partition, and the second specified timestamp is used to instruct the first partition to sequentially perform the log record in the intra-partition replication log until Stop execution when it encounters logging of multi-partition transactions.
- the specific content of the second specified timestamp may be preset.
- the value of the second specified timestamp may be 0 or infinity, or the second specified timestamp may be a special string, for example, “execute single”, etc.
- the specific content of the second specified timestamp is not specifically limited in the embodiment of the present invention.
- the coordination server obtains the second specified timestamp, and
- the second specified timestamp is used as the target timestamp of the first partition, and the target timestamp is used to indicate that the first partition can continue to perform logging of the single partition transaction within the partition to avoid the partitions caused by the execution of the multi-partition transaction. Inconsistent data.
- the standby cluster contains two partitions, partition one and partition two.
- FIG. 7 is a schematic diagram of a partition included in the standby cluster interacting with the coordination server.
- the log buffer of partition one contains the timestamps corresponding to the transaction as 1, 2, 4, 5, 6, and 7, where 4 is the timestamp of the multi-partition transaction, and the rest is the single-partition transaction timestamp.
- the log buffer of partition 2 contains the timestamp corresponding to the transaction, where 3 is a single-partition transaction timestamp.
- the partition 1 Take the trigger condition of the partition as the log buffer full. For example, if the log buffer of partition 1 reaches the trigger condition, 4 is the time stamp of the new multi-partition transaction of the partition, then the partition 1 will time the multi-partition transaction.
- the stamp 4 is sent to the coordinating server, which stores the timestamp of the multi-partition transaction into the timestamp storage area of the partition one.
- the timestamp storage area of the partition 1 includes the timestamp 4 of the multi-partition transaction
- the timestamp storage area of the partition 2 has no timestamp of the multi-partition transaction, indicating that the two partitions of the standby cluster do not include The timestamp of the same multi-partition transaction. Therefore, the coordinating server determines that the second specified timestamp is the target timestamp of partition one, so that the first partition continues to perform logging of the single partitioned transaction within the partition.
- the coordination server can coordinate the replication logs in the partition execution partitions of the standby cluster according to the time stamp of the multi-partition transaction of the standby cluster
- the process may include: the coordination server receives the The target timestamp of the first partition is determined by the timestamp of the newly added multi-partition transaction and the timestamp of the multi-partition transaction of each partition of the standby cluster stored by the coordinating server.
- step 501 if the time stamp of the new multi-partition transaction is not obtained when the trigger condition of the first partition is reached, that is, the newly added multi-partition is sent to the coordinating server last time.
- the first partition does not receive the replication log of the multi-partition transaction sent by the corresponding partition in the primary cluster.
- the first partition may send a designated identifier to the coordination server, where the specified identifier is used to indicate that there is no timestamp of the newly added multi-partition transaction in the first partition.
- the coordinating server obtains the target timestamp of the first partition in the same manner as the foregoing step 502 to step 504, and then performs the following steps 505 and 506.
- the coordinating server sends a target timestamp to the first partition.
- step is the same as the content of the foregoing step 205, and details are not described herein again.
- the first partition executes the replication log in the first partition according to the target timestamp.
- the process of the first partition performing the replication log of the first partition may be: if the target timestamp of the first partition is the timestamp with the largest value in the intersection, the target timestamp in the first partition is executed.
- the timestamp of the first multi-partition transaction corresponds to the log record before the multi-partition transaction; if the target timestamp of the first partition is the second specified timestamp, the log records in the first partition are executed in sequence until multiple encounters are encountered Stop execution when logging of partitioned transactions.
- the first partition may determine whether the target timestamp is the second specified timestamp; if the target timestamp is not the second specified timestamp, determine that the target timestamp is the intersection value
- the maximum timestamp according to the value indicated by the target timestamp, performs the log record in the replication log. If there is a multi-partition transaction log record in the log record stored in the first partition that is larger than the target timestamp, the target is acquired.
- the coordinating server sends the target timestamp to the partition one after selecting the timestamp 11 of the multi-partition transaction as the partition-target timestamp.
- the partition first determines that the target timestamp is not the first specified timestamp "0" after receiving the target timestamp, and the timestamp of the multi-partition transaction indicated by the target timestamp is 11, And the timestamp of the first multi-partition transaction after the timestamp 11 in the first partition is 15, so the first partition performs the log record corresponding to the timestamp before the timestamp 15, that is, the partition 2 executes the timestamp.
- the coordination server sends the second specified timestamp as the target timestamp to the partition one if the second specified timestamp is "0".
- the partition After receiving the target timestamp, the partition first determines that the target timestamp is the second specified timestamp according to the content of the target timestamp being “0”.
- the log record in the partition is continuously executed. As shown in FIG. 7, the log record to be executed in the partition one is a log record with a timestamp of 1, because the timestamp 1 corresponds to the single record.
- the method provided by the embodiment of the present invention sends a timestamp of all the multi-partition transactions included in each partition of the standby cluster to the coordination server, so that the coordination server can determine which multiple partitions according to the situation that each partition of the standby cluster includes multiple partition transactions.
- the transaction already exists in all partitions, which multi-partition transactions do not exist in all partitions, and the target timestamp is used to inform the corresponding partition which log records can be executed, so that the corresponding partition can be executed in all partitions but not necessarily executed.
- Multi-partition transactions do not have to enter a wait state every time a multi-partition transaction is encountered, avoiding data inconsistency in each partition.
- An embodiment of the present invention provides a distributed system, where the system includes a primary cluster and a standby cluster, where the primary cluster and the standby cluster respectively include multiple partitions of the database, and multiple partitions in the primary cluster and multiple partitions in the standby cluster One-to-one correspondence, each partition in the primary cluster sends the replication log of the partition to the corresponding partition in the standby cluster, and the replication log records the transaction of the data operation, and the standby cluster also includes the coordination server.
- the first partition in the standby cluster is used to send the timestamp of the newly added multi-partition transaction in the first partition to the coordination server, and the newly added multi-partition transaction is the time when the first partition is sent to the coordination server from the last completion of the multi-partition transaction.
- the stamp After the stamp, the multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the received primary cluster;
- the coordination server is configured to determine the target time of the first partition according to the timestamp of receiving the newly added multi-partition transaction and the timestamp of the multi-partition transaction of each partition of the standby cluster stored by the coordination server. Stamping, the target timestamp is used to indicate information of a multi-partition transaction that the first partition can perform; the coordination server is further configured to send the target timestamp to the first partition;
- the first partition is further configured to execute the replication log in the first partition according to the target timestamp.
- all the multi-partition transaction time stamps included in each partition in the standby cluster are sent to the coordination server, so that the coordination server can determine which multiple partitions according to the situation that each partition in the standby cluster includes multiple partition transactions.
- the transaction already exists in all partitions, which multi-partition transactions do not exist in all partitions, and the target timestamp is used to inform the corresponding partition which log records can be executed, so that the corresponding partition can be executed in all partitions but not necessarily executed.
- Multi-partition transactions do not have to enter a wait state every time a multi-partition transaction is encountered, avoiding data inconsistency in each partition.
- FIG. 8 is a block diagram of a database replication apparatus of a distributed system according to an embodiment of the present invention, including: a receiving module 801, an obtaining module 802, and a sending module 803.
- the receiving module 801 is connected to the obtaining module 802, and is configured to receive a timestamp of the newly added multi-partition transaction in the first partition of the standby cluster, and the newly added multi-partition transaction is the first partition since the last time the multi-partition transaction is completed. After the timestamp, the multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the received primary cluster; the obtaining module 802 is connected to the sending module 803 for receiving the timestamp according to the newly added multi-partition transaction.
- a time stamp of the multi-partition transaction of each partition of the stored standby cluster determining a target timestamp of the first partition, the target timestamp is used to indicate information of the multi-partition transaction that the first partition can perform; 803.
- the target timestamp is sent to the first partition, and the first partition performs the copy log in the first partition according to the target timestamp.
- the obtaining module 802 is configured to determine whether the time stamps of the multi-partition transactions of each partition of the standby cluster are completely coincident; if the time stamps of the multi-partition transactions of each partition of the standby cluster are not completely coincident, determining the standby cluster The intersection of the timestamps of the multi-partition transactions of each partition. From the timestamp of the multi-partition transaction except the intersection of each partition of the standby cluster, the timestamp with the smallest value is obtained as the target timestamp of the first partition, and the value is the smallest.
- the timestamp is used to indicate that the first partition performs the log record before the multi-partition transaction corresponding to the target timestamp in the replication log of the first partition; if the multi-partition transaction timestamp of each partition is completely coincident, the first specified timestamp is As the target timestamp as the first partition, the first specified timestamp is used to instruct the first partition to perform the log record in the intra-partition copy log until the next new multi-partition transaction log record is encountered.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp.
- the obtaining module 802 is configured to determine an intersection of timestamps of the multi-partition transactions of each partition of the standby cluster, and determine whether the intersection is an empty set; if the intersection is not an empty set, obtain the maximum value from the intersection.
- the stamp is the target timestamp of the first partition, and the timestamp with the largest value is used to indicate the log record before the multi-partition transaction corresponding to the timestamp of the first multi-partition transaction after the target timestamp is executed in the first partition. If the intersection is an empty set, the second specified timestamp is used as the target timestamp of the first partition, and the second specified timestamp is used to instruct the first partition to sequentially perform the log record in the intra-partition copy log until multiple partitions are encountered.
- the execution of the transaction is stopped when logging.
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed, and do not have to wait for each multi-partition transaction, which avoids data inconsistency and improves copy efficiency. This makes the database replication method perform well.
- the device provided by the embodiment of the present invention sends all the multi-partition transaction timestamps included in each partition of the standby cluster to the coordination server, so that the coordination server can determine which multi-partition transactions are all according to the situation that each partition includes a multi-partition transaction.
- the coordination server can determine which multi-partition transactions are all according to the situation that each partition includes a multi-partition transaction.
- the target timestamp is used to inform the corresponding partition which log records can be executed, so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed. It is not necessary to enter a wait state every time a multi-partition transaction is encountered, and the data of each partition in the standby cluster is inconsistent.
- FIG. 9 is a block diagram of a database replication apparatus of a distributed system according to an embodiment of the present invention, including: a sending module 901 and an executing module 902.
- the sending module 901 is connected to the receiving module 902, and is configured to send, to the coordinating server in the standby cluster, a timestamp of the newly added multi-partition transaction in the first partition, where the coordinating server receives the newly added multi-partition transaction according to the a timestamp and a timestamp of the multi-partition transaction of each partition of the stored standby cluster, determining a target timestamp of the first partition, and transmitting the target timestamp to the first partition, the target timestamp indicating the
- the information of the multi-partition transaction that can be executed by a partition; the execution module 902 is configured to execute the replication log in the first partition according to the target timestamp of the first partition.
- the executing module 902 is configured to: if the target timestamp of the first partition is each partition of the standby cluster The timestamp with the smallest value in the timestamp of the multi-partition transaction other than the intersection, the log record before the multi-partition transaction corresponding to the target timestamp in the replication log of the first partition is executed; if the target timestamp is the first specified timestamp And executing the log record in the replication log in the first partition, and stopping execution until the log record of the next new multi-partition transaction is encountered, where the first specified timestamp indicates multiple partitions of each partition of the standby cluster. The timestamps of the transactions are completely coincident.
- Logging based on the target timestamp of the first partition ensures that the executed multi-partition transaction exists in all partitions. It does not have to enter a wait state every time a multi-partition transaction is encountered, avoiding the first partition and other partitions in the standby cluster. The data between them is inconsistent.
- the executing module 902 is configured to execute the target timestamp in the first partition if the target timestamp of the first partition is the timestamp with the largest intersection value of the timestamp of the multi-partition transaction of each partition of the standby cluster. After the timestamp of the first multi-partition transaction corresponds to the log record before the multi-partition transaction; if the target timestamp of the first partition is the second specified timestamp, the log records in the first partition are executed in sequence until encountered The execution of the logging of the multi-partition transaction is stopped, and the second specified timestamp indicates that the intersection of the timestamps of the multi-partition transactions of each partition of the standby cluster is an empty set.
- Logging based on the target timestamp of the first partition ensures that the executed multi-partition transaction exists in all partitions. It does not have to enter a wait state every time a multi-partition transaction is encountered, avoiding the first partition and other partitions in the standby cluster. The data between them is inconsistent.
- the device provided by the embodiment of the present invention sends all the multi-partition transaction timestamps included in each partition of the standby cluster to the coordination server, so that the coordination server can determine which multi-partition transactions are all according to the situation that each partition includes a multi-partition transaction.
- the coordination server can determine which multi-partition transactions are all according to the situation that each partition includes a multi-partition transaction.
- the target timestamp is used to inform the corresponding partition which log records can be executed, so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed. It is not necessary to enter a wait state every time a multi-partition transaction is encountered, and the data inconsistency of each partition in the standby cluster is avoided.
- FIG. 10 is a schematic structural diagram of a coordination server according to an embodiment of the present invention.
- Coordination service The device is used to implement the method performed by the coordination server in the above method.
- the coordination server includes a processing component 1022 that further includes one or more processors, and memory resources represented by memory 1032 for storing instructions executable by processing component 1022, such as an application.
- An application stored in memory 1032 can include one or more modules each corresponding to a set of instructions.
- processing component 1022 is configured to execute instructions to perform the steps of:
- the newly added multi-partition transaction is the first cluster received from the last time the multi-partition transaction is sent, and the received primary cluster
- the multi-partition transaction in the transaction recorded in the replication log sent by the corresponding partition in the medium; according to the timestamp of receiving the newly added multi-partition transaction and the stored time stamp of the multi-partition transaction of each partition of the standby cluster
- Determining a target timestamp of the first partition the target timestamp is used to indicate information of a multi-partition transaction that the first partition can perform; sending the target timestamp to the first partition, by The first partition executes the copy log in the first partition according to the target time stamp.
- the determining, according to the timestamp of receiving the newly added multi-partition transaction and the stored time stamp of the multi-partition transaction of each partition of the standby cluster, determining that the target timestamp of the first partition includes :
- the coordination server maintains the timestamp of the multi-partition transaction by each partition that is maintained, which can determine which multi-partition transactions already exist in all partitions, which multi-partition transactions do not exist in all partitions, and notify the corresponding partition by the target timestamp. Which log records can be executed so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed. It is not necessary to enter a wait state every time a multi-partition transaction is encountered, and data inconsistency of each partition in the standby cluster is avoided.
- determining, according to the timestamp of receiving the newly added multi-partition transaction and the timestamp of the multi-partition transaction of each partition of the standby cluster, the target timestamp packet of the first partition is determined. Include: determining an intersection of timestamps of multi-partition transactions of each partition of the standby cluster, and determining whether the intersection is an empty set; if the intersection is not an empty set, obtaining a value from the intersection The largest timestamp is the target timestamp of the first partition, and the highest timestamp is used to indicate that the first partition performs the first multi-partition transaction after the target timestamp in the first partition.
- the first partition sequentially performs the logging in the intra-partition replication log until the execution of the logging of the multi-partition transaction is stopped.
- the first partition can ensure data consistency between the first partition and other partitions in the standby cluster by performing log recording within the partition according to the target time stamp.
- the coordination server may also include a power component 1026 configured to perform power management of the coordination server, a wired or wireless network interface 1050 configured to connect the coordination server to the network, and an input/output (I/O) interface 1058.
- Coordination server can operate based on the operating system stored in memory 1032, for example, Windows Server TM, Mac OS X TM , Unix TM, Linux TM, FreeBSD TM or similar.
- the coordination server provided by the embodiment of the present invention can determine which multi-partition transactions already exist in all the partitions, and which multi-partition transactions do not exist in all the partitions, by using the timestamp of the multi-partition transaction for each partition maintained, and The target timestamp is used to inform the corresponding partition which log records can be executed, so that the corresponding partition can execute multi-partition transactions that exist in all partitions but are not necessarily executed, and do not have to wait for each multi-partition transaction, avoiding the standby cluster.
- the data for each partition is inconsistent.
- FIG. 11 is a schematic structural diagram of a database replication apparatus of a distributed system according to an embodiment of the present invention.
- a processing component 1122 is included that further includes one or more processors, and memory resources represented by memory 1132 for storing instructions executable by processing component 1122, such as an application.
- An application stored in memory 1132 can include one or more modules each corresponding to a set of instructions. Additionally, processing component 1122 is configured to:
- the coordinating server sends a timestamp of the newly added multi-partition transaction in the first partition to the coordinating server.
- the coordinating server receives the timestamp of the newly added multi-partition transaction and the timestamp of the multi-partition transaction of each partition of the stored standby cluster. Determining a target timestamp of the first partition, and transmitting a target timestamp of the first partition to the first partition, the target timestamp being used to indicate information of the multi-partition transaction that the first partition can perform;
- the replication log within the first partition is executed according to the target timestamp of the first partition.
- the processor is further configured to: if the target timestamp of the first partition is the lowest timestamp in the timestamp of the multi-partition transaction other than the intersection of the timestamps of the multi-partition transactions of each partition of the standby cluster, The log record before the multi-partition transaction corresponding to the target timestamp in the replication log of the first partition is executed; if the target timestamp is the first specified timestamp, the log record in the replication log in the first partition is executed until the next time The execution of a new multi-partition transaction is stopped.
- the first specified timestamp indicates that the timestamps of the multi-partition transactions of each partition of the standby cluster are completely coincident.
- the first partition performs log records according to the target timestamp, which ensures that the executed multi-partition transaction exists in all partitions, and does not have to enter a wait state every time a multi-partition transaction is encountered, thereby avoiding the first partition and other partitions in the standby cluster.
- the data is inconsistent.
- the processor is further configured to: when the target timestamp of the first partition is the timestamp with the largest value of the timestamp of the timestamp of the multi-partition transaction of each partition of the standby cluster, perform the target time in the first partition The time stamp of the first multi-partition transaction after the stamp corresponds to the log record before the multi-partition transaction; if the target timestamp of the first partition is the second specified timestamp, the log records in the first partition are executed in sequence until the encounter The execution stops when logging to the multi-partition transaction, and the second specified timestamp indicates that the intersection of the timestamps of the multi-partition transactions of each partition of the standby cluster is an empty set.
- the first partition performs log records according to the target timestamp, which ensures that the executed multi-partition transaction exists in all partitions, and does not have to enter a wait state every time a multi-partition transaction is encountered, thereby avoiding the first partition and other partitions in the standby cluster.
- the data is inconsistent.
- the device provided by the embodiment of the present invention performs log record according to the target timestamp of the first partition, and can ensure that the executed multi-partition transaction exists in all the partitions, and does not need to enter a wait state every time a multi-partition transaction is encountered, avoiding the first
- the data between one partition and other partitions in the standby cluster is inconsistent.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种分布式系统的数据库复制方法及装置,涉及数据库领域,方法包括:备集群中的第一分区向协调服务器发送第一分区中新增的多分区事务的时间戳;协调服务器根据新增的多分区事务的时间戳以及协调服务器存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间戳;协调服务器向第一分区发送目标时间戳;第一分区根据目标时间戳,执行第一分区中的复制日志。通过协调服务器维护备集群每个分区包含多分区事务的时间戳,确定哪些多分区事务在所有分区中都存在,告知相应分区能够执行哪些日志记录,使得相应分区不用等待就能够执行在所有分区中都存在但未必执行的多分区事务,避免了数据不一致且提高了复制效率。
Description
本发明涉及数据库领域,特别涉及一种分布式系统的数据库复制方法及装置。
在分布式系统中,数据库复制是指将主集群中的数据库复制到备集群中,当主集群遇到灾难整体宕机时,通过备集群提供数据服务,以解决异地容灾的问题。
一般情况下,常见的分布式系统架构是指通过计算机网络将物理上分散的多个节点连接起来组成的一个逻辑上统一的系统架构,其中的节点,具体可以为普通的计算机、移动终端、工作站或通用服务器、专用服务器等,也可以是一个虚拟节点,即虚拟机。分布式系统中的数据分为多个分区,每个分区保存一部分数据,各分区数据的合集构成完整数据,每个节点中可以包括一个分区或者多个分区。分布式系统中包括主集群和备集群,主集群和备集群中均包含多个节点,其中的每个节点包括一个分区或者多个分区。主备集群具有一一对应的分区,且主备集群的每一个分区都有一个日志缓冲区,来保存用于记录当前分区所包含事务的复制日志。复制日志中记录有多条日志记录,每条日志记录用于记录一个事务。分布式系统中事务分为单分区事务和多分区事务,单分区事务是指仅在一个分区中运行的事务,多分区事务是指在所有分区中都运行的事务。
主备集群之间通过复制日志实现数据库复制,现有的数据库复制通常采用的方法包括:当主集群中某个分区的缓冲区已满或者到达一定周期时,主集群会将该分区的复制日志发送给备集群中对应的分区;备集群中对应的分区会执行该复制日志中的所有日志记录,以实现数据库复制。该种方法中,对于多分区事务来说,在主集群的所有分区中均会保存有该多分区事务的复制日志,然而,由于各个分区的缓冲区情况不同或周期不同步,可能会出现主集群中的一些分区将该多分区事务的复制日志发送至备集群的对应分区,而一些分区并没
有将该多分区事务的复制日志发送出去,这样,导致备集群有些分区执行了某个多分区事务的复制日志,而有些分区并没有执行这个多分区事务的复制日志,使得各个分区数据不一致。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种分布式系统的数据库复制方法及装置,解决备集群的各个分区中数据不一致的问题,所述技术方案如下:
第一方面,提供了一种分布式系统的数据库复制方法,所述分布式系统包括主集群和备集群,所述主集群和所述备集群分别包括数据库的多个分区,且所述主集群中的多个分区与所述备集群中的多个分区一一对应,所述主集群中的每个分区将本分区的复制日志发送至所述备集群中的对应分区,所述复制日志中记录数据操作的事务;所述备集群中的第一分区向协调服务器发送所述第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成向所述协调服务器发送多分区事务的时间戳后,收到的所述主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;所述协调服务器根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;所述协调服务器向所述第一分区发送所述目标时间戳;所述第一分区根据所述目标时间戳,执行所述第一分区中的复制日志。
结合第一方面,在第一方面的第一种可能的实现方式中,所述所述协调服务器根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:
判断所述备集群的每个分区的多分区事务的时间戳是否完全重合;若所述备集群的每个分区的多分区事务的时间戳不完全重合,则确定所述备集群的每个分区的多分区事务的时间戳的交集,从所述备集群的每个分区除所述交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为所述第一分区的目标时间戳;若所述备集群的每个分区的多分区事务的时间戳完全重合,则将第一指定时间戳作为所述第一分区的目标时间戳,所述第一指定时间戳用于指示所
述第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,避免了备集群中各分区的数据不一致。
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述第一分区根据所述目标时间戳,执行所述第一分区中的复制日志包括:
若所述第一分区的目标时间戳为所述交集以外的多分区事务的时间戳中数值最小的时间戳,则执行所述第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述目标时间戳为所述第一指定时间戳,则执行所述第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行。第一分区通过根据目标时间戳执行分区内的日志记录,保证了与备集群中其他分区之间的数据一致性。
结合第一方面,在第一方面的第三种可能的实现方式中,所述所述协调服务器根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:
确定所述备集群的每个分区的多分区事务的时间戳的交集,并判断所述交集是否为空集;若所述交集不为空集,则从所述交集中获取数值最大的时间戳作为所述第一分区的目标时间戳;若所述交集为空集,则将第二指定时间戳作为所述第一分区的目标时间戳,所述第二指定时间戳用于指示所述第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。
结合第一方面的第三种可能的实现方式,在第一方面的第四种可能的实现方式中,所述第一分区根据所述目标时间戳,执行所述第一分区中的复制日志
包括:
若所述第一分区的目标时间戳为所述交集中数值最大的时间戳,则执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述第一分区的目标时间戳为所述第二指定时间戳,则依次执行所述第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行。第一分区通过根据目标时间戳执行分区内的日志记录,保证了与备集群中其他分区之间的数据一致性。
结合第一方面,在第一方面的第五种可能的实现方式中,所述协调服务器为所述备集群中的一个独立设备,或者架设于所述备集群中所有节点上。通过提供协调服务器的多种实施方式,提高了实现数据库复制方法的灵活性。
结合第一方面,在第一方面的第五种可能的实现方式中,所述备集群中的第一分区向协调服务器发送所述第一分区中新增的多分区事务的时间戳包括:
当所述第一分区达到触发条件时,向所述协调服务器发送所述第一分区中新增的多分区事务的时间戳,所述触发条件为所述第一分区的日志缓冲区满,或者为预设周期,或者为所述第一分区处于空闲状态。
第二方面,提供了一种分布式系统,所述系统包括主集群和备集群,所述主集群和所述备集群分别包括数据库的多个分区,且所述主集群中的多个分区与所述备集群中的多个分区一一对应,所述主集群中的每个分区将本分区的复制日志发送至所述备集群中的对应分区,所述复制日志中记录数据操作的事务,所述备集群还包括协调服务器;所述备集群中的第一分区用于向所述协调服务器发送所述第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成向所述协调服务器发送多分区事务的时间戳后,收到的所述主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;所述协调服务器用于根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;所述协调服务器还用于向所述第一分区发送所述目标时间戳;所述第一分区还用于根据所述目标时间戳执行所述第一分区中的复制日志。
第三方面,提供了一种分布式系统的数据库复制方法,所述方法包括:
接收备集群的第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;向所述第一分区发送所述目标时间戳,由所述第一分区根据所述目标时间戳执行所述第一分区中的复制日志。
结合第三方面,在第三方面的第一种可能的实现方式中,所述根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:
判断所述备集群的每个分区的多分区事务的时间戳是否完全重合;若所述备集群的每个分区的多分区事务的时间戳不完全重合,则确定所述备集群的每个分区的多分区事务的时间戳的交集,从所述备集群的每个分区除所述交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为所述第一分区的目标时间戳,所述数值最小的时间戳用于指示所述第一分区执行所述第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述每个分区的多分区事务时间戳完全重合,则将第一指定时间戳作为作为所述第一分区的目标时间戳,所述第一指定时间戳用于指示所述第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。
结合第三方面,在第三方面的第二种可能的实现方式中,所述根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:确定所述所述备集群的每个分区的多分区事务的时间戳的交集,并判断所述交集是否为空集;若所述交集不为空集,则从所述交集中获取数值最大的时间戳作为所述第一分区的目标时间戳,所述数值最大的时间戳用于指示所述第一分区执行所述第一分区中
所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述交集为空集,则将第二指定时间戳作为所述第一分区的目标时间戳,所述第二指定时间戳用于指示所述第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。通过向第一分区发送目标时间戳,使得第一分区可以通过根据目标时间戳执行分区内的日志记录,保证了第一分区与备集群中其他分区之间的数据一致性。
第四方面,提供了一种分布式系统的数据库复制方法,所述方法包括:
向协调服务器发送第一分区中新增的多分区事务的时间戳,由所述协调服务器根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,并向所述第一分区发送所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志。
结合第四方面,在第四方面的第一种可能的实现方式中,所述根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志包括:
若所述第一分区的目标时间戳为所述备集群的每个分区的多分区事务的时间戳的交集以外的多分区事务的时间戳中数值最小的时间戳,则执行所述第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述目标时间戳为所述第一指定时间戳,则执行所述第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行,所述第一指定时间戳指示所述备集群的每个分区的多分区事务的时间戳完全重合。根据第一分区的目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
结合第四方面,在第四方面的第二种可能的实现方式中,所述根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志包括:
若所述第一分区的目标时间戳为所述备集群的每个分区的多分区事务的时间戳的交集中数值最大的时间戳,则执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述第一分区的目标时间戳为所述第二指定时间戳,则依次执行所述第一分区中的日
志记录,直到遇到多分区事务的日志记录时停止执行,所述第二指定时间戳指示所述备集群的每个分区的多分区事务的时间戳的交集为空集。根据第一分区的目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
第五方面,提供了一种分布式系统的数据库复制装置,所述装置包括多个功能模块用于执行上述第三方面所述的方法。在一种可能的实现方式中,所述装置还包括其他功能模块用于执行上述第三方面中多种可能的实现方式所述的方法。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。第一分区通过根据目标时间戳执行分区内的日志记录,保证了与其他分区之间的数据一致性。
第六方面,提供了一种分布式系统的数据库复制装置,所述装置包括多个功能模块用于执行上述第四方面所述的方法。在一种可能的实现方式中,所述装置还包括其他功能模块用于执行上述第四方面中多种可能的实现方式所述的方法。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。第一分区通过根据目标时间戳执行分区内的日志记录,保证了与其他分区之间的数据一致性。
第七方面,提供了一种协调服务器,包括存储器和处理器,存储器用于存储处理器可执行指令,处理器被配置为执行上述第三方面所述的方法;在一种可能的实现方式中,所述处理器还被配置为执行上述第三方面中多种可能的实
现方式所述的方法。通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。第一分区通过根据目标时间戳执行分区内的日志记录,保证了与其他分区之间的数据一致性。
第八方面,提供了一种分布式系统的数据库复制装置,包括存储器和处理器,存储器用于存储处理器可执行指令,处理器被配置为执行上述第四方面所述的方法;在一种可能的实现方式中,所述处理器还被配置为执行上述第四方面中多种可能的实现方式所述的方法。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。第一分区通过根据目标时间戳执行分区内的日志记录,保证了与其他分区之间的数据一致性。
本发明实施例提供的技术方案的有益效果:
备集群中每个分区将包含的所有多分区事务时间戳发送至协调服务器,使得协调服务器可以根据每个分区包含多分区事务的情况,确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,避免了备集群中各分区的数据不一致的问题。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,
还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种分布式系统的架构图;
图2是本发明实施例提供的一种分布式系统的数据库复制方法的流程图;
图3是本发明实施例提供的一种备集群包含的分区与协调服务器交互的示意图;
图4是本发明实施例提供的一种备集群包含的分区与协调服务器交互的示意图;
图5是本发明实施例提供的一种分布式系统的数据库复制方法的流程图;
图6是本发明实施例提供的一种备集群包含的分区与协调服务器交互的示意图;
图7是本发明实施例提供的一种备集群包含的分区与协调服务器交互的示意图;
图8是本发明实施例提供的一种分布式系统的数据库复制装置的框图;
图9是本发明实施例提供的一种分布式系统的数据库复制装置的框图;
图10是本发明实施例示出的一种协调服务器的结构示意图;
图11是本发明实施例示出的一种分布式系统的数据库复制装置的结构示意图。
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1是本发明实施例提供的一种分布式系统的架构图。参见图1,该分布式系统包括主集群和备集群,主集群和备集群分别包括多个分区,且主集群中的多个分区与备集群中的多个分区一一对应。
如图1所示,主备集群均包括三个节点,以每个节点包括一个分区为例。主集群中的三个分区分别为分区A、分区B和分区C,备集群中对应的分区分别为分区A1、分区B1和分区C1。图1中,仅以主备集群分别包括三个分区为例进行展示,本发明实施例对主备集群中分区的数目不做具体限定。
其中,主集群中的每个分区分别维护复制日志,也即是,主集群中的每个分区中均有一个日志缓冲区,来保存用于记录当前分区所包含事务的复制日志。每个分区的复制日志中记录有多条日志记录,每条日志记录用于记录一个
事务。复制日志的每条日志记录至少包括事务的时间戳和事务类型,事务类型包括单分区事务和多分区事务。此外,每条日志记录中还包括事务的具体内容,比如,事务对数据库执行的操作或者事务所修改的数据库记录等,使得可以通过执行该日志记录实现数据的复制。单分区事务是指仅在一个分区中运行的事务,多分区事务是指在所有分区中都运行的事务。其中,事务的时间戳可以是该事务发生时由系统为该事务添加的事务标识符(Identifier,ID),用于唯一标识该事务。
主集群中每个分区将所维护的复制日志发送至备集群中的对应分区。以图1中的分区A为例,当主集群的分区A获取到一个事务后,在该分区A维护的复制日志中添加该事务的日志记录;并在该分区A的日志缓冲区满时或者达到指定周期或者处于空闲状态时,将复制日志发送至备集群中的对应分区A1。
备集群中的每个分区维护一个日志缓冲区,该日志缓冲区用于存储主集群中对应分区的复制日志。比如,当备集群中分区A1接收到主集群中对应分区A的复制日志时,存储该复制日志到分区A1的日志缓冲区。此外,备集群中的每个分区获取复制日志的方式还可为:由备集群中的每个分区周期性地从主集群的对应分区获取复制日志,本发明实施对此不做具体限定。
此外,分布式系统中还包括协调服务器,协调服务器可以包含在备集群中,也可以包含在主集群中,也可以是备集群和主集群之外的独立实体,本发明实施例对此不做具体限定。备集群中的每个分区还用于向协调服务器发送分区内的多分区事务时间戳。该协调服务器用于存储备集群中所有多分区事务的时间戳。该协调服务器可以为分布式系统中主集群或备集群之外的一个独立设备,也可以属于备集群中,是备集群中的一个独立设备,或者架设于分布式系统中的所有节点上或一个节点上,优选的在备集群中所有节点上或一个节点上,每个节点包括一个分区或多个分区。具体实施时,可以利用分布式服务框架来实现该协调服务器的功能,该分布式服务框架可以为Zookeeper等框架,本发明实施例对此不做具体限定。
图2是本发明实施例提供的一种分布式系统的数据库复制方法的流程图,参见图2,本发明实施例提供的方法流程包括:
201、备集群中的第一分区向协调服务器发送第一分区中新增的多分区事务的时间戳。
其中,多分区事务的时间戳用于指示复制日志中的多分区事务的日志记录。新增的多分区事务为第一分区自上一次完成向协调服务器发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务,该新增的多分区事务的个数可以为1个也可以为多个,本发明实施例对比不做具体限定。
具体地,当第一分区接收到主集群对应分区的复制日志时,存储该复制日志到第一分区的日志缓冲区。第一分区向协调服务器发送新增的多分区事务的时间戳触发条件可以为第一分区的日志缓冲区满,或者为预设周期,或者为第一分区处于空闲状态等,本发明实施例对此不做具体限定。该预设周期可以为任一数值,该预设周期根据系统性能需求配置,本发明实施例对此不做具体限定。相应地,当触发条件为日志缓冲区满时,第一分区向协调服务器发送新增的多分区事务的时间戳的过程可以为:检测当前分区的日志缓冲区是否达到触发条件;若达到触发条件,则从存储的复制日志中,获取新增的多分区事务的时间戳,也即是,在复制日志包含的日志记录中,获取事务类型为多分区事务对应的时间戳;之后,将该新增的多分区事务的时间戳发送到协调服务器。
其中,第一分区处于空闲状态是指当前分区的可执行事务均已完成,处于等待继续处理新的事务的状态。例如,第一分区中包括事务1、事务2、事务3和事务4的复制日志,其中,事务1和事务2为当前可执行事务,事务3为当前不可执行事务,比如事务3为多分区事务,第一分区在将要执行事务3时,备集群的其他分区中还未全部包含该事务3的复制日志,那么第一分区便进入空闲状态,也即是,等待执行事务3的状态。
其中,第一分区可以为备集群中的任一分区,备集群中每个分区的日志缓冲区的存储容量可以相同也可以不同,本发明实施例对此不做具体限定。
202、协调服务器在接收到新增的多分区事务的时间戳后,判断备集群的每个分区的多分区事务的时间戳是否完全重合,若不完全重合,则执行下述步骤203,若完全重合,则执行下述步骤204。
在本发明实施例中,协调服务器用于维护备集群中每个分区的多分区事务的时间戳,具体地,协调服务器可以维护一张时间戳存储表,其中为备集群中每个分区开辟了对应的时间戳存储区域,用于存储每个分区的多分区事务的时间戳。其中,时间戳存储表的形式可以如表1所示。
表1
分区 | 多分区事务的时间戳 |
分区一 | 4,11,15 |
分区二 | 4,11 |
分区三 | 4,11,13 |
…… | …… |
具体地,协调服务器接收备集群的第一分区中新增的多分区事务的时间戳,并将该多分区事务的时间戳存储至该第一分区的时间戳存储区域。之后,判断备集群的每个分区的多分区事务的时间戳是否完全重合,也即是,判断备集群的每个分区的多分区事务的时间戳中是否存在最小时间戳,该最小时间戳为至少在一个分区的时间戳存储区域中不存在的多分区事务的时间戳。
需要说明的是,协调服务器在接收到备集群中的分区发送的新增的多分区事务的时间戳后,可以不用判断即可获知备集群的每个分区的多分区事务的时间戳是否完全重合,本发明实施例对是否存在判断步骤不做具体限定。
203、若备集群的每个分区的多分区事务的时间戳不完全重合,则协调服务器确定备集群的每个分区的多分区事务的时间戳的交集,从除交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为第一分区的目标时间戳。
在分发明实施例中,若备集群的每个分区的多分区事务的时间戳不完全重合,也即是,备集群的每个分区的多分区事务的时间戳中存在最小时间戳,说明备集群中每个分区中包含的多分区事务不一致,则为了保证每个分区的数据处理的一致性,协调服务器可以从除交集以外的多分区事务的时间戳中,获取第一分区的目标时间戳,该第一分区的目标时间戳用于指示第一分区可以执行的多分区事务的信息,也即是,指示第一分区可以执行的日志记录,以避免第一分区执行其他分区中未包含的多分区事务日志记录,造成各分区的数据不一致。
其中,获取第一分区的目标时间戳的过程可以为:确定备集群的每个分区重合的多分区事务的时间戳,也即是,确定备集群的每个分区的多分区事务的时间戳的交集;从除交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为第一分区的目标时间戳。
例如,备集群中包含两个分区,分区一和分区二,图3为一种备集群包含的分区与协调服务器交互的示意图。在图3中,分区一的日志缓冲区中包含事务对应的时间戳为4、5、6、11、13、14、15,其中,4、11、15为多分区事
务的时间戳,其余的为单分区事务的时间戳。分区二的日志缓冲区中包含事务对应的时间戳为3、4、7、8、9、10,其中,4为多分区事务的时间戳,其余的为单分区事务的时间戳。
以分区达到触发条件时向协调服务器发送新增的多分区事务的时间戳为例,假如,此时分区二的日志缓冲区达到触发条件,4为该分区新增的多分区事务的时间戳,则分区二将多分区事务的时间戳4发送至协调服务器,协调服务器将该多分区事务的时间戳存储至分区二的时间戳存储区域中。如图3所示,分区一的时间戳存储区域中包括多分区事务的时间戳4、11、15,分区二的时间戳存储区域中包括多分区事务的时间戳4,说明备集群的分区二中不包含多分区事务的时间戳11和15对应的日志记录。协调服务器可以明显获知分区一和分区二的多分区事务的时间戳不完全重合,且二者的多分区事务的时间戳的交集为4,除交集以外的多分区事务的时间戳为11和15,则协调服务器在11和15中,选取数值最小的时间戳作为目标时间戳,也即是,选取多分区事务的时间戳11作为分区二的目标时间戳。
在另一实施例中,为了使获取到的第一分区的目标时间戳更加满足第一分区内当前事务的执行情况,协调服务器还可以在确定备集群的每个分区的多分区事务的时间戳的交集之后,在第一分区中除交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为第一分区的目标时间戳。
204、若每个分区的多分区事务的时间戳完全重合,则协调服务器将第一指定时间戳作为第一分区的目标时间戳,第一指定时间戳用于指示第一分区执行分区内复制日志中的日志记录,直到遇到下第一个新增的多分区事务的日志记录时停止执行。
其中,第一指定时间戳的具体内容可以进行预先设置,比如,第一指定时间戳的数值可以为0或无穷大,或者第一指定时间戳可以是一个特殊字符串,比如,“execute all”等,本发明实施例对第一指定时间戳的具体内容不做具体限定。
具体地,若备集群的每个分区的多分区事务的时间戳完全重合,也即是,不存在最小时间戳,说明备集群中每个分区中包含的多分区事务完全一致,此时,协调服务器获取第一指定时间戳,并将该第一指定时间戳作为目标时间戳,该目标时间戳用于指示第一分区可以执行分区内所有事务的日志记录,且不会造成第一分区与备集群中其他分区的数据不一致。
例如,备集群中包含两个分区,分区一和分区二,图4为一种备集群包含的分区与协调服务器交互的示意图。在图4中,分区一的日志缓冲区中包含事务对应的时间戳为4、5、6、11、13、14,其中,4、11为多分区事务的时间戳,其余的为单分区事务时间戳。分区二的日志缓冲区中包含事务对应的时间戳为3、4、7、8、9、10、11,其中,4、11为多分区事务的时间戳,其余的为单分区事务时间戳。
以分区的触发条件为日志缓冲区满为例,假如,此时分区二的日志缓冲区达到触发条件,11为该分区新增的多分区事务的时间戳,则分区二将多分区事务的时间戳11发送至协调服务器,协调服务器将该多分区事务的时间戳存储至分区二的时间戳存储区域中。如图4所示,分区一的时间戳存储区域中包括多分区事务的时间戳4、11,分区二的时间戳存储区域中包括多分区事务的时间戳4、11,说明备集群的两个分区中均包含多分区事务的时间戳4和11对应的日志记录。协调服务器可以明显获知分区一和分区二的多分区事务的时间戳完全重合,因此,协调服务器确定第一指定时间戳为分区二的目标时间戳。
需要说明的是,上述步骤202至步骤204详细介绍了如下过程,使得协调服务器可以根据维护的备集群中每个分区的多分区事务的时间戳,协调备集群的各个分区执行分区内的复制日志,过程可包括:协调服务器根据接收到新增的多分区事务的时间戳以及协调服务器存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间戳。具体的可以为,协调服务器接收到第一分区发送的新增的多分区事务的时间戳后,更新本地存储的第一分区的多分区事务的时间戳,并根据更新后的各分区的多分区事务的时间戳,确定第一分区的目标时间戳。
需要说明的是,若在步骤201中,如果第一分区达到触发条件时,未获取到新增的多分区事务的时间戳,也即是,在上一次向协调服务器发送新增的多分区事务的时间戳后,第一分区未接收到主集群中对应分区发送的多分区事务的复制日志。此时,第一分区可以向协调服务器发送指定标识,该指定标识用于指示第一分区中没有新增加的多分区事务的时间戳。协调服务器在接收到第一分区发送的指定标识后,采用上述步骤202至步骤204相同的方式获取第一分区的目标时间戳后,再执行下述步骤205和步骤206。
205、协调服务器向第一分区发送目标时间戳。
在本发明实施例中,第一分区在接收到目标时间戳后,可以向协调服务器
反馈确认消息。若协调服务器在向第一分区发送目标时间戳后,在指定时间内未接收到第一分区的确认消息,则向第一分区重新发送该目标时间戳,以保证第一分区能够接收到该目标时间戳。
206、第一分区根据目标时间戳,执行第一分区中的复制日志。
在本发明实施例中,第一分区执行第一分区的复制日志的过程可以为:若第一分区的目标时间戳为交集以外的多分区事务的时间戳中数值最小的时间戳,则执行第一分区的复制日志中目标时间戳对应的多分区事务之前的日志记录;若目标时间戳为第一指定时间戳,则执行第一分区内复制日志中的所有日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行。
具体地,第一分区在接收到目标时间戳后,可以判断该目标时间戳是否为第一指定时间戳;若该目标时间戳不是第一指定时间戳,则确定该目标时间戳为交集以外的多分区事务的时间戳中数值最小的时间戳,并根据该目标时间戳指示的数值,在第一分区存储的复制日志中,获取时间戳比目标时间戳小的时间戳以及对应的待执行日志记录,并执行该待执行日志记录。若该目标时间戳是第一指定时间戳,则执行第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行。其中,下一个新增多分区事务是指第一分区在接收到目标时间戳之后,第一分区接收到的主集群对应分区发送的第一个多分区事务。
需要说明的是,当第一分区执行了复制日志中的任一日志记录后,在复制日志中删除该执行过的日志记录。
以上述步骤203给出的例子为例,对上述目标时间戳为备集群中各分区的多分区事务的时间戳的交集以外的时间戳中数值最小的时间戳的情况进行解释说明。参见图3,协调服务器在选取多分区事务的时间戳11作为分区二的目标时间戳后,将该目标时间戳发送至分区二。假如第一指定时间戳为“0”,则分区二在接收到目标时间戳后,确定该目标时间戳不是第一指定时间戳“0”,则可以根据多分区事务的时间戳指示的数值11,在存储的复制日志中,获取到时间戳小于11的时间戳包括时间戳3、4、7、8、9、10。之后,分区二执行时间戳3、4、7、8、9、10对应的日志记录。
以上述步骤204给出的例子为例,对上述目标时间戳是第一指定时间戳的情况进行解释说明。参见图4,协调服务器将第一指定时间戳作为分区二的目标时间戳发送至分区二,假如第一指定时间戳为“0”。分区二在接收到目标时
间戳后,根据该目标时间戳的内容是“0”,确定该目标时间戳为第一指定时间戳。根据第一指定时间戳指示的复制日志执行方式,执行分区内的日志记录,直到遇到后续第一个新增多分区事务的日志记录时停止执行,由于当前分区二中除了现有的多分区事务4和11外还未接收到新增多分区事务的日志记录,因此,分区二会执行时间戳为3、4、7、8、9、10、11的日志记录。
本发明实施例提供的方法,备集群中每个分区将包含的所有多分区事务的时间戳发送至协调服务器,使得协调服务器可以根据备集群中每个分区包含多分区事务的情况,确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了各分区的数据不一致。
图5是本发明实施例提供的一种分布式系统的数据库复制方法的流程图,参见图5,本发明实施例提供的方法流程包括:
501、第一分区向备集群中的协调服务器发送第一分区中新增的多分区事务的时间戳。
具体地,本步骤与上述步骤201的内容相同,在此不再赘述。
502、协调服务器在接收到新增的多分区事务的时间戳后,确定备集群的每个分区的多分区事务的时间戳的交集,并判断该交集是否为空集;若该交集不为空集,则执行下述步骤503,若该交集为空集,则执行下述步骤504。
在本发明实施例中,协调服务器用于维护备集群中每个分区的多分区事务的时间戳,具体地,协调服务器可以维护一张时间戳存储表,其中为备集群中每个分区开辟了对应的时间戳存储区域,用于存储每个分区的多分区事务的时间戳。其中,时间戳存储表的形式可以如表2所示。
表2
分区 | 多分区事务的时间戳 |
分区一 | 4,11,15 |
分区二 | 4,11 |
分区三 | 4,11,13 |
…… | …… |
具体地,协调服务器接收备集群的第一分区中新增的多分区事务的时间
戳,并将该多分区事务的时间戳存储至该第一分区的时间戳存储区域。之后,确定每个分区的多分区事务的时间戳的交集,并判断该交集是否为空集,也即是,判断每个分区的多分区事务的时间戳中是否存在最大时间戳,该最大时间戳为在所有多分区的时间戳存储区域中,都存在的多分区事务的时间戳中数值最大的时间戳。
需要说明的是,协调服务器在确定每个分区的多分区事务的时间戳之间的交集后,可以不用判断即可获知交集是否为空集,本发明实施例对是否存在判断步骤不做具体限定。
503、若该交集不为空集,则协调服务器从该交集中获取数值最大的时间戳作为第一分区的目标时间戳。
在本发明实施例中,若该交集不为空集,也即是,备集群的每个分区的多分区事务的时间戳中存在最大时间戳,说明备集群中每个分区中包含有相同的多分区事务,则为了保证每个分区的数据处理的一致性,协调服务器可以在该交集中,将数值最大的时间戳确定为目标时间戳,该目标时间戳用于告知第一分区可以执行的日志记录,以避免第一分区执行了其他分区中未包含的多分区事务日志记录,造成各分区的数据不一致。
例如,备集群中包含两个分区,分区一和分区二,图6为一种备集群包含的分区与协调服务器交互的示意图。在图6中,分区一的日志缓冲区中包含事务对应的时间戳为4、5、6、11、13、14、15,其中,4、11、15为多分区事务的时间戳,其余的为单分区事务的时间戳。分区二的日志缓冲区中包含事务对应的时间戳为3、4、7、8、9、10、11,其中,4、11为多分区事务的时间戳,其余的为单分区事务的时间戳。
以分区的触发条件为日志缓冲区满为例,假如,此时分区一的日志缓冲区达到触发条件,11、15为该分区新增的多分区事务的时间戳,则分区一将多分区事务的时间戳11、15发送至协调服务器,协调服务器将该多分区事务的时间戳存储至分区一的时间戳存储区域中。如图6所示,分区一的时间戳存储区域中包括多分区事务的时间戳4、11、15,分区二的时间戳存储区域中包括多分区事务的时间戳4、11,说明备集群的两个分区中均包含多分区事务的时间戳4和11对应的日志记录。协调服务器可以明显获知分区一和分区二的多分区事务的时间戳的交集为4、11,则协调服务器在交集中,选取数值最大的时间戳作为目标时间戳,即获取4和11中最大的时间戳,因此,协调服务器选
取多分区事务的时间戳11作为目标时间戳。
504、若该交集为空集,则协调服务器将第二指定时间戳作为第一分区的目标时间戳,第二指定时间戳用于指示第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。
其中,第二指定时间戳的具体内容可以进行预先设置,比如,第二指定时间戳的数值可以为0或无穷大,或者第二指定时间戳可以是一个特殊字符串,比如,“execute single”等,本发明实施例对第二指定时间戳的具体内容不做具体限定。
具体地,若该交集为空集,也即是,不存在最大时间戳,说明备集群中每个分区中不包含相同的多分区事务,此时,协调服务器获取第二指定时间戳,并将该第二指定时间戳作为第一分区的目标时间戳,该目标时间戳用于指示第一分区可以继续执行分区内的单分区事务的日志记录,以避免由于执行了多分区事务造成各分区的数据不一致。
例如,备集群中包含两个分区,分区一和分区二,图7为一种备集群包含的分区与协调服务器交互的示意图。在图7中,分区一的日志缓冲区中包含事务对应的时间戳为1、2、4、5、6、7,其中,4为多分区事务的时间戳,其余的为单分区事务时间戳。分区二的日志缓冲区中包含事务对应的时间戳为3,其中,3为单分区事务时间戳。
以分区的触发条件为日志缓冲区满为例,假如,此时分区一的日志缓冲区达到触发条件,4为该分区新增的多分区事务的时间戳,则分区一将多分区事务的时间戳4发送至协调服务器,协调服务器将该多分区事务的时间戳存储至分区一的时间戳存储区域中。如图7所示,分区一的时间戳存储区域中包括多分区事务的时间戳4,分区二的时间戳存储区域中没有多分区事务的时间戳,说明备集群的两个分区二中不包含相同的多分区事务的时间戳。因此,协调服务器确定第二指定时间戳为分区一的目标时间戳,以便第一分区继续执行分区内的单分区事务的日志记录。
上述步骤502至步骤504详细介绍了如下过程,使得协调服务器可以根据维护的备集群的多分区事务的时间戳,协调备集群的各个分区执行分区内的复制日志,过程可包括:协调服务器根据接收到新增的多分区事务的时间戳以及协调服务器存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间戳。
需要说明的是,若在步骤501中,如果第一分区的达到触发条件时,未获取到新增的多分区事务的时间戳,也即是,在上一次向协调服务器发送新增的多分区事务的时间戳后,第一分区未接收到主集群中对应分区发送的多分区事务的复制日志。此时,第一分区可以向协调服务器发送指定标识,该指定标识用于指示第一分区中没有新增加的多分区事务的时间戳。协调服务器在接收到第一分区发送的指定标识后,采用上述步骤502至步骤504相同的方式获取第一分区的目标时间戳后,再执行下述步骤505和步骤506。
505、协调服务器向第一分区发送目标时间戳。
具体地,该步骤与上述步骤205的内容相同,在此不再赘述。
506、第一分区根据目标时间戳,执行第一分区中的复制日志。
在本发明实施例中,第一分区执行第一分区的复制日志的过程可以为:若第一分区的目标时间戳为交集中数值最大的时间戳,则执行第一分区中该目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若第一分区的目标时间戳为第二指定时间戳,则依次执行第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行。
具体地,第一分区在接收到目标时间戳后,可以判断该目标时间戳是否为第二指定时间戳;若该目标时间戳不是第二指定时间戳,则确定该目标时间戳为交集中数值最大的时间戳,根据该目标时间戳指示的数值,执行复制日志中的日志记录,如果第一分区存储的日志记录中,存在比目标时间戳的数值大的多分区事务日志记录,则获取目标时间戳对应的多分区事务之后第一个多分区事务,并执行该第一个多分区事务的时间戳之前的日志记录;如果第一分区存储的日志记录中,不存在比目标时间戳的数值大的多分区事务日志记录,则执行第一分区中的所有日志记录,直到遇到后续新增的第一个多分区事务的日志记录时停止执行。若该目标时间戳是第二指定时间戳,则继续执行当前分区中的日志记录,若即将执行的日志记录对应单分区事务,则执行该日志记录;若即将执行的日志记录对应多分区事务,则停止执行。
需要说明的是,当第一分区执行了复制日志中的任一日志记录后,在复制日志中删除该执行过的日志记录。
以上述步骤503给出的例子为例,对上述目标时间戳为交集中数值最大的时间戳的情况进行解释说明。参见图6,协调服务器在选取多分区事务的时间戳11作为分区一目标时间戳后,将该目标时间戳发送至分区一。假如第二指
定时间戳为“0”,则分区一在接收到目标时间戳后,确定该目标时间戳不是第一指定时间戳“0”,则由于目标时间戳指示的多分区事务的时间戳为11,且第一分区中时间戳11之后的第一个多分区事务的时间戳为15,因此,第一分区会执行时间戳15之前的时间戳对应的日志记录,也即是,分区二执行时间戳4、5、6、11、13、14对应的日志记录。
以上述步骤504给出的例子为例,对上述目标时间戳是第二指定时间戳的情况进行解释说明。参见图7,协调服务器将第二指定时间戳作为目标时间戳发送至分区一,假如第二指定时间戳为“0”。分区一在接收到目标时间戳后,根据该目标时间戳的内容是“0”,确定该目标时间戳为第二指定时间戳。根据第二指定时间戳指示的复制日志执行方式,继续执行分区内的日志记录,如图7所示,分区一中即将执行的日志记录为时间戳为1的日志记录,由于时间戳1对应单分区事务,因此执行该日志记录,之后,执行时间戳为2对应单分区事务的日志记录。当遇到时间戳2对应的日志记录时,由于时间戳4对应多分区事务,因此,停止执行。
本发明实施例提供的方法,备集群中每个分区将包含的所有多分区事务的时间戳发送至协调服务器,使得协调服务器可以根据备集群每个分区包含多分区事务的情况,确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了各分区的数据不一致。
本发明实施例提供了一种分布式系统,该系统包括主集群和备集群,主集群和备集群分别包括数据库的多个分区,且主集群中的多个分区与备集群中的多个分区一一对应,主集群中的每个分区将本分区的复制日志发送至备集群中的对应分区,复制日志中记录数据操作的事务,备集群还包括协调服务器,
备集群中的第一分区用于向协调服务器发送第一分区中新增的多分区事务的时间戳,新增的多分区事务为第一分区自上一次完成向协调服务器发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;
协调服务器用于根据接收到新增的多分区事务的时间戳以及协调服务器存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间
戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;协调服务器还用于向第一分区发送该目标时间戳;
第一分区还用于根据该目标时间戳,执行第一分区中的复制日志。
本发明实施例提供的系统,备集群中每个分区将包含的所有多分区事务时间戳发送至协调服务器,使得协调服务器可以根据备集群中每个分区包含多分区事务的情况,确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了各分区的数据不一致。
图8是本发明实施例提供的一种分布式系统的数据库复制装置的框图,包括:接收模块801,获取模块802和发送模块803。
其中,接收模块801与获取模块802连接,用于接收备集群的第一分区中新增的多分区事务的时间戳,新增的多分区事务为第一分区自上一次完成发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;获取模块802与发送模块803连接,用于根据接收到新增的多分区事务的时间戳以及存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;发送模块803,用于向第一分区发送目标时间戳,由第一分区根据目标时间戳执行第一分区中的复制日志。
可选地,获取模块802用于判断备集群的每个分区的多分区事务的时间戳是否完全重合;若备集群的每个分区的多分区事务的时间戳不完全重合,则确定备集群的每个分区的多分区事务的时间戳的交集,从备集群的每个分区除交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为第一分区的目标时间戳,数值最小的时间戳用于指示第一分区执行第一分区的复制日志中该目标时间戳对应的多分区事务之前的日志记录;若每个分区的多分区事务时间戳完全重合,则将第一指定时间戳作为作为第一分区的目标时间戳,第一指定时间戳用于指示第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够
执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免备集群中各分区的数据不一致。
可选地,获取模块802用于确定备集群的每个分区的多分区事务的时间戳的交集,并判断交集是否为空集;若交集不为空集,则从交集中获取数值最大的时间戳作为第一分区的目标时间戳,数值最大的时间戳用于指示第一分区执行第一分区中该目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若交集为空集,则将第二指定时间戳作为第一分区的目标时间戳,第二指定时间戳用于指示第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,在避免了数据不一致的同时,提高了复制效率,使得该种数据库复制方法性能好。
本发明实施例提供的装置,备集群中每个分区将包含的所有多分区事务时间戳发送至协调服务器,使得协调服务器可以根据每个分区包含多分区事务的情况,确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免备集群中各分区的数据不一致。
图9是本发明实施例提供的一种分布式系统的数据库复制装置的框图,包括:发送模块901和执行模块902。
其中,发送模块901与接收模块902连接,用于向备集群中的协调服务器发送第一分区中新增的多分区事务的时间戳,由协调服务器根据接收到所述新增的多分区事务的时间戳以及存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间戳,并向第一分区发送该目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;执行模块902,用于根据第一分区的目标时间戳,执行第一分区内的复制日志。
可选地,执行模块902用于若第一分区的目标时间戳为备集群的每个分区
的交集以外的多分区事务的时间戳中数值最小的时间戳,则执行第一分区的复制日志中该目标时间戳对应的多分区事务之前的日志记录;若目标时间戳为第一指定时间戳,则执行第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行,所述第一指定时间戳指示所述备集群的每个分区的多分区事务的时间戳完全重合。根据第一分区的目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
可选地,执行模块902用于若第一分区的目标时间戳为备集群的每个分区的多分区事务的时间戳的交集中数值最大的时间戳,则执行第一分区中该目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若第一分区的目标时间戳为第二指定时间戳,则依次执行第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行,所述第二指定时间戳指示所述备集群的每个分区的多分区事务的时间戳的交集为空集。根据第一分区的目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
本发明实施例提供的装置,备集群中每个分区将包含的所有多分区事务时间戳发送至协调服务器,使得协调服务器可以根据每个分区包含多分区事务的情况,确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。
需要说明的是:上述实施例提供的数据传输装置在进行数据传输时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据传输装置与数据传输方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图10是本发明实施例示出的一种协调服务器的结构示意图。该协调服务
器用于实现上述方法中的协调服务器所执行的方法。参照图10,协调服务器包括处理组件1022,其进一步包括一个或多个处理器,以及由存储器1032所代表的存储器资源,用于存储可由处理组件1022的执行的指令,例如应用程序。存储器1032中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1022被配置为执行指令,以执行下述步骤:
接收备集群的第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;向所述第一分区发送所述目标时间戳,由所述第一分区根据所述目标时间戳执行所述第一分区中的复制日志。
可选地,所述根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:
判断所述备集群的每个分区的多分区事务的时间戳是否完全重合;若所述备集群的每个分区的多分区事务的时间戳不完全重合,则确定所述备集群的每个分区的多分区事务的时间戳的交集,从所述备集群的每个分区除所述交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为所述第一分区的目标时间戳,所述数值最小的时间戳用于指示所述第一分区执行所述第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述每个分区的多分区事务时间戳完全重合,则将第一指定时间戳作为作为所述第一分区的目标时间戳,所述第一指定时间戳用于指示所述第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。协调服务器通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。
可选地,所述根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包
括:确定所述所述备集群的每个分区的多分区事务的时间戳的交集,并判断所述交集是否为空集;若所述交集不为空集,则从所述交集中获取数值最大的时间戳作为所述第一分区的目标时间戳,所述数值最大的时间戳用于指示所述第一分区执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述交集为空集,则将第二指定时间戳作为所述第一分区的目标时间戳,所述第二指定时间戳用于指示所述第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。通过向第一分区发送目标时间戳,使得第一分区可以通过根据目标时间戳执行分区内的日志记录,保证了第一分区与备集群中其他分区之间的数据一致性。
协调服务器还可以包括一个电源组件1026被配置为执行协调服务器的电源管理,一个有线或无线网络接口1050被配置为将协调服务器连接到网络,和一个输入输出(I/O)接口1058。协调服务器可以操作基于存储在存储器1032的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
本发明实施例提供的协调服务器,通过维护的每个分区包含多分区事务的时间戳,可以确定哪些多分区事务在所有分区中都已经存在,哪些多分区事务不是在所有分区中都存在,并通过目标时间戳告知相应分区能够执行哪些日志记录,使得相应分区能够执行在所有分区中都存在但未必执行的多分区事务,不必每遇到一个多分区事务就进入等待状态,避免了备集群中各分区的数据不一致。
图11是本发明实施例示出的一种分布式系统的数据库复制装置的结构示意图。参照图11,包括处理组件1122,其进一步包括一个或多个处理器,以及由存储器1132所代表的存储器资源,用于存储可由处理组件1122的执行的指令,例如应用程序。存储器1132中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1122被配置为:
向协调服务器发送第一分区中新增的多分区事务的时间戳,由协调服务器根据接收到新增的多分区事务的时间戳以及存储的备集群的每个分区的多分区事务的时间戳,确定第一分区的目标时间戳,并向第一分区发送第一分区的目标时间戳,该目标时间戳用于指示第一分区可以执行的多分区事务的信息;
根据第一分区的目标时间戳,执行第一分区内的复制日志。
可选地,处理器还被配置为:若第一分区的目标时间戳为备集群的每个分区的多分区事务的时间戳的交集以外的多分区事务的时间戳中数值最小的时间戳,则执行第一分区的复制日志中目标时间戳对应的多分区事务之前的日志记录;若目标时间戳为第一指定时间戳,则执行第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行,第一指定时间戳指示备集群的每个分区的多分区事务的时间戳完全重合。第一分区根据目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
可选地,处理器还被配置为:若第一分区的目标时间戳为备集群的每个分区的多分区事务的时间戳的交集中数值最大的时间戳,则执行第一分区中目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若第一分区的目标时间戳为第二指定时间戳,则依次执行第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行,第二指定时间戳指示备集群的每个分区的多分区事务的时间戳的交集为空集。第一分区根据目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
本发明实施例提供的装置,根据第一分区的目标时间戳执行日志记录,可以保证执行的多分区事务在所有分区中都存在,不必每遇到一个多分区事务就进入等待状态,避免了第一分区与备集群中其他分区之间的数据不一致。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (20)
- 一种分布式系统的数据库复制方法,其特征在于,所述分布式系统包括主集群和备集群,所述主集群和所述备集群分别包括数据库的多个分区,且所述主集群中的多个分区与所述备集群中的多个分区一一对应,所述主集群中的每个分区将本分区的复制日志发送至所述备集群中的对应分区,所述复制日志中记录数据操作的事务;所述备集群中的第一分区向协调服务器发送所述第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成向所述协调服务器发送多分区事务的时间戳后,收到的所述主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;所述协调服务器根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;所述协调服务器向所述第一分区发送所述目标时间戳;所述第一分区根据所述目标时间戳,执行所述第一分区中的复制日志。
- 根据权利要求1所述的方法,其特征在于,所述所述协调服务器根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:判断所述备集群的每个分区的多分区事务的时间戳是否完全重合;若所述备集群的每个分区的多分区事务的时间戳不完全重合,则确定所述备集群的每个分区的多分区事务的时间戳的交集,从所述备集群的每个分区除所述交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为所述第一分区的目标时间戳;若所述备集群的每个分区的多分区事务的时间戳完全重合,则将第一指定时间戳作为所述第一分区的目标时间戳,所述第一指定时间戳用于指示所述第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。
- 根据权利要求2所述的方法,其特征在于,所述第一分区根据所述目标时间戳,执行所述第一分区中的复制日志包括:若所述第一分区的目标时间戳为所述交集以外的多分区事务的时间戳中数值最小的时间戳,则执行所述第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述目标时间戳为所述第一指定时间戳,则执行所述第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行。
- 根据权利要求1所述的方法,其特征在于,所述所述协调服务器根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:确定所述备集群的每个分区的多分区事务的时间戳的交集,并判断所述交集是否为空集;若所述交集不为空集,则从所述交集中获取数值最大的时间戳作为所述第一分区的目标时间戳;若所述交集为空集,则将第二指定时间戳作为所述第一分区的目标时间戳,所述第二指定时间戳用于指示所述第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。
- 根据权利要求4所述的方法,其特征在于,所述第一分区根据所述目标时间戳,执行所述第一分区中的复制日志包括:若所述第一分区的目标时间戳为所述交集中数值最大的时间戳,则执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述第一分区的目标时间戳为所述第二指定时间戳,则依次执行所述第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行。
- 一种分布式系统,其特征在于,所述系统包括主集群和备集群,所述主集群和所述备集群分别包括数据库的多个分区,且所述主集群中的多个分区与所述备集群中的多个分区一一对应,所述主集群中的每个分区将本分区的复制日志发送至所述备集群中的对应分区,所述复制日志中记录数据操作的事务, 所述备集群还包括协调服务器;所述备集群中的第一分区用于向所述协调服务器发送所述第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成向所述协调服务器发送多分区事务的时间戳后,收到的所述主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;所述协调服务器用于根据接收到所述新增的多分区事务的时间戳以及所述协调服务器存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;所述协调服务器还用于向所述第一分区发送所述目标时间戳;所述第一分区还用于根据所述目标时间戳执行所述第一分区中的复制日志。
- 一种分布式系统的数据库复制方法,其特征在于,所述方法包括:接收备集群的第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;向所述第一分区发送所述目标时间戳,由所述第一分区根据所述目标时间戳执行所述第一分区中的复制日志。
- 根据权利要求7所述的方法,其特征在于,所述根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:判断所述备集群的每个分区的多分区事务的时间戳是否完全重合;若所述备集群的每个分区的多分区事务的时间戳不完全重合,则确定所述备集群的每个分区的多分区事务的时间戳的交集,从所述备集群的每个分区除所述交集以外的多分区事务的时间戳中,获取数值最小的时间戳作作为所述第一分区的目标时间戳,所述数值最小的时间戳用于指示所述第一分区执行所述 第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述每个分区的多分区事务时间戳完全重合,则将第一指定时间戳作为作为所述第一分区的目标时间戳,所述第一指定时间戳用于指示所述第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。
- 根据权利要求7所述的方法,其特征在于,所述根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳包括:确定所述所述备集群的每个分区的多分区事务的时间戳的交集,并判断所述交集是否为空集;若所述交集不为空集,则从所述交集中获取数值最大的时间戳作为所述第一分区的目标时间戳,所述数值最大的时间戳用于指示所述第一分区执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述交集为空集,则将第二指定时间戳作为所述第一分区的目标时间戳,所述第二指定时间戳用于指示所述第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。
- 一种分布式系统的数据库复制方法,所述方法包括:向协调服务器发送第一分区中新增的多分区事务的时间戳,由所述协调服务器根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,并向所述第一分区发送所述目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志。
- 根据权利要求10所述的方法,其特征在于,所述根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志包括:若所述第一分区的目标时间戳为所述备集群的每个分区的多分区事务的时间戳的交集以外的多分区事务的时间戳中数值最小的时间戳,则执行所述第一 分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述目标时间戳为所述第一指定时间戳,则执行所述第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行,所述第一指定时间戳指示所述备集群的每个分区的多分区事务的时间戳完全重合。
- 根据权利要求10所述的方法,其特征在于,所述根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志包括:若所述第一分区的目标时间戳为所述备集群的每个分区的多分区事务的时间戳的交集中数值最大的时间戳,则执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述第一分区的目标时间戳为所述第二指定时间戳,则依次执行所述第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行,所述第二指定时间戳指示所述备集群的每个分区的多分区事务的时间戳的交集为空集。
- 一种分布式系统的数据库复制装置,其特征在于,所述装置包括:接收模块,用于接收备集群的第一分区中新增的多分区事务的时间戳,所述新增的多分区事务为所述第一分区自上一次完成发送多分区事务的时间戳后,收到的主集群中的对应分区发送的复制日志中记录的事务中的多分区事务;获取模块,用于根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;发送模块,用于向所述第一分区发送所述目标时间戳,由所述第一分区根据所述目标时间戳执行所述第一分区中的复制日志。
- 根据权利要求13所述的装置,其特征在于,所述获取模块用于判断所述备集群的每个分区的多分区事务的时间戳是否完全重合;若所述备集群的每个分区的多分区事务的时间戳不完全重合,则确定所述备集群的每个分区的多分区事务的时间戳的交集,从所述备集群的每个分区除所述交集以外的多分区事务的时间戳中,获取数值最小的时间戳作为所述第一分区的目标时间戳,所述数值最小的时间戳用于指示所述第一分区执行所述第一分区的复制日志中所 述目标时间戳对应的多分区事务之前的日志记录;若所述每个分区的多分区事务时间戳完全重合,则将第一指定时间戳作为作为所述第一分区的目标时间戳,所述第一指定时间戳用于指示所述第一分区执行分区内复制日志中的日志记录,直到遇到下一个新增的多分区事务的日志记录时停止执行。
- 根据权利要求13所述的装置,其特征在于,所述获取模块用于确定所述所述备集群的每个分区的多分区事务的时间戳的交集,并判断所述交集是否为空集;若所述交集不为空集,则从所述交集中获取数值最大的时间戳作为所述第一分区的目标时间戳,所述数值最大的时间戳用于指示所述第一分区执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述交集为空集,则将第二指定时间戳作为所述第一分区的目标时间戳,所述第二指定时间戳用于指示所述第一分区依次执行分区内复制日志中的日志记录,直到遇到多分区事务的日志记录时停止执行。
- 一种分布式系统的数据库复制装置,所述装置包括:发送模块,用于向协调服务器发送第一分区中新增的多分区事务的时间戳,由所述协调服务器根据接收到所述新增的多分区事务的时间戳以及存储的所述备集群的每个分区的多分区事务的时间戳,确定所述第一分区的目标时间戳,并向所述第一分区发送所述目标时间戳,所述目标时间戳用于指示所述第一分区可以执行的多分区事务的信息;执行模块,用于根据所述第一分区的目标时间戳,执行所述第一分区内的复制日志。
- 根据权利要求16所述的装置,其特征在于,所述执行模块用于若所述第一分区的目标时间戳为所述备集群的每个分区的多分区事务的时间戳的交集以外的多分区事务的时间戳中数值最小的时间戳,则执行所述第一分区的复制日志中所述目标时间戳对应的多分区事务之前的日志记录;若所述目标时间戳为所述第一指定时间戳,则执行所述第一分区内复制日志中的日志记录,直到遇到下一个新增多分区事务的日志记录时停止执行,所述第一指定时间戳指示所述备集群的每个分区的多分区事务的时间戳完全重合。
- 根据权利要求16所述的装置,其特征在于,所述执行模块用于若所述第一分区的目标时间戳为所述备集群的每个分区的多分区事务的时间戳的交集中数值最大的时间戳,则执行所述第一分区中所述目标时间戳之后的第一个多分区事务的时间戳对应的多分区事务之前的日志记录;若所述第一分区的目标时间戳为所述第二指定时间戳,则依次执行所述第一分区中的日志记录,直到遇到多分区事务的日志记录时停止执行,所述第二指定时间戳指示所述备集群的每个分区的多分区事务的时间戳的交集为空集。
- 一种协调服务器,其特征在于,包括存储器和处理器,存储器用于存储处理器可执行指令,处理器被配置为执行上述权利要求7至9任一项所述的方法。
- 一种分布式系统的数据库复制装置,包括存储器和处理器,存储器用于存储处理器可执行指令,处理器被配置为执行上述权利要求10至12任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16899017.4A EP3438847A4 (en) | 2016-04-22 | 2016-04-22 | METHOD AND DEVICE FOR DUPLICATING A DATABASE IN A DISTRIBUTED SYSTEM |
PCT/CN2016/080068 WO2017181430A1 (zh) | 2016-04-22 | 2016-04-22 | 分布式系统的数据库复制方法及装置 |
CN201680057292.XA CN108140035B (zh) | 2016-04-22 | 2016-04-22 | 分布式系统的数据库复制方法及装置 |
US16/165,596 US11093522B2 (en) | 2016-04-22 | 2018-10-19 | Database replication method and apparatus for distributed system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/080068 WO2017181430A1 (zh) | 2016-04-22 | 2016-04-22 | 分布式系统的数据库复制方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/165,596 Continuation US11093522B2 (en) | 2016-04-22 | 2018-10-19 | Database replication method and apparatus for distributed system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017181430A1 true WO2017181430A1 (zh) | 2017-10-26 |
Family
ID=60115464
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/080068 WO2017181430A1 (zh) | 2016-04-22 | 2016-04-22 | 分布式系统的数据库复制方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11093522B2 (zh) |
EP (1) | EP3438847A4 (zh) |
CN (1) | CN108140035B (zh) |
WO (1) | WO2017181430A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110018884A (zh) * | 2019-03-19 | 2019-07-16 | 阿里巴巴集团控股有限公司 | 分布式事务处理方法、协调装置、数据库及电子设备 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11224081B2 (en) * | 2018-12-05 | 2022-01-11 | Google Llc | Disengaged-mode active coordination set management |
US12114394B2 (en) | 2019-01-02 | 2024-10-08 | Google Llc | Multiple active-coordination-set aggregation for mobility management |
US11537454B2 (en) * | 2020-01-09 | 2022-12-27 | International Business Machines Corporation | Reducing write operations in middleware |
US11556370B2 (en) * | 2020-01-30 | 2023-01-17 | Walmart Apollo, Llc | Traversing a large connected component on a distributed file-based data structure |
CN112527759B (zh) * | 2021-02-09 | 2021-06-11 | 腾讯科技(深圳)有限公司 | 日志执行方法、装置、计算机设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110121A (zh) * | 2009-12-24 | 2011-06-29 | 阿里巴巴集团控股有限公司 | 一种数据处理方法及其系统 |
CN103810060A (zh) * | 2013-11-21 | 2014-05-21 | 北京奇虎科技有限公司 | 基于分布式数据库的数据备份方法及其系统 |
CN104573100A (zh) * | 2015-01-29 | 2015-04-29 | 无锡江南计算技术研究所 | 一种带自增量标识的分步式数据库同步方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100459810C (zh) * | 2006-05-30 | 2009-02-04 | 华为技术有限公司 | 使用分布式事务实现移动用户数据安全备份的方法及系统 |
US8706982B2 (en) * | 2007-12-30 | 2014-04-22 | Intel Corporation | Mechanisms for strong atomicity in a transactional memory system |
US8650155B2 (en) * | 2008-02-26 | 2014-02-11 | Oracle International Corporation | Apparatus and method for log based replication of distributed transactions using globally acknowledged commits |
US8671074B2 (en) * | 2010-04-12 | 2014-03-11 | Microsoft Corporation | Logical replication in clustered database system with adaptive cloning |
US9805108B2 (en) * | 2010-12-23 | 2017-10-31 | Mongodb, Inc. | Large distributed database clustering systems and methods |
US9323569B2 (en) * | 2014-09-10 | 2016-04-26 | Amazon Technologies, Inc. | Scalable log-based transaction management |
JP6346376B2 (ja) * | 2014-09-10 | 2018-06-20 | アマゾン・テクノロジーズ・インコーポレーテッド | 拡張縮小可能なログベーストランザクション管理 |
US10353907B1 (en) * | 2016-03-30 | 2019-07-16 | Microsoft Technology Licensing, Llc | Efficient indexing of feed updates for content feeds |
US10810268B2 (en) * | 2017-12-06 | 2020-10-20 | Futurewei Technologies, Inc. | High-throughput distributed transaction management for globally consistent sharded OLTP system and method of implementing |
US11120006B2 (en) * | 2018-06-21 | 2021-09-14 | Amazon Technologies, Inc. | Ordering transaction requests in a distributed database according to an independently assigned sequence |
US20190392047A1 (en) * | 2018-06-25 | 2019-12-26 | Amazon Technologies, Inc. | Multi-table partitions in a key-value database |
-
2016
- 2016-04-22 CN CN201680057292.XA patent/CN108140035B/zh active Active
- 2016-04-22 EP EP16899017.4A patent/EP3438847A4/en active Pending
- 2016-04-22 WO PCT/CN2016/080068 patent/WO2017181430A1/zh active Application Filing
-
2018
- 2018-10-19 US US16/165,596 patent/US11093522B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110121A (zh) * | 2009-12-24 | 2011-06-29 | 阿里巴巴集团控股有限公司 | 一种数据处理方法及其系统 |
CN103810060A (zh) * | 2013-11-21 | 2014-05-21 | 北京奇虎科技有限公司 | 基于分布式数据库的数据备份方法及其系统 |
CN104573100A (zh) * | 2015-01-29 | 2015-04-29 | 无锡江南计算技术研究所 | 一种带自增量标识的分步式数据库同步方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3438847A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110018884A (zh) * | 2019-03-19 | 2019-07-16 | 阿里巴巴集团控股有限公司 | 分布式事务处理方法、协调装置、数据库及电子设备 |
CN110018884B (zh) * | 2019-03-19 | 2023-06-06 | 创新先进技术有限公司 | 分布式事务处理方法、协调装置、数据库及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
EP3438847A4 (en) | 2019-05-01 |
US11093522B2 (en) | 2021-08-17 |
CN108140035A (zh) | 2018-06-08 |
US20190057142A1 (en) | 2019-02-21 |
EP3438847A1 (en) | 2019-02-06 |
CN108140035B (zh) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017181430A1 (zh) | 分布式系统的数据库复制方法及装置 | |
WO2019154394A1 (zh) | 分布式数据库集群系统、数据同步方法及存储介质 | |
WO2018177107A1 (zh) | 数据迁移方法、迁移服务器及存储介质 | |
US8949828B2 (en) | Single point, scalable data synchronization for management of a virtual input/output server cluster | |
US7152076B2 (en) | System and method for efficient multi-master replication | |
EP3195117B1 (en) | Automated configuration of log-coordinated storage groups | |
US11068499B2 (en) | Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching | |
CN102265277A (zh) | 数据存储系统的操作方法和装置 | |
US20140059315A1 (en) | Computer system, data management method and data management program | |
CN110990432A (zh) | 一种跨机房同步分布式缓存集群的装置和方法 | |
CN105824846B (zh) | 数据迁移方法及装置 | |
US9100443B2 (en) | Communication protocol for virtual input/output server (VIOS) cluster communication | |
CN107133231B (zh) | 一种数据获取方法和装置 | |
CN106034137A (zh) | 用于分布式系统的智能调度方法及分布式服务系统 | |
CN105721582A (zh) | 多节点文件备份系统 | |
WO2012171349A1 (zh) | 一种分布式自增计数的实现方法、装置及系统 | |
WO2018157605A1 (zh) | 一种集群文件系统中消息传输的方法及装置 | |
CN112035062B (zh) | 云计算的本地存储的迁移方法、计算机设备及存储介质 | |
CN112685499A (zh) | 一种工作业务流的流程数据同步方法、装置及设备 | |
CN117677943A (zh) | 用于混合数据处理的数据一致性机制 | |
CN115905413A (zh) | 一种基于Python协程和DataX的数据同步平台 | |
US20240211013A1 (en) | Hibernating and resuming nodes of a computing cluster | |
CN114925075B (zh) | 一种多源时空监测信息实时动态融合方法 | |
CN113448775B (zh) | 多源异构数据备份方法及装置 | |
WO2017050177A1 (zh) | 一种数据同步方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2016899017 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016899017 Country of ref document: EP Effective date: 20181101 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899017 Country of ref document: EP Kind code of ref document: A1 |