WO2022134876A1 - Data synchronization method and apparatus, and electronic device and storage medium - Google Patents

Data synchronization method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2022134876A1
WO2022134876A1 PCT/CN2021/128408 CN2021128408W WO2022134876A1 WO 2022134876 A1 WO2022134876 A1 WO 2022134876A1 CN 2021128408 W CN2021128408 W CN 2021128408W WO 2022134876 A1 WO2022134876 A1 WO 2022134876A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
global transaction
global
sub
log
Prior art date
Application number
PCT/CN2021/128408
Other languages
French (fr)
Chinese (zh)
Inventor
周日明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022134876A1 publication Critical patent/WO2022134876A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing

Definitions

  • the embodiments of the present application relate to the field of databases, and in particular, to a data synchronization method, apparatus, electronic device, and storage medium.
  • OTP On-Line Transaction Processing
  • distributed databases need to regularly unload data to analysis systems, such as data warehouses, for subsequent analysis and processing by other systems.
  • analysis systems such as data warehouses
  • the distributed database unloads the data to the analysis system, it will replay the sub-transaction log of the global transaction generated by each shard to the downstream database. .
  • the global transaction across shards includes multiple sub-transactions.
  • the global transaction across shards on a shard is Corresponding sub-transactions have generated logs, and some sub-transactions in the cross-shard global transaction have not generated logs, that is, the global transaction may not be committed or to be rolled back.
  • the transaction log will be played back to the downstream database, so that the uncommitted data will also be synchronized to the downstream database, that is, the problem of uncommitted read occurs.
  • the embodiments of the present application provide a data synchronization method, apparatus, electronic device, and storage medium.
  • an embodiment of the present application provides a data synchronization method, including: acquiring a sub-transaction log to be played back according to a fragmented transaction log; detecting the sub-transaction log to be played back according to a global transaction snapshot table updated with a preset period Whether the global transaction to which the transaction log belongs is a submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction After the global transaction, the sub-transaction log to be played back is played back to the downstream database.
  • an embodiment of the present application also provides a data synchronization device, comprising: a sub-transaction log acquisition module, configured to acquire sub-transaction logs to be played back according to fragmented transaction logs; a submitted global transaction judgment module, according to Detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to the global transaction snapshot table updated at a preset period is a submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; The playback module, after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, plays back the sub-transaction log to be played back to the downstream database.
  • an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be used by the at least one processor. Instructions executed by a processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the above-described data synchronization method.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned data synchronization method is implemented.
  • FIG. 1 is a schematic diagram of a data synchronization system according to a first embodiment of the present application
  • FIG. 2 is a flowchart of updating a global transaction snapshot table according to the first embodiment of the present application
  • FIG. 3 is a flowchart of a data synchronization method according to the first embodiment of the present application.
  • FIG. 5 is a flowchart of a data synchronization method according to a second embodiment of the present application.
  • FIG. 6 is a flowchart of a data synchronization apparatus according to a third embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
  • a data synchronization system that can be used to implement the data synchronization method in the first embodiment of the present application includes: an upstream database, that is, a distributed database 101 , a data synchronization tool 102 , and a downstream database 103 .
  • the distributed database 101 includes: a global transaction manager, computing nodes, and data shards.
  • Data sharding which can be referred to as sharding for short.
  • the Global Transaction Manager (GTM, Global Transaction Manager) is configured to create and release Global Transaction Numbers (GTIDs), maintain a list of active global transactions, also known as global transaction snapshots, and active global transactions are distributed A global transaction in the database that is started but not committed.
  • the global transaction manager records the current active global transaction list, and the active global transaction list includes the global transaction numbers (GTIDs) of all active global transactions in the current distributed database.
  • GTIDs global transaction numbers
  • Compute nodes are configured for global transaction resolution and execution in a distributed database.
  • Data sharding is responsible for local storage and local execution of data, and generates a local transaction log, which records the data rows before and after modification of the table in the shard.
  • the distributed database 101 supports a replicated table schema.
  • a distributed database can distribute the data of a large table to multiple data shards according to certain rules, and each shard stores a part of the data of the table, which is different from each other.
  • Smaller tables can use the replicated table mode, that is, the complete data of the table is stored on each data shard, and the data on each shard is the same.
  • the replication table update is also a global transaction. When the data in the replication table is updated, each shard is updated successfully together, otherwise all shards are rolled back to the state before the update.
  • the global transaction snapshot table in this embodiment adopts the replication table mode, that is, each fragment in the distributed database has a global transaction snapshot table, and the global transaction snapshot table is configured to represent the commit state of the global transaction.
  • the update of the global transaction snapshot table is equivalent to updating the data of the global transaction snapshot table of each data fragment in the distributed database 101.
  • the computing node updates the data on the global transaction manager (GTM) to each data segment.
  • GTM global transaction manager
  • the computing node queries the global transaction manager for the current active global transaction list, and updates the currently queried active global transaction list to each data shard.
  • updating the global transaction snapshot table is also a global transaction of the distributed database.
  • each shard will generate a sub-subsection corresponding to the global transaction that updates the global transaction snapshot table.
  • the transaction log, the subtransaction log records the value of the record row before modification and the value of the record row after modification.
  • the distributed database 101 updates the global snapshot table at a preset period, and a global snapshot point is generated in the log stream of each shard every preset period. That is to say, the global snapshot point is when the global transaction snapshot table is regularly updated. Log records generated in the log stream of each data shard.
  • each data shard When the global snapshot table is updated, each data shard generates a sub-transaction log, which records the data rows before and after the change of the global snapshot table. After the data synchronization tool parses the log record, it can obtain the current global transaction snapshot table. Since the table is a replicated table, the same log records are generated in the log stream of each data shard. Through the information in the log records, synchronization control can be performed between data synchronization processes. The process of generating global snapshot points every preset time period is shown in FIG. 2 .
  • Step 201 the computing node obtains the current active global transaction list from the global transaction manager according to the configured preset period.
  • the computing node obtains the current active global transaction list from the global transaction manager according to a configured timer, such as a configuration of 5 seconds, that is, a preset period of 5 seconds.
  • Step 202 the computing node writes the active global transaction list to the global transaction snapshot table of each shard.
  • Step 203 When the transaction commits, each data shard writes the update log of the global transaction snapshot table to the log replication stream.
  • the update log records the row record value before and the modified row record value of the global transaction snapshot table.
  • the data synchronization tool 102 includes multiple log collection and playback processes and distributed lock resources. Each log collection and playback process corresponds to each data fragment in the distributed database 101 one-to-one, receives the transaction log of the data fragment, processes it, and transmits the processed SQL statement to the downstream database 103 for playback.
  • the distributed lock resource in the data synchronization tool 102 can be provided by open source software such as ZoomKeeper. Multiple log collection and playback processes compete with the distributed lock resource for the master control node, and the process that obtains the master control authority coordinates other nodes to perform log playback.
  • the data synchronization tool 102 shown in FIG. 1 may be an independent device, or may be integrated into the distributed database 101 or the downstream database 103 .
  • the data synchronization method of this embodiment includes: acquiring a sub-transaction log to be played back according to a fragmented transaction log; detecting whether the global transaction to which the sub-transaction log belongs is a submitted global transaction according to a global transaction snapshot table;
  • the global transaction snapshot table is configured to record the commit state of the global transaction; after it is determined that the global transaction to which the sub-transaction log belongs belongs to the committed global transaction, the sub-transaction log is played back to the downstream database.
  • sub-transactions will be played back to the downstream database only when the global transaction is committed, avoiding the problem of uncommitted reads.
  • shards can be granular, and each shard can be replayed according to the global transaction snapshot table.
  • the sub-transaction log facilitates the realization of concurrency between shards and improves the efficiency of data synchronization.
  • Step 301 Obtain the sub-transaction log to be played back according to the transaction log of the fragment.
  • a process that corresponds one-to-one with each shard in the distributed database is created, and the process is configured to obtain the sub-transaction log to be played back according to the transaction log of the shard.
  • one process corresponds to one shard, and parses the transaction log of the corresponding shard.
  • Multiple processes can be created for multiple shards in the database to parse the transaction log at the same time, which facilitates the realization of concurrency between shards.
  • the data synchronization tool creates log collection and playback processes that correspond one-to-one with each shard in the distributed database, and each log collection and playback process obtains the transaction log of the shard from the corresponding shard node of the distributed database 101 And parse the transaction log of the shard, the sub-transaction log parsed from the transaction log of the shard but not played back is called the sub-transaction log to be played back, and the sub-transaction log to be played back is temporarily stored in the parsing order, and marked
  • the GTID list consisting of the global transaction number GTID to which each sub-transaction log to be played back belongs, and the global transaction number to which all the sub-transaction logs to be played back belong, is called the to-be-played-back list. For example, as shown in Table 1, list A to be played back contains GTID lists 7, 8, 9, 10, 11, 12; list B to be played back contains GTID lists of 4, 5, 6, 7, 8, 9; List C to be played back, including GTID lists 1, 2,
  • the global transaction number to which the subtransaction log to be played back belongs C: 7,8,9,10,11,12 log 7, log 8, log 9, log 10, log 11, log 12 B: 4,5,6,7,8,9 log 4, log 5, log 6, log 7, log 8, log 9 A: 1,2,3,4,5 log 1, log 2, log 3, log 4, log 5
  • Step 302 Detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated at a preset period.
  • the process is configured to detect, according to the global transaction snapshot table, whether the global transaction to which the sub-transaction log to be played belongs belongs to a committed global transaction.
  • the global transaction snapshot table includes: an active global transaction list; the active global transaction list records the number of the global transaction in an active state; if the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the In the active global transaction list, the global transaction to which the sub-transaction log to be played back belongs does not belong to the submitted global transaction.
  • the number of uncommitted global transactions is determined by comparing with the list of active global transactions.
  • the global transaction snapshot table further includes: a global transaction number corresponding to the global transaction that updates the table.
  • updating the global transaction snapshot table is a distributed global transaction
  • the number of the global transaction is the global transaction number corresponding to the global transaction that updates this table.
  • the distributed database itself has system hidden Column, record the global transaction number that modifies the row of this record, that is to say, the global transaction number corresponding to the global transaction that updates this table is the system column.
  • the log collection and playback process parses the transaction log of the shard, it also parses the sub-transaction log updated to the global transaction snapshot table, and the GTID value of the sub-transaction log is the global transaction that updates the table.
  • the corresponding global transaction number is used as an identifier of the global transaction snapshot table, and the identifier is configured to uniquely identify the global transaction snapshot table, that is, different identifiers indicate different global transaction snapshot tables. Also identifies the aforementioned global snapshot point. Since the global transaction snapshot table is a replicated table, each log collection and playback process can resolve to the same global transaction snapshot table. The log collection and playback process compares the current list to be played back with the active global transaction list snapshot in the global transaction snapshot table.
  • a global transaction number appears in the current list to be played back, it also appears in the global transaction snapshot, that is, the active global transaction list. , it indicates that the global transaction is not a committed transaction. If a global transaction number appears in the current to-be-playback list, but does not appear in the global transaction snapshot, that is, the active global transaction list, it indicates that the global transaction is a committed transaction.
  • the sub-transaction log corresponding to the transaction can be played back to the downstream database.
  • the circle in FIG. 4 represents the active global transaction list snapshot corresponding to the global snapshot point, and the log represents the parsed list to be played back.
  • the global transaction numbers 1, 2, and 3 appear in the list A to be played back but not in snapshot2, indicating that the three transactions 1, 2, and 3 belong to the global commit, while the global transaction numbers 4 and 5 appear in the list A to be played back.
  • global transaction numbers 4, 5, and 6 appear in the to-be-playback list B, but not in snapshot3, indicating that the global transaction has been committed before snapshot3 is generated, while global transaction numbers 7, 8, and 9 appear in the to-be-playback list. B also appears in snapshot3, indicating that 7, 8, and 9 do not belong to the committed global transaction.
  • Global transaction numbers 7, 8, 9, and 10 appear in the to-be-playback list C, but not in sanpshot4, indicating that the global transaction has been committed before snapshot4, and global transaction numbers 11 and 12 appear in the to-be-playback list C and also appear in In snapshot4, it indicates that 11 and 12 do not belong to the committed global transaction.
  • the global transaction snapshot table further includes: the current maximum committed global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is not in the active global transaction list, it is determined that Whether the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum submitted global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current The maximum committed global transaction number, the global transaction to which the sub-transaction log to be played back belongs does not belong to the submitted global transaction; otherwise, the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction.
  • the global transaction number to which the undetermined global transaction belongs needs to be compared with the current maximum committed global transaction number to determine the The number of the committed global transaction, rather than all global transactions other than uncommitted global transactions identified by the list of active global transactions as committed global transactions.
  • the global transaction number of the global transaction to which the sub-transaction log in the list to be played back belongs is generated after the current global transaction snapshot table and is not recorded in the global snapshot table, so that the judgment of the submitted global transaction is avoided. more precise.
  • the global transaction label of the global transaction generated after the global transaction snapshot table is larger than any number that appears in the global transaction snapshot table, that is, it is larger than the current maximum committed number.
  • the global transaction number is large.
  • the current maximum committed global transaction number in the global transaction snapshot table is 20, and after updating the global transaction snapshot table, a global transaction with the global transaction number 21 is generated, and the global transaction is also in an active state, That is, the uncommitted state, but because it is generated after the global transaction snapshot table, it will not be in the active global transaction list of the global transaction snapshot table.
  • the global transaction corresponding to the global transaction number 21 is an uncommitted global transaction.
  • Table 2 below is a schematic diagram of the table structure of the global snapshot table including the current maximum committed global transaction number.
  • GTID INT8 System column the global transaction number corresponding to the global transaction that updates this table GTIDLIST BLOB List of active global transactions MAXGTID INT8 The current maximum committed global transaction number
  • the data in the above table is small in scale.
  • the table can have only one record, and the replication table mode will not occupy too much hardware and software resources.
  • the table structure of the global transaction snapshot table is shown in Table 2. Combined with Table 2, determine one
  • the rules for whether a global transaction is a global commit is as follows.
  • the global transaction number to be judged is recorded as GTID_1:
  • GTID_1 is found in GTIDLIST, the global transaction identified by GTID_1 is active, not globally committed, and does not belong to a globally committed transaction; otherwise, if GTID_1 is greater than MAXGTID, that is, the current maximum committed transaction number, then the global transaction identified by GTID_1 It is active and not submitted globally; otherwise, if GTID_1 is less than or equal to MAXGTID, the GTID_1 has been submitted globally.
  • the global transaction snapshot table is updated at a preset cycle. For example, in the above example, it is updated every 5 seconds. Every 5 seconds, the computing node will query the global transaction manager for the current GTIDLIST and MAXGTID, and use the queried GTIDLIST and MAXGTID pair.
  • the global transaction snapshot table is updated, and the global transaction number of the global transaction of the global snapshot table is updated, that is, the GTID in the global update snapshot table is also updated, and the updated global transaction snapshot table is updated.
  • the row record value of the shard is recorded in the transaction log of the shard. Therefore, an updated global transaction snapshot table is parsed from the transaction log of the shard at regular intervals.
  • the field GTID used is the 30 global transaction snapshot table.
  • the global transaction snapshot table at this point, the step of detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to the global transaction to which the sub-transaction log belongs is a submitted global transaction has been completed according to the global transaction snapshot table whose field GTID is 30.
  • the global transaction snapshot table detects whether the global transaction to which the sub-transaction log to be replayed belongs is a committed global transaction.
  • Step 303 After it is determined that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the sub-transaction log to be played back is played back to the downstream database.
  • each shard will synchronously replay the sub-transactions corresponding to the submitted global transactions detected according to the same global transaction snapshot table, so as to avoid the problem of inconsistent playback data among shards, and make the data synchronization progress between shards the same. For example: synchronizing data to a downstream database.
  • the global transaction snapshot table identifier can be configured to indicate a global transaction snapshot table, it can be the update time of the global transaction snapshot table or the value of the field GTID in the global transaction snapshot table. There are no restrictions. In this embodiment, the global transaction snapshot table is identified as the value of the field GTID in the global transaction snapshot table as an example.
  • the log collection and playback process performs detection based on a global transaction snapshot table.
  • a detection completion notification is sent.
  • the detection completion notification includes the global transaction snapshot table identifier corresponding to the global transaction snapshot table.
  • the tool creates a module that is configured to obtain the detection completion notification sent by each log collection and playback process, and confirms that the processes corresponding to each shard in the distributed database have completed detection based on the same global transaction snapshot table according to the detection completion notification. , and then issue a playback command to each log collection and playback process.
  • the playback command includes the above-mentioned global transaction snapshot table identifier of the same global transaction snapshot table.
  • Each log collection and playback process will use the same global transaction snapshot table identifier to indicate The detected subtransaction logs belonging to committed global transactions are played back to the downstream database. For example, according to the global transaction snapshot table whose GTID is 30 in the global transaction snapshot table, it is detected that the submitted global transaction has been completed. The sub-transaction log corresponding to the committed global transaction determined by the table is played back to the downstream database, and the committed global transaction determined according to the global transaction snapshot table with GTID of 30 is a cross-shard transaction. At the same time, when the data is changed, data shard 1 generates sub-transaction A, and data shard 2 generates sub-transaction B.
  • Process 1 synchronizes data fragment 1, parses and plays back the sub-transaction A log
  • process 2 synchronizes data fragment 2, parses and plays back the sub-transaction B log.
  • each log collection and playback process will respond to the playback command, and data fragment 1 will Sub-transaction A log
  • data shard 2 will synchronously replay the sub-transaction B log to the downstream database, so that the data generated by each shard can be played back concurrently, avoiding that when each shard is played back, it belongs to the same submitted global transaction.
  • the problem of inconsistent progress of sub-transaction log playback to the downstream database for example, the sub-transaction A log is played back to the downstream database, but the sub-transaction B log is not played back to the downstream database, that is to say, the problem of inconsistent reading is solved and data synchronization between shards
  • the progress is similar, so as to avoid the dirty read of the downstream database caused by the difference of the sub-transaction data synchronization progress.
  • a method which sorts the global transactions in order, and replays multiple sub-transactions belonging to the same global transaction to the downstream together. Sorting, and monitoring the sub-transactions corresponding to the global transaction of each shard, to determine whether the global transaction has been committed, the performance overhead is high, and the synchronization efficiency is low.
  • the sub-transaction log to be played back is obtained according to the transaction log of the shard, and each shard detects the global transaction to which the sub-transaction log belongs according to the global transaction snapshot table updated at a preset period, whether it belongs to the submitted global transaction, and then determines whether the sub-transaction log belongs to the global transaction.
  • the sub-transaction log is played back to the downstream database, and the transaction can be operated in parallel with the transaction log generated by the fragmentation as the granularity, and the error between the fragments is controlled within seconds level, there is no need to sort global transactions, and the performance overhead is small, enabling each shard to synchronize data in parallel between shards, improving synchronization efficiency, and ensuring that only committed global transactions will be synchronized to the downstream database.
  • the sub-transaction log that belongs to the submitted global transaction detected according to the same global transaction snapshot table is used as the target sub-transaction log, and the target sub-transaction is played back, so that the data read by the upstream database and the downstream database are kept consistent in real time, that is, each Each data shard synchronizes data to the downstream database in parallel, and the performance is close to concurrent synchronization, which can ensure the consistency of data synchronization between shards.
  • the second embodiment of the present application relates to a data synchronization method.
  • the second embodiment is roughly the same as the first embodiment, the main difference is that: in the first embodiment, the data synchronization tool creates a command sending module to send the playback command to the log collection and playback process.
  • each log collection and playback process in addition to performing the above steps 301 to 303, each log collection and playback process also needs to preempt the master control authority from the distributed lock resource, and the process that preempts the master control authority obtains the detection of each process. Notification, the global transaction snapshot table identifier in the playback command is determined according to the detection completion notification.
  • FIG. 5 A flowchart of the data synchronization method in the second embodiment of the present application is shown in FIG. 5 .
  • Step 501 At startup time, preempt the master control authority from the distributed lock resource. Specifically, when the log collection and playback process starts, it grabs the master control permission from the distributed lock, and when the master control permission is obtained, it becomes the master control process. If the master control permission is not occupied, it can seize the master control permission from the distributed lock resource. .
  • the process with the master control authority executes and receives the detection completion notices respectively sent by the processes corresponding to other shards;
  • the process that successfully preempts the control authority sends a detection completion notification, where the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
  • the process with the master control authority notifies the processes corresponding to other shards in the distributed database to detect the transactions belonging to the submitted transactions according to the same global transaction snapshot table.
  • the sub-transaction log of the global transaction is used as the target sub-transaction log to be played back to the downstream database.
  • the detection completion notice in the case of successfully preempting the master control authority, receiving detection completion notices respectively sent by processes corresponding to other shards in the distributed database, and the detection completion notice carries the data of the same global transaction snapshot table. logo.
  • the log collection and playback process 1 preempts the master control authority, and the log collection and playback process m does not preempt the master control authority as an example.
  • Step 502 when the master control authority is not preempted, query the current master control process. Specifically, the log collection and playback process m that has not preempted the master control authority queries the process that has preempted the master control authority, so as to exchange information with the master control authority, that is, to send a completion notification to the master control authority and receive playback commands.
  • the playback command is configured to notify the processes corresponding to other shards in the distributed database to use the sub-transaction logs belonging to the submitted global transactions detected according to the same global transaction snapshot table as the target sub-transaction logs, and playback to the downstream database.
  • Step 503 parse the transaction log of the fragment. This step is to obtain the submitted global transaction currently to be played back according to steps 301 to 302 in the first embodiment of the present application.
  • Step 504 sending a detection completion notification.
  • the log collection and playback process m detects that the submitted global transaction has been completed according to the global transaction snapshot table whose GTI D is 30 in the global transaction snapshot table, it sends a completion notification to the log collection and playback process 1.
  • the completion notification includes: GTI D is 30.
  • Step 505 sending a playback command.
  • the log collection and playback process determines the global transaction snapshot table identifier of the playback command according to the global transaction snapshot table identifier in the sending completion notification of each process.
  • the master control process receives all processes according to the global transaction snapshot table.
  • the global transaction snapshot table of 30 detects that the submitted global transaction has been completed, and the master process, that is, the log collection and playback process 1, sends a playback command, and the playback command includes a GTID of 30.
  • Step 506 Play back the target sub-transaction log to the downstream database.
  • the master control permission is released. Specifically, when there is a problem with the running of the process that has preempted the master control authority, the master control authority is released, and other processes can preempt the master control authority from the distributed lock resource again. It avoids abnormal data synchronization in the upstream database and the downstream database caused by process failure, and improves the robustness.
  • the master process is the coordinator of multiple log collection and playback processes, and is responsible for receiving playback progress notifications from other processes; when all processes have finished parsing a log snapshot point, it commands all processes to playback to the downstream database.
  • the master control process can coordinate multiple log collection and playback processes to play back the global transaction at the same time, which solves the problem that some process delays lead to the increasing deviation of playback progress between processes.
  • the data is in a globally consistent state.
  • the master control process itself also replicates the log collection and playback work. If the master control node fails, other processes can regain the master and continue to work, which also improves the robustness.
  • step division of the above various methods is only for the purpose of describing clearly, and can be combined into one step or split some steps during implementation, and decomposed into multiple steps, as long as the same logical relationship is included, all within the protection scope of the present application ; Adding insignificant modifications to the algorithm or process or introducing insignificant designs, but not changing the core design of the algorithm and process are within the protection scope of this application.
  • the third embodiment of the present application relates to a data synchronization apparatus.
  • a sub-transaction log acquisition module 601 which is configured to acquire sub-transaction logs to be played back according to fragmented transaction logs
  • submitted global transaction detection Module 602 is configured to detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated with a preset period; wherein, the global transaction snapshot table is configured as Record the commit status of the global transaction
  • the playback module 603 is configured to play back the sub-transaction log to be played back to the downstream database after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the committed global transaction.
  • the sub-transaction log obtaining module 601 is further configured to create a process that corresponds to the fragmentation one-to-one; the process is configured to execute the obtaining of the sub-transaction log to be played back according to the transaction log of the fragment, the The snapshot table detects whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction, and after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, The sub-transaction log to be played back is played back to the downstream database.
  • the playback module 603 is further configured to confirm that the processes corresponding to each fragment in the distributed database have completed detection based on the same global transaction snapshot table; the playback of the sub-transaction log to be played back to the downstream database includes: : Use the sub-transaction log to be played back belonging to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log; playback the target sub-transaction log to the downstream database.
  • the playback module 603 is further configured to notify the processes corresponding to other shards in the distributed database to use the sub-transaction log to be played back belonging to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log, Playback to the downstream database.
  • the playback module 603 is further configured to receive detection completion notifications respectively sent by processes corresponding to other shards in the distributed database, where the detection completion notifications carry the identifier of the same global transaction snapshot table.
  • the playback module 603 is further configured to preempt the master control authority; if the master control authority is preempted successfully, execute the detection completion notification respectively sent by the processes corresponding to the other shards; if the master control authority is preempted If it fails, a detection completion notification is sent to the process that has successfully preempted the master control authority, and the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
  • the submitted global transaction detection module 602 is further configured to: if the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the active global transaction list, then the global transaction to which the sub-transaction log to be played back belongs Does not belong to a committed global transaction; wherein, the global transaction snapshot table includes: an active global transaction list; the active global transaction list records the number of the active global transaction.
  • the submitted global transaction detection module 602 is further configured to determine the global transaction to which the sub-transaction log to be played belongs if the global transaction number of the global transaction to which the sub-transaction log to be played belongs is not in the active global transaction list. Whether the global transaction number of the transaction is greater than the current maximum committed global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum committed global transaction number, then the The global transaction to which the sub-transaction log to be replayed belongs does not belong to the submitted global transaction; otherwise, the global transaction to which the sub-transaction log to be replayed belongs belongs to the submitted global transaction; wherein, the global transaction snapshot table further includes: The current largest committed global transaction number.
  • this embodiment is a system example corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment.
  • the related technical details mentioned in the first embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition.
  • the related technical details mentioned in this embodiment can also be applied to the first embodiment.
  • each module involved in this embodiment is a logical module.
  • a logical unit may be a physical unit, a part of a physical unit, or multiple physical units.
  • a composite implementation of the unit in order to highlight the innovative part of the present application, this embodiment does not introduce units that are not closely related to solving the technical problem raised by the present application, but this does not mean that there are no other units in this embodiment.
  • the fourth embodiment of the present application relates to an electronic device, as shown in FIG. 7 , comprising at least one processor 701 ; and a memory 702 communicatively connected to the at least one processor; wherein the memory stores data that can be instructions executed by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned data synchronization method.
  • the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory.
  • the bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
  • the fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • the sub-transaction log to be played back is obtained according to the fragmented transaction log; and whether the global transaction to which the sub-transaction log to be played belongs is detected according to the global transaction snapshot table updated at a preset period, whether it belongs to the The submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the to-be-played The sub-transaction log is played back to the downstream database, avoiding the problem of synchronizing uncommitted data to the downstream database.
  • this embodiment uses sharding as the granularity, and the shards in the distributed database determine the sub-transactions that can be played back according to the global transaction snapshot table.
  • the transaction log facilitates the realization of concurrency between shards and improves the efficiency of data synchronization.

Abstract

A data synchronization method and apparatus, and an electronic device and a storage medium. The data synchronization method provided in the present application comprises: acquiring, according to a transaction log of a slice, a transaction sub-log to be replayed (301); detecting, according to a global transaction snapshot table updated according to a preset cycle, whether a global transaction, to which said transaction sub-log belongs, belongs to a submitted global transaction (302), wherein the global transaction snapshot table is configured to record a submission state of the global transaction; and after it is determined that the global transaction, to which said transaction sub-log belongs, belongs to the submitted global transaction, replaying said transaction sub-log to a downstream database (303).

Description

数据同步方法、装置、电子设备、存储介质Data synchronization method, device, electronic device, storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请基于申请号为202011549237.7、申请日为2020年12月24日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number of 202011549237.7 and the filing date of December 24, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.
技术领域technical field
本申请实施例涉及数据库领域,特别涉及一种数据同步方法、装置、电子设备、存储介质。The embodiments of the present application relate to the field of databases, and in particular, to a data synchronization method, apparatus, electronic device, and storage medium.
背景技术Background technique
分布式数据库作为面向交易的处理系统(On-Line Transaction Processing,简称“OLTP”),需要定期将数据卸载到分析系统,如数据仓库,以供其他系统进行后续分析处理。分布式数据库中存在多个数据分片,分片间并发同步数据具有较高性能,分布式数据库将数据卸载到分析系统时,会将各分片产生的全局事务的子事务日志回放到下游数据库。As a transaction-oriented processing system (On-Line Transaction Processing, referred to as "OLTP"), distributed databases need to regularly unload data to analysis systems, such as data warehouses, for subsequent analysis and processing by other systems. There are multiple data shards in a distributed database, and concurrent data synchronization between shards has high performance. When the distributed database unloads the data to the analysis system, it will replay the sub-transaction log of the global transaction generated by each shard to the downstream database. .
然而,在分布式数据库存在跨分片的全局事务,该跨分片的全局事务包括多个子事务,多个子事务在多个数据分片执行的情况下,若某分片上的跨分片全局事务对应的子事务已经产生日志,该跨分片全局事务中有部分子事务并未产生日志,即,该全局事务可能未提交或待回滚,在一些情形下会将分片中已经产生的子事务日志会回放到下游数据库,使得未提交的数据也会同步到下游数据库,即产生了未提交读的问题。However, there is a global transaction across shards in a distributed database, and the global transaction across shards includes multiple sub-transactions. When multiple sub-transactions are executed in multiple data shards, if the global transaction across shards on a shard is Corresponding sub-transactions have generated logs, and some sub-transactions in the cross-shard global transaction have not generated logs, that is, the global transaction may not be committed or to be rolled back. The transaction log will be played back to the downstream database, so that the uncommitted data will also be synchronized to the downstream database, that is, the problem of uncommitted read occurs.
发明内容SUMMARY OF THE INVENTION
本申请实施例提出一种数据同步方法、装置、电子设备、存储介质。The embodiments of the present application provide a data synchronization method, apparatus, electronic device, and storage medium.
有鉴于此,本申请实施例提供了一种数据同步方法,包括:根据分片的事务日志获取待回放的子事务日志;根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。In view of this, an embodiment of the present application provides a data synchronization method, including: acquiring a sub-transaction log to be played back according to a fragmented transaction log; detecting the sub-transaction log to be played back according to a global transaction snapshot table updated with a preset period Whether the global transaction to which the transaction log belongs is a submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction After the global transaction, the sub-transaction log to be played back is played back to the downstream database.
有鉴于此,本申请实施例还提供了一种数据同步装置,包括:子事务日志获取模块,被配置为根据分片的事务日志获取待回放的子事务日志;已提交全局事务判断模块,根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;回放模块,在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。In view of this, an embodiment of the present application also provides a data synchronization device, comprising: a sub-transaction log acquisition module, configured to acquire sub-transaction logs to be played back according to fragmented transaction logs; a submitted global transaction judgment module, according to Detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to the global transaction snapshot table updated at a preset period is a submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; The playback module, after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, plays back the sub-transaction log to be played back to the downstream database.
有鉴于此,本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的数据同步方法。In view of this, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be used by the at least one processor. Instructions executed by a processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the above-described data synchronization method.
有鉴于此,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的数据同步方法。In view of this, an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned data synchronization method is implemented.
附图说明Description of drawings
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplified descriptions do not constitute limitations on the embodiments.
图1是根据本申请第一实施例中的数据同步系统的示意图;1 is a schematic diagram of a data synchronization system according to a first embodiment of the present application;
图2是根据本申请第一实施例中的更新全局事务快照表的流程图;2 is a flowchart of updating a global transaction snapshot table according to the first embodiment of the present application;
图3是根据本申请第一实施例中的数据同步方法的流程图;3 is a flowchart of a data synchronization method according to the first embodiment of the present application;
图4是根据本申请第一实施例中的事务日志解析示意图;4 is a schematic diagram of transaction log analysis according to the first embodiment of the present application;
图5是根据本申请第二实施例中的数据同步方法的流程图;5 is a flowchart of a data synchronization method according to a second embodiment of the present application;
图6是根据本申请第三实施例中的数据同步装置的流程图;6 is a flowchart of a data synchronization apparatus according to a third embodiment of the present application;
图7是根据本申请第四实施例中的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the objectives, technical solutions and advantages of the embodiments of the present application more clear, each embodiment of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that, in each embodiment of the present application, many technical details are provided for the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in the present application can be realized. The following divisions of the various embodiments are for the convenience of description, and should not constitute any limitation on the specific implementation of the present application, and the various embodiments may be combined with each other and referred to each other on the premise of not contradicting each other.
本申请的第一实施例涉及一种数据同步方法。参照图1所示,可以用于实现本申请第一实施例中的数据同步方法的数据同步系统包括:上游数据库,即分布式数据库101,数据同步工具102,下游数据库103。The first embodiment of the present application relates to a data synchronization method. Referring to FIG. 1 , a data synchronization system that can be used to implement the data synchronization method in the first embodiment of the present application includes: an upstream database, that is, a distributed database 101 , a data synchronization tool 102 , and a downstream database 103 .
分布式数据库101中包括:全局事务管理器、计算节点、数据分片。数据分片,可以简称为分片。全局事务管理器(GTM,Global Transaction Manager)被配置为创建和释放全局事务编号(GTID),维护活跃全局事务列表,活跃全局事务列表也称为全局事务快照(snapshot),活跃全局事务为分布式数据库中处于启动但是未提交状态的全局事务。全局事务管理器记录了当前活跃全局事务列表,活跃全局事务列表包括当前分布式数据库中所有处于活跃状态的全局事务的全局事务编号(GTID)。当某个全局事务开始时,GTM会将其GTID记录到该列表中;当某个全局事务提交时,GTM将该事务对应的GTID从中剔除。计算节点被配置为分布式数据库中的全局事务解析和执行。数据分片是负责数据的本地存储和本地执行,生成本地的事务日志,该事务日志会记录分片中的表在修改前和修改后的数据行。The distributed database 101 includes: a global transaction manager, computing nodes, and data shards. Data sharding, which can be referred to as sharding for short. The Global Transaction Manager (GTM, Global Transaction Manager) is configured to create and release Global Transaction Numbers (GTIDs), maintain a list of active global transactions, also known as global transaction snapshots, and active global transactions are distributed A global transaction in the database that is started but not committed. The global transaction manager records the current active global transaction list, and the active global transaction list includes the global transaction numbers (GTIDs) of all active global transactions in the current distributed database. When a global transaction starts, GTM records its GTID into the list; when a global transaction commits, GTM removes the GTID corresponding to the transaction. Compute nodes are configured for global transaction resolution and execution in a distributed database. Data sharding is responsible for local storage and local execution of data, and generates a local transaction log, which records the data rows before and after modification of the table in the shard.
分布式数据库101支持复制表模式。关于复制表模式,分布式数据库可以按照一定规则将一个大表的数据分散到多个数据分片上,每个分片存储该表的一部分数据,且各不相同,除此之外,针对数据规模较小的表可以采用复制表模式,即每个数据分片上都保存该表完整数据,并且每个分片上的数据都是相同的。复制表更新也属于一种全局事务,复制表中的数据更新时,每个分片一起更新成功,否则所有分片均回滚到更新前状态。本实施例中的全局事务快照表采用复制表模式,即分布式数据库中的每个分片中均存有全局事务快照表,全局 事务快照表被配置为表示全局事务的提交状态。全局事务快照表更新,相当于对分布式数据库101中的各数据分片的全局事务快照表的数据进行更新,在更新过程中,计算节点将全局事务管理器(GTM)上的数据更新到每个数据分片的全局事务快照表中,例如,计算节点向全局事务管理器查询当前的活跃全局事务列表,并将当前查询到的活跃全局事务列表更新到每个数据分片中。The distributed database 101 supports a replicated table schema. Regarding the replication table mode, a distributed database can distribute the data of a large table to multiple data shards according to certain rules, and each shard stores a part of the data of the table, which is different from each other. In addition, according to the data scale Smaller tables can use the replicated table mode, that is, the complete data of the table is stored on each data shard, and the data on each shard is the same. The replication table update is also a global transaction. When the data in the replication table is updated, each shard is updated successfully together, otherwise all shards are rolled back to the state before the update. The global transaction snapshot table in this embodiment adopts the replication table mode, that is, each fragment in the distributed database has a global transaction snapshot table, and the global transaction snapshot table is configured to represent the commit state of the global transaction. The update of the global transaction snapshot table is equivalent to updating the data of the global transaction snapshot table of each data fragment in the distributed database 101. During the update process, the computing node updates the data on the global transaction manager (GTM) to each data segment. In the global transaction snapshot table of each data shard, for example, the computing node queries the global transaction manager for the current active global transaction list, and updates the currently queried active global transaction list to each data shard.
值得一提的是,更新全局事务快照表也为分布式数据库的全局事务,在分片中的全局事务快照表更新时,每个分片会产生与更新全局事务快照表的全局事务对应的子事务日志,子事务日志记录修改前的记录行值和修改后的记录行值。分布式数据库101以预设周期更新全局快照表,每隔预设周期每个分片的日志流中会生成一个全局快照点,也就是说,全局快照点是定期更新全局事务快照表时,每个数据分片的日志流中产生的日志记录。当全局快照表更新时,每个数据分片产生一个子事务日志,该子事务日志记录全局快照表变更前后的数据行。数据同步工具解析该日志记录后,可以获得的当前全局事务快照表。由于该表是复制表,每个数据分片的日志流中均产生相同的日志记录,通过该日志记录中的信息,数据同步进程之间可以进行同步控制。每隔预设时长生成全局快照点的流程如图2所示。It is worth mentioning that updating the global transaction snapshot table is also a global transaction of the distributed database. When the global transaction snapshot table in the shard is updated, each shard will generate a sub-subsection corresponding to the global transaction that updates the global transaction snapshot table. The transaction log, the subtransaction log records the value of the record row before modification and the value of the record row after modification. The distributed database 101 updates the global snapshot table at a preset period, and a global snapshot point is generated in the log stream of each shard every preset period. That is to say, the global snapshot point is when the global transaction snapshot table is regularly updated. Log records generated in the log stream of each data shard. When the global snapshot table is updated, each data shard generates a sub-transaction log, which records the data rows before and after the change of the global snapshot table. After the data synchronization tool parses the log record, it can obtain the current global transaction snapshot table. Since the table is a replicated table, the same log records are generated in the log stream of each data shard. Through the information in the log records, synchronization control can be performed between data synchronization processes. The process of generating global snapshot points every preset time period is shown in FIG. 2 .
步骤201,计算节点根据配置的预设周期,向全局事务管理器获取当前的活跃全局事务列表。 Step 201, the computing node obtains the current active global transaction list from the global transaction manager according to the configured preset period.
在一些实例中,计算节点根据配置的定时器,如配置5秒,即预设周期为5s,向全局事务管理器获取当前的活跃全局事务列表。In some instances, the computing node obtains the current active global transaction list from the global transaction manager according to a configured timer, such as a configuration of 5 seconds, that is, a preset period of 5 seconds.
步骤202,计算节点将活跃全局事务列表写到各分片的全局事务快照表。 Step 202, the computing node writes the active global transaction list to the global transaction snapshot table of each shard.
步骤203,每个数据分片在该事务提交时,向日志复制流写入全局事务快照表更新日志。更新日志记录了全局事务快照表更新前的行记录值和修改后的行记录值。Step 203: When the transaction commits, each data shard writes the update log of the global transaction snapshot table to the log replication stream. The update log records the row record value before and the modified row record value of the global transaction snapshot table.
数据同步工具102中包括多个日志采集回放进程和分布式锁资源。每个日志采集回放进程分别一一对应分布式数据库101中的每个数据分片,接收数据分片的事务日志,并进行处理,将处理得到的SQL语句传输到下游数据库103,即进行回放。数据同步工具102中的分布式锁资源可以由ZoomKeeper等开源软件提供,多个日志采集回放进程向分布式锁资源争抢主控制节点,获得主控权限的进程协调其他节点进行日志回放。The data synchronization tool 102 includes multiple log collection and playback processes and distributed lock resources. Each log collection and playback process corresponds to each data fragment in the distributed database 101 one-to-one, receives the transaction log of the data fragment, processes it, and transmits the processed SQL statement to the downstream database 103 for playback. The distributed lock resource in the data synchronization tool 102 can be provided by open source software such as ZoomKeeper. Multiple log collection and playback processes compete with the distributed lock resource for the master control node, and the process that obtains the master control authority coordinates other nodes to perform log playback.
需要说明的是,图1中所示的数据同步工具102可以为独立的设备,也可以集成于分布式数据库101,或者下游数据库103。It should be noted that the data synchronization tool 102 shown in FIG. 1 may be an independent device, or may be integrated into the distributed database 101 or the downstream database 103 .
下面以数据同步方法应用在数据同步工具102为例进行说明,数据同步工具可以应用在服务器,也可以是其他电子设备。本实施例的数据同步方法包括:根据分片的事务日志获取待回放的子事务日志;根据全局事务快照表检测所述子事务日志所属的全局事务,是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;在确定所述子事务日志所属的全局事务属于已提交的全局事务后,将所述子事务日志回放到下游数据库。本实施例在全局事务提交时,子事务才会回放到下游数据库,避免了未提交读的问题,另外,本实施例可以分片为粒度,每个分片根据全局事务快照表确定能够回放的子事务日志,便于实现各分片间的并发,提高数据同步效率。The following description will be given by taking the data synchronization method applied to the data synchronization tool 102 as an example. The data synchronization tool may be applied to a server or other electronic devices. The data synchronization method of this embodiment includes: acquiring a sub-transaction log to be played back according to a fragmented transaction log; detecting whether the global transaction to which the sub-transaction log belongs is a submitted global transaction according to a global transaction snapshot table; The global transaction snapshot table is configured to record the commit state of the global transaction; after it is determined that the global transaction to which the sub-transaction log belongs belongs to the committed global transaction, the sub-transaction log is played back to the downstream database. In this embodiment, sub-transactions will be played back to the downstream database only when the global transaction is committed, avoiding the problem of uncommitted reads. In addition, in this embodiment, shards can be granular, and each shard can be replayed according to the global transaction snapshot table. The sub-transaction log facilitates the realization of concurrency between shards and improves the efficiency of data synchronization.
下面结合本申请第一实施例中的数据同步方法的流程图图3对本实施例中的数据同步方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。The implementation details of the data synchronization method in this embodiment will be described in detail below with reference to the flowchart of the data synchronization method in the first embodiment of the present application. must.
步骤301,根据分片的事务日志获取待回放的子事务日志。Step 301: Obtain the sub-transaction log to be played back according to the transaction log of the fragment.
在一个例子中,创建与分布式数据库中的各分片一一对应的进程,进程被配置为根据分片的事务日志获取待回放的子事务日志。该实现中,一个进程对应一个分片,进行对应分片的事务日志的解析,可以为数据库中的多个分片创建多个进程同时解析事务日志,便于实现分片间的并发。In one example, a process that corresponds one-to-one with each shard in the distributed database is created, and the process is configured to obtain the sub-transaction log to be played back according to the transaction log of the shard. In this implementation, one process corresponds to one shard, and parses the transaction log of the corresponding shard. Multiple processes can be created for multiple shards in the database to parse the transaction log at the same time, which facilitates the realization of concurrency between shards.
在一些实例中,数据同步工具创建与分布式数据库中的各分片一一对应的日志采集回放进程,每个日志采集回放进程从分布式数据库101的对应分片节点中获取分片的事务日志并解析分片的事务日志,将从分片的事务日志中解析出来但未回放的子事务日志称为待回放的子事务日志,将待回放的子事务日志按解析顺序暂存起来,并标记每条待回放的子事务日志所属的全局事务编号GTID,所有待回放的子事务日志所属的全局事务编号构成的GTID列表,称为待回放列表。例如,如表1所示的待回放列表A,包含的GTID列表为7,8,9,10,11,12;待回放列表B,包含的GTID列表为4,5,6,7,8,9;待回放列表C,包含的GTID列表为1,2,3,4,5。In some instances, the data synchronization tool creates log collection and playback processes that correspond one-to-one with each shard in the distributed database, and each log collection and playback process obtains the transaction log of the shard from the corresponding shard node of the distributed database 101 And parse the transaction log of the shard, the sub-transaction log parsed from the transaction log of the shard but not played back is called the sub-transaction log to be played back, and the sub-transaction log to be played back is temporarily stored in the parsing order, and marked The GTID list consisting of the global transaction number GTID to which each sub-transaction log to be played back belongs, and the global transaction number to which all the sub-transaction logs to be played back belong, is called the to-be-played-back list. For example, as shown in Table 1, list A to be played back contains GTID lists 7, 8, 9, 10, 11, 12; list B to be played back contains GTID lists of 4, 5, 6, 7, 8, 9; List C to be played back, including GTID lists 1, 2, 3, 4, and 5.
表1Table 1
待回放列表To be played back list 待回放的子事务日志所属的全局事务编号The global transaction number to which the subtransaction log to be played back belongs
C:7,8,9,10,11,12C: 7,8,9,10,11,12 日志7,日志8,日志9,日志10,日志11,日志12log 7, log 8, log 9, log 10, log 11, log 12
B:4,5,6,7,8,9B: 4,5,6,7,8,9 日志4,日志5,日志6,日志7,日志8,日志9log 4, log 5, log 6, log 7, log 8, log 9
A:1,2,3,4,5A: 1,2,3,4,5 日志1,日志2,日志3,日志4,日志5log 1, log 2, log 3, log 4, log 5
步骤302,根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务。Step 302: Detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated at a preset period.
在一个例子中,进程被配置为根据全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务。In one example, the process is configured to detect, according to the global transaction snapshot table, whether the global transaction to which the sub-transaction log to be played belongs belongs to a committed global transaction.
在一个例子中,全局事务快照表包括:活跃全局事务列表;活跃全局事务列表记录处于活跃状态的全局事务的编号;若所述待回放的子事务日志所属的全局事务的全局事务编号在所述活跃全局事务列表中,则所述待回放的子事务日志所属的全局事务不属于已提交的全局事务。该实现中,与活跃全局事务列表对比,确定未提交的全局事务编号。In one example, the global transaction snapshot table includes: an active global transaction list; the active global transaction list records the number of the global transaction in an active state; if the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the In the active global transaction list, the global transaction to which the sub-transaction log to be played back belongs does not belong to the submitted global transaction. In this implementation, the number of uncommitted global transactions is determined by comparing with the list of active global transactions.
在一个例子中,全局事务快照表还包括:更新本表的全局事务对应的全局事务编号。具体地说,更新全局事务快照表为一个分布式的全局事务,该全局事务的编号即为更新本表的全局事务对应的全局事务编号,值得一提的是,分布式数据库中自身有系统隐藏列,记录修改该记录行的全局事务编号,也就是说更新本表的全局事务对应的全局事务编号为系统列。In one example, the global transaction snapshot table further includes: a global transaction number corresponding to the global transaction that updates the table. Specifically, updating the global transaction snapshot table is a distributed global transaction, and the number of the global transaction is the global transaction number corresponding to the global transaction that updates this table. It is worth mentioning that the distributed database itself has system hidden Column, record the global transaction number that modifies the row of this record, that is to say, the global transaction number corresponding to the global transaction that updates this table is the system column.
在一些实例中,日志采集回放进程在解析分片的事务日志时,也会解析到对全局事务快照表更新的子事务日志,将该子事务日志的GTID值,也就是更新本表的全局事务对应的全局事务编号作为全局事务快照表的标识,该标识被配置为唯一标识全局事务快照表,即不同的标识指示不同的全局事务快照表。同时也标识上述的全局快照点。由于全局事务快照表为复制表,因此每个日志采集回放进程均能解析到相同的全局事务快照表。日志采集回放进程将当前待回放列表与全局事务快照表中的活跃全局事务列表snapshot进行对比,如果一个全局事务编号出现在当前待回放列表中,也出现在全局事务快照,也就是活跃全局事务列表中,则表明该全局事务不属于已提交事务,如果一个全局事务编号出现在当前待回放列表中,未 出现在全局事务快照,也就是活跃全局事务列表中,则表明该全局事务属于已提交事务该事务对应的子事务日志可以回放到下游数据库。In some instances, when the log collection and playback process parses the transaction log of the shard, it also parses the sub-transaction log updated to the global transaction snapshot table, and the GTID value of the sub-transaction log is the global transaction that updates the table. The corresponding global transaction number is used as an identifier of the global transaction snapshot table, and the identifier is configured to uniquely identify the global transaction snapshot table, that is, different identifiers indicate different global transaction snapshot tables. Also identifies the aforementioned global snapshot point. Since the global transaction snapshot table is a replicated table, each log collection and playback process can resolve to the same global transaction snapshot table. The log collection and playback process compares the current list to be played back with the active global transaction list snapshot in the global transaction snapshot table. If a global transaction number appears in the current list to be played back, it also appears in the global transaction snapshot, that is, the active global transaction list. , it indicates that the global transaction is not a committed transaction. If a global transaction number appears in the current to-be-playback list, but does not appear in the global transaction snapshot, that is, the active global transaction list, it indicates that the global transaction is a committed transaction. The sub-transaction log corresponding to the transaction can be played back to the downstream database.
下面结合具体的例子简要阐述,如图4所示,图4中的圆形代表全局快照点对应的活跃全局事务列表snapshot,log代表解析出的待回放列表。全局事务编号1,2,3出现在待回放列表A中而未出现在snapshot2中,表明1,2,3这三个事务属于全局提交,而全局事务编号4,5出现在待回放列表A中,也出现在snapshot2中,则表明4,5不属于已提交全局事务。以此类推,全局事务编号4,5,6出现在待回放列表B中,而未出现在snapshot3中,表明在snapshot3产生前已全局提交,而全局事务编号7,8,9出现在待回放列表B中也出现在snapshot3中,则表明7,8,9不属于已提交的全局事务。全局事务编号7,8,9,10出现在待回放列表C中,而未出现在sanpshot4中,表明在snapshot4前已全局提交,而全局事务编号11,12出现在待回放列表C中也出现在snapshot4中,则表明11,12不属于已提交的全局事务。The following is briefly described with reference to specific examples. As shown in FIG. 4 , the circle in FIG. 4 represents the active global transaction list snapshot corresponding to the global snapshot point, and the log represents the parsed list to be played back. The global transaction numbers 1, 2, and 3 appear in the list A to be played back but not in snapshot2, indicating that the three transactions 1, 2, and 3 belong to the global commit, while the global transaction numbers 4 and 5 appear in the list A to be played back. , also appears in snapshot2, indicating that 4 and 5 do not belong to the committed global transaction. By analogy, global transaction numbers 4, 5, and 6 appear in the to-be-playback list B, but not in snapshot3, indicating that the global transaction has been committed before snapshot3 is generated, while global transaction numbers 7, 8, and 9 appear in the to-be-playback list. B also appears in snapshot3, indicating that 7, 8, and 9 do not belong to the committed global transaction. Global transaction numbers 7, 8, 9, and 10 appear in the to-be-playback list C, but not in sanpshot4, indicating that the global transaction has been committed before snapshot4, and global transaction numbers 11 and 12 appear in the to-be-playback list C and also appear in In snapshot4, it indicates that 11 and 12 do not belong to the committed global transaction.
在另一个例子中,全局事务快照表还包括:当前最大已提交的全局事务编号;若所述待回放的子事务日志所属的全局事务的全局事务编号不在所述活跃全局事务列表中,则判断所述待回放的子事务日志所属的全局事务的全局事务编号是否大于所述当前最大已提交的全局事务编号;若所述待回放的子事务日志所属的全局事务的全局事务编号大于所述当前最大已提交的全局事务编号,则所述待回放的子事务日志所属的全局事务不属于已提交的全局事务;否则,所述待回放的子事务日志所属的全局事务属于已提交的全局事务。In another example, the global transaction snapshot table further includes: the current maximum committed global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is not in the active global transaction list, it is determined that Whether the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum submitted global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current The maximum committed global transaction number, the global transaction to which the sub-transaction log to be played back belongs does not belong to the submitted global transaction; otherwise, the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction.
也就是说,通过全局事务快照表中的活跃全局事务列表确定未提交全局事务之后,对于未被确定的全局事务所属的全局事务编号还需与当前最大已提交的全局事务编号进行比较,确定已提交的全局事务编号,而不是将通过与活跃全局事务列表确定的未提交全局事务之外的全局事务都作为已提交的全局事务。该实现中,避免待回放列表中的子事务日志所属的全局事务的全局事务编号在当前的全局事务快照表之后产生,未记录到全局快照表中的情况,使得对已提交的全局事务的判断更加准确。全局事务编号按照全局事务产生的时间顺序先后进行编号的,在全局事务快照表之后产生的全局事务的全局事务标号比全局事务快照表中的任意出现的编号都大,即比当前最大已提交的全局事务编号大。That is to say, after determining the uncommitted global transaction through the active global transaction list in the global transaction snapshot table, the global transaction number to which the undetermined global transaction belongs needs to be compared with the current maximum committed global transaction number to determine the The number of the committed global transaction, rather than all global transactions other than uncommitted global transactions identified by the list of active global transactions as committed global transactions. In this implementation, it is avoided that the global transaction number of the global transaction to which the sub-transaction log in the list to be played back belongs is generated after the current global transaction snapshot table and is not recorded in the global snapshot table, so that the judgment of the submitted global transaction is avoided. more precise. If the global transaction number is numbered in the chronological order of the global transaction, the global transaction label of the global transaction generated after the global transaction snapshot table is larger than any number that appears in the global transaction snapshot table, that is, it is larger than the current maximum committed number. The global transaction number is large.
在一些实例中,全局事务快照表中当前最大已提交的全局事务标号为20,而在更新该全局事务快照表之后,产生了全局事务编号为21的全局事务,该全局事务也处于活跃状态,即未提交状态,但是因为在该全局事务快照表之后产生,所以不会在该全局事务快照表的活跃全局事务列表中,为避免将全局事务编号21的全局事务确定为已提交的全局事务,将全局事务编号与最大已提交全局事务编号对比,21大于最大已提交全局事务编号20,则全局事务编号21对应的全局事务为未提交全局事务。下表2为包括当前最大已提交的全局事务编号的全局快照表的表结构示意图。In some instances, the current maximum committed global transaction number in the global transaction snapshot table is 20, and after updating the global transaction snapshot table, a global transaction with the global transaction number 21 is generated, and the global transaction is also in an active state, That is, the uncommitted state, but because it is generated after the global transaction snapshot table, it will not be in the active global transaction list of the global transaction snapshot table. In order to avoid determining the global transaction of global transaction number 21 as a committed global transaction, Comparing the global transaction number with the largest committed global transaction number, 21 is greater than the largest committed global transaction number 20, then the global transaction corresponding to the global transaction number 21 is an uncommitted global transaction. Table 2 below is a schematic diagram of the table structure of the global snapshot table including the current maximum committed global transaction number.
表2Table 2
字段field 类型type 说明illustrate
GTIDGTID INT8INT8 系统列,更新本表的全局事务对应的全局事务编号System column, the global transaction number corresponding to the global transaction that updates this table
GTIDLISTGTIDLIST BLOBBLOB 活跃全局事务列表List of active global transactions
MAXGTIDMAXGTID INT8INT8 当前最大已提交的全局事务编号The current maximum committed global transaction number
上表的数据规模较小,该表可以只有一条记录,采用复制表模式也不会过多占用硬件和软件资源,全局事务快照表的表结构如表2所示,以下结合表2,判断一个全局事务是否为全局提交的规则如下,另外,为了与表中出现的GTID区分,将待判断的全局事务编号记作GTID_1:The data in the above table is small in scale. The table can have only one record, and the replication table mode will not occupy too much hardware and software resources. The table structure of the global transaction snapshot table is shown in Table 2. Combined with Table 2, determine one The rules for whether a global transaction is a global commit is as follows. In addition, in order to distinguish it from the GTID that appears in the table, the global transaction number to be judged is recorded as GTID_1:
如果在GTIDLIST中找到GTID_1,则该GTID_1标识的全局事务为活跃状态,未全局提交,不属于已全局提交事务;否则,如果GTID_1大于MAXGTID,即当前最大已提交事务编号,则GTID_1标识的全局事务为活跃状态,未全局提交;否则,GTID_1小于等于MAXGTID,则该GTID_1已全局提交。If GTID_1 is found in GTIDLIST, the global transaction identified by GTID_1 is active, not globally committed, and does not belong to a globally committed transaction; otherwise, if GTID_1 is greater than MAXGTID, that is, the current maximum committed transaction number, then the global transaction identified by GTID_1 It is active and not submitted globally; otherwise, if GTID_1 is less than or equal to MAXGTID, the GTID_1 has been submitted globally.
另外,全局事务快照表是以预设周期更新的,例如上例中5秒更新一次,每隔5秒计算节点会向全局事务管理器查询当前的GTIDLIST和MAXGTID,使用查询到的GTIDLIST和MAXGTID对全局事务快照表更新,并将更新全局快照表这一全局事务的全局事务编号,即全局更新快照表中的GTID也进行更新,将更新后的全局事务快照表更新前的行记录值和更新后的行记录值记录到分片的事务日志中,因此,每隔一段时间会从分片的事务日志中解析出一个更新的全局事务快照表,例如,在根据以预设周期更新的全局事务快照表检测待回放的子事务日志所属的全局事务是否属于已提交的全局事务时,使用的字段GTID为30全局事务快照表,过了一段时间,从分片事务日志中解析出字段GTID为40的全局事务快照表,此时,已经完成了根据字段GTID为30的全局事务快照表检测待回放的子事务日志所属的全局事务是否属于已提交的全局事务的步骤,接下来根据字段GTID为40的全局事务快照表检测待回放的子事务日志所属的全局事务是否属于已提交的全局事务。In addition, the global transaction snapshot table is updated at a preset cycle. For example, in the above example, it is updated every 5 seconds. Every 5 seconds, the computing node will query the global transaction manager for the current GTIDLIST and MAXGTID, and use the queried GTIDLIST and MAXGTID pair. The global transaction snapshot table is updated, and the global transaction number of the global transaction of the global snapshot table is updated, that is, the GTID in the global update snapshot table is also updated, and the updated global transaction snapshot table is updated. The row record value of the shard is recorded in the transaction log of the shard. Therefore, an updated global transaction snapshot table is parsed from the transaction log of the shard at regular intervals. For example, according to the global transaction snapshot updated at a preset period When the table detects whether the global transaction to which the sub-transaction log to be replayed belongs is a committed global transaction, the field GTID used is the 30 global transaction snapshot table. The global transaction snapshot table, at this point, the step of detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to the global transaction to which the sub-transaction log belongs is a submitted global transaction has been completed according to the global transaction snapshot table whose field GTID is 30. Next, according to the field whose GTID is 40 The global transaction snapshot table detects whether the global transaction to which the sub-transaction log to be replayed belongs is a committed global transaction.
步骤303,在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。Step 303: After it is determined that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the sub-transaction log to be played back is played back to the downstream database.
在一个例子中,确认所述分布式数据库中的各分片对应的进程均已完成基于同一全局事务快照表的检测;将根据所述同一全局事务快照表检测的属于已提交全局事务的待回放的子事务日志作为目标子事务日志;将所述目标子事务日志回放到下游数据库。该实现中,各分片将根据同一全局事务快照表检测得到的已提交的全局事务对应的子事务同步进行回放,避免各分片间回放数据不一致的问题,使分片间数据同步进度相同。例如:将数据同步到下游数据库。In one example, it is confirmed that the processes corresponding to each shard in the distributed database have completed detection based on the same global transaction snapshot table; The target sub-transaction log is used as the target sub-transaction log; the target sub-transaction log is played back to the downstream database. In this implementation, each shard will synchronously replay the sub-transactions corresponding to the submitted global transactions detected according to the same global transaction snapshot table, so as to avoid the problem of inconsistent playback data among shards, and make the data synchronization progress between shards the same. For example: synchronizing data to a downstream database.
需要说明的是,全局事务快照表标识只要能被配置为指示一个全局事务快照表即可,可以是该全局事务快照表的更新时间,也可以是全局事务快照表中的字段GTID的值,此处不做限制。本实施例以全局事务快照表标识为全局事务快照表中的字段GTID的值为例。It should be noted that as long as the global transaction snapshot table identifier can be configured to indicate a global transaction snapshot table, it can be the update time of the global transaction snapshot table or the value of the field GTID in the global transaction snapshot table. There are no restrictions. In this embodiment, the global transaction snapshot table is identified as the value of the field GTID in the global transaction snapshot table as an example.
在一些实例中,日志采集回放进程基于某一全局事务快照表进行检测,检测完成之后,会发送检测完成通知,检测完成通知包括与该全局事务快照表对应的全局事务快照表标识,在数据同步工具创建一个模块,该模块被配置为获取各日志采集回放进程的发送的检测完成通知,根据检测完成通知确认分布式数据库中的各分片对应的进程均已完成基于同一全局事务快照表的检测,之后,发出回放命令给各日志采集回放进程,回放命令包括上述的同一全局事务快照表的全局事务快照表标识,各日志采集回放进程会将根据同一全局事务快照表标识指示的全局事务快照表检测出的属于已提交全局事务的子事务日志回放到下游数据库。例如:根据全局事务快照表中GTID为30的全局事务快照表检测已提交全局事务已经完成,命令模块将包括GTID为30的命令发送给各进程,各进程会将根据GTID为30的全局事务快 照表确定的已提交的全局事务对应的子事务日志回放到下游数据库,根据GTID为30的全局事务快照表确定的已提交全局事务为一个跨分片事务,在数据分片1和数据分片2同时变更数据,数据分片1产生子事务A,数据分片2产生子事务B。进程1同步数据分片1,解析回放子事务A日志,进程2同步数据分片2,解析回放子事务B日志,本实施例,各日志采集回放进程会响应回放命令,数据分片1会将子事务A日志,数据分片2会将子事务B日志同步回放到下游数据库,使得各分片产生的数据能够实现并发回放,避免了各分片回放时,属于同一已提交的全局事务对应的子事务日志回放到下游数据库进度不一致的问题,例如子事务A日志回放到下游数据库,而子事务B日志未回放到下游数据库,也就是说,解决了不一致读的问题,使得分片间数据同步进度相近,从而避免因为子事务数据同步进度差异导致的下游数据库脏读。In some instances, the log collection and playback process performs detection based on a global transaction snapshot table. After the detection is completed, a detection completion notification is sent. The detection completion notification includes the global transaction snapshot table identifier corresponding to the global transaction snapshot table. The tool creates a module that is configured to obtain the detection completion notification sent by each log collection and playback process, and confirms that the processes corresponding to each shard in the distributed database have completed detection based on the same global transaction snapshot table according to the detection completion notification. , and then issue a playback command to each log collection and playback process. The playback command includes the above-mentioned global transaction snapshot table identifier of the same global transaction snapshot table. Each log collection and playback process will use the same global transaction snapshot table identifier to indicate The detected subtransaction logs belonging to committed global transactions are played back to the downstream database. For example, according to the global transaction snapshot table whose GTID is 30 in the global transaction snapshot table, it is detected that the submitted global transaction has been completed. The sub-transaction log corresponding to the committed global transaction determined by the table is played back to the downstream database, and the committed global transaction determined according to the global transaction snapshot table with GTID of 30 is a cross-shard transaction. At the same time, when the data is changed, data shard 1 generates sub-transaction A, and data shard 2 generates sub-transaction B. Process 1 synchronizes data fragment 1, parses and plays back the sub-transaction A log, and process 2 synchronizes data fragment 2, parses and plays back the sub-transaction B log. In this embodiment, each log collection and playback process will respond to the playback command, and data fragment 1 will Sub-transaction A log, data shard 2 will synchronously replay the sub-transaction B log to the downstream database, so that the data generated by each shard can be played back concurrently, avoiding that when each shard is played back, it belongs to the same submitted global transaction. The problem of inconsistent progress of sub-transaction log playback to the downstream database, for example, the sub-transaction A log is played back to the downstream database, but the sub-transaction B log is not played back to the downstream database, that is to say, the problem of inconsistent reading is solved and data synchronization between shards The progress is similar, so as to avoid the dirty read of the downstream database caused by the difference of the sub-transaction data synchronization progress.
在一些情形下,为了解决不一致读问题,也提供了一种方法,将全局事务按照先后顺序进行排序,属于同一个全局事务的多个子事务一起回放到下游,在一些情形下需要对所有全局事务进行排序,而且要监控各分片的全局事务对应的子事务,判断全局事务是否已提交,性能开销大,同步效率低。本实施例根据分片的事务日志获取待回放的子事务日志,各分片根据以预设周期更新的全局事务快照表检测子事务日志所属的全局事务,是否属于已提交的全局事务,在确定子事务日志所属的全局事务属于已提交的全局事务后,将所述子事务日志回放到下游数据库,能以分片产生的事务日志为粒度对事务进行并行操作,分片间的误差控制在秒级别,无需对全局事务排序,性能开销小,使得各分片间能够让每个分片并行同步数据,提高了同步效率,而且还保证了只有已提交的全局事务才会同步到下游数据库,另外,将根据同一全局事务快照表检测的属于已提交全局事务的子事务日志作为目标子事务日志,对目标子事务进行回放,使得上游数据库和下游数据库读取的数据保持实时一致,即实现了每个数据分片并行同步数据到下游数据库,性能接近于并发同步,能够保证分片间数据同步一致性。In some cases, in order to solve the problem of inconsistent reading, a method is also provided, which sorts the global transactions in order, and replays multiple sub-transactions belonging to the same global transaction to the downstream together. Sorting, and monitoring the sub-transactions corresponding to the global transaction of each shard, to determine whether the global transaction has been committed, the performance overhead is high, and the synchronization efficiency is low. In this embodiment, the sub-transaction log to be played back is obtained according to the transaction log of the shard, and each shard detects the global transaction to which the sub-transaction log belongs according to the global transaction snapshot table updated at a preset period, whether it belongs to the submitted global transaction, and then determines whether the sub-transaction log belongs to the global transaction. After the global transaction to which the sub-transaction log belongs belongs to the submitted global transaction, the sub-transaction log is played back to the downstream database, and the transaction can be operated in parallel with the transaction log generated by the fragmentation as the granularity, and the error between the fragments is controlled within seconds level, there is no need to sort global transactions, and the performance overhead is small, enabling each shard to synchronize data in parallel between shards, improving synchronization efficiency, and ensuring that only committed global transactions will be synchronized to the downstream database. , the sub-transaction log that belongs to the submitted global transaction detected according to the same global transaction snapshot table is used as the target sub-transaction log, and the target sub-transaction is played back, so that the data read by the upstream database and the downstream database are kept consistent in real time, that is, each Each data shard synchronizes data to the downstream database in parallel, and the performance is close to concurrent synchronization, which can ensure the consistency of data synchronization between shards.
本申请的第二实施方式涉及一种数据同步方法。第二实施方式与第一实施方式大致相同,主要区别之处在于:在第一实施方式中,数据同步工具创建一个命令发送模块,将回放命令发送给日志采集回放进程。而在本申请第二实施方式中,各日志采集回放进程除了执行上述步骤301至303之外,还需要向分布式锁资源抢占主控权限,抢占到主控权限的进程获取各进程的检测完成通知,根据检测完成通知确定回放命令中的全局事务快照表标识。The second embodiment of the present application relates to a data synchronization method. The second embodiment is roughly the same as the first embodiment, the main difference is that: in the first embodiment, the data synchronization tool creates a command sending module to send the playback command to the log collection and playback process. In the second embodiment of the present application, in addition to performing the above steps 301 to 303, each log collection and playback process also needs to preempt the master control authority from the distributed lock resource, and the process that preempts the master control authority obtains the detection of each process. Notification, the global transaction snapshot table identifier in the playback command is determined according to the detection completion notification.
本申请第二实施例中的数据同步方法的流程图如图5所示。A flowchart of the data synchronization method in the second embodiment of the present application is shown in FIG. 5 .
步骤501,启动时刻,向分布式锁资源抢占主控权限。具体地,日志采集回放进程启动时向分布式锁抢主控权限,获得主控权限则成为主控进程,在主控权限未被占用的情况下,都可以向分布式锁资源抢占主控权限。Step 501: At startup time, preempt the master control authority from the distributed lock resource. Specifically, when the log collection and playback process starts, it grabs the master control permission from the distributed lock, and when the master control permission is obtained, it becomes the master control process. If the master control permission is not occupied, it can seize the master control permission from the distributed lock resource. .
在一个例子中,若所述主控权限抢占成功,则拥有主控权限的进程执行接收其他分片对应的进程分别发送的检测完成通知;若所述主控权限抢占失败,则向所述主控权限抢占成功的进程发送检测完成通知,所述检测完成通知携带已完成检测的全局事务快照表的标识。In one example, if the preemption of the master control authority succeeds, the process with the master control authority executes and receives the detection completion notices respectively sent by the processes corresponding to other shards; The process that successfully preempts the control authority sends a detection completion notification, where the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
在一个例子中,在抢占主控权限成功下的情况下,拥有主控权限的进程通知所述分布式数据库中的其他分片对应的进程将根据所述同一全局事务快照表检测的属于已提交全局事务的子事务日志作为目标子事务日志,回放到所述下游数据库。In an example, in the case of successfully preempting the master control authority, the process with the master control authority notifies the processes corresponding to other shards in the distributed database to detect the transactions belonging to the submitted transactions according to the same global transaction snapshot table. The sub-transaction log of the global transaction is used as the target sub-transaction log to be played back to the downstream database.
在一个例子中,在抢占主控权限成功的情况下,接收所述分布式数据库中的其他分片对 应的进程分别发送的检测完成通知,所述检测完成通知携带所述同一全局事务快照表的标识。In an example, in the case of successfully preempting the master control authority, receiving detection completion notices respectively sent by processes corresponding to other shards in the distributed database, and the detection completion notice carries the data of the same global transaction snapshot table. logo.
以下,以日志采集回放进程1抢占到主控权限,日志采集回放进程m未抢占到主控权限为例。In the following, the log collection and playback process 1 preempts the master control authority, and the log collection and playback process m does not preempt the master control authority as an example.
步骤502,未抢占到主控权限时,则查询当前主控进程。具体地,日志采集回放进程m未抢占到主控权限的进程查询抢占到主控权限的进程,以便与主控权限进行信息交互,即以便向主控权限发送完成通知,接收回放命令。回放命令被配置为通知所述分布式数据库中的其他分片对应的进程将根据所述同一全局事务快照表检测的属于已提交全局事务的子事务日志作为目标子事务日志,回放到所述下游数据库。 Step 502, when the master control authority is not preempted, query the current master control process. Specifically, the log collection and playback process m that has not preempted the master control authority queries the process that has preempted the master control authority, so as to exchange information with the master control authority, that is, to send a completion notification to the master control authority and receive playback commands. The playback command is configured to notify the processes corresponding to other shards in the distributed database to use the sub-transaction logs belonging to the submitted global transactions detected according to the same global transaction snapshot table as the target sub-transaction logs, and playback to the downstream database.
步骤503,解析分片的事务日志。本步骤就是根据本申请第一实施例中的步骤301至步骤302得到当前待回放的已提交全局事务。Step 503, parse the transaction log of the fragment. This step is to obtain the submitted global transaction currently to be played back according to steps 301 to 302 in the first embodiment of the present application.
步骤504,发送检测完成通知。 Step 504, sending a detection completion notification.
承上例的,若日志采集回放进程m根据全局事务快照表中GTI D为30的全局事务快照表检测已提交全局事务已经完成,向日志采集回放进程1发送完成通知,完成通知包括:GTI D为30。Following the above example, if the log collection and playback process m detects that the submitted global transaction has been completed according to the global transaction snapshot table whose GTI D is 30 in the global transaction snapshot table, it sends a completion notification to the log collection and playback process 1. The completion notification includes: GTI D is 30.
步骤505,发送回放命令。 Step 505, sending a playback command.
承上例的,日志采集回放进程根据各进程的发送完成通知中的全局事务快照表标识确定回放命令的全局事务快照表标识,例如,主控进程收到所有进程根据全局事务快照表中GTID为30的全局事务快照表检测已提交全局事务已经完成,则主控进程,即日志采集回放进程1发送回放命令,回放命令包括GTID为30。Following the above example, the log collection and playback process determines the global transaction snapshot table identifier of the playback command according to the global transaction snapshot table identifier in the sending completion notification of each process. For example, the master control process receives all processes according to the global transaction snapshot table. The global transaction snapshot table of 30 detects that the submitted global transaction has been completed, and the master process, that is, the log collection and playback process 1, sends a playback command, and the playback command includes a GTID of 30.
步骤506,将目标子事务日志回放到下游数据库。Step 506: Play back the target sub-transaction log to the downstream database.
承上例,其他进程接收到命令,会将根据全局事务快照表中GTID为30的全局事务快照表检测到的已提交的全局事务对应的目标子事务日志回放到下游数据库,主控进程也会根据生成的回放命令,将目标子事务日志回放到下游数据库,所有进程将目标子事务日志回放到下游数据库后,就实现了将根据全局事务快照表中GTID为30的全局事务快照表检测到的已提交的全局事务回放到下游数据库。Following the above example, when other processes receive the command, they will play back the target sub-transaction log corresponding to the committed global transaction detected according to the global transaction snapshot table whose GTID is 30 in the global transaction snapshot table to the downstream database, and the master process will also According to the generated playback command, the target sub-transaction log is played back to the downstream database. After all processes play back the target sub-transaction log to the downstream database, the detection based on the global transaction snapshot table whose GTID is 30 in the global transaction snapshot table is realized. The committed global transaction is played back to the downstream database.
在一个例子中,若所述抢占到主控权限的进程发生故障,则释放主控权限。具体地,当抢占到主控权限的进程运行出现问题,则释放主控权限,其他进程可以重新向分布式锁资源抢占主控权限。避免因进程故障导致的上游数据库和下游数据库中数据同步发生异常,提高了健壮性。In one example, if the process that has preempted the master control permission fails, the master control permission is released. Specifically, when there is a problem with the running of the process that has preempted the master control authority, the master control authority is released, and other processes can preempt the master control authority from the distributed lock resource again. It avoids abnormal data synchronization in the upstream database and the downstream database caused by process failure, and improves the robustness.
也就是说,主控进程是多个日志采集回放进程的协调者,负责接受其他进程回放进度通知;当所有进程都解析完某个日志快照点时,命令所有进程回放到下游数据库。通过上述通知-命令机制,主控进程可以协调多个日志采集回放进程在同一个时刻回放全局事务,解决了某些进程处理延迟导致进程之间回放进度偏差越来越大的问题,下游数据库的数据处于全局一致状态。主控进程本身也复制日志采集回放工作,如果主控制节点故障,其他进程可以重新抢主,继续工作,也提高了健壮性。That is to say, the master process is the coordinator of multiple log collection and playback processes, and is responsible for receiving playback progress notifications from other processes; when all processes have finished parsing a log snapshot point, it commands all processes to playback to the downstream database. Through the above notification-command mechanism, the master control process can coordinate multiple log collection and playback processes to play back the global transaction at the same time, which solves the problem that some process delays lead to the increasing deviation of playback progress between processes. The data is in a globally consistent state. The master control process itself also replicates the log collection and playback work. If the master control node fails, other processes can regain the master and continue to work, which also improves the robustness.
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本申请的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该申请的保护范围内。The step division of the above various methods is only for the purpose of describing clearly, and can be combined into one step or split some steps during implementation, and decomposed into multiple steps, as long as the same logical relationship is included, all within the protection scope of the present application ; Adding insignificant modifications to the algorithm or process or introducing insignificant designs, but not changing the core design of the algorithm and process are within the protection scope of this application.
本申请第三实施方式涉及一种数据同步装置,如图6所示,包括:子事务日志获取模块601,被配置为根据分片的事务日志获取待回放的子事务日志;已提交全局事务检测模块602,被配置为根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;回放模块603,被配置为在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。The third embodiment of the present application relates to a data synchronization apparatus. As shown in FIG. 6 , it includes: a sub-transaction log acquisition module 601, which is configured to acquire sub-transaction logs to be played back according to fragmented transaction logs; submitted global transaction detection Module 602 is configured to detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated with a preset period; wherein, the global transaction snapshot table is configured as Record the commit status of the global transaction; the playback module 603 is configured to play back the sub-transaction log to be played back to the downstream database after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the committed global transaction.
子事务日志获取模块601还被配置为创建与所述分片一一对应的进程;所述进程被配置为执行所述根据分片的事务日志获取待回放的子事务日志、所述根据全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务,以及所述在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。The sub-transaction log obtaining module 601 is further configured to create a process that corresponds to the fragmentation one-to-one; the process is configured to execute the obtaining of the sub-transaction log to be played back according to the transaction log of the fragment, the The snapshot table detects whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction, and after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, The sub-transaction log to be played back is played back to the downstream database.
回放模块603还被配置为确认所述分布式数据库中的各分片对应的进程均已完成基于同一全局事务快照表的检测;所述将所述待回放的子事务日志回放到下游数据库,包括:将根据所述同一全局事务快照表检测的属于已提交全局事务的待回放的子事务日志作为目标子事务日志;将所述目标子事务日志回放到下游数据库。The playback module 603 is further configured to confirm that the processes corresponding to each fragment in the distributed database have completed detection based on the same global transaction snapshot table; the playback of the sub-transaction log to be played back to the downstream database includes: : Use the sub-transaction log to be played back belonging to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log; playback the target sub-transaction log to the downstream database.
回放模块603还被配置为通知所述分布式数据库中的其他分片对应的进程将根据所述同一全局事务快照表检测的属于已提交全局事务的待回放的子事务日志作为目标子事务日志,回放到所述下游数据库。The playback module 603 is further configured to notify the processes corresponding to other shards in the distributed database to use the sub-transaction log to be played back belonging to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log, Playback to the downstream database.
回放模块603还被配置为接收所述分布式数据库中的其他分片对应的进程分别发送的检测完成通知,所述检测完成通知携带所述同一全局事务快照表的标识。The playback module 603 is further configured to receive detection completion notifications respectively sent by processes corresponding to other shards in the distributed database, where the detection completion notifications carry the identifier of the same global transaction snapshot table.
回放模块603还被配置为抢占所述主控权限;若所述主控权限抢占成功,则执行所述接收所述其他分片对应的进程分别发送的检测完成通知;若所述主控权限抢占失败,则向所述主控权限抢占成功的进程发送检测完成通知,所述检测完成通知携带已完成检测的全局事务快照表的标识。The playback module 603 is further configured to preempt the master control authority; if the master control authority is preempted successfully, execute the detection completion notification respectively sent by the processes corresponding to the other shards; if the master control authority is preempted If it fails, a detection completion notification is sent to the process that has successfully preempted the master control authority, and the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
已提交全局事务检测模块602还被配置为若所述待回放的子事务日志所属的全局事务的全局事务编号在所述活跃全局事务列表中,则所述待回放的子事务日志所属的全局事务不属于已提交的全局事务;其中,所述全局事务快照表包括:活跃全局事务列表;所述活跃全局事务列表记录处于活跃状态的全局事务的编号。The submitted global transaction detection module 602 is further configured to: if the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the active global transaction list, then the global transaction to which the sub-transaction log to be played back belongs Does not belong to a committed global transaction; wherein, the global transaction snapshot table includes: an active global transaction list; the active global transaction list records the number of the active global transaction.
已提交全局事务检测模块602还被配置为若所述待回放的子事务日志所属的全局事务的全局事务编号不在所述活跃全局事务列表中,则判断所述待回放的子事务日志所属的全局事务的全局事务编号是否大于所述当前最大已提交的全局事务编号;若所述待回放的子事务日志所属的全局事务的全局事务编号大于所述当前最大已提交的全局事务编号,则所述待回放的子事务日志所属的全局事务不属于已提交的全局事务;否则,所述待回放的子事务日志所属的全局事务属于已提交的全局事务;其中,所述全局事务快照表还包括:当前最大已提交的全局事务编号。The submitted global transaction detection module 602 is further configured to determine the global transaction to which the sub-transaction log to be played belongs if the global transaction number of the global transaction to which the sub-transaction log to be played belongs is not in the active global transaction list. Whether the global transaction number of the transaction is greater than the current maximum committed global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum committed global transaction number, then the The global transaction to which the sub-transaction log to be replayed belongs does not belong to the submitted global transaction; otherwise, the global transaction to which the sub-transaction log to be replayed belongs belongs to the submitted global transaction; wherein, the global transaction snapshot table further includes: The current largest committed global transaction number.
不难发现,本实施方式为与第一实施方式相对应的系统实施例,本实施方式可与第一实施方式互相配合实施。第一实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在第一实施方式中。It is not difficult to find that this embodiment is a system example corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition. Correspondingly, the related technical details mentioned in this embodiment can also be applied to the first embodiment.
值得一提的是,本实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施方式中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that each module involved in this embodiment is a logical module. In practical applications, a logical unit may be a physical unit, a part of a physical unit, or multiple physical units. A composite implementation of the unit. In addition, in order to highlight the innovative part of the present application, this embodiment does not introduce units that are not closely related to solving the technical problem raised by the present application, but this does not mean that there are no other units in this embodiment.
本申请第四实施方式涉及一种电子设备,如图7所示,包括至少一个处理器701;以及,与所述至少一个处理器通信连接的存储器702;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的数据同步方法。The fourth embodiment of the present application relates to an electronic device, as shown in FIG. 7 , comprising at least one processor 701 ; and a memory 702 communicatively connected to the at least one processor; wherein the memory stores data that can be instructions executed by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned data synchronization method.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。The memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory. The bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides the interface between the bus and the transceiver. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
本申请第五实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。The fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method for implementing the above embodiments can be completed by instructing the relevant hardware through a program, and the program is stored in a storage medium and includes several instructions to make a device ( It may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
本申请提出的数据同步方法,根据分片的事务日志获取待回放的子事务日志;根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库,避免了将未提交数据同步到下游数据库的问题,另外,本实施例以分片为粒度,分布式数据库中的分片根据全局事务快照表确定能够回放的子事务日志,便于实现各分片间的并发,提高数据同步的效率。In the data synchronization method proposed in the present application, the sub-transaction log to be played back is obtained according to the fragmented transaction log; and whether the global transaction to which the sub-transaction log to be played belongs is detected according to the global transaction snapshot table updated at a preset period, whether it belongs to the The submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the to-be-played The sub-transaction log is played back to the downstream database, avoiding the problem of synchronizing uncommitted data to the downstream database. In addition, this embodiment uses sharding as the granularity, and the shards in the distributed database determine the sub-transactions that can be played back according to the global transaction snapshot table. The transaction log facilitates the realization of concurrency between shards and improves the efficiency of data synchronization.
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present application, and in practical applications, various changes can be made in form and details without departing from the spirit and the spirit of the present application. scope.

Claims (10)

  1. 一种数据同步方法,包括:A data synchronization method, comprising:
    根据分片的事务日志获取待回放的子事务日志;Obtain the sub-transaction log to be played back according to the transaction log of the shard;
    根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;Detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated with a preset period; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction;
    在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。After it is determined that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the sub-transaction log to be played back is played back to the downstream database.
  2. 根据权利要求1所述的数据同步方法,其中,在所述根据分片的事务日志获取待回放的子事务日志之前,包括:The data synchronization method according to claim 1, wherein before acquiring the sub-transaction log to be played back according to the transaction log of the fragmentation, the method comprises:
    创建与分布式数据库中的各分片一一对应的进程;所述进程被配置为执行所述根据分片的事务日志获取待回放的子事务日志、所述根据全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务,以及所述在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库;Create a process that corresponds to each shard in the distributed database one-to-one; the process is configured to execute the acquisition of the sub-transaction log to be played back according to the transaction log of the shard, and the detection of the to-be-played log according to the global transaction snapshot table. Whether the global transaction to which the replayed sub-transaction log belongs is a submitted global transaction, and after determining that the global transaction to which the sub-transaction log to be replayed belongs belongs to the submitted global transaction, The transaction log is played back to the downstream database;
    在所述确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库之前,还包括:After it is determined that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, and before the sub-transaction log to be played back is played back to the downstream database, the method further includes:
    确认所述分布式数据库中的各分片对应的进程均已完成基于同一全局事务快照表的检测;Confirm that the processes corresponding to each fragment in the distributed database have completed detection based on the same global transaction snapshot table;
    所述将所述待回放的子事务日志回放到下游数据库,包括:The replaying the sub-transaction log to be replayed to the downstream database includes:
    将根据所述同一全局事务快照表检测的属于已提交全局事务的子事务日志作为目标子事务日志;Taking the sub-transaction log that belongs to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log;
    将所述目标子事务日志回放到下游数据库。The target sub-transaction log is played back to the downstream database.
  3. 根据权利要求2所述的数据同步方法,其中,在所述确认所述分布式数据库中的各分片对应的进程均已完成基于同一全局事务快照表的检测后,还包括:The data synchronization method according to claim 2, wherein after confirming that the processes corresponding to each fragment in the distributed database have completed the detection based on the same global transaction snapshot table, the method further comprises:
    通知所述分布式数据库中的其他分片对应的进程将根据所述同一全局事务快照表检测的属于已提交全局事务的子事务日志作为目标子事务日志,回放到所述下游数据库。Notifying the processes corresponding to other shards in the distributed database to play back the sub-transaction logs belonging to the committed global transactions detected according to the same global transaction snapshot table to the downstream database as target sub-transaction logs.
  4. 根据权利要求2所述的数据同步方法,其中,所述确认所述分布式数据库中的各分片对应的进程均已完成基于同一全局事务快照表的检测,包括:The data synchronization method according to claim 2, wherein the confirming that the processes corresponding to each fragment in the distributed database have completed detection based on the same global transaction snapshot table, comprising:
    接收所述分布式数据库中的其他分片对应的进程分别发送的检测完成通知,所述检测完成通知携带所述同一全局事务快照表的标识。Receive detection completion notifications respectively sent by processes corresponding to other shards in the distributed database, where the detection completion notification carries the identifier of the same global transaction snapshot table.
  5. 根据权利要求4所述的数据同步方法,其中,所述接收所述其他分片对应的进程分别发送的检测完成通知之前,还包括:The data synchronization method according to claim 4, wherein before the receiving the detection completion notification respectively sent by the processes corresponding to the other shards, the method further comprises:
    抢占主控权限;Seize the master control authority;
    若所述主控权限抢占成功,则执行所述接收所述其他分片对应的进程分别发送的检测完成通知;If the preemption of the master control authority succeeds, executing the detection completion notification respectively sent by the processes corresponding to the other shards;
    若所述主控权限抢占失败,则向所述主控权限抢占成功的进程发送检测完成通知,所述检测完成通知携带已完成检测的全局事务快照表的标识。If the master control authority preemption fails, a detection completion notification is sent to the process for which the master control authorization preemption succeeds, and the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
  6. 根据权利要求1至5中任一项所述的数据同步方法,其中,所述全局事务快照表包括:活跃全局事务列表;所述活跃全局事务列表记录处于活跃状态的全局事务的编号;The data synchronization method according to any one of claims 1 to 5, wherein the global transaction snapshot table comprises: an active global transaction list; the active global transaction list records the number of the global transaction in an active state;
    所述根据全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务,包括:Detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table includes:
    若所述待回放的子事务日志所属的全局事务的全局事务编号在所述活跃全局事务列表中,则所述待回放的子事务日志所属的全局事务不属于已提交的全局事务。If the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the active global transaction list, the global transaction to which the sub-transaction log to be played back belongs does not belong to the committed global transaction.
  7. [根据细则26改正18.11.2021]
    根据权利要求6所述的数据同步方法,其中,所述全局事务快照表还包括:当前最大已提交的全局事务编号;
    所述根据全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务,还包括:
    若所述待回放的子事务日志所属的全局事务的全局事务编号不在所述活跃全局事务列表中,则判断所述待回放的子事务日志所属的全局事务的全局事务编号是否大于所述当前最大已提交的全局事务编号;
    若所述待回放的子事务日志所属的全局事务的全局事务编号大于所述当前最大已提交的全局事务编号,则所述待回放的子事务日志所属的全局事务不属于已提交的全局事务;
    否则,所述待回放的子事务日志所属的全局事务属于已提交的全局事务。
    [Corrected 18.11.2021 according to Rule 26]
    The data synchronization method according to claim 6, wherein the global transaction snapshot table further comprises: the current maximum submitted global transaction number;
    Detecting, according to the global transaction snapshot table, whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction, further comprising:
    If the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is not in the active global transaction list, determine whether the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum the committed global transaction number;
    If the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum submitted global transaction number, then the global transaction to which the sub-transaction log to be replayed belongs does not belong to the submitted global transaction;
    Otherwise, the global transaction to which the sub-transaction log to be played back belongs belongs to the submitted global transaction.
  8. [根据细则26改正18.11.2021]
    8.一种数据同步装置,包括:
    子事务日志获取模块,被配置为根据分片的事务日志获取待回放的子事务日志;
    已提交全局事务检测模块,被配置为根据以预设周期更新的全局事务快照表检测所述待回放的子事务日志所属的全局事务,是否属于已提交的全局事务;其中,所述全局事务快照表被配置为记录全局事务的提交状态;
    回放模块,被配置为在确定所述待回放的子事务日志所属的全局事务属于已提交的全局事务后,将所述待回放的子事务日志回放到下游数据库。
    [Corrected 18.11.2021 according to Rule 26]
    8. A data synchronization device, comprising:
    The sub-transaction log obtaining module is configured to obtain the sub-transaction log to be played back according to the fragmented transaction log;
    The submitted global transaction detection module is configured to detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated with a preset period; wherein, the global transaction snapshot The table is configured to record the commit status of global transactions;
    The playback module is configured to play back the sub-transaction log to be played back to the downstream database after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction.
  9. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任一所述的数据同步方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform any one of claims 1 to 7 data synchronization method.
  10. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的数据同步方法。A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the data synchronization method according to any one of claims 1 to 7 is implemented.
PCT/CN2021/128408 2020-12-24 2021-11-03 Data synchronization method and apparatus, and electronic device and storage medium WO2022134876A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011549237.7A CN114661816B (en) 2020-12-24 2020-12-24 Data synchronization method and device, electronic equipment and storage medium
CN202011549237.7 2020-12-24

Publications (1)

Publication Number Publication Date
WO2022134876A1 true WO2022134876A1 (en) 2022-06-30

Family

ID=82024881

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128408 WO2022134876A1 (en) 2020-12-24 2021-11-03 Data synchronization method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN114661816B (en)
WO (1) WO2022134876A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131071A (en) * 2023-10-26 2023-11-28 中国证券登记结算有限责任公司 Data processing method, device, electronic equipment and computer readable medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11669518B1 (en) * 2021-12-14 2023-06-06 Huawei Technologies Co., Ltd. Method and system for processing database transactions in a distributed online transaction processing (OLTP) database
CN115185787B (en) * 2022-09-06 2022-12-30 北京奥星贝斯科技有限公司 Method and device for processing transaction log

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009039118A2 (en) * 2007-09-18 2009-03-26 Microsoft Corporation Parallel nested transactions in transactional memory
US20090217274A1 (en) * 2008-02-26 2009-08-27 Goldengate Software, Inc. Apparatus and method for log based replication of distributed transactions using globally acknowledged commits
CN102073540A (en) * 2010-12-15 2011-05-25 北京新媒传信科技有限公司 Distributed affair submitting method and device thereof
CN103164219A (en) * 2013-01-08 2013-06-19 华中科技大学 Distributed transaction processing system using multi-type replica in decentralized schema
CN107797850A (en) * 2016-08-30 2018-03-13 阿里巴巴集团控股有限公司 The method, apparatus and system of distributing real time system
CN109857802A (en) * 2018-12-12 2019-06-07 深圳前海微众银行股份有限公司 Daily record data synchronous method, device, equipment and computer readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107045454B (en) * 2016-02-06 2020-06-26 华为技术有限公司 Cross-process distributed transaction control method and related system
US10810268B2 (en) * 2017-12-06 2020-10-20 Futurewei Technologies, Inc. High-throughput distributed transaction management for globally consistent sharded OLTP system and method of implementing
US10942823B2 (en) * 2018-01-29 2021-03-09 Guy Pardon Transaction processing system, recovery subsystem and method for operating a recovery subsystem
CN109783578B (en) * 2019-01-09 2022-10-21 腾讯科技(深圳)有限公司 Data reading method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009039118A2 (en) * 2007-09-18 2009-03-26 Microsoft Corporation Parallel nested transactions in transactional memory
US20090217274A1 (en) * 2008-02-26 2009-08-27 Goldengate Software, Inc. Apparatus and method for log based replication of distributed transactions using globally acknowledged commits
CN102073540A (en) * 2010-12-15 2011-05-25 北京新媒传信科技有限公司 Distributed affair submitting method and device thereof
CN103164219A (en) * 2013-01-08 2013-06-19 华中科技大学 Distributed transaction processing system using multi-type replica in decentralized schema
CN107797850A (en) * 2016-08-30 2018-03-13 阿里巴巴集团控股有限公司 The method, apparatus and system of distributing real time system
CN109857802A (en) * 2018-12-12 2019-06-07 深圳前海微众银行股份有限公司 Daily record data synchronous method, device, equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131071A (en) * 2023-10-26 2023-11-28 中国证券登记结算有限责任公司 Data processing method, device, electronic equipment and computer readable medium
CN117131071B (en) * 2023-10-26 2024-01-26 中国证券登记结算有限责任公司 Data processing method, device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN114661816A (en) 2022-06-24
CN114661816B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
WO2022134876A1 (en) Data synchronization method and apparatus, and electronic device and storage medium
CN109739935B (en) Data reading method and device, electronic equipment and storage medium
US9779128B2 (en) System and method for massively parallel processing database
US9575849B2 (en) Synchronized backup and recovery of database systems
US9589041B2 (en) Client and server integration for replicating data
US9727576B2 (en) Method and system for efficient data synchronization
US7287043B2 (en) System and method for asynchronous data replication without persistence for distributed computing
US10503699B2 (en) Metadata synchronization in a distrubuted database
US7490113B2 (en) Database log capture that publishes transactions to multiple targets to handle unavailable targets by separating the publishing of subscriptions and subsequently recombining the publishing
US6662196B2 (en) Collision avoidance in bidirectional database replication
CN109710388B (en) Data reading method and device, electronic equipment and storage medium
US7996363B2 (en) Real-time apply mechanism in standby database environments
CN110196856B (en) Distributed data reading method and device
WO2021036768A1 (en) Data reading method, apparatus, computer device, and storage medium
CN106202365B (en) Method and system for database update synchronization and database cluster
CN109783578B (en) Data reading method and device, electronic equipment and storage medium
Chairunnanda et al. ConfluxDB: Multi-master replication for partitioned snapshot isolation databases
CN113391885A (en) Distributed transaction processing system
US11003550B2 (en) Methods and systems of operating a database management system DBMS in a strong consistency mode
US20230110826A1 (en) Log execution method and apparatus, computer device and storage medium
CN112800060A (en) Data processing method and device, computer readable storage medium and electronic equipment
US10970177B2 (en) Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
CN108038163B (en) Master and backup control center database synchronization system
CN112612647B (en) Log parallel replay method, device, equipment and storage medium
Zhou et al. FoundationDB: A Distributed Key Value Store

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908877

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 161123)