WO2022134876A1 - Procédé et appareil de synchronisation de données, et dispositif électronique et support de stockage - Google Patents

Procédé et appareil de synchronisation de données, et dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2022134876A1
WO2022134876A1 PCT/CN2021/128408 CN2021128408W WO2022134876A1 WO 2022134876 A1 WO2022134876 A1 WO 2022134876A1 CN 2021128408 W CN2021128408 W CN 2021128408W WO 2022134876 A1 WO2022134876 A1 WO 2022134876A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
global transaction
global
sub
log
Prior art date
Application number
PCT/CN2021/128408
Other languages
English (en)
Chinese (zh)
Inventor
周日明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022134876A1 publication Critical patent/WO2022134876A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing

Definitions

  • the embodiments of the present application relate to the field of databases, and in particular, to a data synchronization method, apparatus, electronic device, and storage medium.
  • OTP On-Line Transaction Processing
  • distributed databases need to regularly unload data to analysis systems, such as data warehouses, for subsequent analysis and processing by other systems.
  • analysis systems such as data warehouses
  • the distributed database unloads the data to the analysis system, it will replay the sub-transaction log of the global transaction generated by each shard to the downstream database. .
  • the global transaction across shards includes multiple sub-transactions.
  • the global transaction across shards on a shard is Corresponding sub-transactions have generated logs, and some sub-transactions in the cross-shard global transaction have not generated logs, that is, the global transaction may not be committed or to be rolled back.
  • the transaction log will be played back to the downstream database, so that the uncommitted data will also be synchronized to the downstream database, that is, the problem of uncommitted read occurs.
  • the embodiments of the present application provide a data synchronization method, apparatus, electronic device, and storage medium.
  • an embodiment of the present application provides a data synchronization method, including: acquiring a sub-transaction log to be played back according to a fragmented transaction log; detecting the sub-transaction log to be played back according to a global transaction snapshot table updated with a preset period Whether the global transaction to which the transaction log belongs is a submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction After the global transaction, the sub-transaction log to be played back is played back to the downstream database.
  • an embodiment of the present application also provides a data synchronization device, comprising: a sub-transaction log acquisition module, configured to acquire sub-transaction logs to be played back according to fragmented transaction logs; a submitted global transaction judgment module, according to Detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to the global transaction snapshot table updated at a preset period is a submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; The playback module, after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, plays back the sub-transaction log to be played back to the downstream database.
  • an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be used by the at least one processor. Instructions executed by a processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the above-described data synchronization method.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned data synchronization method is implemented.
  • FIG. 1 is a schematic diagram of a data synchronization system according to a first embodiment of the present application
  • FIG. 2 is a flowchart of updating a global transaction snapshot table according to the first embodiment of the present application
  • FIG. 3 is a flowchart of a data synchronization method according to the first embodiment of the present application.
  • FIG. 5 is a flowchart of a data synchronization method according to a second embodiment of the present application.
  • FIG. 6 is a flowchart of a data synchronization apparatus according to a third embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
  • a data synchronization system that can be used to implement the data synchronization method in the first embodiment of the present application includes: an upstream database, that is, a distributed database 101 , a data synchronization tool 102 , and a downstream database 103 .
  • the distributed database 101 includes: a global transaction manager, computing nodes, and data shards.
  • Data sharding which can be referred to as sharding for short.
  • the Global Transaction Manager (GTM, Global Transaction Manager) is configured to create and release Global Transaction Numbers (GTIDs), maintain a list of active global transactions, also known as global transaction snapshots, and active global transactions are distributed A global transaction in the database that is started but not committed.
  • the global transaction manager records the current active global transaction list, and the active global transaction list includes the global transaction numbers (GTIDs) of all active global transactions in the current distributed database.
  • GTIDs global transaction numbers
  • Compute nodes are configured for global transaction resolution and execution in a distributed database.
  • Data sharding is responsible for local storage and local execution of data, and generates a local transaction log, which records the data rows before and after modification of the table in the shard.
  • the distributed database 101 supports a replicated table schema.
  • a distributed database can distribute the data of a large table to multiple data shards according to certain rules, and each shard stores a part of the data of the table, which is different from each other.
  • Smaller tables can use the replicated table mode, that is, the complete data of the table is stored on each data shard, and the data on each shard is the same.
  • the replication table update is also a global transaction. When the data in the replication table is updated, each shard is updated successfully together, otherwise all shards are rolled back to the state before the update.
  • the global transaction snapshot table in this embodiment adopts the replication table mode, that is, each fragment in the distributed database has a global transaction snapshot table, and the global transaction snapshot table is configured to represent the commit state of the global transaction.
  • the update of the global transaction snapshot table is equivalent to updating the data of the global transaction snapshot table of each data fragment in the distributed database 101.
  • the computing node updates the data on the global transaction manager (GTM) to each data segment.
  • GTM global transaction manager
  • the computing node queries the global transaction manager for the current active global transaction list, and updates the currently queried active global transaction list to each data shard.
  • updating the global transaction snapshot table is also a global transaction of the distributed database.
  • each shard will generate a sub-subsection corresponding to the global transaction that updates the global transaction snapshot table.
  • the transaction log, the subtransaction log records the value of the record row before modification and the value of the record row after modification.
  • the distributed database 101 updates the global snapshot table at a preset period, and a global snapshot point is generated in the log stream of each shard every preset period. That is to say, the global snapshot point is when the global transaction snapshot table is regularly updated. Log records generated in the log stream of each data shard.
  • each data shard When the global snapshot table is updated, each data shard generates a sub-transaction log, which records the data rows before and after the change of the global snapshot table. After the data synchronization tool parses the log record, it can obtain the current global transaction snapshot table. Since the table is a replicated table, the same log records are generated in the log stream of each data shard. Through the information in the log records, synchronization control can be performed between data synchronization processes. The process of generating global snapshot points every preset time period is shown in FIG. 2 .
  • Step 201 the computing node obtains the current active global transaction list from the global transaction manager according to the configured preset period.
  • the computing node obtains the current active global transaction list from the global transaction manager according to a configured timer, such as a configuration of 5 seconds, that is, a preset period of 5 seconds.
  • Step 202 the computing node writes the active global transaction list to the global transaction snapshot table of each shard.
  • Step 203 When the transaction commits, each data shard writes the update log of the global transaction snapshot table to the log replication stream.
  • the update log records the row record value before and the modified row record value of the global transaction snapshot table.
  • the data synchronization tool 102 includes multiple log collection and playback processes and distributed lock resources. Each log collection and playback process corresponds to each data fragment in the distributed database 101 one-to-one, receives the transaction log of the data fragment, processes it, and transmits the processed SQL statement to the downstream database 103 for playback.
  • the distributed lock resource in the data synchronization tool 102 can be provided by open source software such as ZoomKeeper. Multiple log collection and playback processes compete with the distributed lock resource for the master control node, and the process that obtains the master control authority coordinates other nodes to perform log playback.
  • the data synchronization tool 102 shown in FIG. 1 may be an independent device, or may be integrated into the distributed database 101 or the downstream database 103 .
  • the data synchronization method of this embodiment includes: acquiring a sub-transaction log to be played back according to a fragmented transaction log; detecting whether the global transaction to which the sub-transaction log belongs is a submitted global transaction according to a global transaction snapshot table;
  • the global transaction snapshot table is configured to record the commit state of the global transaction; after it is determined that the global transaction to which the sub-transaction log belongs belongs to the committed global transaction, the sub-transaction log is played back to the downstream database.
  • sub-transactions will be played back to the downstream database only when the global transaction is committed, avoiding the problem of uncommitted reads.
  • shards can be granular, and each shard can be replayed according to the global transaction snapshot table.
  • the sub-transaction log facilitates the realization of concurrency between shards and improves the efficiency of data synchronization.
  • Step 301 Obtain the sub-transaction log to be played back according to the transaction log of the fragment.
  • a process that corresponds one-to-one with each shard in the distributed database is created, and the process is configured to obtain the sub-transaction log to be played back according to the transaction log of the shard.
  • one process corresponds to one shard, and parses the transaction log of the corresponding shard.
  • Multiple processes can be created for multiple shards in the database to parse the transaction log at the same time, which facilitates the realization of concurrency between shards.
  • the data synchronization tool creates log collection and playback processes that correspond one-to-one with each shard in the distributed database, and each log collection and playback process obtains the transaction log of the shard from the corresponding shard node of the distributed database 101 And parse the transaction log of the shard, the sub-transaction log parsed from the transaction log of the shard but not played back is called the sub-transaction log to be played back, and the sub-transaction log to be played back is temporarily stored in the parsing order, and marked
  • the GTID list consisting of the global transaction number GTID to which each sub-transaction log to be played back belongs, and the global transaction number to which all the sub-transaction logs to be played back belong, is called the to-be-played-back list. For example, as shown in Table 1, list A to be played back contains GTID lists 7, 8, 9, 10, 11, 12; list B to be played back contains GTID lists of 4, 5, 6, 7, 8, 9; List C to be played back, including GTID lists 1, 2,
  • the global transaction number to which the subtransaction log to be played back belongs C: 7,8,9,10,11,12 log 7, log 8, log 9, log 10, log 11, log 12 B: 4,5,6,7,8,9 log 4, log 5, log 6, log 7, log 8, log 9 A: 1,2,3,4,5 log 1, log 2, log 3, log 4, log 5
  • Step 302 Detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated at a preset period.
  • the process is configured to detect, according to the global transaction snapshot table, whether the global transaction to which the sub-transaction log to be played belongs belongs to a committed global transaction.
  • the global transaction snapshot table includes: an active global transaction list; the active global transaction list records the number of the global transaction in an active state; if the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the In the active global transaction list, the global transaction to which the sub-transaction log to be played back belongs does not belong to the submitted global transaction.
  • the number of uncommitted global transactions is determined by comparing with the list of active global transactions.
  • the global transaction snapshot table further includes: a global transaction number corresponding to the global transaction that updates the table.
  • updating the global transaction snapshot table is a distributed global transaction
  • the number of the global transaction is the global transaction number corresponding to the global transaction that updates this table.
  • the distributed database itself has system hidden Column, record the global transaction number that modifies the row of this record, that is to say, the global transaction number corresponding to the global transaction that updates this table is the system column.
  • the log collection and playback process parses the transaction log of the shard, it also parses the sub-transaction log updated to the global transaction snapshot table, and the GTID value of the sub-transaction log is the global transaction that updates the table.
  • the corresponding global transaction number is used as an identifier of the global transaction snapshot table, and the identifier is configured to uniquely identify the global transaction snapshot table, that is, different identifiers indicate different global transaction snapshot tables. Also identifies the aforementioned global snapshot point. Since the global transaction snapshot table is a replicated table, each log collection and playback process can resolve to the same global transaction snapshot table. The log collection and playback process compares the current list to be played back with the active global transaction list snapshot in the global transaction snapshot table.
  • a global transaction number appears in the current list to be played back, it also appears in the global transaction snapshot, that is, the active global transaction list. , it indicates that the global transaction is not a committed transaction. If a global transaction number appears in the current to-be-playback list, but does not appear in the global transaction snapshot, that is, the active global transaction list, it indicates that the global transaction is a committed transaction.
  • the sub-transaction log corresponding to the transaction can be played back to the downstream database.
  • the circle in FIG. 4 represents the active global transaction list snapshot corresponding to the global snapshot point, and the log represents the parsed list to be played back.
  • the global transaction numbers 1, 2, and 3 appear in the list A to be played back but not in snapshot2, indicating that the three transactions 1, 2, and 3 belong to the global commit, while the global transaction numbers 4 and 5 appear in the list A to be played back.
  • global transaction numbers 4, 5, and 6 appear in the to-be-playback list B, but not in snapshot3, indicating that the global transaction has been committed before snapshot3 is generated, while global transaction numbers 7, 8, and 9 appear in the to-be-playback list. B also appears in snapshot3, indicating that 7, 8, and 9 do not belong to the committed global transaction.
  • Global transaction numbers 7, 8, 9, and 10 appear in the to-be-playback list C, but not in sanpshot4, indicating that the global transaction has been committed before snapshot4, and global transaction numbers 11 and 12 appear in the to-be-playback list C and also appear in In snapshot4, it indicates that 11 and 12 do not belong to the committed global transaction.
  • the global transaction snapshot table further includes: the current maximum committed global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is not in the active global transaction list, it is determined that Whether the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum submitted global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current The maximum committed global transaction number, the global transaction to which the sub-transaction log to be played back belongs does not belong to the submitted global transaction; otherwise, the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction.
  • the global transaction number to which the undetermined global transaction belongs needs to be compared with the current maximum committed global transaction number to determine the The number of the committed global transaction, rather than all global transactions other than uncommitted global transactions identified by the list of active global transactions as committed global transactions.
  • the global transaction number of the global transaction to which the sub-transaction log in the list to be played back belongs is generated after the current global transaction snapshot table and is not recorded in the global snapshot table, so that the judgment of the submitted global transaction is avoided. more precise.
  • the global transaction label of the global transaction generated after the global transaction snapshot table is larger than any number that appears in the global transaction snapshot table, that is, it is larger than the current maximum committed number.
  • the global transaction number is large.
  • the current maximum committed global transaction number in the global transaction snapshot table is 20, and after updating the global transaction snapshot table, a global transaction with the global transaction number 21 is generated, and the global transaction is also in an active state, That is, the uncommitted state, but because it is generated after the global transaction snapshot table, it will not be in the active global transaction list of the global transaction snapshot table.
  • the global transaction corresponding to the global transaction number 21 is an uncommitted global transaction.
  • Table 2 below is a schematic diagram of the table structure of the global snapshot table including the current maximum committed global transaction number.
  • GTID INT8 System column the global transaction number corresponding to the global transaction that updates this table GTIDLIST BLOB List of active global transactions MAXGTID INT8 The current maximum committed global transaction number
  • the data in the above table is small in scale.
  • the table can have only one record, and the replication table mode will not occupy too much hardware and software resources.
  • the table structure of the global transaction snapshot table is shown in Table 2. Combined with Table 2, determine one
  • the rules for whether a global transaction is a global commit is as follows.
  • the global transaction number to be judged is recorded as GTID_1:
  • GTID_1 is found in GTIDLIST, the global transaction identified by GTID_1 is active, not globally committed, and does not belong to a globally committed transaction; otherwise, if GTID_1 is greater than MAXGTID, that is, the current maximum committed transaction number, then the global transaction identified by GTID_1 It is active and not submitted globally; otherwise, if GTID_1 is less than or equal to MAXGTID, the GTID_1 has been submitted globally.
  • the global transaction snapshot table is updated at a preset cycle. For example, in the above example, it is updated every 5 seconds. Every 5 seconds, the computing node will query the global transaction manager for the current GTIDLIST and MAXGTID, and use the queried GTIDLIST and MAXGTID pair.
  • the global transaction snapshot table is updated, and the global transaction number of the global transaction of the global snapshot table is updated, that is, the GTID in the global update snapshot table is also updated, and the updated global transaction snapshot table is updated.
  • the row record value of the shard is recorded in the transaction log of the shard. Therefore, an updated global transaction snapshot table is parsed from the transaction log of the shard at regular intervals.
  • the field GTID used is the 30 global transaction snapshot table.
  • the global transaction snapshot table at this point, the step of detecting whether the global transaction to which the sub-transaction log to be played belongs belongs to the global transaction to which the sub-transaction log belongs is a submitted global transaction has been completed according to the global transaction snapshot table whose field GTID is 30.
  • the global transaction snapshot table detects whether the global transaction to which the sub-transaction log to be replayed belongs is a committed global transaction.
  • Step 303 After it is determined that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the sub-transaction log to be played back is played back to the downstream database.
  • each shard will synchronously replay the sub-transactions corresponding to the submitted global transactions detected according to the same global transaction snapshot table, so as to avoid the problem of inconsistent playback data among shards, and make the data synchronization progress between shards the same. For example: synchronizing data to a downstream database.
  • the global transaction snapshot table identifier can be configured to indicate a global transaction snapshot table, it can be the update time of the global transaction snapshot table or the value of the field GTID in the global transaction snapshot table. There are no restrictions. In this embodiment, the global transaction snapshot table is identified as the value of the field GTID in the global transaction snapshot table as an example.
  • the log collection and playback process performs detection based on a global transaction snapshot table.
  • a detection completion notification is sent.
  • the detection completion notification includes the global transaction snapshot table identifier corresponding to the global transaction snapshot table.
  • the tool creates a module that is configured to obtain the detection completion notification sent by each log collection and playback process, and confirms that the processes corresponding to each shard in the distributed database have completed detection based on the same global transaction snapshot table according to the detection completion notification. , and then issue a playback command to each log collection and playback process.
  • the playback command includes the above-mentioned global transaction snapshot table identifier of the same global transaction snapshot table.
  • Each log collection and playback process will use the same global transaction snapshot table identifier to indicate The detected subtransaction logs belonging to committed global transactions are played back to the downstream database. For example, according to the global transaction snapshot table whose GTID is 30 in the global transaction snapshot table, it is detected that the submitted global transaction has been completed. The sub-transaction log corresponding to the committed global transaction determined by the table is played back to the downstream database, and the committed global transaction determined according to the global transaction snapshot table with GTID of 30 is a cross-shard transaction. At the same time, when the data is changed, data shard 1 generates sub-transaction A, and data shard 2 generates sub-transaction B.
  • Process 1 synchronizes data fragment 1, parses and plays back the sub-transaction A log
  • process 2 synchronizes data fragment 2, parses and plays back the sub-transaction B log.
  • each log collection and playback process will respond to the playback command, and data fragment 1 will Sub-transaction A log
  • data shard 2 will synchronously replay the sub-transaction B log to the downstream database, so that the data generated by each shard can be played back concurrently, avoiding that when each shard is played back, it belongs to the same submitted global transaction.
  • the problem of inconsistent progress of sub-transaction log playback to the downstream database for example, the sub-transaction A log is played back to the downstream database, but the sub-transaction B log is not played back to the downstream database, that is to say, the problem of inconsistent reading is solved and data synchronization between shards
  • the progress is similar, so as to avoid the dirty read of the downstream database caused by the difference of the sub-transaction data synchronization progress.
  • a method which sorts the global transactions in order, and replays multiple sub-transactions belonging to the same global transaction to the downstream together. Sorting, and monitoring the sub-transactions corresponding to the global transaction of each shard, to determine whether the global transaction has been committed, the performance overhead is high, and the synchronization efficiency is low.
  • the sub-transaction log to be played back is obtained according to the transaction log of the shard, and each shard detects the global transaction to which the sub-transaction log belongs according to the global transaction snapshot table updated at a preset period, whether it belongs to the submitted global transaction, and then determines whether the sub-transaction log belongs to the global transaction.
  • the sub-transaction log is played back to the downstream database, and the transaction can be operated in parallel with the transaction log generated by the fragmentation as the granularity, and the error between the fragments is controlled within seconds level, there is no need to sort global transactions, and the performance overhead is small, enabling each shard to synchronize data in parallel between shards, improving synchronization efficiency, and ensuring that only committed global transactions will be synchronized to the downstream database.
  • the sub-transaction log that belongs to the submitted global transaction detected according to the same global transaction snapshot table is used as the target sub-transaction log, and the target sub-transaction is played back, so that the data read by the upstream database and the downstream database are kept consistent in real time, that is, each Each data shard synchronizes data to the downstream database in parallel, and the performance is close to concurrent synchronization, which can ensure the consistency of data synchronization between shards.
  • the second embodiment of the present application relates to a data synchronization method.
  • the second embodiment is roughly the same as the first embodiment, the main difference is that: in the first embodiment, the data synchronization tool creates a command sending module to send the playback command to the log collection and playback process.
  • each log collection and playback process in addition to performing the above steps 301 to 303, each log collection and playback process also needs to preempt the master control authority from the distributed lock resource, and the process that preempts the master control authority obtains the detection of each process. Notification, the global transaction snapshot table identifier in the playback command is determined according to the detection completion notification.
  • FIG. 5 A flowchart of the data synchronization method in the second embodiment of the present application is shown in FIG. 5 .
  • Step 501 At startup time, preempt the master control authority from the distributed lock resource. Specifically, when the log collection and playback process starts, it grabs the master control permission from the distributed lock, and when the master control permission is obtained, it becomes the master control process. If the master control permission is not occupied, it can seize the master control permission from the distributed lock resource. .
  • the process with the master control authority executes and receives the detection completion notices respectively sent by the processes corresponding to other shards;
  • the process that successfully preempts the control authority sends a detection completion notification, where the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
  • the process with the master control authority notifies the processes corresponding to other shards in the distributed database to detect the transactions belonging to the submitted transactions according to the same global transaction snapshot table.
  • the sub-transaction log of the global transaction is used as the target sub-transaction log to be played back to the downstream database.
  • the detection completion notice in the case of successfully preempting the master control authority, receiving detection completion notices respectively sent by processes corresponding to other shards in the distributed database, and the detection completion notice carries the data of the same global transaction snapshot table. logo.
  • the log collection and playback process 1 preempts the master control authority, and the log collection and playback process m does not preempt the master control authority as an example.
  • Step 502 when the master control authority is not preempted, query the current master control process. Specifically, the log collection and playback process m that has not preempted the master control authority queries the process that has preempted the master control authority, so as to exchange information with the master control authority, that is, to send a completion notification to the master control authority and receive playback commands.
  • the playback command is configured to notify the processes corresponding to other shards in the distributed database to use the sub-transaction logs belonging to the submitted global transactions detected according to the same global transaction snapshot table as the target sub-transaction logs, and playback to the downstream database.
  • Step 503 parse the transaction log of the fragment. This step is to obtain the submitted global transaction currently to be played back according to steps 301 to 302 in the first embodiment of the present application.
  • Step 504 sending a detection completion notification.
  • the log collection and playback process m detects that the submitted global transaction has been completed according to the global transaction snapshot table whose GTI D is 30 in the global transaction snapshot table, it sends a completion notification to the log collection and playback process 1.
  • the completion notification includes: GTI D is 30.
  • Step 505 sending a playback command.
  • the log collection and playback process determines the global transaction snapshot table identifier of the playback command according to the global transaction snapshot table identifier in the sending completion notification of each process.
  • the master control process receives all processes according to the global transaction snapshot table.
  • the global transaction snapshot table of 30 detects that the submitted global transaction has been completed, and the master process, that is, the log collection and playback process 1, sends a playback command, and the playback command includes a GTID of 30.
  • Step 506 Play back the target sub-transaction log to the downstream database.
  • the master control permission is released. Specifically, when there is a problem with the running of the process that has preempted the master control authority, the master control authority is released, and other processes can preempt the master control authority from the distributed lock resource again. It avoids abnormal data synchronization in the upstream database and the downstream database caused by process failure, and improves the robustness.
  • the master process is the coordinator of multiple log collection and playback processes, and is responsible for receiving playback progress notifications from other processes; when all processes have finished parsing a log snapshot point, it commands all processes to playback to the downstream database.
  • the master control process can coordinate multiple log collection and playback processes to play back the global transaction at the same time, which solves the problem that some process delays lead to the increasing deviation of playback progress between processes.
  • the data is in a globally consistent state.
  • the master control process itself also replicates the log collection and playback work. If the master control node fails, other processes can regain the master and continue to work, which also improves the robustness.
  • step division of the above various methods is only for the purpose of describing clearly, and can be combined into one step or split some steps during implementation, and decomposed into multiple steps, as long as the same logical relationship is included, all within the protection scope of the present application ; Adding insignificant modifications to the algorithm or process or introducing insignificant designs, but not changing the core design of the algorithm and process are within the protection scope of this application.
  • the third embodiment of the present application relates to a data synchronization apparatus.
  • a sub-transaction log acquisition module 601 which is configured to acquire sub-transaction logs to be played back according to fragmented transaction logs
  • submitted global transaction detection Module 602 is configured to detect whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction according to the global transaction snapshot table updated with a preset period; wherein, the global transaction snapshot table is configured as Record the commit status of the global transaction
  • the playback module 603 is configured to play back the sub-transaction log to be played back to the downstream database after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the committed global transaction.
  • the sub-transaction log obtaining module 601 is further configured to create a process that corresponds to the fragmentation one-to-one; the process is configured to execute the obtaining of the sub-transaction log to be played back according to the transaction log of the fragment, the The snapshot table detects whether the global transaction to which the sub-transaction log to be played belongs belongs to a submitted global transaction, and after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, The sub-transaction log to be played back is played back to the downstream database.
  • the playback module 603 is further configured to confirm that the processes corresponding to each fragment in the distributed database have completed detection based on the same global transaction snapshot table; the playback of the sub-transaction log to be played back to the downstream database includes: : Use the sub-transaction log to be played back belonging to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log; playback the target sub-transaction log to the downstream database.
  • the playback module 603 is further configured to notify the processes corresponding to other shards in the distributed database to use the sub-transaction log to be played back belonging to the submitted global transaction detected according to the same global transaction snapshot table as the target sub-transaction log, Playback to the downstream database.
  • the playback module 603 is further configured to receive detection completion notifications respectively sent by processes corresponding to other shards in the distributed database, where the detection completion notifications carry the identifier of the same global transaction snapshot table.
  • the playback module 603 is further configured to preempt the master control authority; if the master control authority is preempted successfully, execute the detection completion notification respectively sent by the processes corresponding to the other shards; if the master control authority is preempted If it fails, a detection completion notification is sent to the process that has successfully preempted the master control authority, and the detection completion notification carries the identifier of the global transaction snapshot table for which detection has been completed.
  • the submitted global transaction detection module 602 is further configured to: if the global transaction number of the global transaction to which the sub-transaction log to be played back belongs is in the active global transaction list, then the global transaction to which the sub-transaction log to be played back belongs Does not belong to a committed global transaction; wherein, the global transaction snapshot table includes: an active global transaction list; the active global transaction list records the number of the active global transaction.
  • the submitted global transaction detection module 602 is further configured to determine the global transaction to which the sub-transaction log to be played belongs if the global transaction number of the global transaction to which the sub-transaction log to be played belongs is not in the active global transaction list. Whether the global transaction number of the transaction is greater than the current maximum committed global transaction number; if the global transaction number of the global transaction to which the sub-transaction log to be replayed belongs is greater than the current maximum committed global transaction number, then the The global transaction to which the sub-transaction log to be replayed belongs does not belong to the submitted global transaction; otherwise, the global transaction to which the sub-transaction log to be replayed belongs belongs to the submitted global transaction; wherein, the global transaction snapshot table further includes: The current largest committed global transaction number.
  • this embodiment is a system example corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment.
  • the related technical details mentioned in the first embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition.
  • the related technical details mentioned in this embodiment can also be applied to the first embodiment.
  • each module involved in this embodiment is a logical module.
  • a logical unit may be a physical unit, a part of a physical unit, or multiple physical units.
  • a composite implementation of the unit in order to highlight the innovative part of the present application, this embodiment does not introduce units that are not closely related to solving the technical problem raised by the present application, but this does not mean that there are no other units in this embodiment.
  • the fourth embodiment of the present application relates to an electronic device, as shown in FIG. 7 , comprising at least one processor 701 ; and a memory 702 communicatively connected to the at least one processor; wherein the memory stores data that can be instructions executed by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned data synchronization method.
  • the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory.
  • the bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
  • the fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • the sub-transaction log to be played back is obtained according to the fragmented transaction log; and whether the global transaction to which the sub-transaction log to be played belongs is detected according to the global transaction snapshot table updated at a preset period, whether it belongs to the The submitted global transaction; wherein, the global transaction snapshot table is configured to record the submission status of the global transaction; after determining that the global transaction to which the sub-transaction log to be played belongs belongs to the submitted global transaction, the to-be-played The sub-transaction log is played back to the downstream database, avoiding the problem of synchronizing uncommitted data to the downstream database.
  • this embodiment uses sharding as the granularity, and the shards in the distributed database determine the sub-transactions that can be played back according to the global transaction snapshot table.
  • the transaction log facilitates the realization of concurrency between shards and improves the efficiency of data synchronization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Procédé et appareil de synchronisation de données, et dispositif électronique et support de stockage. Le procédé de synchronisation de données selon la présente demande consiste : à acquérir, selon un journal de transaction d'une tranche, un sous-journal de transaction à réexécuter (301) ; à détecter, selon une table d'instantanés de transaction globale mise à jour selon un cycle prédéfini, si une transaction globale, à laquelle appartient ledit sous-journal de transaction, appartient à une transaction globale soumise (302), la table d'instantanés de transaction globale étant configurée pour enregistrer un état de soumission de la transaction globale ; et après qu'il a été déterminé que la transaction globale, à laquelle appartient ledit sous-journal de transaction, appartient à la transaction globale soumise, à réexécuter ledit sous-journal de transaction dans une base de données en aval (303).
PCT/CN2021/128408 2020-12-24 2021-11-03 Procédé et appareil de synchronisation de données, et dispositif électronique et support de stockage WO2022134876A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011549237.7A CN114661816B (zh) 2020-12-24 2020-12-24 数据同步方法、装置、电子设备、存储介质
CN202011549237.7 2020-12-24

Publications (1)

Publication Number Publication Date
WO2022134876A1 true WO2022134876A1 (fr) 2022-06-30

Family

ID=82024881

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128408 WO2022134876A1 (fr) 2020-12-24 2021-11-03 Procédé et appareil de synchronisation de données, et dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN114661816B (fr)
WO (1) WO2022134876A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131071A (zh) * 2023-10-26 2023-11-28 中国证券登记结算有限责任公司 一种数据处理方法、装置、电子设备及计算机可读介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11669518B1 (en) * 2021-12-14 2023-06-06 Huawei Technologies Co., Ltd. Method and system for processing database transactions in a distributed online transaction processing (OLTP) database
CN115185787B (zh) * 2022-09-06 2022-12-30 北京奥星贝斯科技有限公司 处理事务日志的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009039118A2 (fr) * 2007-09-18 2009-03-26 Microsoft Corporation Transactions imbriquées en parallèle dans une mémoire transactionnelle
US20090217274A1 (en) * 2008-02-26 2009-08-27 Goldengate Software, Inc. Apparatus and method for log based replication of distributed transactions using globally acknowledged commits
CN102073540A (zh) * 2010-12-15 2011-05-25 北京新媒传信科技有限公司 分布式事务提交方法和装置
CN103164219A (zh) * 2013-01-08 2013-06-19 华中科技大学 去中心化架构中使用多类型副本的分布式事务处理系统
CN107797850A (zh) * 2016-08-30 2018-03-13 阿里巴巴集团控股有限公司 分布式事务处理的方法、装置与系统
CN109857802A (zh) * 2018-12-12 2019-06-07 深圳前海微众银行股份有限公司 日志数据同步方法、装置、设备及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107045454B (zh) * 2016-02-06 2020-06-26 华为技术有限公司 跨进程分布式事务控制方法及相关系统
US10810268B2 (en) * 2017-12-06 2020-10-20 Futurewei Technologies, Inc. High-throughput distributed transaction management for globally consistent sharded OLTP system and method of implementing
US10942823B2 (en) * 2018-01-29 2021-03-09 Guy Pardon Transaction processing system, recovery subsystem and method for operating a recovery subsystem
CN109783578B (zh) * 2019-01-09 2022-10-21 腾讯科技(深圳)有限公司 数据读取方法、装置、电子设备以及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009039118A2 (fr) * 2007-09-18 2009-03-26 Microsoft Corporation Transactions imbriquées en parallèle dans une mémoire transactionnelle
US20090217274A1 (en) * 2008-02-26 2009-08-27 Goldengate Software, Inc. Apparatus and method for log based replication of distributed transactions using globally acknowledged commits
CN102073540A (zh) * 2010-12-15 2011-05-25 北京新媒传信科技有限公司 分布式事务提交方法和装置
CN103164219A (zh) * 2013-01-08 2013-06-19 华中科技大学 去中心化架构中使用多类型副本的分布式事务处理系统
CN107797850A (zh) * 2016-08-30 2018-03-13 阿里巴巴集团控股有限公司 分布式事务处理的方法、装置与系统
CN109857802A (zh) * 2018-12-12 2019-06-07 深圳前海微众银行股份有限公司 日志数据同步方法、装置、设备及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131071A (zh) * 2023-10-26 2023-11-28 中国证券登记结算有限责任公司 一种数据处理方法、装置、电子设备及计算机可读介质
CN117131071B (zh) * 2023-10-26 2024-01-26 中国证券登记结算有限责任公司 一种数据处理方法、装置、电子设备及计算机可读介质

Also Published As

Publication number Publication date
CN114661816A (zh) 2022-06-24
CN114661816B (zh) 2023-03-24

Similar Documents

Publication Publication Date Title
WO2022134876A1 (fr) Procédé et appareil de synchronisation de données, et dispositif électronique et support de stockage
CN109739935B (zh) 数据读取方法、装置、电子设备以及存储介质
EP3968175B1 (fr) Procédé et appareil de réplication de données, et dispositif informatique et support de stockage
US9779128B2 (en) System and method for massively parallel processing database
US9575849B2 (en) Synchronized backup and recovery of database systems
US9589041B2 (en) Client and server integration for replicating data
US9727576B2 (en) Method and system for efficient data synchronization
US7287043B2 (en) System and method for asynchronous data replication without persistence for distributed computing
US10503699B2 (en) Metadata synchronization in a distrubuted database
US7490113B2 (en) Database log capture that publishes transactions to multiple targets to handle unavailable targets by separating the publishing of subscriptions and subsequently recombining the publishing
US6662196B2 (en) Collision avoidance in bidirectional database replication
CN109710388B (zh) 数据读取方法、装置、电子设备以及存储介质
US7996363B2 (en) Real-time apply mechanism in standby database environments
CN110196856B (zh) 一种分布式数据读取方法及装置
WO2021036768A1 (fr) Procédé, appareil, dispositif informatique et support de stockage pour la lecture de données
CN106202365B (zh) 数据库更新同步的方法、系统及数据库集群
CN109783578B (zh) 数据读取方法、装置、电子设备以及存储介质
Chairunnanda et al. ConfluxDB: Multi-master replication for partitioned snapshot isolation databases
CN113391885A (zh) 一种分布式事务处理系统
US11003550B2 (en) Methods and systems of operating a database management system DBMS in a strong consistency mode
WO2022002103A1 (fr) Procédé de lecture d'un journal sur un nœud de données, nœud de données et système
US20230110826A1 (en) Log execution method and apparatus, computer device and storage medium
CN108038163B (zh) 主备控制中心数据库同步系统
CN112800060A (zh) 数据处理方法、装置、计算机可读存储介质及电子设备
US20190251006A1 (en) Methods and systems of managing consistency and availability tradeoffs in a real-time operational dbms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908877

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 161123)