CN111858626A

CN111858626A - Data synchronization method and device based on parallel execution

Info

Publication number: CN111858626A
Application number: CN202010499491.4A
Authority: CN
Inventors: 孙峰; 付铨; 彭青松; 刘启春
Original assignee: Wuhan Dameng Database Co Ltd
Current assignee: Wuhan Dameng Database Co Ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2020-10-30

Abstract

The invention relates to the field of databases, in particular to a method and a device for data synchronization based on parallel execution. The method mainly comprises the following steps: acquiring a transaction needing to be synchronized; creating at least two execution threads; distributing the transactions to different idle execution threads one by one according to the submission sequence; acquiring each operation in the current transaction of each execution thread, and constructing a row lock for each operation according to the unique identifier of each operation; judging whether the current affair in each execution thread has a line lock conflict with the affair with the prior submission sequence, and putting the current affair into a wake-up queue of the execution thread where the affair with the line lock conflict exists for waiting; executing the current affairs without line lock conflict by the execution thread in parallel; and after each execution thread finishes executing the current transaction, awakening the transaction in the awakening queue of the execution thread. The invention improves the synchronization efficiency, reduces the control complexity and realizes the efficient and simple parallel synchronization of the heterogeneous databases.

Description

Data synchronization method and device based on parallel execution

[ technical field ] A method for producing a semiconductor device

The invention relates to the field of databases, in particular to a method and a device for data synchronization based on parallel execution.

[ background of the invention ]

The traditional main and standby mechanism based on the database realizes real-time copying of database data, and is an important solution for carrying out disaster recovery backup of data and ensuring data safety. However, the database master/slave mechanism requires the database system of the slave to be consistent with the host, and for a heterogeneous database system environment, the effective real-time data replication cannot be realized by using the master/slave mechanism of the database. In order to implement real-time data replication of a heterogeneous database, a software-based heterogeneous database synchronization method is generally used at present, in which incremental data of a source database is captured at a source end and then sent to a target end, and the incremental data is applied to the target database at the target end through a general database access interface to implement data replication.

In the real-time synchronization process of database synchronization, synchronization needs to be performed according to the operation sequence of each transaction recorded in the database log file, otherwise, the data consistency between the source database and the target database is damaged. If the data synchronization software of the target end strictly synchronizes according to the transaction sequence in the log file of the source database, although the consistency of data replication can be effectively ensured, the parallelism of data synchronization is seriously affected, and the synchronization efficiency is very low.

In view of this, how to overcome the defects existing in the prior art and solve the phenomenon that data consistency and parallel execution conflict when heterogeneous databases are synchronized is a problem to be solved in the technical field.

[ summary of the invention ]

Aiming at the defects or the improvement requirements of the prior art, the parallel synchronization method solves the problem of parallel conflict generated for ensuring the data consistency during database synchronization, and realizes efficient and simple parallel synchronization of heterogeneous databases on the basis of ensuring the accuracy of synchronization data.

The embodiment of the invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for data synchronization based on parallel execution, specifically: acquiring a transaction needing to be synchronized; creating at least two execution threads, wherein each execution thread comprises a wakeup queue; distributing the transactions to different idle execution threads one by one according to the submission sequence, wherein the transaction distributed to each thread is the current transaction of each execution thread; acquiring each operation in the current transaction of each execution thread, and constructing a row lock for each operation according to the unique identifier of each operation; judging whether the current affair in each execution thread has a line lock conflict with the affair with the prior submission sequence, if so, putting the current affair into a wake-up queue of the execution thread where the affair with the line lock conflict exists for waiting; executing the current affairs without line lock conflict by the execution thread in parallel; and after each execution thread finishes executing the current transaction, awakening the transaction in the awakening queue of the execution thread.

Preferably, the constructing of the row lock for each operation according to the unique identifier of each operation specifically includes: creating a row lock hash table in each execution thread, taking the unique identifier of each operation as the key of the row lock hash table, and taking each operation as a record of the row lock hash table.

Preferably, the determining whether the current transaction in each execution thread has a line lock conflict with the transaction whose commit sequence is before the current transaction includes: acquiring a row lock hash table of a first execution thread as a first row lock hash table; acquiring a row lock hash table of a second execution thread as a second row lock hash table, wherein the current transaction submission sequence of the second execution thread is before the first thread; judging whether the same key exists in the first row lock hash table and the second row lock hash table; if the same key exists, the fact that the row lock conflict exists between the current transaction of the first thread and the current transaction of the second thread is represented; if the same key does not exist, the fact that the lock conflict does not exist between the current transaction of the first thread and the current transaction of the second thread is represented; and comparing each group of the first thread with the second thread one by one, and judging whether a row lock conflict exists between every two threads.

Preferably, the step of putting the current transaction into a wake-up queue corresponding to the transaction having the row lock conflict for waiting includes: if the current transaction and the plurality of transactions have the line lock conflict, searching the transaction with the commit sequence which is before the current transaction and is closest to the current transaction in all the transactions with the line lock conflict, and putting the current transaction into the wakeup queue of the execution thread where the searched transaction is located.

Preferably, the method further comprises: and if the current transaction execution of the execution thread is finished, releasing all row locks in the row lock hash table of the execution thread.

Preferably, the method further comprises executing the transaction list, and after the transaction to be synchronized is acquired, putting the transaction to be synchronized into the execution transaction list according to the commit order, so as to search and allocate the transaction to be synchronized.

Preferably, when determining whether the line lock conflict exists between the current transaction in each execution thread and the transaction whose commit sequence is before the current transaction, the method specifically includes: searching the position of the current transaction in the execution transaction list, starting from the position of the current transaction, sequentially obtaining the transactions to be executed in a reverse order, judging whether a row lock conflict exists between the current transaction and each transaction to be executed, and stopping judgment when the first transaction to be executed with the row lock conflict exists is found.

Preferably, the system further comprises a transaction receiving thread, the transaction receiving thread receives the acquired transactions to be synchronized, stores the transactions to be synchronized into the execution transaction list, and sequentially allocates the transactions to be executed in the execution transaction list to the execution threads when the execution threads are idle.

Preferably, after the transaction receiving thread receives the acquired transaction to be synchronized, the transaction to be synchronized is classified according to the transaction ID to be synchronized, and only the transaction corresponding to the commit operation is stored in the execution transaction list.

On the other hand, the invention provides a device for data synchronization based on parallel execution, which specifically comprises the following steps: comprising at least one processor and a memory, the at least one processor and the memory being connected via a data bus, the memory storing instructions executable by the at least one processor, the instructions, after execution by the processor, being adapted to perform a method for data synchronization based on parallel execution according to any of claims 1-9.

Compared with the prior art, the embodiment of the invention has the beneficial effects that: the method has the advantages that the synchronization efficiency is improved through the multithreading parallel synchronization, the data errors are avoided by adding the mutual exclusion lock to the operation which may cause the synchronization data errors, and the control complexity of the transaction execution sequence is simplified through the wake-up queue, so that the efficient and simple parallel synchronization of the heterogeneous databases is realized.

Furthermore, the row lock hash table in the scheme provided by the invention integrally releases resources after the transaction execution is finished, and classifies the operations to be synchronized through the data reading thread, thereby further improving the synchronization efficiency of the heterogeneous database.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a flowchart of a method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 2 is a timing diagram of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 3 is a flowchart of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 5 is a flowchart of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus for data synchronization based on parallel execution according to an embodiment of the present invention;

wherein the reference numbers are as follows:

21: a processor; 22: a memory.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The present invention is a system structure of a specific function system, so the functional logic relationship of each structural module is mainly explained in the specific embodiment, and the specific software and hardware implementation is not limited.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The invention will be described in detail below with reference to the figures and examples.

Example 1:

in the database system, operations in all transactions in the source database are recorded in the database log, and if the operations in the transactions are acquired and repeated in the destination database, the data in the destination database can be consistent with the data in the source database log. When the number of transactions needing synchronization is large, the single thread sequence execution can cause long waiting time of data synchronization and poor real-time performance, but when the transactions are executed in parallel, the operation execution sequence in the target database may be inconsistent with that of the source database, so that synchronization errors are caused. Therefore, the embodiment of the invention provides a data synchronization method capable of executing transactions in parallel on the premise of ensuring consistent execution sequence of the transactions.

As shown in fig. 1, the method for data synchronization based on parallel execution provided by the embodiment of the present invention includes the following specific steps:

step 101: and acquiring the transaction to be synchronized.

When data synchronization is performed, data in the source database needs to be synchronized to the destination database, so that data of the source is first acquired. In the field of databases, because the amount of data to be synchronized is large, in order to reduce the total amount of data to be synchronized, an incremental synchronization mode may be adopted, that is, on the basis of the previous synchronization, synchronization is not repeated for unchanged data, and only changed data is synchronized. The source database generates data changes by performing transactions, each transaction including one or more database operations, each operation generating a data change. The operations include reading data, writing data, updating and modifying data, deleting data, and the like, and in a specific implementation scenario, one operation may correspond to one SQL statement. The executed transactions in the source database are numbered according to the transaction submission sequence and stored in the database log, and the same transactions are repeated in the target database according to the transaction submission sequence in the database log of the source database, so that the data in the target database can be changed in the same way, and the purpose of data synchronization is achieved. In the non-database field, the transaction and operation represent actions that can generate data changes corresponding to the transaction and operation of the database, such as creating, changing, deleting, etc. of a file or a data block in a file system such as a document database, a file, a distributed system, etc.

Step 102: at least two execution threads are created, each execution thread including a wake-up queue.

In order to improve the execution efficiency of the target-end transaction, the target end executes the transaction in parallel when performing data synchronization, so that a plurality of execution threads capable of executing in parallel need to be created, and the transaction to be executed needs to be allocated to different execution threads for execution. In order to ensure that the execution sequence in each execution thread is correct and no conflict occurs, if two transactions include operations for the same data, the transaction with the later commit sequence needs to wait for the execution of the transaction with the earlier commit sequence to be completed before executing. Therefore, each execution thread needs to include a wake-up queue for storing the transaction whose commit sequence is after the current transaction of the execution thread and which contains the operation conflicting with the current transaction, and putting the transaction into the wake-up queue of the current transaction, waiting for the completion of the execution of the current transaction, and waking up the transaction whose commit sequence is after the operation does not conflict.

Step 103: and distributing the transactions to different idle execution threads one by one according to the submission sequence, wherein the transaction distributed to each thread is the current transaction of each execution thread.

Each thread of execution can only execute one transaction at a time, and therefore each thread of execution can only be assigned one transaction at a time, which is referred to as the current transaction of the thread of execution. The state of the execution thread when the current transaction is not allocated is idle, and the idle execution thread can accept the next transaction allocated according to the transaction submission sequence. If multiple idle threads of execution exist at the same time, a transaction is allocated to each idle thread of execution.

Step 104: each operation in the current transaction of each execution thread is acquired, and a row lock is constructed for each operation according to the unique identifier of each operation.

For example, for two updates of the same row of data in the database, in order to determine whether conflicting operations exist in the transaction, a row lock may be constructed for the operations according to the unique identifier of the data in the operations. Specifically, the value of the unique identifier may be used as the value of the row lock, and if an operation exists in two transactions, the operation indicates that a row lock conflict exists between the two transactions. In a database usage scenario, the ROWID may be used as a unique identifier for operations, and although different databases may have different embodiments on the structure of the ROWID, some may be constructed using physical addresses, such as ORACLE, and some may be constructed using logical integers, such as SQL SERVER and DM7, etc., all follow a principle that the ROWID value of each row of data on a single table is unique. Meanwhile, in the log operation recorded by the database, the log of each operation is provided with corresponding ROWID information for marking the data row corresponding to the log operation, and the ROWID of each operation in the transaction can be simultaneously obtained as the unique identifier by obtaining the transaction needing synchronization from the database log. The running mechanism of the database ensures that the data with the same ROWID is not allowed to be modified by a plurality of transactions in parallel, so that when the data synchronization puts the operations with the ROWID information into a warehouse, the conflicting transactions can be executed in series through mutual exclusion of the transactions with the same ROWID information, the conflict-free transactions are executed in parallel, the parallelism of the transaction putting into the warehouse is effectively increased, and the transaction submission sequence is partially ignored under the condition that conflict does not occur, so that the synchronization performance is improved. In the non-database field, an actually used unique identifier, such as a file ID, can be selected according to an actual scene; multiple sets of feature values can also be used as unique identifiers for data, such as using a table name + ROWID as a unique identifier for data when multiple tables are synchronized simultaneously.

Step 105: and judging whether the current transaction in each execution thread has a line lock conflict with the transaction with the commit sequence before the current transaction, if so, putting the current transaction into a wake-up queue of the execution thread where the transaction with the line lock conflict exists for waiting.

In order to avoid conflict during the execution of the transactions, whether the current transaction of each execution thread has a line lock conflict or not can be judged according to the line lock, if the current transactions of the two threads contain the same line lock, the line lock conflict exists between the two transactions, and the transaction with the later commit sequence in the two transactions needs to wait for the execution of the transaction with the earlier commit sequence to be completed and then starts to execute. Therefore, when there is a line lock conflict between two transactions, it is necessary to put the transaction with the later commit order into the wake-up queue in the execution thread in which the transaction with the earlier commit order is located, and suspend the execution thread in which the transaction with the later commit order is located, so as to wake up the transaction with the later commit order after the execution of the transaction with the earlier commit order is completed. The wake-up queue is used, and the execution of the next transaction is triggered through the wake-up message, so that the execution sequence of the mutually conflicting transactions can be conveniently managed without additional execution state monitoring and scheduling.

Step 106: the execution threads execute the current transaction in parallel without a line lock conflict.

The transaction without the line lock conflict is executed in parallel, and the synchronization conflict can not be caused, so that the transaction can be executed in a parallel mode, system resources are fully utilized, the total time for executing the transaction is reduced, and the data synchronization efficiency is improved. Because the transactions are allocated to the execution threads in step 103 according to the commit order of the transactions, the execution order of the transactions is further limited in steps 104 to 105 by the row lock and the wake-up queue, so that the execution order of the transactions is not inconsistent with the commit order due to parallel execution, and the data is correct after the execution of the transactions is completed.

Step 107: and after each execution thread finishes executing the current transaction, awakening the transaction in the awakening queue of the execution thread.

After each execution thread finishes executing the current transaction, the transactions of the wake-up queue in the execution thread are waken up, and the execution of the transactions in the wake-up queue is started. By adding the wake-up queue and the wake-up mode, the sequential scheduling of the transactions can be conveniently completed without using extra scheduling processing, the scheduling difficulty and the resource consumption required by scheduling are reduced, and the scheduling efficiency and stability are improved. If there are multiple waiting transactions in the wake-up queue of the execution thread, the waiting transactions are all woken up, the execution thread on which the woken-up transaction is located is activated, that is, the execution thread suspended in step 105 is activated, and the woken-up transaction is executed according to step 106. And circularly executing the steps of adding the wake-up queue and waking up until the transaction needing to be synchronized is completely executed. On the other hand, when a certain transaction exists in the wake-up queues of multiple threads, it is necessary to wait for all wake-up queues to wake up the transaction, and then the transaction can start to be activated and executed.

Furthermore, after the execution thread finishes executing the current transaction, the next unassigned transaction is assigned according to the transaction commit order, and the steps 104 to 106 are repeated for the next transaction until all the transactions to be synchronized are executed.

The timing diagram shown in fig. 2 illustrates steps 101-107, taking an implementation scenario as an example.

Tables T (ID INT PRIMARY KEY, C1 INT) now exist for both the source and target databases.

The source application performs the following operations on the table T:

operation 1: INSERT INTO T (ID, C1) VALUES (1, 1);

operation 2: INSERT INTO T (ID, C1) VALUES (2, 2);

COMMIT；

operation 3: UPDATE T SET C1 ═ 10WHERE ID ═ 1;

COMMIT；

and operation 4: UPDATE T SET C1 ═ 20WHERE ID ═ 2;

COMMIT；

the above-mentioned serially executed operations include three COMMIT operations, which will generate three transactions in the log of the source database, and the transactions are TRX1, TRX2, and TRX3, respectively, according to the COMMIT order of the transactions. The TRX1 comprises an operation 1, an operation 2 and a COMMIT operation, and two rows of data with an ID of 1 and an ID of 2 are inserted into a T table; TRX2 includes operation 3 and COMMIT operation, which updates the row with ID 1 in the T table; TRX3 includes operation 4 and a COMMIT operation, updating the row with an ID of 2 in the T table. All three transactions cause data changes in the source database, so TRX1, TRX2, and TRX3 are obtained as transactions to be synchronized according to step 101.

Three threads of execution are created in the destination database, EXEC1, EXEC2, and EXEC3, respectively, per step 102. The number of the execution threads is specifically determined by the number of resources of the destination, and creating three execution threads in this embodiment does not mean limiting the specific number of threads.

According to step 103, the current execution threads EXEC1, EXEC2 and EXEC3 are all in idle state, and the transactions TRX1, TRX2 and TRX3 are allocated to the three execution threads one by one according to the transaction commit order. After allocation, TRX1 is the current transaction of EXEC1, TRX2 is the current transaction of EXEC2, and TRX3 is the current transaction of EXEC 3.

As the operations in the transaction are performed row by row except for the COMMIT operation, a row number ID, i.e., ROWID, is selected as the unique identifier for the operation, per step 104. Constructing a row lock for each operation according to the unique identifier ROWID of the operation: the row lock ID of operation 1 is 1, the row lock ID of operation 2 is 2, the row lock ID of operation 3 is 1, and the row lock ID of operation 4 is 2.

Per step 105, it is determined whether each transaction has a row lock conflict with the transaction whose commit order precedes it. TRX1 commits first, so there are no transactions conflicting with its row lock; the row lock ID of operation 3 in TRX2 is the same as the row lock ID of operation 1 in TRX1, TRX2 has a row lock conflict with TRX1, and TRX2 is placed in the wake-up queue of the thread EXEC1 in which TRX1 is located; the row lock ID 2 of operation 4 in the TRX3 is the same as the row lock ID 2 of operation 2 in the TRX1, the TRX3 has a row lock conflict with the TRX1, and the TRX3 is placed in the wake-up queue of the thread EXEC1 where the TRX1 is located; the row lock ID of operation 4 in TRX3 is different from the row lock ID of operation 3 in TRX2 by 1, so there is no row lock collision.

The transaction TRX1 in EXEC1 commits in the first order, per step 106, and thus execution may begin. Because the transaction commit sequence of the transaction TRX2 in EXEC2 and the transaction TRX3 in EXEC3 is after TRX1 and there is a line lock conflict with TRX1, the transaction TRX2 and the transaction TRX3 are placed in a wake-up queue of EXEC1, and need to wait for TRX1 to finish execution and then start execution after waking up.

According to step 107, after the TRX1 completes its execution, it accepts the next allocated transaction and wakes up TRX2 and TRX3 in the wake-up queue of the executing thread EXEC 1. Because there is no row lock conflict between TRX2 and TRX3, TRX2 and TRX3 may execute in parallel.

In the above example, because of using multi-thread concurrent execution, the total time for executing the TRX1+ TRX2+ TRX3 serially in the original sequence is reduced to the execution time of TRX1+ TRX2 or TRX1+ TRX3, so that the total time for executing the transaction is reduced, and the synchronization efficiency is improved. Meanwhile, the conflict between the operation 3 in the TRX2 and the operation 1 in the TRX1 and the conflict between the operation 4 in the TRX3 and the operation 2 in the TRX1 are avoided through row locking, and the correctness of the synchronous data is ensured.

To facilitate comparison of row locks, the row locks may be stored in the form of a hash table. In step 104, the specific implementation manner of constructing a row lock for each operation according to the unique identifier of each operation may be: creating a row lock hash table in each execution thread, and putting each operation of the current transaction in each execution thread into the row lock hash table, wherein the unique identifier of each operation is used as the key of the row lock hash table. In the implementation scenario of the above example, the key of the row lock hash table is the unique identifier ID of the operation. Two sets of VALUEs with (key) as a structure are stored in the row-lock hash table of EXEC1, where key is a key VALUE of the hash table and is (ID is 1, VALUE is operation 1) and (ID is 2, VALUE is operation 2), respectively; a set of key VALUEs is stored in the row lock hash table of EXEC2, (ID 1, VALUE operation 3); a set of key VALUEs is stored in the row lock hash table of EXEC3, (ID 2, VALUE operation 4). By comparing key values of the row lock hash tables in different threads, whether row lock conflicts exist can be judged quickly, simply and conveniently by using the characteristics of the hash tables.

Further, as shown in fig. 3, when the execution thread uses the row lock hash table to store the row lock, the following specific steps may be used in step 105 to determine whether the current transaction in each execution thread has a row lock conflict with the transaction whose commit order is before the current transaction:

step 201: and acquiring a row lock hash table of the first execution thread as a first row lock hash table.

Step 202: and acquiring a line lock hash table of a second execution thread as a second line lock hash table, wherein the current transaction submission sequence of the second execution thread is before the first thread.

In the above example, three sets of row lock hash tables need to be compared, depending on the transaction commit order: the row lock hash table of EXEC1 is a first row lock hash table, and the row lock hash table in EXEC2 is a second row lock hash table; the row lock hash table of EXEC1 is a first row lock hash table, and the row lock hash table in EXEC3 is a second row lock hash table; the row lock hash table in EXEC2 is the first row lock hash table, and the row lock hash table in EXEC2 is the second row lock hash table.

Step 203: and judging whether the same key exists in the first row lock hash table and the second row lock hash table.

Step 204: if yes, it indicates that there is a line lock conflict between the current transaction of the first thread and the current transaction of the second thread.

Step 205: if not, the fact that the current transaction of the first thread and the current transaction of the second thread do not have the line lock conflict is shown.

In the scenario above, the same key exists in the row lock hash table of EXEC1 and the row lock hash table of EXEC 2: if the ID is 1, row lock conflict exists; the same key exists in the row lock hash table of EXEC1 and the row lock hash table of EXEC3, and a row lock conflict exists: ID is 2; the row lock hash table of EXEC2 and the row lock hash table of EXEC3 do not have the same key, and no row lock collision exists.

Step 206: comparing the first thread and the second thread one by one, and judging whether a row lock exists between every two threads.

When performing row lock conflict determination, it is necessary to perform pairwise comparison on the row lock hash tables in all execution threads one by one in the comparison manner as described in the above examples in step 202 to step 205, so as to ensure that all row lock conflicts are found.

When synchronization is carried out, execution is required to be carried out according to the transaction commit sequence, and if the line lock conflict exists between the current transaction and a plurality of transactions, execution can be carried out only after the conflict transaction with the latest transaction commit sequence is finished. Therefore, only the transaction with the commit order being before the current transaction and the transaction with the commit order being closest to the current transaction among all the transactions with the row lock conflict needs to be searched, and the current transaction is put into the wakeup queue of the execution thread where the searched transaction is located. In the above example, as shown in the timing diagram of fig. 4, if there is operation 5 in the TRX 2: (UPDATE T SET C1 ═ 15WHERE ID;) 2, then TRX3 has a line lock conflict with TRX1 and TRX2 at the same time, and according to the transaction commit order, on the basis of ensuring the data is correct, TRX3 needs to wait for TRX1 and TRX2 to finish execution and then start execution, and TRX2 needs to wait for TRX1 to finish execution and then start execution. At this time, the TRX2 is the transaction whose commit sequence is before the TRX3 and whose commit sequence is closest to the current transaction among all the transactions having the row lock conflict with the TRX3, so that the TRX3 is only needed to be placed in the wake-up queue of the execution thread EXEC2 where the TRX2 is located, and the TRX3 is waken up after the TRX2 is executed, so that the transaction execution sequence is correct, and the synchronization data is guaranteed to be correct. Only the transaction needing to wait is put into the wake-up queue corresponding to the transaction with the submission sequence closest to the current transaction, and under the condition that multiple groups of row lock conflicts exist, unnecessary wake-up is not needed for many times, so that resources are saved, and the complexity of thread scheduling execution is reduced.

When the current transaction of the execution thread is completed, the resource corresponding to the row lock can be changed next time, at this time, all row locks in the row lock hash table of the execution thread need to be released, that is, all (key, value) saved in the row lock hash table are cleared, and a new row lock hash table is reestablished after the next current transaction is allocated to the execution thread.

To facilitate management of transactions to be synchronized, the transactions to be synchronized may be organized using a list of executing transactions. After the transaction needing to be synchronized is obtained, the transaction needing to be synchronized is put into an execution transaction list according to the submission sequence, so that the transaction needing to be synchronized can be conveniently searched and distributed according to the submission sequence of the transaction. In the above example, the list of executed transactions is: TRX1- > TRX2- > TRX 3. In step 103, the transactions are allocated to different idle execution threads one by one according to the commit order, that is, the transactions are taken out one by one from the head of the table according to the transaction order stored in the execution transaction list, and are allocated to the idle execution threads in sequence. Specifically, in a usage scenario of a database, database transactions in a Log are managed by Log sequence number (abbreviated as LSN), and the LSN is compiled from small to large according to a transaction submission sequence.

Furthermore, after the transaction needing synchronization is organized by using the execution transaction list, the transaction with the row lock conflict can be conveniently searched by executing the transaction list. And when judging whether the current transaction in each execution thread has a line lock conflict with the transaction before the current transaction in the commit sequence, searching the position of the current transaction in the execution transaction list, starting from the position of the current transaction, sequentially acquiring the transactions to be executed in a reverse sequence, judging whether the line lock conflict exists, and stopping judging when finding the first transaction to be executed with the line lock conflict. In the above example, in the case of operation 5, when a transaction having a line lock conflict with the TRX3 is searched, the TRX3 starts to acquire the transactions to be executed in reverse order, the first acquired transaction is the TRX2, and the TRX3 and the TRX2 have a line lock conflict, so that the determination may be stopped. The row lock conflict is searched by using the execution transaction list in the reverse order, the last transaction with the row lock conflict can be conveniently and quickly found, and the searching and comparing cost is reduced.

In order to separate the transaction accepting and distributing process to be synchronized from the transaction executing process and conveniently manage the executing transaction list, a separate transaction receiving thread can be created. And the transaction receiving thread receives the acquired transactions needing to be synchronized, stores the transactions needing to be synchronized into the execution transaction list, and sequentially distributes the transactions needing to be executed in the execution transaction list to the execution threads when the execution threads are idle. The transaction receiving and distributing process is managed through the transaction receiving thread, so that the transaction receiving and distributing process and the transaction executing process can be performed in parallel, and the executing efficiency of the data synchronization process is further improved.

When data synchronization is carried out, only the transaction containing the COMMIT operation needs to be executed because only the transaction containing the COMMIT operation, namely the transaction corresponding to the COMMIT operation, takes effect on the data change. Therefore, after the transaction receiving thread receives the acquired transaction to be synchronized, the transaction to be synchronized is classified according to the transaction ID to be synchronized, and only the transaction corresponding to the commit operation is stored in the execution transaction list. In the above example, TRX1, TRX2, and TRX3 all contain COMMIT operations, all of which produce data changes and therefore all of which need to be performed. Specifically, in a usage scenario of the database, the transaction acceptance thread may determine the operation type according to the transaction ID of the operation, where the transaction ID of the operation represents the type of the operation, and thus may be used to determine whether the transaction contains a COMMIT operation. The transactions are classified before being stored in the execution transaction list, the transactions needing to be synchronized can be pre-screened, the transactions which do not need to be processed are removed, the number of the transactions needing to be processed is reduced, the transactions which do not need to be executed are prevented from being processed, and the execution efficiency of data synchronization is further improved.

After the steps provided in this embodiment, a plurality of execution threads are created, and a row lock and a wake-up queue are used in cooperation, so that a plurality of synchronous transactions can be executed in parallel on the basis of ensuring correct data, and the data synchronization efficiency is improved. Furthermore, by setting a lock hash table, executing a transaction list and a transaction receiving thread, the processes of conflict judgment and transaction allocation are further optimized, and the efficiency of data synchronization is further improved.

Example 2:

based on the method for data synchronization based on parallel execution provided in embodiment 1, in different specific application scenarios, supplementation and adjustment may also be performed according to different use requirements or actual scenarios. In the following embodiments, one or more embodiments may be selected and used in combination with the embodiment in example 1, in the case where there is no conflict.

In order to further improve the efficiency of concurrent execution of the execution threads, in a specific implementation scenario of this embodiment, when the transactions are allocated to different idle execution threads one by one according to the commit order in step 103, further optimized allocation may be performed according to the characteristics of the transactions and the threads: and pre-judging the transaction with conflict, forming a queue by the transaction with conflict, estimating the execution time of the transaction for the transactions in different queues, and selecting a proper execution thread for distribution according to the time. In a specific implementation scenario, as shown in fig. 5, steps 101-107 may be changed to the following steps:

step 301: and acquiring the transactions to be synchronized, wherein each transaction comprises at least one operation.

Step 302: at least two execution threads are created, each execution thread including a wake-up queue.

Step 303: and acquiring each operation in each transaction to be synchronized, and constructing a row lock for each operation according to the unique identifier of each operation.

Step 304: and judging whether the current transaction in each execution thread has a line lock conflict with the transaction before the current transaction in the commit sequence, putting the conflict-existing transactions into the same distribution queue according to the commit sequence, wherein no transaction conflict exists between the transactions in any two distribution queues.

Step 305: the execution time of the first transaction in each distribution queue is predicted.

Step 306: when the execution thread is idle, the transaction with longer execution time is preferentially distributed to the execution thread, and the next transaction of the distribution queue where the transaction is located is placed into the wake-up queue of the thread.

Step 307: and executing the current transaction without the line lock conflict by the execution thread in parallel according to the transaction submission sequence.

Step 308: and after each execution thread finishes executing the current transaction, receiving the next allocated transaction and awakening the transaction in the awakening queue of the execution thread.

The lock conflict judgment is carried out before the transaction is distributed to the execution thread, so that the operation execution and the lock judgment in the execution thread can be carried out simultaneously, and the lock judgment time is saved; meanwhile, the execution time of the transaction is estimated, the transaction with longer execution time is distributed to the first-appearing idle thread, the execution time of each thread can be balanced, and thread resources are fully utilized.

Under the condition of using the row lock hash table, in order to release the row lock hash table, each row lock hash table can be placed in a continuous memory, and when the row lock hash table is released, the whole memory is directly released, so that the operation is simpler and more convenient, and the execution efficiency is higher. Further, when the transaction is completed, in addition to releasing the resources of the locking hash table, other resources occupied by the transaction need to be released.

When data synchronization is performed, there may be a case where data formats of source data and destination data are different, or only partial data needs to be synchronized. In order to facilitate the synchronization of data with different formats, when a transaction receiving thread is used for managing a transaction to be synchronized, the transaction receiving thread can be used for cleaning the operation in the transaction, and the operation of a source end is converted into the operation of a destination end capable of providing the same data change, or a part which does not need to be synchronized is removed.

The data synchronization methods provided in the embodiments 1 and 2 are used in combination, so that system resources can be fully utilized, and the efficiency of data synchronization is further improved.

Example 3:

on the basis of the methods for data synchronization based on parallel execution provided in embodiments 1 to 2, the present invention further provides a device for data synchronization based on parallel execution, which can be used to implement the methods described above, as shown in fig. 6, which is a schematic diagram of a device architecture according to an embodiment of the present invention. The apparatus for data synchronization based on parallel execution of the present embodiment includes one or more processors 21 and a memory 22. In fig. 6, one processor 21 is taken as an example.

The processor 21 and the memory 22 may be connected by a bus or other means, such as the bus connection in fig. 6.

The memory 22, which is a nonvolatile computer-readable storage medium based on a parallel execution data synchronization method, may be used to store nonvolatile software programs, nonvolatile computer-executable programs, and modules, such as the parallel execution data synchronization method in embodiments 1 to 2. The processor 21 executes various functional applications of the apparatus based on data synchronization by parallel execution and data processing, that is, implements the method based on data synchronization by parallel execution of embodiments 1 to 2, by executing the nonvolatile software program, instructions, and modules stored in the memory 22.

The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The program instructions/modules are stored in the memory 22 and, when executed by the one or more processors 21, perform the method of data synchronization based on parallel execution in embodiments 1 to 2 described above, for example, perform the respective steps shown in fig. 1 to 5 described above.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for data synchronization based on parallel execution, the method comprising:

acquiring a transaction needing to be synchronized;

creating at least two execution threads, wherein each execution thread comprises a wakeup queue;

distributing the transactions to different idle execution threads one by one according to the submission sequence, wherein the transaction distributed to each thread is the current transaction of each execution thread;

Acquiring each operation in the current transaction of each execution thread, and constructing a row lock for each operation according to the unique identifier of each operation;

judging whether the current affair in each execution thread has a line lock conflict with the affair with the prior submission sequence, if so, putting the current affair into a wake-up queue of the execution thread where the affair with the line lock conflict exists for waiting;

executing the current affairs without line lock conflict by the execution thread in parallel;

after each execution thread finishes executing the current transaction, the execution thread is awakened, and the transaction in the awakening queue starts to be executed.

2. The method for data synchronization based on parallel execution according to claim 1, wherein the constructing a row lock for each operation according to its unique identifier specifically comprises: creating a row lock hash table in each execution thread, taking the unique identifier of each operation as the key of the row lock hash table, and taking each operation as a record of the row lock hash table.

3. The method according to claim 2, wherein the determining whether the current transaction in each execution thread has a line lock conflict with the transaction whose commit order is before the current transaction comprises:

Acquiring a row lock hash table of a first execution thread as a first row lock hash table;

acquiring a row lock hash table of a second execution thread as a second row lock hash table, wherein the current transaction submission sequence of the second execution thread is before the first thread;

judging whether the same key exists in the first row lock hash table and the second row lock hash table;

if the same key exists, the fact that the row lock conflict exists between the current transaction of the first thread and the current transaction of the second thread is represented;

if the same key does not exist, the fact that the lock conflict does not exist between the current transaction of the first thread and the current transaction of the second thread is represented;

and comparing each group of the first thread with the second thread one by one, and judging whether a row lock conflict exists between every two threads.

4. The method according to claim 1, wherein the step of waiting for the current transaction in a wake-up queue corresponding to the transaction having the lock conflict comprises:

if the current transaction and the plurality of transactions have the line lock conflict, searching the transaction with the commit sequence which is before the current transaction and is closest to the current transaction in all the transactions with the line lock conflict, and putting the current transaction into the wakeup queue of the execution thread where the searched transaction is located.

5. The method for data synchronization based on parallel execution according to claim 2, wherein the method further comprises: and if the current transaction execution of the execution thread is finished, releasing all row locks in the row lock hash table of the execution thread.

6. The method for data synchronization based on parallel execution according to claim 1, wherein: the method also comprises the steps of executing the transaction list, and after the transaction needing to be synchronized is obtained, putting the transaction needing to be synchronized into the execution transaction list according to the submission sequence so as to conveniently search and distribute the transaction needing to be synchronized.

7. The method for data synchronization based on parallel execution according to claim 6, wherein when determining whether the current transaction in each execution thread has a line lock conflict with the transaction whose commit order is before the current transaction, the method specifically comprises: searching the position of the current transaction in the execution transaction list, starting from the position of the current transaction, sequentially obtaining the transactions to be executed in a reverse order, judging whether a row lock conflict exists between the current transaction and each transaction to be executed, and stopping judgment when the first transaction to be executed with the row lock conflict exists is found.

8. The method for data synchronization based on parallel execution according to claim 6, wherein:

The system also comprises a transaction receiving thread, wherein the transaction receiving thread receives the acquired transaction to be synchronized, stores the transaction to be synchronized into an execution transaction list, and sequentially distributes the transaction to be executed in the execution transaction list to the execution thread when the execution thread is idle.

9. The method for data synchronization based on parallel execution according to claim 8, wherein:

and after receiving the acquired transaction needing to be synchronized by the transaction receiving thread, classifying the transaction needing to be synchronized according to the transaction ID needing to be synchronized, and only storing the transaction corresponding to the submission operation into an execution transaction list.

10. An apparatus for data synchronization based on parallel execution, characterized in that:

comprising at least one processor and a memory, said at least one processor and memory being connected via a data bus, said memory storing instructions executable by said at least one processor, said instructions being adapted to perform a method for data synchronization based on parallel execution according to any of claims 1-9, after execution by said processor.