CN116627769A

CN116627769A - Method and device for processing transaction log

Info

Publication number: CN116627769A
Application number: CN202310761690.1A
Authority: CN
Inventors: 田伟; 刘浩; 韩富晟
Original assignee: Beijing Oceanbase Technology Co Ltd
Current assignee: Beijing Oceanbase Technology Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-08-22
Also published as: CN115905402B; CN115905402A

Abstract

The embodiment of the specification provides a method and a device for processing a transaction log. The method comprises the following steps: executing a transaction against the distributed database; triggering generation of a third Structured Query Language (SQL) of the transaction, wherein the SQL comprises a third SQL corresponding to a change operation; changing the changing operation corresponding to the third SQL into a first operation and a second operation so as to move the data in the first partition table into the second partition table; the first operation is a changed deleting operation, and the second operation is a changed inserting operation; carrying a first sequence number corresponding to the first log corresponding to the first operation, and carrying a second sequence number corresponding to the second operation in a second log, wherein the first sequence number is prior and the second sequence number is later; and writing the first log carrying the first serial number and the second log carrying the second serial number into two log streams in the distributed database respectively. The application can obtain the sequence of each operation in the transaction according to the log in the log stream, thereby correctly obtaining the transaction data of the transaction.

Description

Method and device for processing transaction log

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for processing a transaction log.

Background

The database will log each insert, change, delete, etc. operation. To ensure atomicity and durability of data operations in a database system, a log of a transaction may be persisted into a log stream. Subsequently, the complete data change history of the database can be obtained by analyzing the log, so that the data synchronization is realized. For a traditional single-machine database, such as MySQL, there is only one log stream globally, so the transaction commit history can be restored by sequentially acquiring and analyzing the log streams.

Distributed databases employing multiple log streams such as OceanBase and the like are currently emerging. In a distributed database employing multiple log streams, all logs corresponding to a transaction may be written into multiple log streams in the distributed database, and different log streams may be distributed among multiple machine nodes.

In many scenarios, there is a sequence between operations in a transaction, so that the transaction data of the transaction needs to be obtained by considering the sequence, so that data synchronization can be performed correctly. There is currently no effective solution relevant. Therefore, how to obtain the sequence of each operation in the transaction according to the log in the log stream, and further accurately obtain the transaction data of the transaction based on the sequence is a problem to be solved.

Disclosure of Invention

One or more embodiments of the present disclosure provide a method and an apparatus for processing a transaction log, which can obtain a sequence between operations in a transaction according to a log in a log stream, so as to accurately obtain transaction data of the transaction.

According to a first aspect, a method for processing a transaction log is provided, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the method is applied to a distributed database adopting multiple log streams;

one data table in the distributed database comprises a first partition table and a second partition table;

the method comprises the following steps:

executing a transaction against the distributed database; wherein, the structured query language SQL for triggering and generating the transaction comprises a third SQL corresponding to the change operation;

when executing the transaction, changing one change operation corresponding to the third SQL into a first operation and a second operation so as to move the data in the first partition table into the second partition table; wherein the first operation is a modified delete operation and the second operation is a modified insert operation;

generating a first log corresponding to the first operation and a second log corresponding to the second operation;

Carrying a first sequence number corresponding to a first operation in a first log corresponding to the first operation, carrying a second sequence number corresponding to a second operation in a second log corresponding to the second operation, and setting the first sequence number to be before and the second sequence number to be after;

and writing the first log carrying the first serial number and the second log carrying the second serial number into two log streams in the distributed database respectively.

The method further comprises the steps of:

executing a transaction against the distributed database; wherein, a plurality of SQL triggers generate the transaction; each SQL corresponds to at least one operation in the transaction; generating a log corresponding to each operation for each operation;

aiming at the plurality of SQL trigger generation of the transaction, generating a statement sequence which is unique in the transaction and corresponds to each SQL in the plurality of SQL according to the execution sequence among the plurality of SQL;

aiming at all operations in the transaction corresponding to each SQL, carrying statement sequence numbers corresponding to the SQL in all logs corresponding to all the operations;

for all operations corresponding to the SQL which trigger the generation of the transaction, writing all logs which correspond to all operations and respectively carry statement sequence numbers into a plurality of log streams of a distributed database; wherein, each log corresponding to all the operations of one SQL is written into different log streams.

Wherein, when executing the transaction, further comprises: scheduling and executing a first operation, and scheduling and executing a second operation;

the setting the first sequence number before and the second sequence number after includes: the first sequence number is set according to the time of executing the first operation, and the second sequence number is set according to the time of executing the second operation.

The first partition table and the second partition table are positioned in the same machine node;

the setting the first sequence number according to the time of executing the first operation includes: taking the local time of the machine node when executing the first operation as the first sequence number;

the setting the second sequence number according to the time of executing the second operation includes: and taking the local time of the machine node when the second operation is executed as the second sequence number.

The first partition table and the second partition table are respectively positioned in different first machine nodes and second machine nodes;

the setting the first sequence number according to the time of executing the first operation includes: taking the local time of the first machine node when executing the first operation as the first sequence number;

the method further comprises the steps of: transmitting the local time of the first machine node when the first operation is executed to the second machine node;

The setting the second sequence number according to the time of executing the second operation includes:

pushing up the logic time in the second machine node based on the local time in the first machine node received by the second machine node; and

and taking the pushed logic time of the second machine node when the second operation is executed as the second sequence number.

The method comprises the steps that each SQL for triggering and generating each transaction comprises a plurality of third SQL, and the first operation and the second operation changed according to each third SQL are used for moving data in one partition table to another partition table;

the first operation is scheduled to be executed and then the second operation is scheduled to be executed, and the method comprises the following steps: scheduling the first machine node to execute all first operations changed by all third SQL in the first machine node; then, the second machine node is scheduled to execute all second operations changed by all third SQL in the node;

accordingly, the sending the local time when the first machine node performs the first operation to the second machine node includes:

obtaining all local time when the first machine node executes all changed first operations;

selecting the maximum local time from all the local times;

And sending the selected maximum local time to the second machine node.

According to a second aspect, there is provided a method of processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the method is applied to a distributed database adopting multiple log streams;

the method comprises the following steps:

acquiring at least two log streams from the distributed database;

obtaining a first log and a second log from any two log streams; the first log carries a first sequence number, the second log carries a second sequence number, the sequence of the first sequence number is prior, and the sequence of the second sequence number is later;

obtaining the execution sequence between a first operation corresponding to the first log and a second operation corresponding to the second log according to the first sequence number carried in the first log and the second sequence number carried in the second log; the first operation is a delete operation changed from one change operation corresponding to the third structured query language SQL, and the second operation is an insert operation changed from one change operation corresponding to the third SQL; the third SQL is used for moving the data in the first partition table to the second partition table;

And obtaining transaction data corresponding to the transactions to which the first operation and the second operation belong according to the execution sequence between the first operation and the second operation.

Wherein, the sequence number includes: time value.

According to a third aspect, there is provided an apparatus for processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the device is applied to a distributed database adopting multiple log streams; one data table in the distributed database comprises a first partition table and a second partition table;

the device comprises:

a transaction execution module configured to execute a transaction against the distributed database; wherein, the structured query language SQL for triggering and generating the transaction comprises a third SQL corresponding to the change operation; when the transaction is executed, changing the changing operation corresponding to the third SQL into a first operation and a second operation so as to move the data in the first partition table into the second partition table; wherein the first operation is a modified delete operation and the second operation is a modified insert operation;

a sequence number determining module configured to determine a first sequence number corresponding to the first operation and a second sequence number corresponding to the second operation, and set the first sequence number before and the second sequence number after;

The log generation module is configured to generate a first log corresponding to the first operation and a second log corresponding to the second operation; carrying a first serial number corresponding to a first operation in a first log corresponding to the first operation, and carrying a second serial number corresponding to a second operation in a second log corresponding to the second operation;

and the log writing module is configured to write the first log carrying the first serial number and the second log carrying the second serial number into two log streams in the distributed database respectively.

According to a fourth aspect, there is provided an apparatus for processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the device is applied to a distributed database adopting multiple log streams; one data table in the distributed database comprises a first partition table and a second partition table;

the device comprises:

the log stream acquisition module is configured to acquire at least two log streams from the distributed database;

the log acquisition module is configured to obtain a first log and a second log from any two log streams; the first log carries a first sequence number, the second log carries a second sequence number, the sequence of the first sequence number is prior, and the sequence of the second sequence number is later;

The operation sequence determining module is configured to obtain the execution sequence between the first operation corresponding to the first log and the second operation corresponding to the second log according to the first sequence number carried in the first log and the second sequence number carried in the second log; the first operation is a deleting operation changed from a changing operation corresponding to the third structured query language SQL, and the second operation is an inserting operation changed from a changing operation corresponding to the third SQL; the third SQL is used for moving the data in the first partition table to the second partition table;

and the synchronous processing module is configured to obtain transaction data corresponding to the transactions to which the first operation and the second operation belong according to the execution sequence between the first operation and the second operation.

According to a fifth aspect, the present description provides a computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to perform the method as described above.

According to a sixth aspect, embodiments of the present specification provide a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements a method as described above.

As can be seen from the above technical solutions, the combination of one or more embodiments of the present disclosure has at least the following advantages:

1. the logs corresponding to the operations carry serial numbers, and because the sequence among the serial numbers is the same as the execution sequence among the operations corresponding to each log, that is, even if the logs of each operation of one transaction are written into different log streams, the execution sequence among the operations corresponding to the logs can be determined according to the serial numbers, so that the data of the transaction can be assembled according to the sequence, the transaction data of the transaction can be obtained correctly, and the accuracy of data synchronization is ensured.

2. Aiming at the condition that a plurality of SQL (structured query language) corresponding to one transaction has sequence requirements, statement sequence numbers can be carried in a log, so that the sequence of all operations of the SQL executed in advance in the transaction can be ensured to precede the sequence of all operations of the SQL executed later in the transaction, transaction data of the transaction can be obtained based on the execution sequence among the correct SQL, and the accuracy of data synchronization is ensured.

3. Aiming at the condition that sequence requirements exist between the delete operation and the insert operation changed by one change operation, the operation sequence number such as a time value can be carried in the log, and the sequence of the changed delete operation is ensured to precede the sequence of the changed insert operation, so that transaction data of a transaction can be obtained based on the correct operation execution sequence, and the accuracy of data synchronization is ensured.

4. When the two related partition tables are located in different machine nodes, the local time and the logic time can be used for pushing up the time value so as to obtain the sequence number carried in the log, so that the time value corresponding to the changed delete operation, namely the sequence number, can be ensured to precede the time value corresponding to the changed insert operation, namely the sequence number even when the changed delete operation and the changed insert operation are executed in different machine nodes.

5. All changed deleting operations in one machine node can be performed in a centralized way, so that only one maximum time value is sent to the other machine node, a plurality of local times for executing the deleting operations are not needed to be sent, and the processing resources of the system are saved.

6. Each distributed transaction is supported to be output according to a strict statement sequence, and the whole external expression is consistent with a single-machine database, so that the distributed transaction has good compatibility and understandability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates an exemplary system architecture diagram to which embodiments of the present description may be applied;

FIG. 2 is a flow chart of a method of processing a transaction log provided in one embodiment of the present disclosure;

FIG. 3 is a diagram of SQL and its corresponding operations and logs according to an embodiment of the present disclosure;

FIG. 4 is a diagram of a multi-log stream in which logs are written according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of writing a log carrying a statement sequence number into a log stream in an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a log stream of a log of a delete operation carrying an operation sequence number and a log of an insert operation carrying an operation sequence number in an embodiment of the present disclosure;

FIG. 7 is a flow chart of a method of processing a transaction log according to another embodiment of the present disclosure;

fig. 8 is a block diagram of an apparatus for processing transaction logs according to an embodiment of the present disclosure.

Fig. 9 is a block diagram of an apparatus for processing a transaction log according to another embodiment of the present disclosure.

Detailed Description

First, concepts of transactions involved in the embodiments of the present specification will be described.

A Transaction (Transaction) refers to a sequence of database operations that access and/or manipulate data. In computer terminology, a program execution unit that accesses and possibly alters data items in a database consists of collective operations that are performed between the beginning of a transaction and the end of the transaction, which must all be completed successfully, otherwise all changes made in each operation are undone. For example, a transfer transaction may consist of an increase in the balance of one account and a decrease in the balance of another account.

One transaction often corresponds to multiple operations, with each operation being sequenced, e.g., in one transaction, an operation 12 of adding 100 elements to account B can be performed after an operation 11 of adding 100 elements to account a is performed. The logs corresponding to different operations are likely to be written into different log streams. However, in a multi-log stream distributed database, the write order in which the logs are written into the log streams is random, and the plurality of log streams have no global timing. For example, the log 12 of the operation 12 is written into the log stream 12 in the machine node 12 first, then the log 11 of the operation 11 is written into the log stream 11 in the machine node 11, and subsequently, when the data synchronization is performed, the log 11 of the operation 11 may be obtained from the log stream 11, and the log 12 of the operation 12 is obtained from the log stream 12, but the sequence of the operations corresponding to the log 11 and the log 12 cannot be determined, so that the result of the synchronization may be: in this transaction, the operation 12 corresponding to the log 12 is preceded and the operation 11 corresponding to the log 11 is followed, resulting in an error.

The following describes the scheme provided in the present specification with reference to the drawings.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

FIG. 1 illustrates an exemplary system architecture to which embodiments of the present description may be applied. The system mainly comprises: transaction executives, distributed databases, and data synchronization devices.

And a distributed database adopts multiple log streams. Also, multiple machine nodes may be employed as an example of running distributed database software.

A transaction executor is a device that performs operations on a distributed database, such as insert (insert), delete (delete) and update (update) operations in the distributed database, writes all logs corresponding to each operation in a transaction into at least two log streams in the distributed database, and uses the method for processing transaction logs provided in the embodiments of the present disclosure to carry a sequence number in each log.

The data synchronization device may pull the log stream from the distributed database, and process the log stream by adopting the method for processing the transaction log provided in the embodiment of the present disclosure to obtain the transaction data in the transaction corresponding to the correct operation sequence, so as to perform data synchronization based on the dependency relationship or sequence between the operations in the transaction.

The transaction executives, the data synchronization devices, and the distributed database may interact over a network, which may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The transaction executor or the data synchronization device may be a single server, a server group formed by a plurality of servers, or a cloud server. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPs, virtual Private Server) service. In addition, the system can also be a computer terminal with stronger computing power.

It should be appreciated that the number of transaction executives, distributed databases, machine nodes, and data synchronization devices in FIG. 1 are merely illustrative. There may be any number of distributed databases, machine nodes, and data synchronization devices, as desired for implementation.

It can be seen that in the embodiments of the present specification, a method for processing a transaction log performed by a transaction executor is related to the method includes: and a process of performing an operation of one transaction on the distributed data and writing each log of the transaction to a plurality of log streams. Meanwhile, the method for processing the transaction log, which is executed by the data synchronization device, comprises the following steps: and acquiring the complete data change history of the database according to the logs carried in the log stream, thereby realizing the data synchronization equipment processing of data synchronization. The following description is made by way of different examples.

FIG. 2 is a flow chart of a method for processing transaction logs provided by an embodiment of the present disclosure. The execution subject of the method is the transaction executor, and the method is applied to a distributed database adopting multiple log streams. It will be appreciated that the method may be performed by any apparatus, device, platform, cluster of devices, having computing, processing capabilities. Referring to fig. 2, the method includes:

step 201: executing a transaction against the distributed database; the transaction includes a first operation and a second operation.

Step 203: determining a first sequence number and a second sequence number; the sequence between the first sequence number and the second sequence number is the same as the execution sequence between the first operation and the second operation.

Step 205: and generating a first log corresponding to the first operation and a second log corresponding to the second operation.

Step 207: the first log carries a first sequence number and the second log carries a second sequence number.

Step 209: and writing the first log and the second log into two log streams in the distributed database respectively.

Therefore, in the flow shown in fig. 2, the logs corresponding to the operations carry serial numbers, because the sequence between the serial numbers is the same as the execution sequence between the operations corresponding to the logs, that is, even if the logs of the operations of a transaction are written into different log streams, the execution sequence between the operations corresponding to the logs can be determined according to the serial numbers, so that the data of the transaction can be assembled according to the sequence, thereby correctly obtaining the transaction data of the transaction and ensuring the correctness of data synchronization.

Each step shown in fig. 2 is described separately below.

First for step 201: a transaction is performed with respect to the distributed database, wherein the transaction includes a first operation and a second operation.

As previously described, a Transaction (Transaction) refers to a sequence of database operations that access and/or manipulate data. And transactions are triggered by SQL (Structured Query Language ). The SQL corresponding operations may include: insertion, query, modification, or deletion of data; database schema creation and modification; and (3) data access control. When SQL is initiated for the distributed database, then the generation of transactions in the distributed database is triggered, with one SQL corresponding to 1 or more operations.

In one embodiment of the present disclosure, the method for processing transaction logs may be applied in two business scenarios:

business scenario a: the execution sequence among the SQL corresponding to the transaction is required, and the data synchronization is required according to the execution sequence among the SQL.

Business scenario B: the distributed database opens a row movement (row movement) function, and has a requirement of execution sequence between a delete (delete) operation and an insert (insert) operation changed by an update operation, so that data synchronization needs to be performed according to the execution sequence between the delete (delete) operation and the insert (insert) operation changed.

Each business scenario is described separately below.

For traffic scenario a: the execution sequence among the SQL corresponding to the transaction is required, and the data synchronization is required according to the execution sequence among the SQL.

For example, for a distributed database, 3 SQL's were initiated: SQL1, SQL2, SQL3, the 3 SQL triggers generate a transaction tx1 in the distributed database. In practice, the execution sequence of the 3 SQL is SQL1, SQL2 and SQL3. As shown in fig. 3, for example, SQL1 corresponds to 3 operations in transaction tx1, which are respectively operation 1 (for example, deleting a first row of data items in the data table), operation 2 (for example, deleting a second row of data items in the data table), and operation 3 (for example, deleting a third row of data items in the data table); SQL2 corresponds to 3 operations in transaction tx1, operation 4, operation 5, and operation 6, respectively; SQL3 corresponds to 3 operations in transaction tx1, operation 7, operation 8, and operation 9, respectively.

As shown in fig. 3, in step 201, when executing the transaction tx1, a log corresponding to each of the operations 1 to 9 is generated. For example, the log corresponding to operation 1 is recorded as ROW1, the log corresponding to operation 2 is recorded as ROW2, and so on, until the log corresponding to operation 9 is recorded as ROW9. All logs of the transaction tx1 will tend to be written into multiple log streams, such as see fig. 4, assuming that in the subsequent process, the log of each operation in the transaction tx1 will be written into 3 log streams LS1, LS2, LS 3. Wherein, three lines of data, namely 3 logs of 3 operations, are written in the log stream LS 1: ROW1, ROW4, ROW7; 3 logs of 3 operations are written in log stream LS 2: ROW2, ROW5, ROW8; 3 logs of 3 operations are written in log stream LS 3: ROW3, ROW6, ROW9.

Subsequently, when data synchronization is performed, for the distributed transaction tx1, when all data of LS1, LS2, LS3 are aggregated, it is natural that the data may be output according to the data sequence of the participant, i.e. the log stream, for example, the following sequence M is output according to LS1, LS2, LS 3: ROW1, ROW4, ROW7, ROW2, ROW5, ROW8, ROW3, ROW6, ROW9. As described above, the execution order between SQLs corresponding to the transaction tx1 is: SQL1, SQL2 and SQL3, namely ROW1, ROW2 and ROW3 corresponding to SQL1 should be output first; then, ROW4, ROW5 and ROW6 corresponding to SQL2 are output again; finally, ROW7, ROW8 and ROW9 corresponding to SQL3 are output again. It can be seen that if the above sequence M is output in the order of the line data in LS1, LS2, LS3, this results in that the execution order between SQLs of the transaction tx1 cannot be corresponded to: SQL1, SQL2, SQL3, resulting in disruption of external consistency in downstream business systems.

It can be seen that data synchronization is required according to the execution sequence among the SQL's. Therefore, the log needs to carry information of the execution sequence of the SQL.

In the service scenario A, the sequence number carried in the log is the sequence number of the SQL statement. Further, the log of each operation needs to carry a statement sequence number. Thus, in step 201, the first operation and the second operation may be any operation corresponding to different SQL.

Next for traffic scenario B: the distributed database opens a row movement (row movement) function, and has a requirement of execution sequence between a delete (delete) operation and an insert (insert) operation changed by an update operation, so that data synchronization needs to be performed according to the execution sequence between the delete (delete) operation and the insert (insert) operation changed.

In the distributed database, one data table may include at least two partition tables, for example, sales data in one year is stored in 12 partition tables, and 12 partition tables store sales data in 12 months of one year, respectively. Different partition tables may be stored on the same machine node or on different machine nodes. In the business scenario B, when a row move (row move) is started, an SQL needs to perform a change (update) operation on one of the partition tables, for example, one row of data in the partition table corresponding to 1 month, that is, move the one row of data in the partition table corresponding to 1 month to the partition table corresponding to 2 months. Then, in practice, the SQL corresponding change operation can be split, i.e., changed to a delete (delete) operation and an insert (insert) operation, while the transaction is being executed. That is, the deletion operation is first performed on the line data in the partition table corresponding to 1 month, and then the insertion operation for inserting the line data is performed on the partition table corresponding to 2 months. Therefore, the changed deleting operation and the inserting operation have an execution sequence, and if the transaction data is obtained not according to the sequence during the subsequent data synchronization, the changed inserting operation is executed first and then the changed deleting operation is executed, which often leads to data synchronization errors. Therefore, it is necessary to carry information of the execution order of the changed delete operation and the changed insert operation in the log.

In the service scenario B, the sequence number carried in the log is actually the operation sequence number. It is necessary to carry the operation sequence number in the log corresponding to the changed delete operation and the operation sequence number in the log corresponding to the changed insert operation. Accordingly, in step 201, the first operation and the second operation are the changed deletion operation and the changed insertion operation, respectively.

Next, for steps 203 to 207: determining a first sequence number and determining a second sequence number; the sequence between the first sequence number and the second sequence number is the same as the execution sequence between the first operation and the second operation; generating a first log corresponding to the first operation and a second log corresponding to the second operation; the first log carries a first sequence number and the second log carries a second sequence number.

In step 203, the first sequence number is an execution order for characterizing the first operation, and the second sequence number is an execution order for characterizing the second operation. Therefore, in the subsequent step 207, the first sequence number is carried in the first log corresponding to the first operation, and the second sequence number is carried in the second log corresponding to the second operation.

The implementation of step 203 is first described as corresponding to business scenario a.

When the method is applied to the business scene A, the first operation is any operation corresponding to the first SQL, and the second operation is any operation corresponding to the second SQL; the execution sequence between the first operation and the second operation is equal to the execution sequence between the first SQL and the second SQL.

As previously mentioned, one sequence number is actually the statement sequence number of the relevant SQL. The execution sequence of the operations corresponding to different SQL is equal to the execution sequence of the SQL. For example, the execution sequence of SQL1 to SQL3 is: SQL1, SQL2, SQL3. Thus, the sequence of operations should be in turn: operations 1 to 3, 4 to 6, and 7 to 9.

One implementation procedure of step 203, corresponding to traffic scenario a, includes:

generating a first statement sequence number corresponding to a first SQL unique in a transaction, and generating a second statement sequence number corresponding to a second SQL unique in the transaction; the sequence between the first statement sequence number and the second statement sequence number is the same as the execution sequence between the first SQL and the second SQL; and

the first sentence sequence number is used as a first sequence number, and the second sentence sequence number is used as a second sequence number.

For example, for the transaction tx1, a statement sequence number corresponding to SQL1 is generated and denoted as sql_no1; generating a statement sequence number corresponding to SQL2, and marking the statement sequence number as SQL_NO2; and generating a statement sequence number corresponding to SQL3, and marking the statement sequence number as SQL_NO3. The sequence of the statement sequence numbers SQL_NO1, SQL_NO2 and SQL_NO3 is the same as the execution sequence of the statement sequence numbers SQL1, SQL2 and SQL3.

Because the logs of each operation corresponding to the SQL1 are ROW1, ROW2 and ROW3, the serial numbers carried in the ROW1, ROW2 and ROW3 are SQL_NO1; because the logs of each operation corresponding to the SQL2 are ROW4, ROW5 and ROW6, the serial numbers carried in the ROW4, ROW5 and ROW6 are SQL_NO2; because the logs of each operation corresponding to the SQL3 are ROW7, ROW8 and ROW9, the sequence numbers carried in the ROW7, ROW8 and ROW9 are SQL_NO3.

The implementation of step 203 when corresponding to business scenario B is described below.

As described above, when applied to the service scenario B, one sequence number is actually the operation sequence number of the changed delete operation and the operation sequence number of the changed insert operation. One implementation of this step 203 includes performing step 2031: the first sequence number is set before and the second sequence number is set after.

In one embodiment of the present disclosure, in the business scenario B, when executing a transaction, a first operation may be scheduled to be executed and then a second operation may be scheduled to be executed, so that an implementation procedure of step 2031 includes:

step 20311: setting a first sequence number according to the time of executing the first operation, for example, the first sequence number is marked as Ltime_NO1; and, the second sequence number is set according to the time of executing the second operation, for example, the second sequence number is marked as Ltime_NO2, thereby ensuring that the first sequence number is before and the second sequence number is after.

In the traffic scenario B, it is in fact possible to distinguish between the following two cases:

case 1: the partition movement is performed in the same machine node, i.e. the first partition table and the second partition table are located in the same machine node.

In case 1, the local time may be utilized as the sequence number. At this time, a specific implementation procedure of step 20311 includes: setting the first sequence number as the local time of the machine node when executing the first operation; the second sequence number is set to a local time of the machine node when the second operation is performed. For example, since the first operation (i.e., the changed delete operation) is scheduled to be performed before the second operation (i.e., the changed insert operation) is scheduled to be performed at the time of scheduling, the local time for performing the first operation is earlier and the local time for performing the second operation is later, so that the local time for performing the first operation is taken as the first sequence number and the local time for performing the second operation is taken as the second sequence number, the first sequence number can be guaranteed to be earlier and the second sequence number is later.

Case 2: the partition movement takes place in different machine nodes, i.e. the first partition table and the second partition table are located in different machine nodes.

In case 2, the local time as well as the logical time may be utilized as the sequence number. At this time, a specific implementation procedure of step 20311 includes:

step 203111: setting the first sequence number as the local time of the first machine node when the first operation is executed;

step 203113: transmitting the local time when the first machine node performs the first operation to the second machine node;

step 203115: the second machine node receives the local time in the first machine node when the first operation is executed, and the logic time in the second machine node is pushed up on the basis of the received local time in the first machine node; and

step 203117: and setting the second serial number as the pushed logic time when the second machine node executes the second operation.

For example, when the machine node 1 performs the first operation locally, the local time is a time value of 1:2022, 9, 1 am 10:00, the time value 1 is used as a first sequence number, and is sent to the machine node 2. The machine node 2 pushes up the logical time to 10 am on month 1 of 2022, 9 on the basis of the time value 1: 01, the logic time after the push is 2022, 9, 1, 10 am: 01 as the second sequence number.

For this case 2, in one embodiment of the present specification, the third SQL described above is included in the SQL that triggers the generation of each transaction. To further increase the efficiency of processing transaction logs, the local time may be pushed up in batches for multiple delete operations performed in one machine node, and the logical time may be pushed up in batches for multiple insert operations performed in another machine node, processed together, thereby conserving processing resources. At this time, the liquid crystal display device,

Firstly, scheduling a first machine node to execute all first operations changed by all third SQL in the node;

thereafter, the above step 203111 is executed, and the specific implementation process includes: and taking the local time of the first machine node when executing each first operation as a first serial number corresponding to the first operation.

Thereafter, the above step 203113 is executed, and the specific implementation process includes: obtaining all local time when the first machine node executes all changed first operations; selecting the maximum local time from all the local times; and sending the selected maximum local time to the second machine node.

Then, the second machine node is scheduled to execute all second operations changed by all third SQL in the node;

thereafter, the above step 203115 is executed, and the specific implementation process includes: and pushing the logic time to be high on the basis of the maximum local time of the first machine node received by the second machine node, and gradually increasing the second serial number corresponding to each second operation on the basis of the logic time after being pushed to be high.

For example, the machine node 1 needs to execute 100 deletion operations changed from the change operation, and 100 insertion operations changed from the change operation need to be executed in the machine node 2, then the machine node 1 is scheduled to execute 100 changed deletion operations, the 100 changed deletion operations have 100 local times when executed, the 100 local times are respectively used as sequence numbers carried in the logs of the corresponding deletion operations, a maximum value is selected from the 100 local times of the machine node 1, the maximum value is sent to the machine node 2, after that, the machine node 2 is scheduled to execute 100 changed insertion operations, the local logic times are gradually pushed up on the basis of the received maximum value, the machine node 2 gradually continues to push up the logic times for each changed insertion operation, and the 100 gradually increased logic times are respectively used as sequence numbers carried in the logs of the corresponding insertion operations. It can be seen that in the above procedure, it is not necessary to send all of the 100 local times for executing the 100 deletion operations in the machine node 1 to the machine node 2, but only one time and only the maximum value of one local time are sent for a batch of changed deletion operations, so that the processing resources of the system are greatly saved.

Step 209 is next performed: and writing the first log and the second log into two log streams in the distributed database respectively.

For example, for the above-mentioned business scenario a, the process from step 203 to step 209 may be shown in fig. 5.

For example, for the above-mentioned business scenario B, the process of steps 203 to 209 may be referred to as shown in fig. 6. For example, SQL1 is a statement concerning partition movement, the log corresponding to the changed delete operation is ROW1, and the log corresponding to the changed insert operation is ROW2. The local time, denoted as ltime_no1, is carried in the ROW1 as the first sequence number. And, according to the logic time recorded as the logic time_no2 after the Ltime_no1 is pushed up, the logic time Ltime_no2 serving as the second sequence number is carried in the ROW2.

The above describes a method of processing a transaction log executed in a transaction executor.

The following describes a method of processing a transaction log performed in the data synchronization apparatus.

FIG. 7 is a flow chart of a method for processing transaction logs provided by an embodiment of the present disclosure. The execution subject of the method is the data synchronization device shown in fig. 1, and is applied to a distributed database adopting multiple log streams. It will be appreciated that the method may be performed by any apparatus, device, platform, cluster of devices, having computing, processing capabilities. Referring to fig. 7, the method includes:

Step 701: at least two log streams are obtained from a distributed database.

Step 703: obtaining a first log and a second log from any two log streams; the first log carries a first sequence number, and the second log carries a second sequence number.

Step 705: obtaining the execution sequence between a first operation corresponding to the first log and a second operation corresponding to the second log according to the first sequence number carried in the first log and the second sequence number carried in the second log; the sequence between the first sequence number and the second sequence number is the same as the execution sequence between the first operation and the second operation.

Step 707: and obtaining transaction data corresponding to the transactions to which the first operation and the second operation belong according to the execution sequence between the first operation and the second operation.

An understanding of the process illustrated in fig. 7 above may be found in relation to what is described above in connection with fig. 2-6.

When the method of the embodiment of the present disclosure is applied to the above service scenario a, the sequence number carried in the log includes: SQL statement sequence number. Referring to fig. 5, according to the statement sequence number carried in each log, the order of the statement sequence numbers is as follows: if the execution sequence of 3 operations corresponding to the logs ROW1, ROW2, ROW3 carrying the sql_no1 is determined to be first, then the execution sequence of 3 operations corresponding to the logs ROW4, ROW5, ROW6 carrying the sql_no2 is determined to be later, and finally the execution sequence of 3 operations corresponding to the logs ROW7, ROW8, ROW9 carrying the sql_no3 is determined to be at last. In the process shown in fig. 7, for any two operations corresponding to any two different SQLs, the execution order of the operations can be determined, so as to obtain the transaction data of the transaction tx1 corresponding to the correct execution order.

When the method of the embodiment of the present disclosure is applied to the above service scenario B, the sequence number carried in the log includes: the operation sequence number may be specifically a time value. Referring to fig. 6, according to the time values carried in the logs ROW1 and ROW2, the order of the time values is as follows: if the execution order of the operation 1 (changed deletion operation) corresponding to the log ROW1 carrying the ltime_no1 is determined to be earlier, and the execution order of the operation 2 (changed insertion operation) corresponding to the log ROW2 carrying the ltime_no2 is determined to be later. Thus, in the process shown in fig. 7, the execution order between the operations 1 and 2 can be determined, so that the transaction data corresponding to the correct execution order of the transaction tx1 is obtained.

In one embodiment of the present specification, an apparatus for processing transaction logs is presented, see fig. 8, for a transaction executor for a distributed database employing multiple log streams, comprising:

a transaction execution module 801 configured to execute a transaction against the distributed database; wherein the transaction comprises a first operation and a second operation;

a sequence number determination module 802 configured to determine a first sequence number and determine a second sequence number; the sequence between the first sequence number and the second sequence number is the same as the execution sequence between the first operation and the second operation;

The log generating module 803 is configured to generate a first log corresponding to the first operation and a second log corresponding to the second operation;

the log writing module 804 is configured to carry a first sequence number in the first log and a second sequence number in the second log; and writing the first log and the second log into two log streams in the distributed database respectively.

In one embodiment of the present specification apparatus shown in FIG. 8, the Structured Query Language (SQL) triggering the generation of transactions includes a first SQL and a second SQL; each SQL corresponds to at least one operation in the transaction;

the first operation is any operation corresponding to the first SQL, and the second operation is any operation corresponding to the second SQL; the execution sequence between the first operation and the second operation is equal to the execution sequence between the first SQL and the second SQL;

a sequence number determination module 802 configured to perform:

generating a first statement sequence number corresponding to a first SQL unique in the transaction, and generating a second statement sequence number corresponding to a second SQL unique in the transaction; the sequence between the first statement sequence number and the second statement sequence number is the same as the execution sequence between the first SQL and the second SQL; and

In one embodiment of the apparatus of the present specification shown in figure 8,

triggering generation of a third Structured Query Language (SQL) of the transaction, wherein the SQL comprises a third SQL corresponding to an update operation;

the transaction execution module 801 is configured to: when the transaction is executed, changing the changing operation corresponding to the third SQL into the first operation and the second operation so as to move the data in the first partition table into the second partition table; wherein the first operation is a modified delete (delete) operation and the second operation is a modified insert (insert) operation;

a sequence number determination module 802 configured to perform: the first sequence number is set before and the second sequence number is set after.

the transaction execution module 801 is configured to: when executing the transaction, scheduling and executing a first operation, and then scheduling and executing a second operation;

a sequence number determination module 802 configured to perform: the first sequence number is set according to the time of executing the first operation, and the second sequence number is set according to the time of executing the second operation.

In one embodiment of the apparatus of the present specification shown in fig. 8, the first partition table and the second partition table are located in the same machine node;

a sequence number determination module 802 configured to perform: taking the local time of the machine node when executing the first operation as the first sequence number; and taking the local time of the machine node when the second operation is executed as the second sequence number.

In one embodiment of the apparatus of the present specification shown in fig. 8, the first partition table and the second partition table are located in different first machine nodes and second machine nodes, respectively;

a sequence number determination module 802 configured to perform: taking the local time of the first machine node when executing the first operation as the first sequence number; transmitting the local time when the first machine node performs the first operation to the second machine node; causing the second machine node to push up a logical time in the second machine node based on the received local time in the first machine node; and taking the logic time when the second machine node executes the second operation as the second sequence number.

In one embodiment of the apparatus of the present specification shown in fig. 8, the third SQL is included in the SQL that triggers the generation of each transaction; the first operation and the second operation changed according to each third SQL are used for moving data in one partition table to another partition table;

The transaction execution module 801 is configured to execute: scheduling the first machine node to execute all first operations changed by all third SQL in the first machine node; then, the second machine node is scheduled to execute all second operations changed by all third SQL in the node;

the sequence number determination module 802 is configured to perform:

selecting the maximum local time from all the local times;

and sending the selected maximum local time to the second machine node.

In one embodiment of the present specification, referring to fig. 9, an apparatus for processing transaction logs is provided, which is applied to a data synchronization device, and is suitable for a distributed database employing multiple log streams. The device comprises:

the log stream obtaining module 901 is configured to obtain at least two log streams from the distributed database;

the log obtaining module 902 is configured to obtain a first log and a second log from any two log streams; the first log carries a first sequence number, and the second log carries a second sequence number;

the operation sequence determining module 903 is configured to obtain an execution sequence between a first operation corresponding to the first log and a second operation corresponding to the second log according to the first sequence number carried in the first log and the second sequence number carried in the second log; the sequence between the first sequence number and the second sequence number is the same as the execution sequence between the first operation and the second operation;

The synchronization processing module 904 is configured to obtain transaction data corresponding to the transactions to which the first operation and the second operation belong according to an execution sequence between the first operation and the second operation.

In one embodiment of the apparatus of the present specification shown in fig. 9, the sequence numbers include: statement sequence number of SQL; the first operation corresponds to a different SQL than the second operation.

In one embodiment of the apparatus of the present specification shown in fig. 9, the sequence numbers include: time value.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The present description also provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the steps of the method of any of the preceding method embodiments.

From the foregoing description of embodiments, it will be apparent to those skilled in the art that the present embodiments may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or what contributes to the prior art in the form of a computer program product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.

The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the invention.

Claims

1. A method of processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the method is applied to a distributed database adopting multiple log streams;

the method comprises the following steps:

2. The method of claim 1, the method further comprising:

3. The method of claim 1, wherein,

In executing the transaction, further comprising: scheduling and executing a first operation, and scheduling and executing a second operation;

4. The method of claim 3, wherein the first partition table and the second partition table are located in the same machine node;

5. A method according to claim 3, wherein the first and second partition tables are located in different first and second machine nodes, respectively;

6. The method of claim 5, wherein each SQL that triggers generation of each transaction includes a plurality of third SQLs, and the first operation and the second operation that are changed according to each third SQL are used to move data in one partition table to another partition table;

selecting the maximum local time from all the local times;

and sending the selected maximum local time to the second machine node.

7. A method of processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the method is applied to a distributed database adopting multiple log streams;

the method comprises the following steps:

acquiring at least two log streams from the distributed database;

8. A means for processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the device is applied to a distributed database adopting multiple log streams; one data table in the distributed database comprises a first partition table and a second partition table;

the device comprises:

9. A means for processing a transaction log, wherein the transaction is: a database operation sequence for accessing and/or operating on data; the device is applied to a distributed database adopting multiple log streams; one data table in the distributed database comprises a first partition table and a second partition table;

the device comprises:

10. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1 to 7.