CN116501539A

CN116501539A - Data processing method and related device

Info

Publication number: CN116501539A
Application number: CN202210061367.9A
Authority: CN
Inventors: 柴云鹏; 任波; 骆远辉; 黄人煌; 王元桢
Original assignee: Renmin University of China; Huawei Cloud Computing Technologies Co Ltd
Current assignee: Renmin University of China; Huawei Cloud Computing Technologies Co Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2023-07-28

Abstract

The application discloses a data processing method and a related device, wherein a first device acquires a transaction table, the transaction table is used for indicating the state of a transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of a tuple in the transaction; the method comprises the steps that first equipment obtains a log chain of a transaction to be recovered according to a transaction table, wherein the transaction in an undo state in the transaction table is the transaction to be recovered; the first device determines a storage address of a tuple of the transaction to be recovered according to a log chain of the transaction to be recovered so as to acquire the tuple of the transaction to be recovered; the first device deletes the log chain and tuples of the transaction to be restored. In the method, the transaction to be recovered can be queried through the transaction table, and the tuple of the transaction to be recovered is acquired according to the log chain of the transaction to be recovered, so that the first equipment does not need to be subjected to full table scanning, and the data recovery and data processing efficiency is improved.

Description

Data processing method and related device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a data processing method and a related device.

Background

The write-after log (write behind logging, WBL) is a new logging and recovery protocol designed for non-volatile memory (NVM). The key idea is to record what parts of the database have changed in data, not how the data has changed. The use of WBL requires that transactions be persisted prior to persisting the log used to track the data modified by the transaction for undoing the uncommitted transaction.

Zen's journalless policy is one implementation of WBL. In the Zen mechanism, log information is written to each tuple of the transaction, reducing write operations to the log. However, zen, in the face of a scenario where data recovery is performed, needs to scan the data of the full table in the database twice to identify which are uncommitted transactions to roll back the data.

Because the database stores more data, the data of the whole table is scanned twice, so that a large system overhead is required, and the data recovery time is too long.

Disclosure of Invention

The embodiment of the application provides a data processing method and a related device, which are used for improving the efficiency of data processing.

In a first aspect, an embodiment of the present application provides a method for processing data, where a first device obtains a transaction table, where the transaction table is used to indicate a state of a transaction and a storage address of a log chain, where each transaction corresponds to one log chain, and the log chain includes at least one chain node, and each link point indicates a storage address of a tuple in the transaction; the method comprises the steps that first equipment obtains a log chain of a transaction to be recovered according to a transaction table, wherein the transaction in an undo state in the transaction table is the transaction to be recovered; the first device determines a storage address of a tuple of the transaction to be recovered according to a log chain of the transaction to be recovered so as to acquire the tuple of the transaction to be recovered; the first device deletes the log chain and tuples of the transaction to be restored.

In the method, the transaction to be recovered can be queried through the transaction table, and the tuple of the transaction to be recovered is acquired according to the log chain of the transaction to be recovered, so that the first equipment does not need to be subjected to full table scanning, and the data recovery and data processing efficiency is improved. On the other hand, the log chain in the application does not bear the tuple information of the transaction, so that the overhead of extra writing operation in the process of using the log chain can be avoided, and the pre-written log (write ahead logging, WAL) mechanism bearing the transaction tuple in comparison with the log file has great performance advantages.

Based on the first aspect, in an alternative implementation manner, the log chain comprises a plurality of serial chain nodes, and an association relationship exists between the serial chain nodes.

Based on the first aspect, in an optional implementation manner, the storage address of the log chain is a storage address of a link head node of the log chain.

Based on the first aspect, in an optional implementation manner, the first device obtains a log chain of the transaction to be recovered according to the transaction table, including:

the first device determines that the transaction in the undo state is a transaction to be recovered according to the transaction table;

The first device determines a storage address of a log chain of the transaction to be recovered according to the transaction table;

and the first equipment acquires the log chain of the transaction to be recovered according to the storage address of the log chain of the transaction to be recovered.

Based on the first aspect, in an optional implementation manner, the method is applied to a primary and backup database system, the first device is a primary database, the second device is a backup database, the transaction table is further used for indicating the commit time of the transaction, and the method further includes:

the first device determines an incremental transaction according to the transaction table, wherein the commit time of the incremental transaction is after the latest backup time of the primary and backup database systems;

the first device acquires a log chain of the increment transaction according to the transaction table;

the first device determines a storage address of a tuple of the incremental transaction according to a log chain of the incremental transaction to obtain the tuple of the incremental transaction;

the first device synchronizes a log chain and tuples of incremental transactions to the second device.

Based on the first aspect, in an optional implementation manner, the first device obtains a log chain of incremental transactions according to a transaction table, including:

the first device determines a storage address of a log chain of the increment transaction according to the transaction table;

and the first equipment acquires the log chain of the increment transaction according to the storage address of the log chain of the increment transaction.

In a second aspect, an embodiment of the present application provides a method for processing data, where the method is applied to a primary and backup database system, a first device is a primary database, and a second device is a backup database, and the method includes:

the method comprises the steps that first equipment obtains a transaction table, wherein the transaction table is used for indicating the submitting time of a transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of one tuple in the transaction;

the first device determines an increment transaction according to the transaction table, wherein the commit time of the increment transaction is after the latest data synchronization time of the master and slave database systems;

Based on the second aspect, in an alternative embodiment, the log chain includes a plurality of serial chain nodes, and an association relationship exists between the plurality of serial chain link nodes.

Based on the second aspect, in an alternative embodiment, the storage address of the log chain is a storage address of a link head node of the log chain.

Based on the second aspect, in an optional implementation manner, the first device obtains a log chain of incremental transactions according to a transaction table, including:

In a third aspect, an embodiment of the present application provides a data processing apparatus, including:

the system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is used for acquiring a transaction table, the transaction table is used for indicating the state of a transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of one tuple in the transaction;

the acquisition unit is further used for acquiring a log chain of the transaction to be recovered according to the transaction table, wherein the transaction in the undo state in the transaction table is the transaction to be recovered;

the acquisition unit is further used for determining the storage address of the tuple of the transaction to be recovered according to the log chain of the transaction to be recovered so as to acquire the tuple of the transaction to be recovered;

and the deleting unit is used for deleting the log chain and the tuple of the transaction to be restored.

Based on the third aspect, in an alternative embodiment, the log chain includes a plurality of serial chain nodes, and an association relationship exists between the plurality of serial chain link nodes.

Based on the third aspect, in an alternative embodiment, the storage address of the log chain is a storage address of a link head node of the log chain.

Based on the third aspect, in an optional implementation manner, the acquiring unit is specifically configured to:

determining the transaction in the undo state as a transaction to be recovered according to the transaction table;

determining a storage address of a log chain of the transaction to be recovered according to the transaction table;

and acquiring the log chain of the transaction to be recovered according to the storage address of the log chain of the transaction to be recovered.

Based on the third aspect, in an alternative implementation manner, the data processing device is applied to the primary and standby database system, the data processing device is a primary database, the second device is a standby database, the transaction table is further used for indicating the commit time of the transaction, the data processing device further comprises a determining unit and a synchronizing unit,

the determining unit is used for determining an increment transaction according to the transaction table, wherein the commit time of the increment transaction is after the latest backup time of the main and standby database systems;

the acquisition unit is also used for acquiring a log chain of the increment transaction according to the transaction table;

the acquisition unit is also used for determining the storage address of the tuple of the increment transaction according to the log chain of the increment transaction so as to acquire the tuple of the increment transaction;

And the synchronization unit is used for synchronizing the log chain and the tuple of the increment transaction to the second device.

determining a storage address of a log chain of the increment transaction according to the transaction table;

and acquiring the log chain of the increment transaction according to the storage address of the log chain of the increment transaction.

In a fourth aspect, an embodiment of the present application provides a data processing apparatus, where the data processing apparatus is applied to a primary and backup database system, the data processing apparatus is a primary database, and the second device is a backup database, and the data processing apparatus includes:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a transaction table, the transaction table is used for indicating the submitting time of a transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of one tuple in the transaction;

the determining unit is used for determining an increment transaction according to the transaction table, wherein the commit time of the increment transaction is after the latest data synchronization time of the master database system and the standby database system;

the determining unit is further used for determining the storage address of the tuple of the increment transaction according to the log chain of the increment transaction so as to acquire the tuple of the increment transaction;

Based on the fourth aspect, in an alternative embodiment, the log chain includes a plurality of serial chain nodes, and an association relationship exists between the plurality of serial chain link nodes.

Based on the fourth aspect, in an alternative embodiment, the storage address of the log chain is a storage address of a link head node of the log chain.

Based on the fourth aspect, in an optional implementation manner, the acquiring unit is specifically configured to:

In a fifth aspect, embodiments of the present invention provide a computer device comprising a memory, a communication interface, and a processor coupled to the memory and the communication interface; the memory is used for storing instructions, the processor is used for executing the instructions, and the communication interface is used for communicating with other devices under the control of the processor; wherein the processor, when executing the instructions, performs the method of data processing as described in any of the above aspects.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the method of data processing according to any one of the above aspects.

In a seventh aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions which, when run on a computer, cause the computer to perform the method of data processing of any of the above aspects.

From the above technical solutions, the embodiments of the present application have the following advantages:

the application discloses a data processing method and a related device, wherein a first device acquires a transaction table, the transaction table is used for indicating the state of a transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of a tuple in the transaction; the method comprises the steps that first equipment obtains a log chain of a transaction to be recovered according to a transaction table, wherein the transaction in an undo state in the transaction table is the transaction to be recovered; the first device determines a storage address of a tuple of the transaction to be recovered according to a log chain of the transaction to be recovered so as to acquire the tuple of the transaction to be recovered; the first device deletes the log chain and tuples of the transaction to be restored. In the method, the transaction to be recovered can be queried through the transaction table, and the tuple of the transaction to be recovered is acquired according to the log chain of the transaction to be recovered, so that the first equipment does not need to be subjected to full table scanning, and the data recovery and data processing efficiency is improved. On the other hand, the log chain in the application does not bear the tuple information of the transaction, so that the overhead of extra writing operation in the process of using the log chain can be avoided, and the pre-written log (write ahead logging, WAL) mechanism bearing the transaction tuple in comparison with the log file has great performance advantages.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic diagram of a pre-write log and a post-write log in a database system;

FIG. 2 is a schematic diagram of an architecture based on a Zen system;

FIG. 3 is a schematic diagram of a logging mechanism provided herein;

FIG. 4 is a flow chart of a data processing method in the present application;

FIG. 5 is a schematic diagram of a primary and backup database system according to the present application;

FIG. 6 is a schematic diagram of the structure of a log chain in the present application;

FIG. 7 is a schematic diagram of a scenario of data recovery in an embodiment of the present application;

FIG. 8 is a schematic diagram of a data synchronization flow in the present application;

FIG. 9 is a schematic diagram of a scenario of data synchronization of a primary and a secondary database system according to an embodiment of the present application;

FIG. 10 is a schematic diagram of another scenario of data synchronization of a primary and a secondary database system according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a log chain mechanism applied to a data recovery scenario and a data synchronization scenario in an embodiment of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of another data processing apparatus according to an embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting of the invention. As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solutions provided in the embodiments of the present application are applicable to similar technical problems.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The security of the database involves various aspects, and loss or tampering of data in the database will bring about immeasurable loss, so the security of the database is particularly important. At present, data in a database is mainly analyzed through logging (logging) of the database, and then corresponding measures such as data recovery or data synchronization are taken for analysis results.

Next, several common logging mechanisms are described.

A: pre-written journaling (write ahead logging, WAL) is a common means in database systems to ensure atomicity and durability of data operations. In computer science, WAL is a series of techniques used in relational database systems to provide atomicity and durability. In database systems using WAL, all modifications are written to a log (log) prior to commit.

The log file typically includes redox information and undo information. The purpose of this can be illustrated by an example. Assuming that a program has powered down the machine during the performance of certain operations, at restart the program may need to know whether the operation being performed at the time was successful or partially successful or failed. If the WAL is used, the program may examine the log file and compare the contents of the operations performed during the sudden power down event with the contents of the operations actually performed. Based on this comparison, the program can decide whether to undo the done operation or continue to complete the done operation, or remain intact.

However, one problem with WAL technology is that log information and data information are already contained in a log file, and the same piece of data is stored in a database file and a log file, and the data is written twice, which has a problem of write amplification. Particularly in the context of a primary and backup database system, the WAL log can cause significant overhead in writing operations when the primary database device is to perform data synchronization.

B: the write-after log (write behind logging, WBL) is a new logging and recovery protocol designed for non-volatile memory (NVM) databases. The advantages of NVM are byte-addressable, high performance near memory, low gap between sequential access and random access, and the non-vanishing data stored by NVM when current is turned off.

Referring to fig. 1, fig. 1 is a schematic diagram of a pre-write log and a post-write log in a database system. As shown in fig. 1, in the WAL mechanism, the modification of each transaction to the database is sequentially written in the WAL log, the data of the transaction needs to be persisted in the corresponding log before being persisted, and the new value and the old value are recorded at the same time so that the log needs to be persisted only in a sequential writing manner on the submitted critical path; whereas the WBL mechanism differs from the WAL mechanism in that the key idea is to keep track of what parts of the database have changed in data, not how the data has changed. The use of WBL requires that transactions be persisted prior to persisting the log used to track the data modified by the transaction for undoing the uncommitted transaction.

C: the Zen system is a high throughput, log free online transaction (on line transaction processing, OLTP) engine for NVM. Zen's journalless policy is one implementation of WBL. In the Zen mechanism, log information is written to each tuple of the transaction. Referring to fig. 2, fig. 2 is a schematic diagram of a Zen-based architecture. As shown in FIG. 2, at transaction commit, the tuple in memory is written to NVM and the (dirty-bit, LP) value of the last tuple written by the transaction is updated, indicating that this transaction data has been fully persisted. When the data recovery of the database is carried out, the full table is scanned to check all the tuples, and if the LP bit of all the tuples corresponding to a certain transaction is not set to 1, the transaction is not successfully persisted. Zen updates the timestamp every time a tuple with an LP value of 1 is encountered by saving the maximum timestamp encountered by the current scan with a ts-commit variable during the scan of each region, and Zen considers that the associated transaction has been committed if the timestamp of one tuple is less than or equal to ts-commit during the scan. If the time stamp of a tuple is greater than ts-commit, then the tuple is placed in the queue waiting for the secondary scan, and after the entire region is scanned, the largest ts-commit is obtained, and then the queue waiting for the secondary scan is processed.

In summary, the above three log mechanisms have certain drawbacks. Specifically, for the WAL mechanism, the log file already contains log information and data information, and the same data is stored in the database file and the log file respectively, so that extra repeated expenditure is brought when writing operation is executed; for the WBL mechanism, after data recovery is executed, whether the currently accessed data is dirty data or not is judged in each access, and certain additional expenditure is brought; zen, in the face of a scenario for data recovery, needs to scan the data of the full table in the database twice to identify which are uncommitted transactions to roll back the data. Because the database stores more data, the data of the whole table is scanned twice, so that a large system overhead is required, and the data recovery time is too long.

In view of this, in the embodiments of the present application, a log mechanism is provided, which is applied to the method and the related device for data processing in the present application, so as to improve the efficiency of data recovery. Referring to fig. 3, fig. 3 is a schematic diagram of a log mechanism provided in the present application. As shown in fig. 3, in the Zen mechanism, log information is written to each tuple of the transaction; in the log mechanism provided by the application, the log information is expressed in the form of a log chain, each transaction corresponds to one log chain, and the tuple of the transaction and the log chain are not fused together, in other words, the tuple of the transaction and the log chain are stored in different storage addresses. Specifically, each log chain includes at least one chain node, and each tuple of a transaction has its corresponding one of the chain nodes. I.e., the log chain of transactions, each link point indicates the storage address of its corresponding tuple. Illustratively, in FIG. 3, transaction 1 has 3 tuples (data 1, data 2, and data 3), and 3 link points (link point 1, link point 2, and link point 3) are included in the log chain corresponding to transaction 1. Wherein chain node 1 indicates that data 1 is stored in table 1 at position 1, link point 2 indicates that data 2 is stored in table 1 at position 2, and link point 3 indicates that data 3 is stored in table 2 at position 1.

In the application, the storage position of the log chain is not limited, and on one hand, an independent table can be configured to intensively store all the log chains in the database, so that the log chains of all the transactions can be managed intensively; on the other hand, the log chain may be stored in a table where each tuple is located, which is not limited herein.

The data processing method can be applied to a data recovery flow of a database, and can also be applied to a data synchronization flow in a master-slave database system. Next, a data recovery procedure in the present application will be described first. Referring to fig. 4, fig. 4 is a flow chart of a data processing method in the present application. As shown in fig. 4, the method for processing data in the embodiment of the present application includes:

101. the first device obtains a transaction table.

Because the NVM has 256B read-write granularity, asymmetric read-write performance, a specific optimal concurrency number, and other advantages, in order to better implement the data processing method of the present application, it is preferable that the data processing method of the present application be applied to the NVM database.

It should be noted that, the first device in the present application may be an independent physical server database, a server cluster database formed by a plurality of physical servers, a distributed system database, or a master database in a master/slave database system, or may also be a cloud server database that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (content delivery network, CDN), basic cloud computing services such as big data or artificial intelligent platforms, and the like, which is not limited herein.

Taking the application of the data processing method in the present application to the active/standby database system as an example, please refer to fig. 5, fig. 5 is a schematic diagram of the architecture of the active/standby database system in the present application. As shown in fig. 5, the master database (equivalent to a separate physical server database) includes a free space management module, a transaction table, a log chain, and table data. Specifically, in the log mechanism provided in the present application, log information is expressed in the form of a "log chain", each transaction corresponds to one log chain, and the tuple of the transaction and the log chain are not fused together, in other words, the tuple of the transaction and the log chain are stored in different storage addresses. Specifically, each log chain includes at least one chain node, and each tuple of a transaction has its corresponding one of the chain nodes. I.e., the log chain of transactions, each link point indicates the storage address of its corresponding tuple. When there are a plurality of tuples of a transaction, namely, the same number of chain link points as the number of the tuples of the transaction are also in the log chain corresponding to the transaction, each chain link point in the same log chain is associated in a serial mode, so that when a certain chain node in a certain log chain is acquired, other chain nodes associated with the chain node in the same log chain can be traced back, and the management efficiency of the log chain is improved.

In the first device, when a transaction is created, a tuple is created for recording basic information (data) of the transaction. When a transaction is submitted, each link node of the log chain corresponding to the transaction needs to be written first, and each tuple needs to be written. After all chain nodes of the log chain of the transaction and all tuples of the transaction have completed persistence, the entire transaction commit process is completed.

Since a large number of transactions are stored in the database (first device), in order to perform centralized management and query on the log chains of the transactions, a transaction table aiming at the log chains is configured, and the state of all the transactions in the first device and the storage addresses of the log chains corresponding to the transactions are stored in the transaction table. The first device can know whether the transaction is submitted or not, or whether rollback is needed or whether data synchronization is needed by acquiring the state of the transaction in the transaction table; the first device can know where the log chains of transactions are stored by obtaining the memory addresses of the log chains. Further, the transaction table may also indicate the commit time of each transaction, etc. Because the log chain includes several chain nodes, in this application, only the storage address where the head node of each log chain is located may be used as the storage address of the log chain. All the link nodes in the same log chain are serially associated, and all the associated link points in the whole log chain can be obtained by obtaining the storage address of the link head node. Therefore, the transaction table does not need to record the storage address of each link node in the log chain, so that the storage resource is saved, and the management efficiency of the log chain is improved.

As can be seen from the above, the number of transactions stored in the database (the first device) is huge, and each transaction corresponds to one log chain, so in the present application, the log chains of the transactions may be further connected in series, and the head node of each log chain is connected with the tail of the previous log chain, so that the log chains stored in the first device are arranged clearly and are convenient to manage. For ease of understanding, please refer to fig. 6, fig. 6 is a schematic diagram of the log chain structure in the present application. As shown in FIG. 6, the transaction table of the first device indicates that two transactions, 0x001 and 0x002, respectively, have completed commit (in the committed state), the memory address of the head node of the 0x001 transaction is 0x001 and the memory address of the head node of the 0x002 transaction is 0x002. The link head node of the log chain of 0x001 transactions and the link head node of the log chain of 0x002 transactions may be obtained by the memory address of the link head node indicated by the transaction table. Further, other chain nodes serially associated with the chain head node can be obtained through the chain head node, so that a complete log chain is obtained, and further, the storage position of each tuple of the transaction is defined through each chain node in the log chain. As in FIG. 6, the log chain of the 0x001 transaction indicates that the tuple of the transaction is stored in table (Table) A at position 1a, position 1B, position 2a in Table B, and position 3a in Table C, while the log chain of the 0x002 transaction indicates that the tuple of the transaction is stored in table A at position 1C and position 2B in Table B.

The idle space management module is required to manage the log chains, after the transaction rolls back, the corresponding log chain is deleted, and the recovery of the idle space is completed by moving the chain head node of the log chain to the tail part of the idle chain where the log chain is located. The table data is then used to store the tuples of the transaction.

102. And the first equipment acquires a log chain of the transaction to be recovered according to the transaction table.

When a downtime failure occurs in a first device, its tuples often cannot fully complete persistence for a transaction that is executing persistence. To guarantee the atomicity and durability of transactions, it is necessary to rollback such interrupted transactions that fail to persist all tuples, i.e. belonging to the undo transaction. Since the states of all the transactions are recorded in the transaction table, after the first device acquires the transaction table, the transaction in the undo state in the transaction table can be screened out as the transaction to be recovered. The first device can search the storage addresses of the log chains corresponding to the to-be-recovered transactions according to the transaction table, and further, the first device obtains the log chains of the to-be-recovered transactions according to the storage addresses of the log chains of the to-be-recovered transactions.

103. The first device determines a storage address of a tuple of the transaction to be recovered according to a log chain of the transaction to be recovered, so as to acquire the tuple of the transaction to be recovered.

In this application, the memory address of one tuple of a transaction is indicated as each link point in the log chain. Thus, after the first device acquires the log chain of the transaction to be recovered, according to the indications of all the chain nodes in the log chain of the transaction to be recovered, all the tuples of the transaction to be recovered are acquired.

104. The first device deletes the log chain and tuples of the transaction to be restored.

And the first equipment deletes the log chain and the tuple of the transaction to be restored, so that the data restoration process aiming at the current time is completed.

For ease of understanding, please refer to fig. 7, fig. 7 is a schematic diagram of a scenario of data recovery in an embodiment of the present application. As shown in fig. 7, in the data recovery scenario after downtime, the transaction table indicates that the state of the transaction numbered 0x004 is "active", i.e., the transaction is identified as being in the undo state. While from the indication of the log chain of transaction 0x004, it can be determined that the tuple at position 1d in table 1 fails to complete persistence, so the log chain of the transaction and all tuples need to be deleted to complete the transaction rollback.

In the method, the transaction to be recovered can be queried through the transaction table, and the tuple of the transaction to be recovered is acquired according to the log chain of the transaction to be recovered, so that the first equipment does not need to be subjected to full table scanning, and the data recovery and data processing efficiency is improved. On the other hand, the log chain in the application does not bear the tuple information of the transaction, so that the overhead of extra writing operation in the process of using the log chain can be avoided, and compared with the WAL mechanism that the log file bears the tuple of the transaction, the WAL mechanism has great performance advantages.

Next, a description is given of a data synchronization flow in the present application. In a primary and backup database system, the primary database device is responsible for read and write operations, while the backup database device is responsible for read operations only. Thus, after the master database device completes the write operation of the transaction, the data needs to be synchronized into the slave database device. The log chain mechanism in the application can be also applied to the data synchronization flow of the main and standby database systems. The following description will take the first device as a primary database device and the second device as a standby database device as an example. Referring to fig. 8, fig. 8 is a schematic diagram of a data synchronization process in the data processing method of the present application. As shown in fig. 8, the method for data synchronization in the embodiment of the present application includes:

201. The first device obtains a transaction table.

Step 201 is similar to step 101 shown in fig. 4, and detailed description thereof will not be repeated here. It should be noted that, in order to facilitate the determination of the incremental transaction later, in the flow of data synchronization, the commit time of each transaction should be recorded in the transaction table.

202. The first device determines an incremental transaction from the transaction table.

In practical applications, the primary and secondary database systems generally need to perform data synchronization periodically, that is, the tuples of the transactions newly added in the first device are synchronized to the secondary database system periodically, and for those transactions newly added after the data synchronization is performed last time, the transactions are incremental transactions in the application, where the data synchronization is not performed yet. Because the transaction table records the commit time of each transaction, the first device can determine the transaction with the commit time after the latest data synchronization time of the primary and standby database systems as an incremental transaction according to the transaction table.

203. The first device obtains a log chain of incremental transactions from the transaction table.

Step 203 is similar to step 102 shown in fig. 4, and detailed description thereof will not be repeated here.

204. The first device determines a storage address of a tuple of the delta transaction from a log chain of delta transactions to obtain the tuple of the delta transaction.

Step 204 is similar to step 103 shown in fig. 4, and detailed description thereof will not be repeated here.

205. The first device synchronizes a log chain and tuples of incremental transactions to the second device.

Referring to fig. 9, fig. 9 is a schematic diagram of a scenario of data synchronization of a primary and a secondary database system according to an embodiment of the present application. As shown in fig. 9, the first device sends the log chain and the tuple of the incremental transaction to the second device, and the second device persists the log chain and the tuple of the incremental transaction to the local, thereby completing the data synchronization flow.

Further, referring to fig. 10, fig. 10 is another schematic diagram of a scenario of data synchronization of a primary and a secondary database system in an embodiment of the present application. As shown in fig. 10, in the present application, the number of database apparatuses participating in data synchronization is not limited, and the master database apparatus synchronizes the tuple and the log chain of the incremental transaction to a plurality of standby database apparatuses, that is, the second apparatus may refer to a plurality of standby database apparatuses.

In the WAL mechanism, the log file comprises the tuple of the transaction, so that in the process of data synchronization, the log file and the tuple of the transaction are persisted, namely the tuple of the transaction is written twice, and the cost of data synchronization is extremely high; in the WBL mechanism, the WAL is regenerated for synchronization by sweeping the table. Because the positions of transaction writing data are random, WAL logs are organized, the full table must be scanned, the cost is high, and the performance is poor; in the Zen mechanism, the logs are additionally organized in the memory, the performance is hardly affected, but after the database is down, the memory logs are lost, and the main machine and the standby machine want to be re-synchronized at great cost. In the method, the incremental transaction can be queried through the transaction table, and the tuple of the incremental transaction is acquired according to the log chain of the incremental transaction, so that the first equipment does not need to be scanned in a full table, and the efficiency of data synchronization and data processing is improved. On the other hand, the log chain in the application does not bear the tuple information of the transaction, so that the overhead of extra writing operation in the process of preparing the database persistent log chain can be avoided, and compared with a WAL mechanism that the log file bears the tuple of the transaction, the WAL mechanism has great performance advantages.

In practical application, the data recovery method and the data synchronization method can be independent of each other or can be matched with each other. That is, the database device may perform data recovery using only steps 101 to 104 shown in fig. 4, or may perform data synchronization using only steps 201 to 205 shown in fig. 8, or may integrate the above-described data recovery method and data synchronization method in the database system. For ease of understanding, referring to fig. 11, fig. 11 is a schematic diagram of an application of the log chain mechanism to a data recovery scenario and a data synchronization scenario in an embodiment of the present application. As shown in fig. 11, in the data recovery scenario and the data synchronization scenario, the same transaction table and log chain may be shared, that is, the transaction table and the log chain of the transaction may be used for data recovery or data synchronization.

In order to better implement the above-described aspects of the embodiments of the present application, the following also provides related devices for implementing the above-described aspects. Specifically, referring to fig. 12, fig. 12 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus includes:

an obtaining unit 301, configured to obtain a transaction table, where the transaction table is used to indicate a state of a transaction and a storage address of a log chain, each transaction corresponds to one log chain, and the log chain includes at least one chain node, and each link point indicates a storage address of a tuple in the transaction;

The obtaining unit 301 is further configured to obtain a log chain of the transaction to be recovered according to a transaction table, where the transaction in the undo state in the transaction table is the transaction to be recovered;

the obtaining unit 301 is further configured to determine a storage address of a tuple of the transaction to be recovered according to the log chain of the transaction to be recovered, so as to obtain the tuple of the transaction to be recovered;

a deleting unit 302, configured to delete the log chain and the tuple of the transaction to be restored.

In one possible design, the log chain includes a plurality of serial chain nodes, with an association between the plurality of serial chain link points.

In one possible design, the storage address of the log chain is the storage address of the link head node of the log chain.

In one possible design, the acquisition unit 301 is specifically configured to:

In a possible design, the data processing apparatus is applied to a primary and a secondary database system, the data processing apparatus is a primary database, the secondary device is a secondary database, the transaction table is further used to indicate the commit time of the transaction, the data processing apparatus further comprises a determining unit 303 and a synchronizing unit 304,

A determining unit 303, configured to determine an incremental transaction according to the transaction table, where a commit time of the incremental transaction is after a latest backup time of the primary and backup database systems;

the acquiring unit 301 is further configured to acquire a log chain of the incremental transaction according to the transaction table;

the obtaining unit 301 is further configured to determine a storage address of a tuple of the incremental transaction according to the log chain of the incremental transaction, so as to obtain the tuple of the incremental transaction;

a synchronization unit 304 for synchronizing the log chain and the tuples of the delta transaction to the second device.

In one possible design, the acquisition unit 301 is specifically configured to:

Referring to fig. 13, fig. 13 is a schematic structural diagram of another data processing apparatus provided in an embodiment of the present application, where the data processing apparatus is applied to a primary and backup database system, the data processing apparatus is a primary database, the second device is a backup database, and the data processing apparatus includes:

an obtaining unit 401, configured to obtain a transaction table, where the transaction table is used to indicate a commit time of a transaction and a storage address of a log chain, and each transaction corresponds to one log chain, and the log chain includes at least one chain node, and each chain link point indicates a storage address of a tuple in the transaction;

A determining unit 402, configured to determine an incremental transaction according to the transaction table, where a commit time of the incremental transaction is after a time of the latest data synchronization of the primary and backup database systems;

the acquiring unit 401 is further configured to acquire a log chain of the incremental transaction according to the transaction table;

a determining unit 402, configured to determine a storage address of a tuple of the incremental transaction according to the log chain of the incremental transaction, so as to obtain the tuple of the incremental transaction;

a synchronizing unit 403 for synchronizing the log chain and the tuples of the delta transaction to the second device.

Based on the fourth aspect, in an alternative embodiment, the obtaining unit 401 is specifically configured to:

The embodiments of the present application further provide a computer device, please refer to fig. 14, fig. 14 is a schematic structural diagram of the computer device provided in the embodiments of the present application, on which the data processing apparatus described in the corresponding embodiments of fig. 12 or fig. 13 may be disposed, for implementing the method of the corresponding embodiments of fig. 4 or fig. 8. Wherein memory 532 and storage medium 530 may be transitory or persistent. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in a computer device. Still further, the central processor 522 may be arranged to communicate with a storage medium 530 to execute a series of instruction operations in the storage medium 530 on the computer device 500.

The computer device can also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 558, and/or one or more operating systems 541, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ，Linux ^TM ，FreeBSD ^TM Etc.

Also provided in embodiments of the present application is a computer program product comprising a program product which, when run on a computer, causes the computer to perform the method as described in the embodiments of fig. 4 or 8 described above.

There is also provided in an embodiment of the present application a computer-readable storage medium having stored therein a program for performing signal processing, which when run on a computer, causes the computer to perform the method as described in the embodiment shown in fig. 4 or 8.

The image processing device provided in this embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip to perform the method described in the embodiment shown in fig. 4 or fig. 8. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.

It should be further noted that the above described embodiments of the apparatus are only schematic, where the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a training device, or a network device, etc.) to perform the method described in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims

1. A method of data processing, comprising:

the method comprises the steps that first equipment obtains a transaction table, wherein the transaction table is used for indicating the state of a transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of one tuple in the transaction;

the first device obtains a log chain of the transaction to be recovered according to the transaction table, wherein the transaction in the undo state in the transaction table is the transaction to be recovered;

the first device determines a storage address of a tuple of the to-be-recovered transaction according to a log chain of the to-be-recovered transaction so as to acquire the tuple of the to-be-recovered transaction;

the first device deletes the log chain and tuples of the transaction to be restored.

2. The method of claim 1, wherein the log chain comprises a plurality of serial chain nodes, and wherein an association exists between the plurality of serial chain link nodes.

3. The method of claim 2, wherein the storage address of the log chain is a storage address of a link head node of the log chain.

4. A method according to claim 1, 2 or 3, wherein the first device obtaining a log chain of transactions to be restored from the transaction table comprises:

the first device determines the storage address of the log chain of the transaction to be recovered according to the transaction table;

5. The method according to any one of claims 1 to 4, wherein the method is applied to a primary and backup database system, the first device is a primary database, the second device is a backup database, the transaction table is further used to indicate a commit time of a transaction, and the method further comprises:

the first device obtains a log chain of the increment transaction according to the transaction table;

the first device determines a storage address of a tuple of the increment transaction according to a log chain of the increment transaction so as to acquire the tuple of the increment transaction;

the first device synchronizes a log chain and tuples of the delta transaction to the second device.

6. The method of claim 5, wherein the first device obtaining the log chain of incremental transactions from the transaction table comprises:

the first device determines the storage address of the log chain of the increment transaction according to the transaction table;

7. A method of data processing, the method being applied to a primary and backup database system, a first device being a primary database and a second device being a backup database, the method comprising:

the first device acquires a transaction table, wherein the transaction table is used for indicating the commit time of the transaction and the storage address of a log chain, each transaction corresponds to one log chain, the log chain comprises at least one chain node, and each chain link point indicates the storage address of one tuple in the transaction;

the first device determines an incremental transaction according to a transaction table, wherein the commit time of the incremental transaction is after the latest data synchronization time of the master-slave database system;

8. The method of claim 7, wherein the log chain comprises a plurality of serial chain nodes, and wherein an association exists between the plurality of serial chain link nodes.

9. The method of claim 8, wherein the storage address of the log chain is a storage address of a link head node of the log chain.

10. The method of claim 7, 8 or 9, wherein the first device obtaining a log chain of the incremental transaction from the transaction table comprises:

11. A data processing apparatus, comprising:

an obtaining unit, configured to obtain a transaction table, where the transaction table is used to indicate a state of a transaction and a storage address of a log chain, each transaction corresponds to one log chain, the log chain includes at least one chain node, and each link point indicates a storage address of a tuple in the transaction;

The acquiring unit is further configured to acquire a log chain of a transaction to be recovered according to the transaction table, where the transaction in the undo state in the transaction table is the transaction to be recovered;

the acquisition unit is further configured to determine a storage address of a tuple of the transaction to be recovered according to the log chain of the transaction to be recovered, so as to acquire the tuple of the transaction to be recovered;

12. The data processing apparatus of claim 11, wherein the log chain comprises a plurality of serial chain nodes, and wherein an association exists between the plurality of serial chain link nodes.

13. The data processing apparatus of claim 12, wherein the memory address of the log chain is a memory address of a head node of the log chain.

14. The data processing apparatus according to claim 11, 12 or 13, wherein the acquisition unit is specifically configured to:

15. The data processing apparatus according to any one of claims 11 to 14, wherein the data processing apparatus is applied to a master-slave database system, the data processing apparatus is a master database, the second device is a slave database, the transaction table is further used to indicate a commit time of a transaction, the data processing apparatus further comprises a determining unit and a synchronizing unit,

the determining unit is used for determining an incremental transaction according to the transaction table, wherein the commit time of the incremental transaction is after the latest backup time of the primary and backup database systems;

the acquisition unit is further used for acquiring a log chain of the increment transaction according to the transaction table;

the acquisition unit is further used for determining a storage address of a tuple of the increment transaction according to the log chain of the increment transaction so as to acquire the tuple of the increment transaction;

the synchronization unit is configured to synchronize a log chain and a tuple of the incremental transaction to the second device.

16. The data processing apparatus according to claim 15, wherein the acquisition unit is specifically configured to:

17. A data processing apparatus, wherein the data processing apparatus is applied to a primary and backup database system, the data processing apparatus is a primary database, and a second device is a backup database, the data processing apparatus comprising:

an obtaining unit, configured to obtain a transaction table, where the transaction table is used to indicate a commit time of a transaction and a storage address of a log chain, each transaction corresponds to one log chain, the log chain includes at least one chain node, and each link point indicates a storage address of a tuple in the transaction;

a determining unit, configured to determine an incremental transaction according to a transaction table, where a commit time of the incremental transaction is after a time of the latest data synchronization of the primary and backup database systems;

the determining unit is further configured to determine a storage address of a tuple of the incremental transaction according to a log chain of the incremental transaction, so as to obtain the tuple of the incremental transaction;

18. The data processing apparatus of claim 17, wherein the log chain comprises a plurality of serial chain nodes, and wherein an association exists between the plurality of serial chain link nodes.

19. The data processing apparatus of claim 18, wherein the memory address of the log chain is a memory address of a head node of the log chain.

20. The data processing apparatus according to claim 17, 18 or 19, wherein the acquisition unit is specifically configured to:

21. A computer device comprising a processor and a memory, the processor being coupled to the memory,

the memory is used for storing programs;

the processor is configured to execute a program in the memory, to cause the computer device to perform the method according to any one of claims 1 to 6, or to cause the computer device to perform the method according to any one of claims 7 to 10.

22. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 6 or which, when executed by a processor, implements the method according to any one of claims 7 to 10.

23. A computer program product having computer readable instructions stored therein, which when executed by a processor, implement the method of any of claims 1 to 6 or which when executed by a processor, implement the method of any of claims 7 to 10.