WO2018157602A1 - 一种同步活动事务表的方法及装置 - Google Patents
一种同步活动事务表的方法及装置 Download PDFInfo
- Publication number
- WO2018157602A1 WO2018157602A1 PCT/CN2017/105561 CN2017105561W WO2018157602A1 WO 2018157602 A1 WO2018157602 A1 WO 2018157602A1 CN 2017105561 W CN2017105561 W CN 2017105561W WO 2018157602 A1 WO2018157602 A1 WO 2018157602A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- transaction
- log
- transaction table
- node
- active
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/273—Asynchronous replication or reconciliation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
Definitions
- the present application relates to the field of database technologies, and in particular, to a method and an apparatus for synchronizing an active transaction table.
- Transaction is a sequence of database operations defined by the user in the database system. These operations are either all executed or not executed. They are an inseparable unit of work.
- the Active Transaction List is used to record the transaction number of the transaction that has not been committed.
- the active transaction table needs to be synchronized between the nodes in the distributed database cluster to ensure the consistency of the database.
- the node performing the transaction sends its own active transaction table to other nodes in the database cluster when the group commits, so that the nodes synchronize their own active transaction table with the active transaction table of the node that executes the transaction.
- nodes usually have a large number of concurrent transactions, which makes the active transaction table of the node become very large.
- Synchronizing the active transaction table through the network not only consumes a large amount of transmission resources, but also the transmission of the active transaction table takes a long time, resulting in active transactions between nodes.
- the synchronization of the table has a large delay and reduces the efficiency of the database system.
- the present invention provides a method and an apparatus for synchronizing an active transaction table, which solves the problem that the data transmission amount between the inter-node synchronization active transaction table is large and the transmission takes a long time.
- the present application provides a method for synchronizing an active transaction table, which can apply a cluster database system including a primary node and a plurality of standby nodes, and can also be applied to include a cooperative node and A clustered database system with multiple write architectures for multiple data nodes.
- the method may be performed by a master node in a cluster database system of a primary multi-standby architecture, or by a collaboration node in a cluster database system of a multi-write architecture.
- the first node (the above-mentioned master node or the collaboration node) records the transaction table increment log after the last synchronization active transaction table in the transaction table delta log buffer, and the transaction table increment log is used to indicate
- the change of the transaction recorded in the active transaction table of the first node includes a new transaction log indicating a new transaction in the active transaction table and a commit transaction log indicating deletion of the transaction from the active transaction table.
- the active transaction table is used to record transactions that have not yet been committed.
- the first node sends a transaction table increment log recorded in the transaction table delta log buffer to the second node other than itself in the database cluster, the first The number of the two nodes may be one or more than one.
- the second node When the first node is the foregoing primary node, the second node is the standby node; when the first node is the cooperative node, the second node is the foregoing data node. .
- the second node After receiving the transaction table increment log sent by the first node, the second node updates the locally saved active transaction table according to the transaction table increment log.
- the first node sends the transaction table increment log when the group commits to the second node, and the second node may update its active transaction table to the active transaction table with the first node according to the transaction table increment log. Consistent. Since the number of transaction table delta logs at the time of group commit is usually much smaller than the number of active transactions in the active transaction table of the first node, the size of the transaction table delta log is much smaller than the size of the entire active transaction table, and is increased by the transport transaction table. The quantity log realizes the synchronization of the active transaction table, which can reduce the occupied transmission resources, and can reduce the transmission time and reduce the active transaction table. The delay of the step.
- the first node configures the transaction table delta log buffer to be protected by a redo log lock, and the first node obtains a redo log lock for the newly added transaction and writes the redo log.
- the process of writing a new transaction log does not incur additional lock overhead.
- the first node can also use the redo log lock to lock the transaction table increment log buffer during the process of obtaining the redo log lock for the commit transaction and writing to the redo log, and write the commit in the transaction table increment log buffer.
- the transaction log the process by which the first node writes the commit transaction log in the transaction table delta log buffer also does not incur additional lock overhead.
- the existing redo log lock is used, and no additional lock overhead is generated, which can effectively reduce the lock permission robbing when the transaction table increment log is recorded, and improve the transaction. Throughput.
- the first node when the transaction table delta log buffer is protected by the redo redo log lock, the first node first obtains the redo log lock, and locks the transaction table incremental log buffer. Zone, copy the transaction table delta log in the transaction table delta log buffer to a buffer-free buffer in memory, and then send the transaction table delta log in the lock-free protected buffer to the second node , wherein the process of sending the transaction table delta log from the lock-free protected buffer to the second node does not occupy the redo log lock.
- the occupation of the redo log lock when the transaction table increment log is sent can be effectively reduced, and the transaction throughput is improved.
- the first node resets the transaction after copying the transaction table delta log in the transaction table delta log buffer according to the redo log lock to the lockless protected buffer.
- the table increments the log buffer, so that the first node can record the transaction table increment log of the newly added transaction and the transaction table increment log of the commit transaction in the transaction table delta log buffer in time to improve the transaction throughput. And the efficiency of transaction processing.
- the first node before the first node sends the transaction table increment log to the at least one second node, deleting the record for the same transaction from the transaction table increment log Add transaction logs and commit transaction logs.
- the first node deletes the newly added transaction log for the same transaction and submits the transaction log, which does not affect the synchronization of the active transaction table between nodes, and can significantly reduce the amount of log transmission, reduce the consumption of transmission resources, and shorten the synchronization of transaction tables. time consuming.
- the first node may not retrieve the new record for the same transaction. Increase transaction logs and commit transaction logs to reduce the consumption of computing resources.
- the first node retrieves the newly added transaction log for the same transaction and submits the transaction log, and deletes the same transaction. The newly added transaction log is recorded and the transaction log is submitted to reduce the amount of log transmission.
- the first node after the second node joins the database cluster where the first node is located, the first node sends its current active transaction table to the first node, so that the second node saves the second node.
- the active transaction table implements active transaction table initialization of the second node.
- the present application provides a method for synchronizing an active transaction table, which can apply a cluster database system including a primary node and a plurality of standby nodes, and can also be applied to include a cooperative node and A clustered database system with multiple write architectures for multiple data nodes.
- the method may be performed by a standby node in a cluster database system of a primary multiple standby architecture, or by a data node in a cluster database system of a multi-write architecture.
- the second node receives the first node (when the second node is the standby node, the first node is the master node, and when the second node is the foregoing data node, Transaction table sent by a node for the collaboration node) a transaction log increment log after the last synchronized active transaction table, the transaction table delta log being used to represent a change in the transaction recorded in the active transaction table of the first node, including representation in the activity a new transaction log of the newly added transaction in the transaction table and a commit transaction log indicating deletion of the transaction from the active transaction table, the active transaction table for recording a transaction that has not been committed; the second node according to the transaction table The incremental log updates the local active transaction table.
- the first node sends the transaction table increment log when the group commits to the second node, and the second node may update its active transaction table to the active transaction table with the first node according to the transaction table increment log. Consistent. Since the number of transaction table delta logs at the time of group commit is usually much smaller than the number of active transactions in the active transaction table of the first node, the size of the transaction table delta log is much smaller than the size of the entire active transaction table, and is increased by the transport transaction table. The quantity log realizes the synchronization of the active transaction table, can reduce the occupied transmission resources, and can reduce the transmission time and reduce the delay of the active transaction table synchronization.
- the second node updates the local active transaction table according to the transaction table delta log, if the transaction table increment log includes new Adding a log of the first transaction and not including a log submitting the first transaction, the second node adds the first transaction in the active transaction table; if the active transaction table includes a second transaction And the transaction table increment log includes a log for submitting the second transaction, and the second node deletes the second transaction from the active transaction table.
- the present application provides an apparatus for synchronizing an active transaction table for performing the method of any of the above-described first aspect or any possible implementation of the first aspect.
- the apparatus comprises means for performing the method of any of the above-described first aspect or any of the possible implementations of the first aspect.
- the present application provides an apparatus for synchronizing an active transaction table for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.
- the apparatus comprises means for performing the method of any of the possible implementations of the second aspect or the second aspect described above.
- the present application provides an apparatus for synchronizing an active transaction table for performing the method of any of the above aspects or any of the possible implementations of the first aspect.
- the device comprises: a processor, a memory and a communication interface, the memory comprises a transaction table increment log buffer, the transaction table increment log buffer is used for recording the transaction table increment log; and the communication interface is used for sending the transaction table to the second node The incremental log; the processor is respectively connected to the memory and the communication interface for performing the method in any of the above first aspects or any possible implementation of the first aspect through a memory or a communication interface.
- the present application provides an apparatus for synchronizing an active transaction table for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.
- the device includes: a processor, a memory, and a communication interface, wherein the memory is used to store an active transaction table; the communication interface is configured to receive a transaction table increment log sent by the first node; and the processor is respectively connected to the memory and the communication interface, and is used for The method of any of the above-described second aspect or any of the possible implementations of the second aspect is performed by a memory, a communication interface.
- the present application further provides a computer readable storage medium, configured to store computer software instructions for performing the functions of any of the above first aspect, the first aspect, including A program designed by any one of the methods of designing the first aspect.
- the present application further provides a computer readable storage medium, configured to store computer software instructions for performing the functions of any of the above second aspect and the second aspect, including The program designed by the two methods of any two aspects of the first aspect.
- Figure 1 is a schematic diagram of the architecture of a database system
- 2a is a schematic diagram of a cluster database system of a shared disk architecture
- 2b is a schematic diagram of a cluster database system without a shared disk architecture
- 2c is a schematic diagram of a cluster database system of a multi-write architecture
- FIG. 3 is a schematic flowchart of a method for synchronizing an active transaction table provided by the present application
- 4a is another schematic flowchart of a method for synchronizing an active transaction table provided by the present application.
- FIG. 4b is still another schematic flowchart of a method for synchronizing an active transaction table provided by the present application
- FIG. 5 is a schematic diagram of a transaction log increment log structure in the present application.
- 6a to 6e are schematic diagrams showing a process of synchronizing an active transaction table in the present application.
- FIG. 7 is a schematic structural diagram of an apparatus for synchronizing an active transaction table provided by the present application.
- FIG. 8 is another schematic structural diagram of an apparatus for synchronizing an active transaction table provided by the present application.
- FIG. 9 is a schematic structural diagram of another apparatus of a synchronous active transaction table provided by the present application.
- the present invention provides a method and an apparatus for synchronizing an active transaction table, which solves the problem that the synchronization active transaction table between nodes existing in the prior art takes a long time.
- the method and the device are based on the same inventive concept. Since the principles of the method and the device for solving the problem are similar, the implementation of the device and the method can be referred to each other, and the repeated description is not repeated.
- the plurality referred to in the present application means two or more.
- the terms “first”, “second” and the like are used only to distinguish the purpose of description, and are not to be understood as indicating or implying relative importance, nor as an indication. Or suggest the order.
- the architecture of the database system is as shown in FIG. 1.
- the database system includes a database 11 and a database management system (DBMS) 12.
- DBMS database management system
- the database 11 refers to an organized data set stored in a data store for a long time, that is, an associated data set organized, stored, and used according to a certain data model.
- the database 11 may include one or more. Table data.
- the DBMS 12 is used to establish, use, and maintain the database 11, and to perform unified management and control of the database 11 to ensure the security and integrity of the database 11.
- the user can access the data in the database 11 through the DBMS 12, and the database administrator also performs database maintenance through the DBMS 12.
- DBMS 12 provides a variety of functions that allow multiple applications and user devices to use different methods to create, modify, and query databases at the same time or at different times. Applications and user devices can be collectively referred to as clients.
- the functions provided by DBMS12 can include the following items: (1) data definition function, DBMS12 provides data definition language (DDL) to define database structure, DDL is used to describe database framework, and can be saved in data dictionary.
- DDL data definition language
- DBMS12 provides Data Manipulation Language (DML) to realize basic access operations on database data, such as retrieval, insertion, modification and deletion;
- Database operation management function, DBMS12 Provide data control functions, that is, data security, integrity and concurrency control to effectively control and manage database operations to ensure data is correct and effective;
- database establishment and maintenance functions including database initial data loading Into, database dump, recovery, reorganization, system performance monitoring, analysis and other functions;
- database transmission, DBMS12 Provides the transfer of processing data to enable communication between the client and the DBMS 12, usually in coordination with the operating system.
- FIG. 2a is a schematic diagram of a cluster database system adopting a shared-storage architecture, including multiple nodes (such as the primary node and the standby node 1-3 in FIG. 2a), and each node is deployed with a database management system.
- the user is provided with services such as querying and modifying the database, and the plurality of database management systems store the shared data in the shared data storage, and perform read and write operations on the data in the data storage through the switch.
- the shared data storage can be a shared disk array.
- a node in a cluster database system may be a physical machine, such as a database server.
- the database server may include multiple processors, and all processors share resources, such as a bus, a memory, and an I/O system.
- the function of the database management system may be one. Or multiple processors execute in-memory programs to implement.
- a node in a clustered database system can also be a virtual machine running on an abstract hardware resource. If the node is a physical machine, the switch is a Storage Area Network (SAN) switch, an Ethernet switch, a fiber switch, or other physical switching device. If the node is a virtual machine, the switch is a virtual switch.
- SAN Storage Area Network
- FIG 2b is a schematic diagram of a cluster database system using a shared-nothing architecture.
- Each node has its own unique hardware resources (such as data storage), an operating system, and a database.
- the nodes communicate through the network. Under this system, the data will be distributed to each node according to the database model and application characteristics.
- the query task will be divided into several parts, executed in parallel on all nodes, and coordinated with each other to provide database services as a whole. All communication functions are in the same way.
- the nodes can be either physical or virtual machines.
- a node may be used as a master node for performing update operations on the database, such as inserting, modifying, and deleting data.
- Nodes other than the primary node (such as nodes 1-3) in the cluster database system are used as backup nodes to implement data read operations in the database.
- This system is also called a primary and multiple standby database system.
- 2c is a schematic diagram of a cluster database system using a multi-write architecture, the system includes a Coordinator Node and a Data Node, a collaboration node and a data node share a disk, and the data node is used to implement data access functions and collaboration. Nodes are used to manage global lock resources and to assign transaction numbers to data nodes.
- the data store of the database system includes, but is not limited to, a solid state drive (SSD), a disk array, or other type of non-transitory computer readable medium.
- SSD solid state drive
- the database is not shown in Figures 2a-2c, it should be understood that the database is stored in a data store.
- a database system may include fewer or more components than those shown in Figures 2a-2c, or include components other than those shown in Figures 2a-2c, Figures 2a-2c only Components that are more relevant to the implementations disclosed by embodiments of the present invention are shown.
- a cluster database system can include any number of nodes.
- the database management system functions of each node may be implemented by appropriate combinations of software, hardware, and/or firmware running on each node, respectively.
- FIG. 2a shows the cluster database system of the Shared-storage architecture shown in FIG. 2a, according to the teachings of the embodiments of the present invention.
- Figure 2b shows the cluster database system of the Shared-nothing architecture and any of the database systems of the clustered database system of the multi-write architecture shown in Figure 2c, or other types of database systems.
- the DBMS 12 when the DBMS 12 adds a transaction, the transaction number of the newly added transaction is added in the active transaction table, and when the transaction is committed, the DBMS 12 deletes the transaction number of the submitted transaction in the active transaction table, so that the current activity is active. Transactions (that is, transactions that have not yet been committed) are recorded in the active transaction table, and the active transaction table only records the current Active business. Moreover, DBMS12 records the redo log (redo log) to record changes to the database when adding transactions and committing transactions.
- redo log redo log
- the DBMS12 commits a transaction, in order to avoid random writes of the disk pages, it is only necessary to ensure that the redo logs of the transaction are written to the disk, so that the random write of the page can be replaced by the sequential write of the redo log, and the persistence of the transaction can be ensured.
- the performance of the database system in order to reduce frequent disk input/output (I/O) operations, the DBMS 12 merges the operations of writing redo logs of multiple transactions into a disk, and the DBMS 12 writes multiple redo logs to the disk at one time. The action is called a group commit.
- the DBMS 12 may be located in the database server.
- the database server may be a primary node or a standby node as described in FIG. 2a or FIG. 2b, where the primary node is used to implement data update operations, and the standby node is used to implement Read operations on data.
- the DBMS 12 can also be applied to the collaboration node or the data node shown in FIG. 2c.
- the data node When the data node newly creates a transaction, the data node requests the transaction node and the redo log lock, and the cooperation node responds to the request of the data node to allocate the transaction number to the data node. And the redo log lock, and add the transaction number of the newly added transaction in the saved active transaction table.
- the collaboration node When a data node commits a transaction, the collaboration node deletes the committed transaction in the saved active transaction table.
- the data node When the data node performs group submission, it requests the redo log lock from the collaboration node, and after obtaining the redo log lock, writes the redo log of the submitted multiple transactions to the disk.
- FIG. 3 is a schematic flowchart of a method for synchronizing an active transaction table, including:
- Step 201 After the standby node joins the database cluster where the primary node is located, the primary node sends its current active transaction table to the standby node.
- Step 202 The standby node saves the active transaction table.
- Steps 201 to 202 describe an implementation of the active transaction table initialization of the standby node. It should be noted that steps 201 to 202 are introduced for the integrity of the solution, and are not implemented to implement the embodiments of the present invention. The necessary steps. For example, when the database cluster is initialized, there is no active transaction on the primary node, and the active transaction table on the primary node is an empty table. The primary node may not send its own active transaction table to the standby node.
- Step 203 The primary node records the transaction table increment log after the last synchronous active transaction table in the transaction table delta log buffer.
- the active transaction table is used to record transactions that have not yet been committed.
- the transaction table increment log is used to represent changes in transactions recorded in the active transaction table of the first node, including a new transaction log indicating the addition of transactions in the active transaction table, and a representation.
- the commit transaction log for the transaction is deleted from the active transaction table.
- the new transaction log can include a field characterizing the transaction status type as new (or open) and the transaction number of the transaction.
- the commit transaction log can include a field characterizing the transaction status type as a commit and the transaction number of the transaction.
- Step 204 When the group commits the transaction recorded in the commit transaction log, the master node sends the transaction table increment log to the standby node.
- the number of standby nodes may be one or more.
- the primary node may send the transaction table increment log of the primary node to the multiple standby nodes when the primary node performs group submission.
- Step 205 The standby node updates the locally saved active transaction table according to the received transaction table increment log.
- the standby node updates the active transaction table as follows:
- the standby node deletes the transaction a from the active transaction table. If the transaction table delta log includes the log of the new transaction b, and does not include the log of the transaction b, it is determined that the transaction b is active. The standby node joins transaction b in the active transaction table. If the transaction table increment log includes both the log of the new transaction c and the log of the commit transaction c, it is determined that the transaction c is inactive and the transaction c is not added in the active transaction table.
- the active transaction table on the standby node is synchronized with the active transaction table on the primary node, and the implementation manner may be the foregoing steps 201 to 202, or
- the standby node updates the active transaction table according to the transaction table increment log sent by the primary node. Therefore, the master node sends the transaction table increment log when the group commits to the standby node, and the standby node can update its active transaction table to be consistent with the active transaction table of the master node according to the transaction table increment log.
- the size of the transaction table delta log is much smaller than the size of the entire active transaction table, by transferring the transaction table increments
- the log realizes the synchronization of the active transaction table, can reduce the occupied transmission resources, and can reduce the transmission time and reduce the delay of the active transaction table synchronization.
- the method further includes:
- Step 206 The primary node resets the transaction table increment log buffer.
- the transaction table increment log buffer is reset, so that the transaction table incremental log buffer is emptied after each synchronization transaction table, thereby causing the transaction table increment log.
- the transaction table delta log recorded before the group commit (maybe called the first group commit) does not exist in the buffer, but the transaction table delta log of the new record after the group commit is stored, so as to be submitted in the next group (may be called
- a transaction table increment log indicating the active transaction table change after the first group commit is sent to the standby node, so that the standby node can update the local active transaction table according to the transaction table increment log.
- the active transaction table of the standby node is synchronized with the active transaction table of the primary node at the time of the second group commit. Therefore, through the above scheme, after each group commit of the primary node, the active node can always maintain synchronization of the active transaction table with other nodes in the database cluster.
- the transaction table delta log buffer is protected by the redo redo log lock, and the process of the main node writing the new transaction log in the transaction table delta log buffer is: the master node obtains the redo log lock. Lock the redo log buffer, write the redo log generated by the new transaction in the redo log buffer, and lock the transaction table increment log buffer, and write the new transaction log in the transaction table increment log buffer.
- the process of writing the commit transaction log in the transaction table delta log buffer is: the master node obtains the redo log lock, locks the redo log buffer, and writes the redo log generated by the commit office in the redo log buffer, and , the transaction table increment log buffer is locked, and the commit transaction log is written in the transaction table delta log buffer.
- the primary node When the primary node adds a new transaction and commits a transaction, it needs to hold the redo log lock and write the redo log in the redo log buffer.
- the transaction table increment log buffer is configured to be protected by the redo log lock, and the master node may also obtain the redo log lock for the newly added transaction and write the redo log, according to the redo log lock.
- the new transaction log is also written to the transaction table delta log buffer, so the process of writing the new transaction log by the primary node in the transaction table delta log buffer does not incur additional lock overhead.
- the master node can also write the commit transaction log in the transaction log increment log buffer according to the redo log lock, and the master node is in the transaction.
- the process of writing a commit transaction log in the table delta log buffer also does not incur additional lock overhead.
- the master node uses the existing redo log lock when recording the transaction table increment log, and does not generate additional lock overhead, and can effectively reduce the lock permission robbing when recording the transaction table increment log. , improve the throughput of the transaction.
- the process of sending the transaction table delta log to the at least one standby node by the primary node is as follows:
- the master node applies for the redo log lock. After obtaining the redo log lock, the transaction table increments the log buffer, and copies the transaction table increment log in the transaction table increment log buffer to the buffer without lock protection. Then, the master The node sends the transaction table delta log in the lock-free protected buffer to at least one standby node.
- the primary node first increments the transaction table according to the redo log lock.
- the transaction table delta log in the buffer is copied to a buffer without lock protection in the memory, and then the transaction table delta log in the lock-free protected buffer is sent to the standby node, wherein the lock-free protection is
- the process of sending a transaction table delta log to the standby node by the buffer does not occupy the redo log lock. Therefore, the above solution can effectively reduce the occupation of the redo log lock when sending the transaction table increment log, and improve the transaction throughput.
- the primary node resets the transaction table delta log buffer after copying the transaction table delta log in the transaction table delta log buffer to the lock-free protected buffer according to the redo log lock. Enables the master node to record the transaction table increment log of the newly added transaction and the transaction table increment log of the commit transaction in the transaction table delta log buffer in time to improve the transaction throughput and transaction efficiency.
- the redo log lock can be released, thereby making other transactions
- the redo log lock can be obtained in time to improve the transaction throughput.
- the method further includes:
- Step 207 The master node deletes the newly added transaction log recorded for the same transaction and submits the transaction log from the transaction table increment log.
- the transaction table increment log buffer there are both the transaction table increment log of the new transaction a and the transaction table increment log of the commit transaction a.
- the master node adds the transaction a before the group commit, indicating that the transaction a is The transaction was not established at the time of the last group commit, so transaction a does not exist in the active transaction table in the standby node.
- the master node commits the transaction a before the group commit, indicating that transaction a has been committed, transaction a is no longer an active transaction, and does not appear in the active transaction table.
- the transaction a does not exist in the active transaction table before the standby node update, and does not exist in the active transaction table.
- the transaction a has no effect on the active transaction table of the standby node.
- Step 207 can include, but is not limited to, the following embodiments:
- the master node can retrieve the new transaction log for the same transaction record and submit the transaction log in the transaction table delta log buffer at any time during the group commit, and then delete it.
- the master node retrieves the new transaction log for the same transaction and submits the transaction log before sending the transaction table increment log from the transaction table delta log buffer to the standby node when the group commits, and then submits the transaction log. Delete, and then send the transaction table increment log after the above deletion operation to the standby node.
- the master node retrieves the new record for the same transaction before copying the transaction table increment log from the transaction table delta log buffer to the lock-protected buffer mentioned in the previous embodiment according to the redo log lock.
- Transaction log and commit transaction log then delete it, then copy the transaction table delta log to the lock-free protected buffer.
- the master node retrieves the new transaction log for the same transaction and submits the transaction log before sending the transaction table increment log from the lock-free buffer to the standby node, and then deletes it, and then from the The buffer without lock protection sends the transaction table increment log after the above deletion operation to the standby node.
- the primary node deletes the newly added transaction log recorded for the same transaction and submits the transaction log, which does not affect the synchronization of the active transaction table between the nodes, and can significantly reduce the log transmission amount and reduce the transmission resource. Consumption, reducing the time spent on transaction table synchronization.
- the method further includes:
- Step 208 The master node determines whether the total size of the transaction table increment log is greater than a preset threshold. If it is greater, step 207 is performed, and after step 207 is performed, step 204 is performed; if the total size of the transaction table increment log is not greater than the preset threshold, step 204 is performed.
- the primary node Since the primary node retrieves the newly added transaction log for the same transaction and takes time to submit the transaction log, if the total size of the transaction table increment log itself is not greater than the preset threshold, the primary node may not execute the Retrieve work to reduce the consumption of computing resources. On the contrary, only when the total size of the transaction log increment log is greater than the preset threshold, when the value is a large value, the master node performs step 207, deletes the newly added transaction log for the same transaction record, and submits the transaction log, reducing the log. The amount of transmission.
- steps 201 to 208 can also be applied to the multi-write cluster database system shown in FIG. 2c.
- the above steps 201 to 204 and 206 are performed.
- Step 208 is performed by the cooperating node in the multi-written cluster database system, and the above step 205 is performed by the data node.
- the step 203 is performed by: when the data node newly creates a transaction, applying for a transaction number and a redo log lock to the collaboration node, and the collaboration node allocates the transaction number and the redo log lock to the data node, and locally saves the global active transaction table. Add the transaction number for the new transaction and log the new transaction for the transaction in the transaction table delta log buffer. When the data node submits the transaction, it applies for the redo log lock to the cooperation node. After the coordination node assigns the redo log lock to the data node, the transaction node deletes the transaction number of the newly added transaction and the transaction table increment in the locally saved global active transaction table. The log that commits the transaction is logged in the log buffer.
- the process of the step 204 is: when the data node submits, the data node applies a redo log lock to the cooperation node, and after assigning the redo log lock to the data node, the collaboration node sends the transaction table increment log to all the data nodes, so that the data node is based on the data node.
- the transaction table delta log keeps the locally saved active transaction table consistent with the global active transaction table maintained by the collaboration node.
- steps 205 to 208 is consistent with the cluster database system of the active/standby architecture shown in FIG. 2a or 2b, and is not repeated here.
- Figure 5 shows a possible implementation of the structure of the transaction table delta log in this application.
- Multiple transaction table increments in the transaction table delta log buffer A set of transaction table delta logs, a set of transaction table delta logs including metadata (Metadata) and one or more transaction table delta logs .
- the metadata for a set of transaction table delta logs includes the log header (Log Header) and the base transaction number (Base Transcation ID).
- the log header can occupy 1 byte, which is the number of the log of a set of transaction table increments; the base transaction number can occupy 8 bytes, which is used to represent the first transaction of the new transaction in the incremental log of the group transaction table.
- Each transaction table increment log includes a Record Header that characterizes the type of transaction change and an Array Index of the transaction in the transaction table.
- the record header field can occupy 1 bit, and the record header "0" is characterized. Commit the transaction, the record header "1" represents the new transaction, the index subscript can occupy (7 + 8 * N) bits, N is the maximum concurrent transaction number of the database system, and the index subscript indicates the transaction distance from the first transaction in the transaction table.
- the offset For the new transaction log, it may also include a transaction number increment field occupying 8*N bits, and the transaction number increment field indicates the transaction value of the transaction minus the difference of the underlying transaction number.
- Figure 6a shows the change of the active transaction table of the master node when the new node is newly created and the transaction is committed.
- Each box in the figure represents an active transaction, and the number in the box is the number of the active transaction, as shown in the left side of Figure 6a.
- the original active transaction table of the primary node (or the active transaction table after the last synchronization active transaction table), the right side of Figure 6a is the master of this group submission
- Figure 6b shows the transaction table delta log recorded by the primary node between the two synchronized active transaction tables.
- Figure 6c shows the transaction log increment log of the same transaction as described in step 207 of the transaction table increment log shown in Figure 6b, and the transaction table increment log formed after the transaction log is submitted, as shown in Figure 6b.
- the log of the transaction 2201 and the log of the commit transaction 2201 are deleted.
- the transaction table increment log described in Fig. 6c is represented by the structure shown in Fig. 5.
- the offset from the first transaction 1210 of the transaction table is 196.
- the meanings of the fields in FIG. 6c are as follows: Metadata field: The group transaction table increment log number is 2, and the basic transaction number is 2200. The "0" "1" field following the metadata field indicates that the transaction log is committed, and the committed transaction has an offset of 1, which determines that the committed transaction is transaction 1211 in the active transaction table on the left side of Figure 6a. Then the "1", "4" and "+0" fields indicate the new transaction log.
- the position of the newly added transaction in the transaction table is the position offset from the transaction 1210 by 4, that is, in the active transaction table on the left side of Figure 6a.
- the transaction number of the newly added transaction is the base transaction number 2200 plus 0, which is 2200.
- the subsequent fields "1", "197" and "+5" indicate the new transaction log.
- the position of the newly added transaction in the transaction table is the position offset from the transaction 1210 by 197, that is, in the active transaction table on the left side of Figure 6a.
- the transaction number of the newly added transaction is the base transaction number 2200 plus 5, which is 2205.
- Subsequent fields "0" and "196" indicate that the transaction log is submitted, and the committed transaction has an offset of 196, which determines that the committed transaction is transaction 1805 in the active transaction table on the left side of Figure 6a.
- Figure 6d shows the main node sending the transaction table delta log shown in Figure 6c to a plurality of standby nodes in the database cluster.
- FIG. 6e is a schematic diagram of any standby node updating an active transaction table saved by the standby node according to the transaction table delta log shown in FIG. 6c, wherein the original active transaction table of the standby node shown in the upper left corner of FIG. 6e
- the original active transaction table of the primary node shown on the left side of FIG. 6a is identical, and the active transaction table after the standby node update shown on the right side of FIG. 6e is also the active transaction table at the time of group submission of the primary node shown on the right side of FIG. 6a. Consistent.
- the active transaction table of the primary node and the standby node can be synchronized after each group submission.
- FIG. 7 is a schematic diagram of an apparatus 300 for synchronizing an active transaction table according to the present application, which is used to implement the function of a master node or a collaboration node in a method for synchronizing an active transaction table in the foregoing embodiment of the present application.
- the apparatus 300 includes:
- the logging module 301 is configured to record, in the transaction table delta log buffer, a transaction table delta log after the last synchronization active transaction table, where the transaction table delta log is used to indicate a change of the transaction recorded in the active transaction table of the device, Includes a new transaction log indicating the addition of a transaction in the active transaction table and a commit transaction log indicating the deletion of the transaction from the active transaction table, which is used to record transactions that have not yet been committed;
- the sending module 302 is configured to send a transaction table delta log to the at least one second node when the group commits the transaction recorded by the commit transaction log, so that the at least one second node increments the log according to the received transaction table.
- An active transaction table saved by a second node is updated.
- the transaction table delta log buffer is protected by the redo redo log lock, and the logging module 301 is used to: obtain the redo log lock, lock the transaction table increment log buffer, and increase the transaction table buffer log buffer.
- the zone adds new transaction logs and commits transaction logs.
- the sending module 302 is configured to:
- the device 300 further includes:
- the reset module 303 is configured to reset the transaction table delta log buffer after the sending module copies the transaction table delta log in the transaction table delta log buffer to the lockless protected buffer.
- the device 300 further includes:
- the first deleting module 304 is configured to delete the newly added transaction log for the same transaction record and submit the transaction log from the transaction table increment log before the sending module sends the transaction table increment log to the at least one second node.
- the device 300 further includes:
- the second deleting module 305 is configured to determine, before the sending module sends the transaction table delta log to the at least one second node, whether the total size of the transaction table delta log is greater than a preset threshold; if the total size of the transaction table delta log is greater than The preset threshold removes the new transaction log for the same transaction record and commits the transaction log from the transaction table delta log.
- the sending module 302 is further configured to: when the second node joins the database cluster where the device is located, the device sends the active transaction table of the device to the second node.
- each functional module in each embodiment of the present application may be integrated. In one processor, it may be physically present alone, or two or more modules may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the device 400 for synchronizing the active transaction table may include a processor 401, the recording module 301, the reset module 303, the first deleting module 304, and the first
- the hardware of the entity corresponding to the second deletion module 305 may be the processor 401.
- the processor 401 can be a central processing unit (CPU), or a digital processing module or the like.
- the device 400 for synchronizing the active transaction table may further include a communication interface 402.
- the hardware of the entity corresponding to the sending module 302 may be a communication interface 402.
- the device that synchronizes the active transaction table sends a transaction to other nodes in the database cluster through the communication interface 402. Table increment log.
- the apparatus 400 for synchronizing the active transaction table further includes a memory 403 for storing a program executed by the processor 401.
- the memory 403 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), or a volatile memory such as a random access memory (random). -access memory, RAM).
- Memory 403 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
- the processor 401 is configured to execute the program code stored in the memory 403, specifically for executing the method described in any of the embodiments shown in FIG. 3, FIG. 4a, and FIG. 4b.
- the method described in the embodiment shown in FIG. 3, FIG. 4a and FIG. 4b can be referred to, and the present application will not be repeated here.
- connection medium between the communication interface 402, the processor 401, and the memory 403 is not limited in the embodiment of the present application.
- the memory 403, the processor 401, and the communication interface 402 are connected by a bus 404 in FIG. 8.
- the bus is indicated by a thick line in FIG. 8, and the connection manner between other components is only schematically illustrated. , not limited to.
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 8, but it does not mean that there is only one bus or one type of bus.
- the embodiment of the present invention further provides a computer readable storage medium for storing computer software instructions required to execute the processor 401 described above, which includes a program for executing the above-mentioned processor.
- FIG. 9 is a diagram of an apparatus 500 for synchronizing an active transaction table, which is used to implement a function of a standby node or a data node in a method for synchronizing an active transaction table in the foregoing embodiment of the present application.
- the apparatus 500 includes:
- the receiving module 501 is configured to receive a transaction table increment date after the last synchronization active transaction table sent by the first node
- the transaction table increment log is used to indicate the change of the transaction recorded in the active transaction table of the first node, including the new transaction log indicating the addition of the transaction in the active transaction table and the submission indicating the deletion of the transaction from the active transaction table.
- the transaction log which is used to record transactions that have not yet been committed;
- the update module 502 is configured to update the local active transaction table according to the transaction table delta log.
- the update module 502 is specifically configured to:
- the transaction table delta log includes a log for adding the first transaction and does not include the log for submitting the first transaction, the first transaction is added to the active transaction table;
- the second transaction is included in the active transaction table and the log in the transaction table delta log includes the second transaction, the second transaction is deleted from the active transaction table.
- each functional module in each embodiment of the present application may be integrated. In one processor, it may be physically present alone, or two or more modules may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the device for synchronizing the active transaction table may include a processor, and the hardware of the entity corresponding to the update module 502 may be a processor.
- the device for synchronizing the active transaction table may further include a communication interface.
- the hardware of the entity corresponding to the receiving module 501 may be a communication interface, and the device that synchronizes the active transaction table receives the transaction table increment log sent by the primary node or the coordinated node through the communication interface.
- the apparatus 500 for synchronizing the active transaction table further includes a memory for storing a program executed by the processor. The implementation of the above processor, the communication interface, and the memory are described in the embodiment shown in FIG. 8, and are not repeated here.
- the embodiment of the invention further provides a computer readable storage medium for storing computer software instructions required to execute the above-mentioned processor, which comprises a program for executing the above-mentioned processor.
- embodiments of the present application can be provided as a method, system, or computer program product.
- the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
- the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
- These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种同步活动事务表的方法及装置,用以解决现有技术中存在节点间同步活动事务表的数据传输量较大,传输耗时较长的问题。该同步活动事务表的方法包括:第一节点在事务表增量日志缓冲区中记录上一次同步活动事务表之后的事务表增量日志,该事务表增量日志用于表示该第一节点的活动事务表中所记录事务的变化,包括表示在该活动事务表中新增事务的新增事务日志以及表示从该活动事务表中删除事务的提交事务日志;在对该提交事务日志所记录的事务进行组提交时,该第一节点向至少一个第二节点发送该事务表增量日志,以使该至少一个第二节点根据接收的该事务表增量日志对该至少一个第二节点保存的活动事务表进行更新。
Description
本申请要求于2017年2月28日提交中国专利局、申请号为201710115023.0、申请名称为“一种同步活动事务表的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及数据库技术领域,尤其涉及一种同步活动事务表的方法及装置。
事务(Transaction)是数据库系统中由用户定义的一个数据库操作序列,这些操作要么全部执行要么全不执行,是一个不可分割的工作单位。活动事务表(Active Transaction List)用于记录尚未提交事务的事务号,分布式数据库集群中的节点之间需要同步活动事务表,以保证数据库的一致性。
现有技术中,执行事务的节点在进行组提交时,将自身的活动事务表发送给数据库集群中的其它节点,使得这些节点将自身的活动事务表与执行事务的节点的活动事务表同步。
但是,节点通常存在大量的并发事务,使得节点的活动事务表变得非常大,通过网络同步活动事务表不仅消耗大量的传输资源,而且活动事务表的传输耗时较长,导致节点间活动事务表的同步存在较大的时延,降低数据库系统的效率。
发明内容
本申请提供一种同步活动事务表的方法及装置,用以解决现有技术中存在节点间同步活动事务表的数据传输量较大,传输耗时较长的问题。
第一方面,本申请提供了一种同步活动事务表的方法,该方法可以应用包括一个主节点以及多个备节点的一主多备架构的集群数据库系统,也可以应用于包括一个协作节点以及多个数据节点的多写架构的集群数据库系统。该方法可以由一主多备架构的集群数据库系统中的主节点执行,也可以由多写架构的集群数据库系统中的协作节点执行。在该方法中,第一节点(上述主节点或协作节点)在事务表增量日志缓冲区中记录上一次同步活动事务表之后的事务表增量日志,所述事务表增量日志用于表示所述第一节点的活动事务表中所记录事务的变化,包括表示在所述活动事务表中新增事务的新增事务日志以及表示从所述活动事务表中删除事务的提交事务日志,所述活动事务表用于记录尚未提交的事务。然后,在对提交事务日志所记录的事务进行组提交时,第一节点向数据库集群中除自己之外的第二节点发送事务表增量日志缓冲区中记录的事务表增量日志,该第二节点的个数可以为1个或多于1个,在第一节点为前述主节点时,第二节点为前述备节点;在第一节点为前述协作节点时,第二节点为前述数据节点。第二节点接收第一节点发送的事务表增量日志以后,根据事务表增量日志对本地保存的活动事务表进行更新。
上述技术方案中,第一节点向第二节点发送组提交时的事务表增量日志,第二节点可以根据该事务表增量日志将自身的活动事务表更新为与第一节点的活动事务表一致。由于组提交时事务表增量日志的数量通常远小于第一节点的活动事务表中活动事务的数量,所以,事务表增量日志的大小远小于整个活动事务表的大小,通过传输事务表增量日志实现活动事务表的同步,能够减小占用的传输资源,且能够减少传输耗时,减小活动事务表同
步的延时。
在第一方面的一种可选的实现方式中,第一节点将事务表增量日志缓冲区配置为由redo日志锁保护,第一节点在针对新增事务获得redo日志锁并写入redo日志的过程中,还可以使用该redo日志锁锁定事务表增量日志缓冲区,一并在事务表增量日志缓冲区中写入新增事务日志,第一节点在事务表增量日志缓冲区中写入新增事务日志的过程没有产生额外的锁开销。第一节点在针对提交事务获得redo日志锁并写入redo日志的过程中,还可以使用该redo日志锁锁定事务表增量日志缓冲区,一并在事务表增量日志缓冲区中写入提交事务日志,第一节点在事务表增量日志缓冲区中写入提交事务日志的过程也没有产生额外的锁开销。上述技术方案中,第一节点记录事务表增量日志时利用已有的redo日志锁,不会产生额外的锁开销,能够有效缓减记录事务表增量日志时的锁权限抢夺,提高事务的吞吐量。
在第一方面的一种可选的实现方式中,事务表增量日志缓冲区由重做redo日志锁保护时,第一节点先获得所述redo日志锁,锁定所述事务表增量日志缓冲区,将事务表增量日志缓冲区中的事务表增量日志拷贝至内存中无锁保护的缓冲区,然后,将该无锁保护的缓冲区中的事务表增量日志发送至第二节点,其中,从该无锁保护的缓冲区向第二节点发送事务表增量日志的过程不占用redo日志锁。通过上述方案可以有效减少发送事务表增量日志时的redo日志锁的占用,提高事务的吞吐量。
在第一方面的一种可选的实现方式中,第一节点在根据redo日志锁将事务表增量日志缓冲区中的事务表增量日志拷贝至无锁保护的缓冲区之后,重置事务表增量日志缓冲区,使得第一节点能够及时在事务表增量日志缓冲区中记录组提交之后新增事务的事务表增量日志以及提交事务的事务表增量日志,提高事务的吞吐量以及事务处理的效率。
在第一方面的一种可选的实现方式中,在所述第一节点向至少一个第二节点发送所述事务表增量日志之前,从事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。第一节点删除针对同一事务所记录的新增事务日志以及提交事务日志不会对节点间活动事务表的同步造成影响,而且能够显著减少日志传输量,减少传输资源的消耗,缩短事务表同步的耗时。
在第一方面的一种可选的实现方式中,如果事务表增量日志的总大小本身不大于预设阈值,为一较小值,第一节点可以不点检索针对同一事务所记录的新增事务日志以及提交事务日志,减少计算资源的消耗。反之,只有在该事务表增量日志的总大小大于预设阈值,为一较大值时,第一节点才点检索针对同一事务所记录的新增事务日志以及提交事务日志,删除针对同一事务所记录的新增事务日志以及提交事务日志,减少日志传输量。
在第一方面的一种可选的实现方式中,在第二节点加入第一节点所在的数据库集群后,第一节点将自身当前的活动事务表发送给第一节点,进而使第二节点保存该活动事务表,实现第二节点的活动事务表初始化。
第二方面,本申请提供了一种同步活动事务表的方法,该方法可以应用包括一个主节点以及多个备节点的一主多备架构的集群数据库系统,也可以应用于包括一个协作节点以及多个数据节点的多写架构的集群数据库系统。该方法可以由一主多备架构的集群数据库系统中的备节点执行,也可以由多写架构的集群数据库系统中的数据节点执行。在该方法中,第二节点(上述备节点或数据节点)接收第一节点(在第二节点为上述备节点时,第一节单为主节点,在第二节点为前述数据节点时,第一节点为协作节点)发送的事务表增
量日志,的上一次同步活动事务表之后的事务表增量日志,所述事务表增量日志用于表示所述第一节点的活动事务表中所记录事务的变化,包括表示在所述活动事务表中新增事务的新增事务日志以及表示从所述活动事务表中删除事务的提交事务日志,所述活动事务表用于记录尚未提交的事务;所述第二节点根据所述事务表增量日志对本地的活动事务表进行更新。
上述技术方案中,第一节点向第二节点发送组提交时的事务表增量日志,第二节点可以根据该事务表增量日志将自身的活动事务表更新为与第一节点的活动事务表一致。由于组提交时事务表增量日志的数量通常远小于第一节点的活动事务表中活动事务的数量,所以,事务表增量日志的大小远小于整个活动事务表的大小,通过传输事务表增量日志实现活动事务表的同步,能够减小占用的传输资源,且能够减少传输耗时,减小活动事务表同步的延时。
在第二方面的一种可选的实现方式中,所述第二节点根据所述事务表增量日志对本地的活动事务表进行更新的过程中,若所述事务表增量日志中包括新增第一事务的日志且未包括提交所述第一事务的日志,则所述第二节点在所述活动事务表中新增所述第一事务;若所述活动事务表中包括第二事务且所述事务表增量日志中包括提交第二事务的日志,则所述第二节点从所述活动事务表中删除所述第二事务。
第三方面,本申请提供一种同步活动事务表的装置,该装置用于执行上述第一方面或第一方面的任意可能的实现中的方法。具体的,该装置包括用于执行上述第一方面或第一方面的任意可能的实现中的方法的模块。
第四方面,本申请提供一种同步活动事务表的装置,该装置用于执行上述第二方面或第二方面的任意可能的实现中的方法。具体的,该装置包括用于执行上述第二方面或第二方面的任意可能的实现中的方法的模块。
第五方面,本申请提供一种同步活动事务表的设备,该设备用于执行上述第一方面或第一方面的任意可能的实现中的方法。该设备包括:处理器、存储器以及通信接口,存储器包括事务表增量日志缓冲区,该事务表增量日志缓冲区用于记录事务表增量日志;通信接口用于向第二节点发送事务表增量日志;处理器,分别与存储器、通信接口通信连接,用于通过存储器、通信接口执行上述第一方面或第一方面的任意可能的实现中的方法。
第六方面,本申请提供一种同步活动事务表的设备,该设备用于执行上述第二方面或第二方面的任意可能的实现中的方法。该设备包括:处理器、存储器以及通信接口,存储器用于存储活动事务表;通信接口用于接收第一节点发送的事务表增量日志;处理器,分别与存储器、通信接口通信连接,用于通过存储器、通信接口执行上述第二方面或第二方面的任意可能的实现中的方法。
第七方面,本申请还提供了一种计算机可读存储介质,用于存储为执行上述第一方面、第一方面的任意一种设计的功能所用的计算机软件指令,其包含用于执行上述第一方面、第一方面的任意一种设计的方法所设计的程序。
第八方面,本申请还提供了一种计算机可读存储介质,用于存储为执行上述第二方面、第二方面的任意一种设计的功能所用的计算机软件指令,其包含用于执行上述第二方面、第一方面的任意二种设计的方法所设计的程序。
本申请在上述各方面提供的实现的基础上,还可以进行进一步组合以提供更多实现。
图1为数据库系统的架构示意图;
图2a为共享磁盘架构的集群数据库系统示意图;
图2b为无共享磁盘架构的集群数据库系统示意图;
图2c为多写架构的集群数据库系统示意图;
图3为本申请提供的同步活动事务表的方法的流程示意图;
图4a分别为本申请提供的同步活动事务表的方法的另一流程示意图;
图4b分别为本申请提供的同步活动事务表的方法的又一流程示意图;
图5为本申请中事务表增量日志结构的示意图;
图6a至图6e为本申请中同步活动事务表的过程的示意图;
图7为本申请提供的同步活动事务表的装置的结构示意图;
图8为本申请提供的同步活动事务表的装置的另一结构示意图;
图9为本申请提供的同步活动事务表的另一装置的结构示意图。
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。
本申请提供一种同步活动事务表的方法及装置,用以解决现有技术中存在的节点之间同步活动事务表耗时较长的问题。其中,方法和装置是基于同一发明构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
本申请中所涉及的多个,是指两个或两个以上。另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
数据库系统的架构如图1所示,该数据库信系统包括数据库11和数据库管理系统(Database Management System,DBMS)12。
其中,数据库11是指长期存储在数据存储器(Data Store)中的有组织的数据集合,即按照一定的数据模型组织、存储和使用的相关联的数据集合,比如,数据库11可以包括一个或者多个表数据。
DBMS12用于建立、使用和维护数据库11,以及对数据库11进行统一的管理和控制,以保证数据库11的安全性和完整性。用户可以通过DBMS12访问数据库11中的数据,数据库管理员也通过DBMS12进行数据库的维护工作。DBMS12提供多种功能,可使多个应用程序和用户设备使用不同的方法,在同一时刻或不同时刻去建立,修改和询问数据库,应用程序和用户设备可以统称为客户端。DBMS12所提供的功能可以包括以下几项:(1)数据定义功能,DBMS12提供数据定义语言(Data Definition Language,DDL)来定义数据库结构,DDL用于刻画数据库框架,并可以被保存在数据字典中;(2)数据存取功能,DBMS12提供数据操纵语言(Data Manipulation Language,DML),实现对数据库数据的基本存取操作,比如检索、插入、修改和删除;(3)数据库运行管理功能,DBMS12提供数据控制功能,即是数据的安全性、完整性和并发控制等对数据库运行进行有效地控制和管理,以确保数据正确有效;(4)数据库的建立和维护功能,包括数据库初始数据的装入,数据库的转储、恢复、重组织,系统性能监视、分析等功能;(5)数据库的传输,DBMS12
提供处理数据的传输,实现客户端与DBMS12之间的通信,通常与操作系统协调完成。
具体地,图2a为采用共享磁盘(Shared-storage)架构的集群数据库系统示意图,包括多个节点(如图2a中的主节点以及备节点1-3),每个节点部署有数据库管理系统,分别为用户提供数据库的查询和修改等服务,多个数据库管理系统存储有共享的数据在共享数据存储器中,并且通过交换机对数据存储器中的数据执行读写操作。共享数据存储器可以为共享磁盘阵列。集群数据库系统中的节点可以为物理机,比如数据库服务器,该数据库服务器可以包括多个处理器,所有的处理器共享资源,如总线,内存和I/O系统等,数据库管理系统的功能可由一个或多个处理器执行内存中的程序来实现。集群数据库系统中的节点也可以为运行在抽象硬件资源上的虚拟机。若节点为物理机,则交换机为存储区网络(Storage Area Network,SAN)交换机、以太网交换机,光纤交换机或其它物理交换设备。若节点为虚拟机,则交换机为虚拟交换机。
图2b为采用无共享(Shared-nothing)架构的集群数据库系统示意图,每个节点具有各自独享的硬件资源(如数据存储器)、操作系统和数据库,节点之间通过网络来通信。该体系下,数据将根据数据库模型和应用特点被分配到各个节点上,查询任务将被分割成若干部分,在所有节点上并行执行,彼此协同计算,作为整体提供数据库服务,所有通信功能都在一个高宽带网络互联体系上实现。如同图2a所描述的共享磁盘架构的集群数据库系统一样,这里的节点既可以是物理机,也可以是虚拟机。
在图2a或者图2b所述的集群数据库系统中,可以由一个节点作为主节点,用于实现对数据库的更新操作,如插入、修改和删除数据。集群数据库系统中除主节点之外的节点(如节点1-3)作为备节点,用于实现对数据库中数据的读取操作,这种系统又称为一主多备的数据库系统。
图2c为采用多写架构的集群数据库系统示意图,该系统包括共享协作节点(Coordinator Node)以及数据节点(Date Node),协作节点以及数据节点共享磁盘,数据节点用于实现数据存取功能,协作节点用于管理全局的锁资源以及为数据节点分配事务号等。
在本发明所有实施例中,数据库系统的数据存储器(Data Store)包括但不限于固态硬盘(SSD)、磁盘阵列或其他类型的非瞬态计算机可读介质。图2a-2c中虽未示出数据库,应理解,数据库存储在数据存储器中。所属领域的技术人员可以理解一个数据库系统可能包括比图2a-2c中所示的部件更少或更多的组件,或者包括与图2a-2c中所示组件不同的组件,图2a-2c仅仅示出了与本发明实施例所公开的实现方式更加相关的组件。例如,虽然图2a至2c中已经描述了有限个数的节点,但所属领域的技术人员可理解成一个集群数据库系统可包含任何数量的节点。各节点的数据库管理系统功能可分别由运行在各节点上的软件、硬件和/或固件的适当组合来实现。
本领域技术人员根据本发明实施例的教导可以很清楚地理解,本发明实施例的方法应用于数据库管理系统,该数据库管理系统可应用于图2a所示的Shared-storage架构的集群数据库系统、图2b所示的Shared-nothing架构的集群数据库系统以及图2c所示的多写架构的集群数据库系统中任一数据库系统,或其它类型的数据库系统。
进一步地,参见图1,DBMS12在新增事务时,在活动事务表中添加该新增事务的事务号,DBMS12在提交事务时,在活动事务表中删除提交的事务的事务号,使得当前活动的事务(即,尚未提交的事务)均记录在活动事务表中,以及使得活动事务表只记录当前
活动的事务。并且,DBMS12在新增事务以及提交事务时,均要记录重做日志(redo log),用以记录事务对数据库的更改。DBMS12在提交事务时,为了避免磁盘页面的随机写,只需要保证事务的redo日志写入磁盘即可,这样可以通过redo日志的顺序写代替页面的随机写,并且可以保证事务的持久性,提高了数据库系统的性能。进一步地,为了减少频繁地进行磁盘输入/输出(Input/Output,I/O)操作,DBMS12将多个事务的redo日志的写入磁盘的动作合并,DBMS12一次将多个redo日志的写入磁盘的动作称为组提交。
其中,DBMS12可以位于数据库服务器中,比如,该数据库服务器具体可以为图2a或者图2b中所述的主节点或备节点,其中,主节点用于实现对数据的更新操作,备节点用于实现对数据的读取操作。DBMS12也可以应用于图2c所示的协作节点或数据节点,其中,数据节点在新建事务时,向协作节点申请事务号以及redo日志锁,协作节点响应数据节点的请求,为数据节点分配事务号以及redo日志锁,并在保存的活动事务表中添加该新增事务的事务号。数据节点在提交事务时,协作节点在保存的活动事务表中删除该提交的事务。数据节点在进行组提交时,向协作节点申请redo日志锁,并在获得redo日志锁后,将提交的多个事务的redo日志写入磁盘。
下面以一主多备的集群数据库系统为例,介绍本发明实施例提供的同步活动事务表的方法。图3为该同步活动事务表的方法的流程示意图,包括:
步骤201、在备节点加入主节点所在的数据库集群后,主节点将自身当前的活动事务表发送给备节点。
步骤202、备节点保存该活动事务表。
步骤201~步骤202介绍备节点的活动事务表初始化的一种实现方式,需要说明的是,步骤201~步骤202是为了方案的完整性而做的介绍,而并不是为实现本发明实施例所必须的步骤。例如,在数据库集群初始化时,主节点上没有活动事务,主节点上的活动事务表为空表,主节点可以不向备节点发送自身的活动事务表。
步骤203、主节点在事务表增量日志缓冲区中记录上一次同步活动事务表之后的事务表增量日志。
活动事务表用于记录尚未提交的事务,事务表增量日志用于表示第一节点的活动事务表中所记录事务的变化,包括表示在活动事务表中新增事务的新增事务日志以及表示从活动事务表中删除事务的提交事务日志。新增事务日志可以包括表征事务状态类型为新增(或开启)的字段以及事务的事务号,提交事务日志可以包括表征事务状态类型为提交的字段以及事务的事务号。主节点在新增一个事务以及提交一个事务时,均会在事务表增量日志缓冲区中记录对应的事务表增量日志。
步骤204、在对提交事务日志所记录的事务进行组提交时,主节点向备节点发送事务表增量日志。
备节点的个数可以为1个或多个,主节点可以向多个备节点发送在主节点进行组提交时主节点的事务表增量日志。
步骤205、备节点根据接收的事务表增量日志,对本地保存的活动事务表进行更新。
备节点对活动事务表进行更新包括如下方式:
若事务表增量日志中包括提交事务a的日志,且备节点本地原保存的活动事务表中包括事务a,则确定事务a已提交,备节点将事务a从活动事务表中删除。若事务表增量日志中包括新增事务b的日志,且不包括提交事务b的日志,则确定事务b处于活动状态,
备节点在活动事务表中加入事务b。若事务表增量日志中既包括新增事务c的日志,又包括提交事务c的日志,则确定事务c处于非活动状态,不在活动事务表中添加事务c。
上述技术方案中,在主节点向备节点发送事务表增量日志之前,备节点上的活动事务表与主节点上的活动事务表同步,其实现方式可以为前述步骤201~步骤202,也可以为在上一次组提交时,备节点根据主节点发送的事务表增量日志对活动事务表进行更新。因此,主节点向备节点发送组提交时的事务表增量日志,备节点可以根据该事务表增量日志将自身的活动事务表更新为与主节点的活动事务表一致。由于组提交时事务表增量日志的数量通常远小于主节点的活动事务表中活动事务的数量,所以,事务表增量日志的大小远小于整个活动事务表的大小,通过传输事务表增量日志实现活动事务表的同步,能够减小占用的传输资源,且能够减少传输耗时,减小活动事务表同步的延时。
作为一种可选的方式,在步骤204之后,还包括:
步骤206、主节点重置事务表增量日志缓冲区。
主节点在将事务表增量日志发送至备节点后,重置事务表增量日志缓冲区,使得事务表增量日志缓冲区在每次同步事务表之后被清空,进而使得事务表增量日志缓冲区中不存在组提交(不妨称为第一次组提交)之前记录的事务表增量日志,而是存储组提交之后新纪录的事务表增量日志,以便在下次组提交(不妨称为第二次组提交)时根据步骤204向备节点发送表征第一次组提交之后活动事务表变更的事务表增量日志,进而使得备节点能够根据该事务表增量日志对本地活动事务表更新,将备节点的活动事务表与主节点在第二次组提交时的活动事务表同步。因此,通过上述方案,在主节点的每次组提交后,主节点与数据库集群中其它节点之间始终能保持活动事务表的同步。
作为一种可选的方式,事务表增量日志缓冲区由重做redo日志锁保护,主节点在事务表增量日志缓冲区中写入新增事务日志的过程为:主节点获得redo日志锁,锁定redo日志缓冲区,在redo日志缓冲区中写入新建事务所产生的redo日志,并且,锁定事务表增量日志缓冲区,在事务表增量日志缓冲区中写入新增事务日志。
主节点在事务表增量日志缓冲区中写入提交事务日志的过程为:主节点获得redo日志锁,锁定redo日志缓冲区,在redo日志缓冲区中写入提交事务所产生的redo日志,并且,锁定事务表增量日志缓冲区,在事务表增量日志缓冲区中写入提交事务日志。
主节点在新增事务以及提交事务时,均需要持有redo日志锁,在redo日志缓冲区中分别写入redo日志。本申请上述技术方案中,将事务表增量日志缓冲区配置为由redo日志锁保护,主节点在针对新增事务获得redo日志锁并写入redo日志的过程中,还可以根据该redo日志锁一并在事务表增量日志缓冲区中写入新增事务日志,因此,主节点在事务表增量日志缓冲区中写入新增事务日志的过程没有产生额外的锁开销。同理,主节点在针对提交事务获得redo日志锁并写入redo日志的过程中,还可以根据该redo日志锁一并在事务表增量日志缓冲区中写入提交事务日志,主节点在事务表增量日志缓冲区中写入提交事务日志的过程也没有产生额外的锁开销。
因此,本申请提供的技术方案中,主节点记录事务表增量日志时利用已有的redo日志锁,不会产生额外的锁开销,能够有效缓减记录事务表增量日志时的锁权限抢夺,提高事务的吞吐量。
作为一种可选的方式,事务表增量日志缓冲区由重做redo日志锁保护时,主节点向至少一个备节点发送事务表增量日志的过程如下:
主节点申请redo日志锁,在获得redo日志锁后,锁定事务表增量日志缓冲区,将事务表增量日志缓冲区中的事务表增量日志拷贝至无锁保护的缓冲区,然后,主节点向至少一个备节点发送无锁保护的缓冲区中事务表增量日志。
由于直接从事务表增量日志缓冲区中向备节点发送事务表增量日志的过程需要占用redo日志锁,为了减少对redo日志锁的占用,主节点先根据redo日志锁将事务表增量日志缓冲区中的事务表增量日志拷贝至内存中无锁保护的缓冲区,然后,将该无锁保护的缓冲区中的事务表增量日志发送至备节点,其中,从该无锁保护的缓冲区向备节点发送事务表增量日志的过程不占用redo日志锁。因此,通过上述方案可以有效减少发送事务表增量日志时的redo日志锁的占用,提高事务的吞吐量。
作为一种可选的方式,主节点在根据redo日志锁将事务表增量日志缓冲区中的事务表增量日志拷贝至无锁保护的缓冲区之后,重置事务表增量日志缓冲区,使得主节点能够及时在事务表增量日志缓冲区中记录组提交之后新增事务的事务表增量日志以及提交事务的事务表增量日志,提高事务的吞吐量以及事务处理的效率。
作为一种可选的方式,主节点在根据redo日志锁将事务表增量日志缓冲区中的事务表增量日志拷贝至无锁保护的缓冲区之后,可以释放redo日志锁,进而使得其他事务能够及时获得redo日志锁,提高事务的吞吐量。
作为一种可选的方式,参照图4a,在步骤204之前,还包括:
步骤207、主节点从事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
以事务表增量日志缓冲区中既存在新增事务a的事务表增量日志又存在提交事务a的事务表增量日志为例,主节点在组提交之前新增事务a表明该事务a在上次组提交时尚未建立,因此备节点中的活动事务表中不存在事务a。主节点在组提交之前提交事务a表明事务a已经被提交,事务a不再是活动事务,不会出现在活动事务表中。因此,如果既存在新增事务a的事务表增量日志又存在提交事务a的事务表增量日志,则该事务a既不存在于备节点更新前的活动事务表之中,又不存在于备节点更新后的活动事务表之中,该事务a对备节点的活动事务表没有影响。
步骤207可以包括但不限于如下实施方式:
其一,主节点在组提交的任一时刻均可以在事务表增量日志缓冲区中检索针对同一事务所记录的新增事务日志以及提交事务日志,然后将其删除。
其二,主节点在组提交时,在从事务表增量日志缓冲区向备节点发送事务表增量日志之前,先检索针对同一事务所记录的新增事务日志以及提交事务日志,然后将其删除,然后向备节点发送上述删除操作之后的事务表增量日志。
其三,主节点在根据redo日志锁从事务表增量日志缓冲区向前面实施例提到的该无锁保护的缓冲区拷贝事务表增量日志之前,先检索针对同一事务所记录的新增事务日志以及提交事务日志,然后将其删除,然后向该无锁保护的缓冲区拷贝事务表增量日志。
其四,主节点在从该无锁保护的缓冲区向备节点发送事务表增量日志之前,先检索针对同一事务所记录的新增事务日志以及提交事务日志,然后将其删除,然后从该无锁保护的缓冲区向备节点发送上述删除操作之后的事务表增量日志。
上述技术方案中,主节点删除针对同一事务所记录的新增事务日志以及提交事务日志,不会对节点间活动事务表的同步造成影响,而且能够显著减少日志传输量,减少传输资源
的消耗,缩短事务表同步的耗时。
作为另一种可选的方式,继续参照图4b,在步骤204之前,还包括:
步骤208、主节点判断事务表增量日志的总大小是否大于预设阈值。若大于,则执行步骤207,并在执行步骤207之后执行步骤204;若事务表增量日志的总大小不大于预设阈值,则执行步骤204。
由于主节点检索针对同一事务所记录的新增事务日志以及提交事务日志需要耗费时间,如果事务表增量日志的总大小本身不大于预设阈值,为一较小值,主节点可以不执行该检索工作,减少计算资源的消耗。反之,只有在该事务表增量日志的总大小大于预设阈值,为一较大值时,主节点才执行步骤207,删除针对同一事务所记录的新增事务日志以及提交事务日志,减少日志传输量。
需要说明的是,上述步骤201至步骤208所描述的方法也可以应用在图2c所示的多写的集群数据库系统中,在多写的集群数据库系统中,上述步骤201至步骤204、步骤206至步骤208由多写的集群数据库系统中的协作节点执行,上述步骤205由数据节点执行。
其中,步骤203执行过程为:数据节点在新建事务时,向协作节点申请事务号以及redo日志锁,协作节点在向数据节点分配事务号以及redo日志锁之后,在本地保存的全局活动事务表中添加该新增事务的事务号以及在事务表增量日志缓冲区中记录新建该事务的日志。数据节点在提交事务时,向协作节点申请redo日志锁,协作节点在向数据节点分配redo日志锁之后,在本地保存的全局活动事务表中删除该新增事务的事务号以及在事务表增量日志缓冲区中记录提交该事务的日志。
步骤204执行时过程为:数据节点在组提交时,向协作节点申请redo日志锁,协作节点在给数据节点分配redo日志锁之后,向所有数据节点发送事务表增量日志,以使数据节点根据事务表增量日志将本地保存的活动事务表与协作节点保存的全局活动事务表保持一致。
图2c所示的多写架构的集群数据库系统中,步骤205至步骤208的实现方式与图2a或图2b所示的主备架构的集群数据库系统中相一致,在此不再重复。
图5所示为本申请中事务表增量日志的结构的一种可能实现。事务表增量日志缓冲区中的多个事务表增量日志称为一组事务表增量日志,一组事务表增量日志包括元数据(Metadata)以及1条或多条事务表增量日志。一组事务表增量日志的元数据包括日志头(Log Header)以及基础事务号(Base Transcation ID)。其中,日志头可以占用1个字节,为一组事务表增量日志的编号;基础事务号可以占用8个字节,用于表征该组事务表增量日志中新增事务第一个事务号,例如,原事务表中事务地最大事务号为2119,则事务表增量日志中的基础事务号为2200。每一条事务表增量日志包括表征事务变化类型的记录头(Record Header)以及事务在事务表内的索引下标(Array Index),例如,记录头字段可以占用1位,记录头“0”表征提交事务,记录头“1”表征新增事务,索引下标可以占用(7+8*N)位,N为数据库系统的最大并发事务数,索引下标表示事务距离事务表中第一个事务的偏移量。对于新增事务日志而言,还可以包括占用8*N位的事务号增量字段,事务号增量字段表示事务的事务号减去基础事务号的差值。
图6a为主节点新建事务以及提交事务时,主节点的活动事务表的变化,图中每个方框表示一个活动事务,框中的数字为活动事务的号码,其中图6a左侧所示为主节点的原活动事务表(或者为上一次同步活动事务表之后的活动事务表),图6a右侧为本次组提交时主
节点的活动事务表,其中“0”表示该位置空缺,并通过斜纹表示事务表中发生变更的位置。
图6b所示为在两次同步活动事务表之间主节点记录的事务表增量日志。
图6c所示为对图6b所示的事务表增量日志进行步骤207所述的删除同一事物的新增事务日志以及提交事务日志之后形成的事务表增量日志,如将图6b中的新增事务2201的日志以及提交事务2201的日志删除。图6c所述的事务表增量日志采用图5所示的结构进行表示。
例如,对于图6a左侧所示活动事务表中的事务1805,不妨设其距离事务表的第一个事务1210的偏移量为196。结合图5所示的事务表增量日志的结构,图6c的各字段的含义如下:元数据字段:该组事务表增量日志的编号为2,基础事务号为2200。元数据字段之后的“0”“1”字段表示提交事务日志,提交的事务的偏移量为1,即可确定提交的事务为图6a左侧活动事务表中的事务1211。再之后的“1”“4”“+0”字段表示新增事务日志,新增事务在事务表中的位置为距离事务1210偏移量为4的位置,即图6a左侧活动事务表中第一行最后一个位置,新增事务的事务号为基础事务号2200加上0,即为2200。再之后的字段“1”“197”“+5”表示新增事务日志,新增事务在事务表中的位置为距离事务1210偏移量为197的位置,即图6a左侧活动事务表中事务1805之后的第一个位置,新增事务的事务号为基础事务号2200加上5,为2205。再之后的字段“0”“196”表示提交事务日志,提交的事务的偏移量为196,即可确定提交的事务为图6a左侧活动事务表中的事务1805。
图6d所示为主节点向数据库集群中的多个备节点发送图6c所示的事务表增量日志。
图6e所示为任一备节点根据图6c所示的事务表增量日志对备节点保存的活动事务表进行更新的示意图,其中,图6e左上角所示的备节点的原活动事务表与图6a左侧所示的主节点的原活动事务表一致,图6e右侧所示的备节点更新后的活动事务表也与图6a右侧所示的主节点在组提交时的活动事务表相一致。
可见,通过本发明实施例的上述技术方案,可以使得每次组提交后主节点与备节点的活动事务表同步。
图7所示为本申请提供的一种同步活动事务表的装置300,用于实现本申请前述实施例中种同步活动事务表的方法中主节点或协作节点的功能,装置300包括:
记录模块301,用于在事务表增量日志缓冲区中记录上一次同步活动事务表之后的事务表增量日志,事务表增量日志用于表示装置的活动事务表中所记录事务的变化,包括表示在活动事务表中新增事务的新增事务日志以及表示从活动事务表中删除事务的提交事务日志,活动事务表用于记录尚未提交的事务;
发送模块302,用于在对提交事务日志所记录的事务进行组提交时,向至少一个第二节点发送事务表增量日志,以使至少一个第二节点根据接收的事务表增量日志对至少一个第二节点保存的活动事务表进行更新。
作为一种可选的方式,事务表增量日志缓冲区由重做redo日志锁保护,记录模块301用于:获得redo日志锁,锁定事务表增量日志缓冲区,在事务表增量日志缓冲区中记录新增事务日志以及提交事务日志。
作为一种可选的方式,发送模块302用于:
获得redo日志锁,锁定事务表增量日志缓冲区,将事务表增量日志缓冲区中的事务表增量日志拷贝至无锁保护的缓冲区,向至少一个第二节点发送无锁保护的缓冲区中事务表增量日志。
作为一种可选的方式,装置300还包括:
重置模块303,用于在发送模块将事务表增量日志缓冲区中的事务表增量日志拷贝至无锁保护的缓冲区之后,重置事务表增量日志缓冲区。
作为一种可选的方式,装置300还包括:
第一删除模块304,用于在发送模块向至少一个第二节点发送事务表增量日志之前,从事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
作为一种可选的方式,装置300还包括:
第二删除模块305,用于在发送模块向至少一个第二节点发送事务表增量日志之前,判断事务表增量日志的总大小是否大于预设阈值;若事务表增量日志的总大小大于预设阈值,则从事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
作为一种可选的方式,发送模块302还用于:在第二节点加入装置所在的数据库集群时,装置向第二节点发送装置的活动事务表。
本申请实施例中对装置300的模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
其中,集成的模块既可以采用硬件的形式实现时,如图8所示,同步活动事务表的装置400可以包括处理器401,上述记录模块301、重置模块303、第一删除模块304、第二删除模块305对应的实体的硬件可以为处理器401。处理器401,可以是一个中央处理模块(central processing unit,简称CPU),或者为数字处理模块等等。同步活动事务表的装置400还可以包括通信接口402,上述发送模块302对应的实体的硬件可以为通信接口402,同步活动事务表的装置通过通信接口402向所在的数据库集群中的其他节点发送事务表增量日志。该同步活动事务表的装置400还包括:存储器403,用于存储处理器401执行的程序。存储器403可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器403是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
处理器401用于执行存储器403存储的程序代码,具体用于执行图3、图4a、图4b任一所示实施例所述的方法。可以参见图3、图4a、图4b所示实施例所述的方法,本申请在此不再赘述。
本申请实施例中不限定上述通信接口402、处理器401以及存储器403之间的具体连接介质。本申请实施例在图8中以存储器403、处理器401以及通信接口402之间通过总线404连接,总线在图8中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
本发明实施例还提供了一种计算机可读存储介质,用于存储为执行上述处理器401所需执行的计算机软件指令,其包含用于执行上述处理器所需执行的程序。
图9所示为本发明实施例提供一种同步活动事务表的装置500,用于实现本申请前述实施例中种同步活动事务表的方法中备节点或数据节点的功能,装置500包括:
接收模块501,用于接收第一节点发送的上一次同步活动事务表之后的事务表增量日
志,事务表增量日志用于表示第一节点的活动事务表中所记录事务的变化,包括表示在活动事务表中新增事务的新增事务日志以及表示从活动事务表中删除事务的提交事务日志,活动事务表用于记录尚未提交的事务;
更新模块502,用于根据事务表增量日志对本地的活动事务表进行更新。
作为一种可选的方式,更新模块502具体用于:
若事务表增量日志中包括新增第一事务的日志且未包括提交第一事务的日志,则在活动事务表中新增第一事务;和/或
若活动事务表中包括第二事务且事务表增量日志中包括提交第二事务的日志,则从活动事务表中删除第二事务。
本申请实施例中对装置500的模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
其中,集成的模块既可以采用硬件的形式实现时,同步活动事务表的装置可以包括处理器,上述更新模块502对应的实体的硬件可以为处理器。同步活动事务表的装置还可以包括通信接口,上述接收模块501对应的实体的硬件可以为通信接口,同步活动事务表的装置通过通信接口接收主节点或协作节点发送的事务表增量日志。该同步活动事务表的装置500还包括:存储器,用于存储处理器执行的程序。上述处理器、通信接口以及存储器的实现方式在图8所示的实施例中已有介绍,在此不再重复。
本发明实施例还提供了一种计算机可读存储介质,用于存储为执行上述处理器所需执行的计算机软件指令,其包含用于执行上述处理器所需执行的程序。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (18)
- 一种同步活动事务表的方法,其特征在于,包括:第一节点在事务表增量日志缓冲区中记录上一次同步活动事务表之后的事务表增量日志,所述事务表增量日志用于表示所述第一节点的活动事务表中所记录事务的变化,包括表示在所述活动事务表中新增事务的新增事务日志以及表示从所述活动事务表中删除事务的提交事务日志,所述活动事务表用于记录尚未提交的事务;在对所述提交事务日志所记录的事务进行组提交时,所述第一节点向至少一个第二节点发送所述事务表增量日志,以使所述至少一个第二节点根据接收的所述事务表增量日志对所述至少一个第二节点保存的活动事务表进行更新。
- 根据权利要求1所述的方法,其特征在于,所述事务表增量日志缓冲区由重做redo日志锁保护,第一节点在事务表增量日志缓冲区中记录事务表增量日志,包括:所述第一节点获得所述redo日志锁,锁定所述事务表增量日志缓冲区,在所述事务表增量日志缓冲区中记录所述新增事务日志以及所述提交事务日志。
- 根据权利要求2所述的方法,其特征在于,所述第一节点向至少一个第二节点发送所述事务表增量日志,包括:所述第一节点获得所述redo日志锁,锁定所述事务表增量日志缓冲区,将所述事务表增量日志缓冲区中的所述事务表增量日志拷贝至无锁保护的缓冲区,向所述至少一个第二节点发送所述无锁保护的缓冲区中事务表增量日志。
- 根据权利要求3所述的方法,其特征在于,在所述第一节点将所述事务表增量日志缓冲区中的所述事务表增量日志拷贝至无锁保护的缓冲区之后,还包括:所述第一节点重置所述事务表增量日志缓冲区。
- 根据权利要求1至4任一项所述的方法,其特征在于,在所述第一节点向至少一个第二节点发送所述事务表增量日志之前,还包括:所述第一节点从所述事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
- 根据权利要求1至4任一项所述的方法,其特征在于,在所述第一节点向至少一个第二节点发送所述事务表增量日志之前,还包括:所述第一节点判断所述事务表增量日志的总大小是否大于预设阈值;若所述事务表增量日志的总大小大于所述预设阈值,则所述第一节点从所述事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
- 根据权利要求1至6任一项所述的方法,其特征在于,在所述第一节点向至少一个第二节点发送所述事务表增量日志之前,还包括:在所述第二节点加入所述第一节点所在的数据库集群时,所述第一节点向所述第二节点发送所述第一节点的活动事务表。
- 一种同步活动事务表的方法,其特征在于,包括:第二节点接收第一节点发送的上一次同步活动事务表之后的事务表增量日志,所述事务表增量日志用于表示所述第一节点的活动事务表中所记录事务的变化,包括表示在所述活动事务表中新增事务的新增事务日志以及表示从所述活动事务表中删除事务的提交事务日志,所述活动事务表用于记录尚未提交的事务;所述第二节点根据所述事务表增量日志对本地的活动事务表进行更新。
- 根据权利要求8所述的方法,其特征在于,所述第二节点根据所述事务表增量日志对本地的活动事务表进行更新,包括:若所述事务表增量日志中包括新增第一事务的日志且未包括提交所述第一事务的日志,则所述第二节点在所述活动事务表中新增所述第一事务;和/或若所述活动事务表中包括第二事务且所述事务表增量日志中包括提交第二事务的日志,则所述第二节点从所述活动事务表中删除所述第二事务。
- 一种同步活动事务表的装置,其特征在于,包括:记录模块,用于在事务表增量日志缓冲区中记录上一次同步活动事务表之后的事务表增量日志,所述事务表增量日志用于表示所述装置的活动事务表中所记录事务的变化,包括表示在所述活动事务表中新增事务的新增事务日志以及表示从所述活动事务表中删除事务的提交事务日志,所述活动事务表用于记录尚未提交的事务;发送模块,用于在对所述提交事务日志所记录的事务进行组提交时,向至少一个第二节点发送所述事务表增量日志,以使所述至少一个第二节点根据接收的所述事务表增量日志对所述至少一个第二节点保存的活动事务表进行更新。
- 根据权利要求10所述的装置,其特征在于,所述事务表增量日志缓冲区由重做redo日志锁保护,所述记录模块用于:获得所述redo日志锁,锁定所述事务表增量日志缓冲区,在所述事务表增量日志缓冲区中记录所述新增事务日志以及所述提交事务日志。
- 根据权利要求11所述的装置,其特征在于,所述发送模块用于:获得所述redo日志锁,锁定所述事务表增量日志缓冲区,将所述事务表增量日志缓冲区中的所述事务表增量日志拷贝至无锁保护的缓冲区,向所述至少一个第二节点发送所述无锁保护的缓冲区中事务表增量日志。
- 根据权利要求12所述的装置,其特征在于,还包括:重置模块,用于在所述发送模块将所述事务表增量日志缓冲区中的所述事务表增量日志拷贝至无锁保护的缓冲区之后,重置所述事务表增量日志缓冲区。
- 根据权利要求10至13任一项所述的装置,其特征在于,还包括:第一删除模块,用于在所述发送模块向至少一个第二节点发送所述事务表增量日志之前,从所述事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
- 根据权利要求10至13任一项所述的装置,其特征在于,还包括:第二删除模块,用于在所述发送模块向至少一个第二节点发送所述事务表增量日志之前,判断所述事务表增量日志的总大小是否大于预设阈值;若所述事务表增量日志的总大小大于所述预设阈值,则从所述事务表增量日志中删除针对同一事务所记录的新增事务日志以及提交事务日志。
- 根据权利要求10至15任一项所述的装置,其特征在于,所述发送模块还用于:在所述第二节点加入所述装置所在的数据库集群时,所述装置向所述第二节点发送所述装置的活动事务表。
- 一种同步活动事务表的装置,其特征在于,包括:接收模块,用于接收第一节点发送的上一次同步活动事务表之后的事务表增量日志,所述事务表增量日志用于表示所述第一节点的活动事务表中所记录事务的变化,包括表示在所述活动事务表中新增事务的新增事务日志以及表示从所述活动事务表中删除事务的 提交事务日志,所述活动事务表用于记录尚未提交的事务;更新模块,用于根据所述事务表增量日志对本地的活动事务表进行更新。
- 根据权利要求17所述的装置,其特征在于,所述更新模块具体用于:若所述事务表增量日志中包括新增第一事务的日志且未包括提交所述第一事务的日志,则在所述活动事务表中新增所述第一事务;和/或若所述活动事务表中包括第二事务且所述事务表增量日志中包括提交第二事务的日志,则从所述活动事务表中删除所述第二事务。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17898997.6A EP3575968A4 (en) | 2017-02-28 | 2017-10-10 | METHOD AND DEVICE FOR SYNCHRONIZING ACTIVE TRANSACTION LISTS |
US16/552,833 US11442961B2 (en) | 2017-02-28 | 2019-08-27 | Active transaction list synchronization method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710115023.0 | 2017-02-28 | ||
CN201710115023.0A CN108509462B (zh) | 2017-02-28 | 2017-02-28 | 一种同步活动事务表的方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/552,833 Continuation US11442961B2 (en) | 2017-02-28 | 2019-08-27 | Active transaction list synchronization method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018157602A1 true WO2018157602A1 (zh) | 2018-09-07 |
Family
ID=63369621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/105561 WO2018157602A1 (zh) | 2017-02-28 | 2017-10-10 | 一种同步活动事务表的方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11442961B2 (zh) |
EP (1) | EP3575968A4 (zh) |
CN (1) | CN108509462B (zh) |
WO (1) | WO2018157602A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035497A (zh) * | 2020-08-31 | 2020-12-04 | 北京百度网讯科技有限公司 | 用于清理已提交的事务信息的方法和装置 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11347774B2 (en) * | 2017-08-01 | 2022-05-31 | Salesforce.Com, Inc. | High availability database through distributed store |
CN109614444B (zh) * | 2018-11-12 | 2023-05-16 | 武汉达梦数据库股份有限公司 | 一种数据同步时的数据初始化方法 |
US11249983B2 (en) * | 2019-04-02 | 2022-02-15 | International Business Machines Corporation | Transaction change data forwarding |
US11449488B2 (en) * | 2019-12-30 | 2022-09-20 | Shipbook Ltd. | System and method for processing logs |
CN113961639A (zh) * | 2020-06-22 | 2022-01-21 | 金篆信科有限责任公司 | 一种分布式事务处理方法、终端及计算机可读存储介质 |
CN112445799A (zh) * | 2020-11-19 | 2021-03-05 | 北京思特奇信息技术股份有限公司 | 一种单源多节点的数据同步方法和系统 |
US11436110B2 (en) * | 2021-02-11 | 2022-09-06 | Huawei Technologies Co., Ltd. | Distributed database remote backup |
CN114003622B (zh) * | 2021-12-30 | 2022-04-08 | 天津南大通用数据技术股份有限公司 | 一种事务型数据库之间巨大事务增量同步方法 |
US20240004897A1 (en) * | 2022-06-30 | 2024-01-04 | Amazon Technologies, Inc. | Hybrid transactional and analytical processing architecture for optimization of real-time analytical querying |
US12007983B2 (en) | 2022-06-30 | 2024-06-11 | Amazon Technologies, Inc. | Optimization of application of transactional information for a hybrid transactional and analytical processing architecture |
US12093239B2 (en) | 2022-06-30 | 2024-09-17 | Amazon Technologies, Inc. | Handshake protocol for efficient exchange of transactional information for a hybrid transactional and analytical processing architecture |
CN115629910B (zh) * | 2022-10-19 | 2023-08-15 | 星环信息科技(上海)股份有限公司 | 一种事务恢复方法、装置、数据库节点及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101488134A (zh) * | 2008-01-16 | 2009-07-22 | 诺基亚西门子通信有限责任两合公司 | 数据库系统的复制环境内的高性能修改事务的方法及系统 |
CN104967658A (zh) * | 2015-05-08 | 2015-10-07 | 成都品果科技有限公司 | 一种多终端设备上的数据同步方法 |
US20160147778A1 (en) * | 2014-11-25 | 2016-05-26 | Ivan Schreter | Applying a database transaction log record directly to a database table container |
CN105975579A (zh) * | 2016-05-05 | 2016-09-28 | 北京思特奇信息技术股份有限公司 | 一种内存数据库的主备复制方法及内存数据库系统 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5832516A (en) * | 1997-01-21 | 1998-11-03 | Oracle Corporation | Caching data in recoverable objects |
US6732137B1 (en) * | 1999-05-12 | 2004-05-04 | International Business Machines Corporation | Performance optimization for data sharing across batch sequential processes and on-line transaction processes |
US7376675B2 (en) * | 2005-02-18 | 2008-05-20 | International Business Machines Corporation | Simulating multi-user activity while maintaining original linear request order for asynchronous transactional events |
CN101251814B (zh) * | 2008-02-04 | 2010-04-07 | 浙江大学 | 一种在操作系统中实现可信恢复系统的方法 |
US8650155B2 (en) * | 2008-02-26 | 2014-02-11 | Oracle International Corporation | Apparatus and method for log based replication of distributed transactions using globally acknowledged commits |
CN101692226B (zh) * | 2009-09-25 | 2012-07-04 | 中国人民解放军国防科学技术大学 | 海量归档流数据存储方法 |
US8719515B2 (en) * | 2010-06-21 | 2014-05-06 | Microsoft Corporation | Composition of locks in software transactional memory |
US8341134B2 (en) * | 2010-12-10 | 2012-12-25 | International Business Machines Corporation | Asynchronous deletion of a range of messages processed by a parallel database replication apply process |
CN103377100B (zh) * | 2012-04-26 | 2016-12-14 | 华为技术有限公司 | 一种数据备份方法、网络节点及系统 |
CN102891849B (zh) * | 2012-09-25 | 2015-07-22 | 北京星网锐捷网络技术有限公司 | 业务数据同步方法、恢复方法及装置和网络设备 |
US9519591B2 (en) | 2013-06-22 | 2016-12-13 | Microsoft Technology Licensing, Llc | Latch-free, log-structured storage for multiple access methods |
US9280591B1 (en) * | 2013-09-20 | 2016-03-08 | Amazon Technologies, Inc. | Efficient replication of system transactions for read-only nodes of a distributed database |
CN103729442B (zh) | 2013-12-30 | 2017-11-24 | 华为技术有限公司 | 记录事务日志的方法和数据库引擎 |
CN103942252B (zh) * | 2014-03-17 | 2017-11-28 | 华为技术有限公司 | 一种恢复数据的方法及系统 |
US9779128B2 (en) * | 2014-04-10 | 2017-10-03 | Futurewei Technologies, Inc. | System and method for massively parallel processing database |
US9665280B2 (en) * | 2014-09-30 | 2017-05-30 | International Business Machines Corporation | Cache coherency verification using ordered lists |
US9864774B2 (en) * | 2015-06-23 | 2018-01-09 | International Business Machines Corporation | Granular buffering of metadata changes for journaling file systems |
-
2017
- 2017-02-28 CN CN201710115023.0A patent/CN108509462B/zh active Active
- 2017-10-10 EP EP17898997.6A patent/EP3575968A4/en active Pending
- 2017-10-10 WO PCT/CN2017/105561 patent/WO2018157602A1/zh unknown
-
2019
- 2019-08-27 US US16/552,833 patent/US11442961B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101488134A (zh) * | 2008-01-16 | 2009-07-22 | 诺基亚西门子通信有限责任两合公司 | 数据库系统的复制环境内的高性能修改事务的方法及系统 |
US20160147778A1 (en) * | 2014-11-25 | 2016-05-26 | Ivan Schreter | Applying a database transaction log record directly to a database table container |
CN104967658A (zh) * | 2015-05-08 | 2015-10-07 | 成都品果科技有限公司 | 一种多终端设备上的数据同步方法 |
CN105975579A (zh) * | 2016-05-05 | 2016-09-28 | 北京思特奇信息技术股份有限公司 | 一种内存数据库的主备复制方法及内存数据库系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3575968A4 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035497A (zh) * | 2020-08-31 | 2020-12-04 | 北京百度网讯科技有限公司 | 用于清理已提交的事务信息的方法和装置 |
CN112035497B (zh) * | 2020-08-31 | 2023-08-04 | 北京百度网讯科技有限公司 | 用于清理已提交的事务信息的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108509462A (zh) | 2018-09-07 |
EP3575968A4 (en) | 2020-04-29 |
CN108509462B (zh) | 2021-01-29 |
EP3575968A1 (en) | 2019-12-04 |
US20190384775A1 (en) | 2019-12-19 |
US11442961B2 (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018157602A1 (zh) | 一种同步活动事务表的方法及装置 | |
KR102307371B1 (ko) | 데이터베이스 시스템 내의 데이터 복제 및 데이터 장애 조치 | |
JP7263297B2 (ja) | ハイブリッドクラウド弾性スケーリングおよび高性能データ仮想化のためのリアルタイムクロスシステムデータベースレプリケーション | |
US8725951B2 (en) | Efficient flash memory-based object store | |
US9460008B1 (en) | Efficient garbage collection for a log-structured data store | |
US6925515B2 (en) | Producer/consumer locking system for efficient replication of file data | |
JP5387757B2 (ja) | 並列データ処理システム、並列データ処理方法及びプログラム | |
US11132350B2 (en) | Replicable differential store data structure | |
EP3564835B1 (en) | Data redistribution method and apparatus, and database cluster | |
US11599514B1 (en) | Transactional version sets | |
JP7549137B2 (ja) | トランザクション処理方法、システム、装置、機器、及びプログラム | |
US10885023B1 (en) | Asynchronous processing for synchronous requests in a database | |
CN110402429B (zh) | 复制用于管理基于云的资源的存储表以抵挡存储账户中断 | |
US10152493B1 (en) | Dynamic ephemeral point-in-time snapshots for consistent reads to HDFS clients | |
US11886422B1 (en) | Transactional protocol for snapshot isolation without synchronized clocks | |
US9747323B1 (en) | Method for reconstruction of a distributed lock state after a node addition or removal using a consistent hash | |
Chen et al. | Federation in cloud data management: Challenges and opportunities | |
US11221777B2 (en) | Storage system indexed using persistent metadata structures | |
US10387384B1 (en) | Method and system for semantic metadata compression in a two-tier storage system using copy-on-write | |
US11709809B1 (en) | Tree-based approach for transactionally consistent version sets | |
Li et al. | {RubbleDB}:{CPU-Efficient} Replication with {NVMe-oF} | |
US10628391B1 (en) | Method and system for reducing metadata overhead in a two-tier storage architecture | |
Saxena et al. | Concepts of HBase archetypes in big data engineering | |
WO2024022329A1 (zh) | 一种基于键值存储系统的数据管理方法及其相关设备 | |
Shu | Distributed Storage Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17898997 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2017898997 Country of ref document: EP Effective date: 20190829 |