CA3125546A1 - Distributed logical timestamp-based management method and system for distributed transaction - Google Patents

Distributed logical timestamp-based management method and system for distributed transaction Download PDF

Info

Publication number
CA3125546A1
CA3125546A1 CA3125546A CA3125546A CA3125546A1 CA 3125546 A1 CA3125546 A1 CA 3125546A1 CA 3125546 A CA3125546 A CA 3125546A CA 3125546 A CA3125546 A CA 3125546A CA 3125546 A1 CA3125546 A1 CA 3125546A1
Authority
CA
Canada
Prior art keywords
transaction
node
distributed
tolerance
llt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3125546A
Other languages
French (fr)
Inventor
Jianhui Xu
Yuanxi CHEN
Guoming HE
Tao Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sequoiadb Corp
Original Assignee
Sequoiadb Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sequoiadb Corp filed Critical Sequoiadb Corp
Publication of CA3125546A1 publication Critical patent/CA3125546A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Abstract

Disclosed are a distributed transaction management method and system based on a distributed logic timestamp. The method comprises: setting a transaction start time of a transaction; if the difference between a local logic time of a data node and a local logic time of a coordination node exceeds a preset threshold value, timing the local logic time of the data node, enabling the coordination node to roll back the transaction, and retrying the transaction after the time of the coordination node is timed; if the difference between a transaction pre-submission time and the transaction start time is greater than a transaction tolerance error of the transaction, sending the transaction pre-submission time and a pre-submission message of the transaction to all data nodes that participate in the transaction; and selecting, according to the difference between timestamps of two different transactions, one data node as an arbitration node to arbitrate timestamp orders of the two different transactions. By means of the present invention network overheads can be reduced while distributed storage and processing requirements are met, thereby effectively improving the overall performance of a system.

Description

Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) DISTRIBUTED LOGICAL TIMESTAMP¨BASED MANAGEMENT METHOD AND
SYSTEM FOR DISTRIBUTED TRANSACTION
TECHNICAL FIELD
[0001] The present disclosure relates to the field of database technologies, and in particular, to a distributed logical timestamp¨based management method and system for a distributed transaction.
BACKGROUND
[0002] A distributed database management system plays an extremely important role in actual application of a distributed system. The distributed database management system has all characteristics of the distributed system, and also imposes higher requirements for data storage and processing. A
database management system has been widely used since the 1980s. At the beginning, each database is deployed on a single computer, and all transactions are processed in the standalone database. As data and businesses increase rapidly, the standalone database has been unable to satisfy requirements for data storage and processing capabilities, and a distributed database system has become a preferred architecture for deployment of all kinds of applications. Distributed storage, distributed computing, and low-delay response under high concurrence in a distributed database are of great significance to other distributed systems.
[0003] In the distributed database system, a globally unique identifier (ID) usually needs to be maintained to distinguish between concurrent transactions and identify generated or changed data.
Usually, the ID is free from a single-point failure, is in chronological order or contains time, and can control data fragmentation. It cannot be set too long.
[0004] Such a globally unique ID is usually implemented based on transaction initiation time, a timestamp, and other identification bits. The timestamp is generally generated based on a clock of hardware or based on a logical clock implemented by software. When the timestamp is generated based on the clock of the hardware, characteristics of atoms and the hardware are used to ensure that a mutual error between clock devices in a long time is negligible. In this way, each computing node in a cluster can directly query a local clock, and use the local clock as a universal timestamp. However, this manner is characterized by high costs, cannot be widely applied. In actual application, the manner in which the timestamp is generated based on the logical clock implemented by the software is used more widely.
In an existing implementation, one global transaction manager (GTM) or one small GTM cluster is deployed in a computer cluster of the distributed system to generate and distribute a unified global ID
including a timestamp. All other computers in the cluster query the GTM
synchronously to obtain an ID. The ID increases progressively to ensure that a unique ID is queried each time. However, in this Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) implementation, when the system cluster is large and there are many concurrent operations, the GTM
will be heavily loaded, and network overheads will be very large, lowering overall system performance.
In addition, it is very complex to use the small GTM cluster to avoid a single-point failure, and reliability cannot be completely guaranteed.
SUMMARY
[0005] Embodiments of the present disclosure provide a distributed logical timestamp¨based management method and system for a distributed transaction, to prevent from forcing all related nodes to obtain a unique ID from a GTM. This can reduce network overheads while satisfying requirements for distributed storage and processing, thereby improving overall system performance.
[0006] To resolve the foregoing technical problems, an embodiment of the present disclosure provides a distributed logical timestamp¨based management method for a distributed transaction. A
distributed transaction database includes a coordination node, a catalog node, and a data node, and a local logical timestamp (LLT) of each node is synchronously calibrated based on a preset universal logical timestamp (ULT).
[0007] The distributed logical timestamp¨based management method for a distributed transaction includes:
[0008] when a transaction begins, setting a transaction begin timestamp (TBT) of the transaction as an LLT of the coordination node;
[0009] when the data node first receives a message sent by the coordination node, determining whether a difference between an LLT of the data node and the LLT of the coordination node exceeds a preset first error threshold; and if the difference between the LLT of the data node and the LLT of the coordination node exceeds the preset first error threshold, synchronously calibrating the LLT of the data node, controlling the data node to return an error message to the coordination node to enable the coordination node to roll back the transaction, and retrying the transaction after synchronously calibrating the LLT of the coordination node;
[0010] when the transaction is pre-committed, setting a transaction pre-commit timestamp (TPCT) of the transaction as a current LLT of the coordination node; determining whether a difference between the TPCT and the TBT is greater than a transaction tolerance of the transaction; and if the difference between the TPCT and the TBT is not greater than the transaction tolerance of the transaction, suspending the current operation; or if the difference between the TPCT and the TBT is greater than the transaction tolerance of the transaction, sending the TPCT and a transaction pre-committing message to all data nodes participating in the transaction; and
[0011] when two different transactions access same data, determining whether a difference between Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) timestamps of the two different transactions is less than a preset second error threshold; and if the difference between the timestamps of the two different transactions is less than the preset second error threshold, selecting one data node from target data nodes as an arbitration node based on a preset algorithm to arbitrate a sequence of the timestamps of the two different transactions.
[0012] Further, the preset first error threshold is twice a ULT tolerance.
[0013] Further, the second error threshold is set as follows: finding a maximum transaction tolerance in different transactions accessing same data, and setting the second error threshold to be twice the maximum transaction tolerance.
[0014] Further, the ULT tolerance is dynamically changed based on a system status, and the method includes:
[0015] when the coordination node performs synchronous time calibration with the catalog node, obtaining and recording a current ULT tolerance of a system; and
[0016] when the transaction obtains the TBT, storing the current ULT tolerance of the system in a metadata control block of the transaction.
[0017] Further, the method includes:
[0018] when the transaction modifies, deletes, or inserts data, using an ID of the transaction as a version to mark the data.
[0019] Further, the ID of the transaction is constituted by the TBT of the transaction and an ID of the coordination node participating in the transaction.
[0020] To resolve the same technical problems, the present disclosure further provides a distributed logical timestamp¨based management system for a distributed transaction. A
distributed transaction database includes a coordination node, a catalog node, and a data node, and an LLT of each node is synchronously calibrated based on a preset ULT.
[0021] The distributed logical timestamp¨based management system for a distributed transaction includes:
[0022] a transaction time management module, configured to: when a transaction begins, set a TBT
of the transaction as an LLT of the coordination node;
[0023] a transaction access management module, configured to: when the data node first receives a message sent by the coordination node, determine whether a difference between an LLT of the data node and the LLT of the coordination node exceeds a preset first error threshold; and if the difference between the LLT of the data node and the LLT of the coordination node exceeds the preset first error threshold, synchronously calibrate the LLT of the data node, control the data node to return an error message to the coordination node to enable the coordination node to roll back the transaction, and retry the transaction after synchronously calibrating the LLT of the coordination node;

Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)
[0024] a transaction pre-committing management module, configured to: when the transaction is pre-committed, set a TPCT of the transaction as a current LLT of the coordination node; determine whether a difference between the TPCT and the TBT is greater than a transaction tolerance of the transaction;
and if the difference between the TPCT and the TBT is not greater than the transaction tolerance of the transaction, suspend the current operation; or if the difference between the TPCT and the TBT is greater than the transaction tolerance of the transaction, send the TPCT and a transaction pre-committing message to all data nodes participating in the transaction; and
[0025] a distributed arbitration module, configured to: when two different transactions access same data, determine whether a difference between timestamps of the two different transactions is less than a preset second error threshold; and if the difference between the timestamps of the two different transactions is less than the preset second error threshold, select one data node from target data nodes as an arbitration node based on a preset algorithm to arbitrate a sequence of the timestamps of the two different transactions.
[0026] Further, the preset first error threshold is twice a ULT tolerance.
[0027] Further, the second error threshold is set as follows: finding a maximum transaction tolerance in different transactions accessing same data, and setting the second error threshold to be twice the maximum transaction tolerance.
[0028] Further, the ULT tolerance is dynamically changed based on a system status, and the system includes:
[0029] a ULT tolerance recording module, configured to: when the coordination node performs synchronous time calibration with the catalog node, obtain and record a current ULT tolerance of the system; and
[0030] a ULT tolerance update module, configured to: when the transaction obtains the TBT, store the current ULT tolerance of the system in a metadata control block of the transaction.
[0031] According to the present disclosure, based on a fully distributed logical clock mechanism, there is no need to force all related nodes to obtain a unique ID from a GTM.
This can reduce network overheads while satisfying requirements for distributed storage and processing, thereby effectively improving overall system performance.
BRIEF DESCRIPTION OF THE DRAWINGS
[0032] FIG. 1 is a schematic diagram of a system architecture of a distributed database according to an embodiment of the present disclosure;
[0033] FIG. 2 is a schematic diagram of performing synchronous time calibration by various nodes according to an embodiment of the present disclosure;

Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)
[0034] FIG. 3 is a schematic flowchart of a distributed logical timestamp¨based management method for a distributed transaction according to an embodiment of the present disclosure;
[0035] FIG. 4 is another schematic flowchart of a distributed logical timestamp¨based management method for a distributed transaction according to an embodiment of the present disclosure; and
[0036] FIG. 5 is a schematic structural diagram of a distributed logical timestamp¨based management system for a distributed transaction according to an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0037] The following clearly and completely describes the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings in the embodiments of the present disclosure. Apparently, the described embodiments are merely some rather than all of the embodiments of the present disclosure. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present disclosure without creative efforts shall fall within the protection scope of the present disclosure.
[0038] Refer to FIG. 1. An embodiment of the present disclosure provides a distributed logical timestamp¨based management method for a distributed transaction. A distributed transaction database includes a coordination node, a catalog node, and a data node, and an LLT of each node is synchronously calibrated based on a preset ULT.
[0039] In a system shown in FIG. 1, a distributed database (the distributed transaction database) is used as an example. An architecture in which storage and computing are separated is used for the distributed database. There are three different types of nodes in the system, and the nodes can be horizontally expanded. A coordination node is responsible for delivering a request to a desired data node. A data node is responsible for data access and storage. A catalog node stores system metadata and partition-related information. To ensure high reliability of the system, each master data node has a plurality of slave data nodes, and the master data node synchronizes data to the slave nodes based on a synchronization log. Similarly, there are also master and slave catalog nodes. The coordination node only performs intermediate computing and processing, and does not keep a status. Therefore, there is no slave coordination node.
[0040] For brief description, some terms in this specification are described by abbreviations. It should be noted that an abbreviation corresponding to a term has a same meaning as the term. The terms are explained as follows:
[0041] LLT: local logical timestamp. It indicates an LLT maintained by each node (minimum unit:
microsecond).
[0042] ULT: universal logical timestamp (minimum unit: microsecond).
Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)
[0043] LRT: local real timestamp. It indicates local coordinated universal time (UTC) of a node.
[0044] ULT tolerance: universal logical timestamp tolerance. It can be dynamically adjusted based on a system status. An initial default value may be set to 1 ms.
[0045] ULT SyncInternal: universal logical timestamp synchronization interval.
It indicates an interval at which ULTs are synchronized.
[0046] TBT: transaction begin timestamp. It indicates time at which a transaction begins.
[0047] TPCT: transaction pre-commit timestamp. It indicates time at which a transaction is pre-committed.
[0048] TCT: transaction commit timestamp. It indicates time at which a transaction is committed.
[0049] Refer to FIG. 2. A mechanism of calculating and synchronizing a logical timestamp in the present disclosure is as follows:
[0050] ULT allocation may be performed by an independent node. However, the catalog node in the system is lightly loaded. Therefore, in a preferred solution, a function of generating and maintaining the ULT can be integrated into the catalog node. When the entire system is started, the ULT may be set as an LRT of the catalog node. After that, the ULT is generated by a master catalog node through accumulation based on a difference between central processing unit (CPU) ticks. In this way, the ULT
approximates real time, but does not depend on precise machine time. On the catalog node, the ULT
increases progressively, with a minimum accuracy of microseconds (us). The master catalog node writes the ULT into a disk at an ULT SyncInterval for persistence, and copies the ULT to a slave catalog node based on a log. Each time the master catalog node is restarted or switched, a new master catalog node compares (ULT + ULT SyncInterval) in the log and a current LRT, and uses the larger one as a new ULT.
[0051] When being started, each coordination node and data node obtain the ULT
from the catalog node as their own LLTs. A slave data node directly performs synchronization with a master data node to obtain and set an LLT of the slave data node. In a running process of the system, each coordination node and data node regularly perform synchronous time calibration with the catalog node based on a Network Time Protocol (NTP) algorithm, to adjust their own LLTs. The ULT
SyncInterval is configurable, and may be set to 60 seconds by default. Specific synchronization steps are as follows:
[0052] An NTP client sends a delay calculation request to an NTP server (a node that needs to perform time synchronization is defined as the NTP client, and a time synchronization source (for example, the master catalog node) is defined as the NTP server).
[0053] A time delay can be calculated based on one request and a corresponding response.
[0054] ti is an LLT at which the NTP client sends the request.
[0055] t2 is an LLT at which the NTP server receives the request.

Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)
[0056] t3 is an LLT at which the NTP server sends the response.
[0057] t4 is an LLT at which the NTP client receives the response.
[0058] A network delay is equal to (t2 ¨ ti) + (t4 ¨ t3).
[0059] The time delay is equal to [(t2 ¨ ti) + (t3 ¨ t4)]/2.
[0060] A ULT tolerance (1 ms by default) is set as follows:
[0061] If the time delay is greater than 0, a updated client LLT is set to the original client LLT + the time delay.
[0062] If the time delay is less than 0, the client is forcibly suspended for the time delay.
[0063] If the network delay is greater than the ULT tolerance,
[0064] time synchronization is re-initiated.
[0065] If the network delay is greater than the ULT tolerance for five times, the ULT tolerance is set to the ULT tolerance multiplied by a change factor.
[0066] If there is no additional time calibration request within a period of time, the ULT tolerance is set to the ULT tolerance divided by the change factor. The change factor may be set to 1.2.
[0067] After each coordination node/data node performs time calibration with the catalog log by the foregoing mechanism, it can be ensured that an LLT error between various coordination nodes/data nodes does not exceed 2*ULT tolerance.
[0068] In this embodiment of the present disclosure, logical timestamps are used to determine a sequence of distributed transactions and data visibility. In the distributed database system, each transaction selects one coordination node to access the database. The selected coordination node may be specified by an application, may be any coordination node, or may be allocated through load balance.
In addition, a quantity of coordination nodes may be dynamically adjusted based on system load. A
two-phase commit mechanism (XA) is used to maintain consistency of a distributed transaction on various nodes. The following will describe an entire transaction access process based on the mechanism.
[0069] As shown in FIG. 3 and FIG. 4, the distributed logical timestamp¨based management method for a distributed transaction includes the following steps.
[0070] Step 51: When a transaction begins, set a TBT of the transaction as an LLT of the coordination node. Further, an ID of the transaction is constituted by the TBT of the transaction and an ID of the coordination node participating in the transaction.
[0071] Further, the ULT tolerance of the system is dynamically changed based on a system status.
When the coordination node performs synchronous time calibration with the catalog node, a current ULT tolerance of the system is obtained and recorded.
[0072] When the transaction obtains the TBT, the current ULT tolerance of the system is stored in a Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) metadata control block of the transaction.
[0073] In this embodiment of the present disclosure, when the transaction begins, an LLT is obtained from the coordination node, and is set as the TBT. The corresponding transaction ID (TID) may be constituted by the TBT and the ID of the coordination node. If 50 bits are used to represent the TBT
accurate to microseconds and 14 bits are used to identify the ID of the coordination node, the system can support use of 16K nodes for 35 years, and the 64-bit ID will not be repeated. To support a larger system cluster or longer time, more bits can be used to identify the TBT and the node ID. The 64-bit ID is used as an example herein.
TBTime (Transaction Begin NodeID
Time) 50 bits 14 bits
[0074] The ULT tolerance of the system is dynamically changed based on the system status (including the network delay). Therefore, a ULT tolerance of each transaction when the transaction is initiated may be different. Therefore, when the coordination node performs time calibration with the catalog node, the system obtains the current ULT tolerance of the system, delivers the ULT tolerance when the transaction obtains the TBT, and stores the ULT in the metadata control block related to the transaction.
[0075] Step S2: When the data node first receives a message sent by the coordination node, determine whether a difference between an LLT of the data node and the LLT of the coordination node exceeds a preset first error threshold; and if the difference between the LLT of the data node and the LLT of the coordination node exceeds the preset first error threshold, synchronously calibrate the LLT of the data node, control the data node to return an error message to the coordination node to enable the coordination node to roll back the transaction, and retry the transaction after synchronously calibrating the LLT of the coordination node. Further, the preset first error threshold is twice the ULT tolerance of the system. Further, when the transaction modifies, deletes, or inserts data, the ID of the transaction is used as a version to mark the data.
[0076] It should be noted that the first message sent by the coordination node to the data node includes the TBT and the LLT of the coordination node. After receiving the message, the data node compares its own LLT and the LLT of the coordination node in the message. If the difference between the two LLTs does not exceed 2*ULT tolerance, the data node performs an operation normally. If the difference between the two LLTs exceeds 2*ULT tolerance, the data node performs time calibration with the catalog node, returns an error, requests the coordination node to first roll back the transaction, and then retries the transaction after performing time calibration with the catalog node. When the transaction modifies/deletes/inserts the data, the ID of the transaction is used as the version to mark the data.

Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)
[0077] Step S3: When the transaction is pre-committed, set a TPCT of the transaction as a current LLT of the coordination node; determine whether a difference between the TPCT
and the TBT is greater than a transaction tolerance of the transaction; and if the difference between the TPCT and the TBT is not greater than the transaction tolerance of the transaction, suspend the current operation; or if the difference between the TPCT and the TBT is greater than the transaction tolerance of the transaction, send the TPCT and a transaction pre-committing message to all data nodes participating in the transaction.
[0078] In this embodiment of the present disclosure, when the transaction is pre-committed, the current LLT of the coordination node is used as the TPCT of the transaction, and the current LLT of the coordination node, along with the pre-committing message, is sent to all the data nodes participating in the transaction. The system requires that TPCT > TBT + ULT
tolerance. Otherwise, the current operation is suspended till the condition is satisfied. A sequence of different transactions on the data node is determined based on TIDs of the transactions. A typical application scenario is determining of data visibility in a multi-version data scenario.
[0079] Step S4: When two different transactions access same data, determine whether a difference between timestamps of the two different transactions is less than a preset second error threshold; and if the difference between the timestamps of the two different transactions is less than the preset second error threshold, select one data node from target data nodes as an arbitration node based on a preset algorithm to arbitrate a sequence of the timestamps of the two different transactions. Further, the second error threshold is set as follows: finding a maximum transaction tolerance in different transactions accessing same data, and setting the second error threshold to be twice the maximum transaction tolerance.
[0080] It should be noted that, in a distributed scenario, transactions are initiated from a plurality of different coordination nodes, and obtain their own timestamps separately.
Therefore, in a high-concurrence scenario, these timestamps may be very approximate. In addition, the coordination nodes initiating the transactions perform time calibration with the catalog node at different time. Therefore, ULT tolerances obtained by the transactions may be different. The system needs to record a corresponding ULT tolerance when each transaction begins. It is assumed that two transactions obtain a ULT tolerance A and a ULT tolerance B respectively. When a difference between timestamps of the two transactions is less than 2*max(ULT tolerance A + ULT tolerance B), it is regarded that a sequence of the timestamps of the two transactions is within an error range, and therefore, determining cannot be performed, and arbitration needs to be performed. It should be noted that arbitration is possibly needed only when different transactions need to access same data.
[0081] An algorithm (for example, a hash algorithm) may be used to select one data node from data Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) nodes participating in each of the two transactions as an arbitration node to determine the sequence of the two transactions. For example, IDs of nodes participating in a transaction A are (1, 2, 3, 4), and IDs of nodes participating in a transaction B are (3, 4, 5, 6). In this case, the arbitration node is Func(3, 4).
In this way, the same node performs arbitration for subsequent comparison between the transaction A
and the transaction B. A result of the first arbitration is used to determine a sequence of the two timestamps, and the result is reserved for subsequent queries. The arbitration result may be cleared after both the transaction A and the transaction B end or when the oldest transaction of the system is generated after max(TBT A, TBT B). The arbitration algorithm is not limited, and may be selected randomly or by comparing node IDs or priorities of transaction types.
[0082] The arbitration result needs to be synchronized to a slave node of the arbitration node in a specified manner, to ensure that when a single-node fault occurs, the slave node can still serve as the arbitration node after becoming a master node, and return the same arbitration result. For the synchronization manner, refer to an existing master/slave synchronization manner.
[0083] Arbitration is performed at a low probability, and is performed only when two timestamps are very approximate and common data is used. Arbitration is performed based on nodes participating in a transaction, and therefore, does not cause a single-point failure or a performance bottleneck, so the arbitration transactions and the nodes are fully distributed. Therefore, arbitration does not affect overall system performance.
[0084] It should be noted that, in different scenarios, the same mechanism may be used to perform arbitration as required to satisfy system requirements. In the foregoing example, different timestamps are compared to determine a sequence of starting or committing corresponding transactions, or a TBT
of one transaction and a TCT of another transaction are compared to determine data visibility. In actual application, key values of other types may be compared in a distributed manner to determine the sequence.
[0085] It should be noted that, for the sake of simplicity, the foregoing method or process embodiments are described as a series of action combinations, but a person skilled in the art will recognize that the embodiments of the present disclosure are not limited by the sequence of actions described, certain steps may be performed in another order or at the same time according to the embodiments of the present disclosure. In addition, it should be understood by a person skilled in the art that the embodiments described in this specification are preferred embodiments and the related actions are not necessarily necessary for the embodiments of the present disclosure.
[0086] It may be understood that the present disclosure provides one logical timestamp maintained in the entire system, and the logical timestamp is used in the distributed database to determine a sequence of starting a transaction and data visibility. When timestamps of transactions are approximate Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) and a sequence of the timestamps cannot be directly determined, the system performs arbitration by a distributed arbitration mechanism. In this way, a management mechanism without a single bottleneck is implemented for distributed transactions in the distributed database.
[0087] Compared with the prior art, the embodiments of the present disclosure have the following beneficial effects.
[0088] Compared with implementation of a universal timestamp by hardware, implementation by the system has advantages of low costs, easy implementation, and easy popularization. Compared with implementation of a distributed logical timestamp by other software, implementation by the system has the following advantages.
[0089] 1. The logical timestamp is generated by a node participating in a transaction instead of by a globally unified timestamp management node. This reduces corresponding network interactions, avoids a single-node performance bottleneck and a single-point failure, and can provide high-frequency and high-concurrency transaction services.
[0090] 2. A universal fault tolerance parameter is dynamically adaptive, and can adapt to different distributed environments and scenarios.
[0091] 3. In an entire transaction process, a parameter indicating waiting time triggered by fault tolerance is only needed for time calibration performed by the coordination node and the data node when the transaction begins, and an interval between beginning of the transaction and committing of the transaction is required. Therefore, waiting triggered by fault tolerance due to concurrent transactions does not occur. The time calibration is of a very small occurrence probability, and is performed only when a global system environment (for example, a network or machine fault occurs) changes significantly. The interval does not affect an overall system throughput in normal transaction logic, and only has a certain delay requirement (in milliseconds), so that demands for most applications can be satisfied.
[0092] 4. Different timestamp lengths and node ID lengths are configured to easily support clusters with different scales.
[0093] The distributed arbitration mechanism is introduced to eliminate blocking or waiting caused when timestamps of a plurality transactions are approximate in an existing software-based solution (for details, refer to google spanner). The arbitration mechanism selects an arbitrator based on nodes participating a transaction, and is free from a single-point performance bottleneck and a single-point failure.
[0094] Refer to FIG. 5. To resolve the same technical problems, the present disclosure further provides a distributed logical timestamp¨based management system for a distributed transaction. A
distributed transaction database includes a coordination node, a catalog node, and a data node, and an Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) LLT of each node is synchronously calibrated based on a preset ULT.
[0095] The distributed logical timestamp¨based management system for a distributed transaction includes:
[0096] a transaction time management module 1, configured to: when a transaction begins, set a TBT
of the transaction as an LLT of the coordination node;
[0097] a transaction access management module 2, configured to: when the data node first receives a message sent by the coordination node, determine whether a difference between an LLT of the data node and the LLT of the coordination node exceeds a preset first error threshold; and if the difference between the LLT of the data node and the LLT of the coordination node exceeds the preset first error threshold, synchronously calibrate the LLT of the data node, control the data node to return an error message to the coordination node to enable the coordination node to roll back the transaction, and retry the transaction after synchronously calibrating the LLT of the coordination node;
[0098] a transaction pre-committing management module 3, configured to: when the transaction is pre-committed, set a TPCT of the transaction as a current LLT of the coordination node; determine whether a difference between the TPCT and the TBT is greater than a transaction tolerance of the transaction; and if the difference between the TPCT and the TBT is not greater than the transaction tolerance of the transaction, suspend the current operation; or if the difference between the TPCT and the TBT is greater than the transaction tolerance of the transaction, send the TPCT and a transaction pre-committing message to all data nodes participating in the transaction; and
[0099] a distributed arbitration module 4, configured to: when two different transactions access same data, determine whether a difference between timestamps of the two different transactions is less than a preset second error threshold; and if the difference between the timestamps of the two different transactions is less than the preset second error threshold, select one data node from target data nodes as an arbitration node based on a preset algorithm to arbitrate a sequence of the timestamps of the two different transactions.
[0100] Further, the preset first error threshold is twice a ULT tolerance.
[0101] Further, the second error threshold is set as follows: finding a maximum transaction tolerance in different transactions accessing same data, and setting the second error threshold to be twice the maximum transaction tolerance.
[0102] Further, the ULT tolerance is dynamically changed based on a system status, and the system includes:
[0103] a ULT tolerance recording module, configured to: when the coordination node performs synchronous time calibration with the catalog node, obtain and record a current ULT tolerance of the system; and Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)
[0104] a ULT tolerance update module, configured to: when the transaction obtains the TBT, store the current ULT tolerance of the system in a metadata control block of the transaction.
[0105] It may be understood that the foregoing system embodiment corresponds to the method embodiment of the present disclosure. The distributed logical timestamp¨based management system for a distributed transaction in the embodiments of the present disclosure can implement the distributed logical timestamp¨based management method for a distributed transaction in any one of the method embodiments of the present disclosure.
[0106] The descriptions above are preferred implementations of the present disclosure, and it should be noted that for a person of ordinary skill in the art, various improvements and modifications can be made without departing from the principles of the present disclosure. These improvements and modifications should also be regarded as falling into the protection scope of the present disclosure.

Date Recue/Date Received 2021-06-29

Claims (10)

Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT)WHAT IS CLAIMED IS:
1. A distributed logical timestamp¨based management method for a distributed transaction, wherein a distributed transaction database comprises a coordination node, a catalog node, and a data node, and a local logical timestamp (LLT) of each node is synchronously calibrated based on a preset universal logical timestamp (ULT); and the distributed logical timestamp¨based management method for a distributed transaction comprises:
when a transaction begins, setting a transaction begin timestamp (TBT) of the transaction as an LLT of the coordination node;
when the data node first receives a message sent by the coordination node, determining whether a difference between an LLT of the data node and the LLT of the coordination node exceeds a preset first error threshold; and if the difference between the LLT of the data node and the LLT of the coordination node exceeds the preset first error threshold, synchronously calibrating the LLT of the data node, controlling the data node to return an error message to the coordination node to enable the coordination node to roll back the transaction, and retrying the transaction after synchronously calibrating the LLT
of the coordination node;
when the transaction is pre-committed, setting a transaction pre-commit timestamp (TPCT) of the transaction as a current LLT of the coordination node; determining whether a difference between the TPCT and the TBT is greater than a transaction tolerance of the transaction;
and if the difference between the TPCT and the TBT is not greater than the transaction tolerance of the transaction, suspending the current operation; or if the difference between the TPCT and the TBT is greater than the transaction tolerance of the transaction, sending the TPCT and a transaction pre-committing message to all data nodes participating in the transaction; and when two different transactions access same data, determining whether a difference between timestamps of the two different transactions is less than a preset second error threshold; and if the difference between the timestamps of the two different transactions is less than the preset second error threshold, selecting one data node from target data nodes as an arbitration node based on a preset algorithm to arbitrate a sequence of the timestamps of the two different transactions.
2. The distributed logical timestamp¨based management method for a distributed transaction according to claim 1, wherein the preset first error threshold is twice a ULT
tolerance.
3. The distributed logical timestamp¨based management method for a distributed transaction according to claim 1, wherein the second error threshold is set as follows:
finding a maximum Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) transaction tolerance in different transactions accessing same data, and setting the second error threshold to be twice the maximum transaction tolerance.
4. The distributed logical timestamp¨based management method for a distributed transaction according to claim 1, wherein a ULT tolerance is dynamically changed based on a system status, and the method further comprises:
when the coordination node performs synchronous time calibration with the catalog node, obtaining and recording a current ULT tolerance of a system; and when the transaction obtains the TBT, storing the current ULT tolerance of the system in a metadata control block of the transaction.
5. The distributed logical timestamp¨based management method for a distributed transaction according to claim 1, further comprising:
when the transaction modifies, deletes, or inserts data, using an identifier (ID) of the transaction as a version to mark the data.
6. The distributed logical timestamp¨based management method for a distributed transaction according to claim 1, wherein an ID of the transaction is constituted by the TBT of the transaction and an ID of the coordination node participating in the transaction.
7. A distributed logical timestamp¨based management system for a distributed transaction, wherein a distributed transaction database comprises a coordination node, a catalog node, and a data node, and an LLT of each node is synchronously calibrated based on a preset ULT; and the distributed logical timestamp¨based management system for a distributed transaction comprises:
a transaction time management module, configured to: when a transaction begins, set a TBT of the transaction as an LLT of the coordination node;
a transaction access management module, configured to: when the data node first receives a message sent by the coordination node, determine whether a difference between an LLT of the data node and the LLT of the coordination node exceeds a preset first error threshold; and if the difference between the LLT of the data node and the LLT of the coordination node exceeds the preset first error threshold, synchronously calibrate the LLT of the data node, control the data node to return an error message to the coordination node to enable the coordination node to roll back the transaction, and retry the transaction after synchronously calibrating the LLT of the coordination node;
Date Recue/Date Received 2021-06-29 Our Ref: 43970-7 CA National Entry of PCT/CN2020/114654 (CA2101012H-PCT) a transaction pre-committing management module, configured to: when the transaction is pre-committed, set a TPCT of the transaction as a current LLT of the coordination node; determine whether a difference between the TPCT and the TBT is greater than a transaction tolerance of the transaction;
and if the difference between the TPCT and the TBT is not greater than the transaction tolerance of the transaction, suspend the current operation; or if the difference between the TPCT and the TBT is greater than the transaction tolerance of the transaction, send the TPCT and a transaction pre-committing message to all data nodes participating in the transaction; and a distributed arbitration module, configured to: when two different transactions access same data, determine whether a difference between timestamps of the two different transactions is less than a preset second error threshold; and if the difference between the timestamps of the two different transactions is less than the preset second error threshold, select one data node from target data nodes as an arbitration node based on a preset algorithm to arbitrate a sequence of the timestamps of the two different transactions.
8. The distributed logical timestamp¨based management system for a distributed transaction according to claim 7, wherein the preset first error threshold is twice a ULT
tolerance.
9. The distributed logical timestamp¨based management system for a distributed transaction according to claim 7, wherein the second error threshold is set as follows:
finding a maximum transaction tolerance in different transactions accessing same data, and setting the second error threshold to be twice the maximum transaction tolerance.
10. The distributed logical timestamp¨based management system for a distributed transaction according to claim 7, wherein a ULT tolerance is dynamically changed based on a system status, and the system further comprises:
a ULT tolerance recording module, configured to: when the coordination node performs synchronous time calibration with the catalog node, obtain and record a current ULT tolerance of the system; and a ULT tolerance update module, configured to: when the transaction obtains the TBT, store the current ULT tolerance of the system in a metadata control block of the transaction.

Date Recue/Date Received 2021-06-29
CA3125546A 2019-10-23 2020-09-11 Distributed logical timestamp-based management method and system for distributed transaction Pending CA3125546A1 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
CN201911014865.2 2019-10-23
CN201911014865.2A CN110795506A (en) 2019-10-23 2019-10-23 Distributed database management method and device based on distributed logic timestamp
CN201911291498.0A CN111061810B (en) 2019-10-23 2019-12-13 Distributed transaction management method and system based on distributed logic timestamp
CN201911291498.0 2019-12-13
PCT/CN2020/114654 WO2021077934A1 (en) 2019-10-23 2020-09-11 Distributed transaction management method and system based on distributed logic timestamp

Publications (1)

Publication Number Publication Date
CA3125546A1 true CA3125546A1 (en) 2021-04-29

Family

ID=69441083

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3125546A Pending CA3125546A1 (en) 2019-10-23 2020-09-11 Distributed logical timestamp-based management method and system for distributed transaction

Country Status (3)

Country Link
CN (2) CN110795506A (en)
CA (1) CA3125546A1 (en)
WO (1) WO2021077934A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795506A (en) * 2019-10-23 2020-02-14 广州巨杉软件开发有限公司 Distributed database management method and device based on distributed logic timestamp
CN111708796A (en) * 2020-06-23 2020-09-25 浪潮云信息技术股份公司 Data consistency method based on time stamp
CN113934737A (en) * 2020-06-29 2022-01-14 华为技术有限公司 Database system, and method and device for managing transactions
CN112182103A (en) * 2020-09-24 2021-01-05 广州巨杉软件开发有限公司 Distributed database and method for realizing cross-node transaction strong consistency
CN113238892B (en) * 2021-05-10 2022-01-04 深圳巨杉数据库软件有限公司 Time point recovery method and device for global consistency of distributed system
CN114095086B (en) * 2021-11-13 2023-05-19 南京智汇电力技术有限公司 Regional time setting method suitable for optical fiber communication

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753609B (en) * 2008-12-15 2012-09-19 中国移动通信集团公司 Version control method, nodes and system of distributed system
US8356007B2 (en) * 2010-10-20 2013-01-15 Microsoft Corporation Distributed transaction management for database systems with multiversioning
CN103914452B (en) * 2012-12-30 2017-03-29 航天信息股份有限公司 Time error correction scheme and system in a kind of distributed file system
WO2019000398A1 (en) * 2017-06-30 2019-01-03 深圳市大疆创新科技有限公司 Method for scheduling transaction, and processor, distributed system, and unmanned aerial vehicle
CN110209726B (en) * 2018-02-12 2023-10-20 金篆信科有限责任公司 Distributed database cluster system, data synchronization method and storage medium
CN108984277B (en) * 2018-04-02 2019-08-30 北京百度网讯科技有限公司 Distributed database transaction processing method and processing device based on GPS atomic clock
CN110795506A (en) * 2019-10-23 2020-02-14 广州巨杉软件开发有限公司 Distributed database management method and device based on distributed logic timestamp

Also Published As

Publication number Publication date
CN111061810A (en) 2020-04-24
CN111061810B (en) 2020-08-21
WO2021077934A1 (en) 2021-04-29
CN110795506A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
CA3125546A1 (en) Distributed logical timestamp-based management method and system for distributed transaction
CN107888657B (en) Low latency distributed storage system
US7340630B2 (en) Multiprocessor system with interactive synchronization of local clocks
US8538923B2 (en) Method, node and system for controlling version in distributed system
US9158779B2 (en) Multi-node replication systems, devices and methods
US11436218B2 (en) Transaction processing for a database distributed across availability zones
US10936545B1 (en) Automatic detection and backup of primary database instance in database cluster
Du et al. Clock-RSM: Low-latency inter-datacenter state machine replication using loosely synchronized physical clocks
US9100330B1 (en) Introduction of read delay or write delay in servers of a geographically distributed data processing system so that clients read up-to-date data
US10489378B2 (en) Detection and resolution of conflicts in data synchronization
JP2018505496A (en) Method, apparatus and system for synchronizing data
US20230106118A1 (en) Distributed processing of transactions in a network using timestamps
US11556375B2 (en) Reducing commit wait in a distributed multiversion database by reading the clock earlier
JP2010061559A (en) Information processing system, and data update method and data update program
JP2007503050A5 (en)
WO2019057193A1 (en) Data deletion method and distributed storage system
Didona et al. Okapi: Causally consistent geo-replication made faster, cheaper and more available
WO2022170938A1 (en) Distributed database remote backup
CN113268472A (en) Distributed data storage system and method
CN110602158A (en) Global unique ID generation system and method, ID generator and storage medium
CN111552701A (en) Method for determining data consistency in distributed cluster and distributed data system
US11907754B2 (en) System to trigger time-dependent action
WO2022068220A1 (en) Data processing method and apparatus, computer device, and storage medium
CN112948178A (en) Data processing method, device, system, equipment and medium
WO2021022396A1 (en) Transaction processing for database distributed across regions

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20210629

EEER Examination request

Effective date: 20210629

EEER Examination request

Effective date: 20210629

EEER Examination request

Effective date: 20210629

EEER Examination request

Effective date: 20210629