CN110795506A - Distributed database management method and device based on distributed logic timestamp - Google Patents

Distributed database management method and device based on distributed logic timestamp Download PDF

Info

Publication number
CN110795506A
CN110795506A CN201911014865.2A CN201911014865A CN110795506A CN 110795506 A CN110795506 A CN 110795506A CN 201911014865 A CN201911014865 A CN 201911014865A CN 110795506 A CN110795506 A CN 110795506A
Authority
CN
China
Prior art keywords
transaction
node
time
data
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911014865.2A
Other languages
Chinese (zh)
Inventor
许建辉
陈元熹
何国明
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Big Tree Software Development Co Ltd
Original Assignee
Guangzhou Big Tree Software Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Big Tree Software Development Co Ltd filed Critical Guangzhou Big Tree Software Development Co Ltd
Priority to CN201911014865.2A priority Critical patent/CN110795506A/en
Priority to CN201911291498.0A priority patent/CN111061810B/en
Publication of CN110795506A publication Critical patent/CN110795506A/en
Priority to PCT/CN2020/114654 priority patent/WO2021077934A1/en
Priority to CA3125546A priority patent/CA3125546A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Abstract

The invention discloses a distributed database management method and a device based on distributed logic timestamps, wherein the method comprises the following steps: setting a transaction start time of a transaction; if the difference value of the local logic time of the data node and the local logic time of the coordination node exceeds a preset threshold value, the local logic time of the data node is calibrated, the coordination node rolls back the transaction, and the transaction is retried after the time of the coordination node is calibrated; if the difference value between the transaction pre-submission time and the transaction start time is larger than the transaction tolerance error of the transaction, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction; and selecting one data node as an arbitration node to arbitrate the time stamp sequences of two different transactions according to the phase difference value condition of the time stamps of the two different transactions. The invention can meet the requirements of distributed storage and processing, and simultaneously reduces the network overhead, thereby effectively improving the overall performance of the system.

Description

Distributed database management method and device based on distributed logic timestamp
Technical Field
The invention relates to the technical field of databases, in particular to a distributed database management method and device based on distributed logic timestamps.
Background
The distributed database management system is an important part in practical application of the distributed system, not only has and uses the characteristics of all the distributed systems, but also puts higher requirements on data storage and processing. Database management systems were widely used since the eighties of the last century, where initially the databases were deployed on one computer and all transactions were processed in this one stand-alone database. With the rapid expansion of the data volume and the service volume, a single machine cannot meet the requirements of data storage and data processing capacity, and a distributed database system becomes a preferred framework for various application deployments. The distributed storage, distributed operation and low-delay response under high concurrency in a distributed database are solved, and the method has great reference significance for other distributed systems.
In a distributed data system, it is often necessary to maintain a globally unique ID to distinguish concurrent transactions and identify data that is generated or changed. This ID is typically characterized as follows: the single point failure cannot exist, the ID of the data fragment can be controlled in time sequence or containing time, and the control is not too long.
Implementing such a globally unique ID typically relies on the timestamp plus some other identification bits in conjunction with the time of transaction initiation. This timestamp generation is generally of two types: hardware-based clocks and software-implemented logic clocks. The former uses the characteristics of atoms and hardware to ensure that the mutual error of each clock device is negligibly small in a long time. Thus, each computing node participating in the cluster can directly query and use the local clock as a global timestamp. The disadvantage is that the cost is very high and the popularization and application are impossible. In practical applications, software-implemented logic clocks are more utilized. Most existing implementations deploy one or a small cluster (global transaction management node, GTM) in a distributed system computer cluster to generate and distribute a uniform global ID containing a timestamp, and all other computers in the cluster query the GTM synchronously to obtain an ID. This ID has an incremental attribute to ensure that the ID obtained for each query is unique. The disadvantage of this implementation is that when the system cluster is large in scale and many concurrent operations are performed, the load of GTM is heavy, and the network overhead is very large, resulting in the overall performance of the system being degraded. Meanwhile, implementing GTM in a small cluster to avoid single point failure is also very complicated and cannot completely guarantee reliability.
Disclosure of Invention
The embodiment of the invention provides a distributed database management method and device based on distributed logic timestamps, which can avoid the step of forcing all participating nodes to acquire unique IDs from GTMs, thereby meeting the requirements of distributed storage and processing, reducing the network overhead and improving the overall performance of a system.
In order to solve the above technical problem, an embodiment of the present invention provides a distributed database management method based on distributed logic timestamps, where the distributed database includes coordination nodes, cataloging nodes, and data nodes, and the local logic time of each node is synchronized according to a preset global logic time;
the distributed database management method based on the distributed logic timestamp comprises the following steps:
when a transaction starts, setting the transaction starting time of the transaction as the local logic time of the coordination node;
when the data node receives a message sent by the coordination node for the first time, judging whether a difference value between the local logic time of the data node and the local logic time of the coordination node exceeds a preset first error threshold value; if yes, synchronously timing the local logic time of the data node, and simultaneously controlling the data node to return an error message to the coordination node so that the coordination node rolls back the transaction, and retries the transaction after synchronously timing the local logic time of the coordination node;
when the transaction is pre-submitted, setting the transaction pre-submission time of the transaction as the current local logic time of the coordination node, and judging whether the difference value between the transaction pre-submission time and the transaction start time is greater than the transaction tolerance error of the transaction; if not, suspending the execution; if yes, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction;
when two different transactions access the same data, judging whether the difference value of the timestamps of the two different transactions is smaller than a preset second error threshold value or not; and if so, selecting one data node from the target data nodes according to a preset algorithm as an arbitration node to arbitrate the time stamp sequence of the two different transactions.
Further, the preset first error threshold is twice the global tolerance error.
Further, the second error threshold is set in the following manner: finding the largest transaction tolerance error among different transactions accessing the same data, and setting the second error threshold to be twice the largest transaction tolerance error.
Further, the global tolerance error is dynamically changed according to the system condition; the method further comprises the following steps:
when the co-regulation point and the cataloguing node perform synchronous calibration, acquiring and recording the global tolerance error of the current system;
and when the transaction acquires the transaction start time, storing the current system global tolerance error into the metadata control block of the transaction.
Further, the method further comprises:
when a transaction modifies, deletes or inserts data, the identification ID of the transaction is used as a version to mark the data that was altered.
Further, the identification ID of a transaction consists of the transaction start time of the transaction and the node numbers of the coordinating nodes participating in the transaction.
In order to solve the same technical problem, the invention also provides a distributed database management device based on the distributed logic timestamp, wherein the distributed database comprises coordination nodes, cataloguing nodes and data nodes, and the local logic time of each node is synchronously calibrated according to the preset global logic time;
the distributed database management apparatus based on distributed logical time stamps includes:
the transaction time management module is used for setting the transaction start time of the transaction as the local logic time of the coordination node when the transaction starts;
the transaction access management module is used for judging whether a difference value between the local logic time of the data node and the local logic time of the coordination node exceeds a preset first error threshold value or not when the data node receives a message sent by the coordination node for the first time; if yes, synchronously timing the local logic time of the data node, and simultaneously controlling the data node to return an error message to the coordination node so that the coordination node rolls back the transaction, and retries the transaction after synchronously timing the local logic time of the coordination node;
the transaction pre-submission management module is used for setting the transaction pre-submission time of the transaction as the current local logic time of the coordination node when the transaction is pre-submitted, and judging whether the difference value between the transaction pre-submission time and the transaction start time is larger than the transaction tolerance error of the transaction or not; if not, suspending the execution; if yes, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction;
the distributed arbitration module is used for judging whether the difference value of the timestamps of the two different transactions is smaller than a preset second error threshold value or not when the two different transactions access the same data; and if so, selecting one data node from the target data nodes according to a preset algorithm as an arbitration node to arbitrate the time stamp sequence of the two different transactions.
Further, the preset first error threshold is twice the global tolerance error.
Further, the second error threshold is set in the following manner: finding the largest transaction tolerance error among different transactions accessing the same data, and setting the second error threshold to be twice the largest transaction tolerance error.
Further, the global tolerance error is dynamically changed according to the system condition; the device further comprises:
the global tolerance error recording module is used for acquiring and recording the global tolerance error of the current system when the co-regulation point and the cataloguing node perform synchronous calibration;
and the global tolerance error updating module is used for storing the current system global tolerance error into the metadata control block of the transaction when the transaction obtains the transaction starting time.
By implementing the invention, the step of forcing all the participating nodes to acquire the unique ID from the GTM is avoided based on a fully distributed logic clock mechanism, thereby meeting the requirements of distributed storage and processing, reducing the network overhead and effectively improving the overall performance of the system.
Drawings
FIG. 1 is a system architecture diagram of a distributed database provided by an embodiment of the present invention;
fig. 2 is a schematic diagram of an application of each node to perform synchronous timing according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a distributed database management method based on distributed logical timestamps according to an embodiment of the present invention;
FIG. 4 is another flow chart diagram of a distributed database management method based on distributed logical timestamps according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a distributed database management apparatus based on distributed logical timestamps according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a distributed database management method based on distributed logic timestamps, where the distributed database includes coordination nodes, cataloging nodes, and data nodes, and local logic time of each node is synchronized according to preset global logic time;
referring to fig. 1, in the system shown in the figure, a distributed database is taken as an example, a framework with separate storage and computation is adopted, and three different types of nodes exist in the system, which can be respectively expanded horizontally. The coordination node is responsible for distributing requests to data nodes needing to participate, the data nodes are responsible for accessing and storing data, and the cataloging node stores system metadata and partition related information. In order to ensure high reliability of the system, each master data node has a plurality of slave data nodes, and the master data node synchronizes data to the slave nodes through synchronization logs. Similarly, the cataloging node also has a master node and a slave node. The coordination node only has an intermediate operation processing process, does not keep the state and has no slave node.
For the sake of brevity, some terms are described herein using the following abbreviations, it being noted that the abbreviations corresponding to the terms are identical to the meanings represented by the terms, and the terms are to be construed as follows:
LLT (Local Logical Timestamp, Local Logical time): local logical time (minimum unit: microsecond) maintained by each node.
ULT (Universal Logical Timestamp, global Logical time): global logic time (minimum unit: microsecond).
Lrt (local Real timestamp): node local UTC time.
ULT Tolerance (Universal logic Timestamp tolerace, global Tolerance error): the error may be dynamically adjusted based on system conditions. An initial default value may be set to 1 ms.
ULT synclnternal (Universal logic Timestamp syncronization interval): a global synchronization time interval.
Tbt (transaction Begin timestamp): the start time of the transaction.
TPCT (Transaction Pre-Commit Timestamp): the pre-commit (presimit) time of the transaction.
TCT (transaction Commit timestamp): the commit time of the transaction.
Referring to fig. 2, the logic time calculation and synchronization mechanism provided by the present invention is as follows:
ULT allocations may be placed on independent nodes. However, since the load on the inventory nodes in the system is relatively light, the functions of generating and maintaining ULTs can be integrated into the inventory nodes as a preferred solution. The ULT may be set to the LRT of the catalog node at system-wide startup. The ULT is then generated by the catalog master node based on CPU tick difference accumulation. Thus the ULT is close to real time but not dependent on precise machine time. ULT is incremental on the cataloged nodes, with a minimum precision of microseconds (us). And writing the ULT to a disk by the cataloging main node every ULT SyncInterval time for persistence, and copying the ULT to a standby node of the cataloging node through a log. Every time the cataloging master node restarts or switches, the new master node compares (ULT + ultsnycnterval) on the log with the current LRT, and takes the larger value as the new ULT.
And each coordinating node and each data node acquire the ULT from the cataloging node as the LLT of the coordinating node and the data node when starting. And the standby node of the data node directly and synchronously acquires and sets the LLT of the standby node and the main node. In the running process of the system, each coordination node and each data node are respectively and periodically synchronized with the cataloging node by using an NTP algorithm, and the LLT of each node is adjusted. The time interval for synchronization (ultsnchnterval) may be configurable and may be set to 60 seconds by default. The synchronization comprises the following specific steps:
the NTP Client sends a request to the NTP Server to calculate delay (defining a node needing synchronization time as NTPClient and a synchronization time source (such as a cataloging main node) as the NTP Server);
the time difference can be calculated by a single request round trip, wherein,
t1 is the LLT when the NTP Client sends the request;
t2 is the LLT when NTP Server receives;
t3 is the LLT when NTP Server sends reply;
t4 is the LLT when NTP Client receives;
network delay: network Delay ═ t2-t1) + (t4-t 3;
time difference: time Delay [ (t2-t1) + (t3-t4) ]/2;
for the time difference threshold ULT Tolerance (default: 1 ms);
if the Time difference Time Delay is greater than 0, setting client LLT + ═ Time Delay;
if the Time difference Time Delay is less than 0, the client end is forced to pause the Time Delay Time;
if the Network Delay is greater than the time difference Network Delay > ULT Tolerance;
reinitiating time synchronization;
if 5 times are more than ULT Tolerance, setting ULT Tolerance ═ ULT Tolerance change factor
If no additional timing request exists in a period of time, setting ULT Tolerance as ULT Tolerance/change factor; a changefactor of 1.2 can be set;
through the mechanism, after each coordination node/data is time-corrected with the cataloging node, the error of logic time (LLT) between each coordination node/data node can be ensured not to exceed 2 × ULT Tolerance.
In the embodiment of the invention, logic time is used in the distributed transaction to judge the sequence of the transaction and the visibility of data. In the distributed database system, each transaction selects one coordination node to start accessing the database, and the selection can be specified by an application, or any one of the coordination nodes, or one coordination node is distributed through load balance. Meanwhile, the number of the coordination nodes can be dynamically adjusted according to the system load. Distributed transactions use a two-stage commit mechanism (XA) to maintain node coherency. The flow of the entire transaction access will be described based on this mechanism as follows:
referring to fig. 3-4, the distributed database management method based on distributed logical timestamps includes:
step S1, when the transaction starts, the transaction start time of the transaction is set as the local logic time of the coordinating node; further, the identification ID of a transaction consists of the transaction start time of the transaction and the node numbers of the coordinating nodes participating in the transaction.
Furthermore, the global tolerance error of the system is dynamically changed according to the system condition; when the co-regulation point and the cataloguing node perform synchronous calibration, acquiring and recording the global tolerance error of the current system;
and when the transaction acquires the transaction start time, storing the current system global tolerance error into the metadata control block of the transaction.
In the embodiment of the invention, when the transaction starts, the LLT is taken by the coordinating node and is set as the transaction start time (TBT). The id (tid) of the corresponding transaction may consist of the TBT time and the node number of the coordinating node. If 50 bits are used to represent microsecond accurate time and 14 bits identify the coordinated node number, the system can support a 16K node using 35 year time without duplication of 64-bit ID. If it is desired to support larger clusters of systems or longer times, a longer number of bits may be used to identify time and number of nodes. Here, 64-bit ID is taken as an example:
Figure BDA0002244925040000071
Figure BDA0002244925040000081
since the ULT Tolerance of the system is dynamically changed according to the system conditions (including network delay), the ULT Tolerance may be different when each transaction is initiated. Therefore, the system can acquire the ULT permission of the current system when the coordinating node and the cataloging node are calibrated, issue the Tolerance value when the transaction acquires the TBT, and store the Tolerance value in the metadata control block related to the transaction.
Step S2, when the data node receives the message sent by the coordinating node for the first time, determining whether a difference between the local logic time of the data node and the local logic time of the coordinating node exceeds a preset first error threshold; if yes, synchronously timing the local logic time of the data node, and simultaneously controlling the data node to return an error message to the coordination node so that the coordination node rolls back the transaction, and retries the transaction after synchronously timing the local logic time of the coordination node; further, the preset first error threshold is twice the global tolerance error of the system. Further, when a transaction modifies, deletes, or inserts data, the identification ID of the transaction is used as a version to mark the data that was changed.
It should be noted that the message between the first coordinating node and the data node contains the TBT and the LLT of the coordinating node, the data node compares the LLT of the coordinating node in the message with its own LLT after receiving the LLT, if the difference between the LLT and the LLT of the coordinating node is not more than 2 × ultrolance, the data node executes the operation normally, otherwise, the data node itself and the cataloging node correct the time and return an error at the same time, the coordinating node is required to roll back the transaction first, and then retry the transaction after correcting the time with the cataloging node. When a transaction modifies/deletes/inserts data, the modified data needs to be tagged with the identification ID of the transaction as a version.
Step S3, when the transaction is pre-submitted, setting the transaction pre-submission time of the transaction as the current local logic time of the coordinating node, and judging whether the difference value between the transaction pre-submission time and the transaction start time is larger than the transaction tolerance error of the transaction; if not, suspending the execution; if yes, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction;
in the embodiment of the invention, when the transaction is pre-committed, the current LLT of the coordinating node is used as the TPCT of the transaction, and is sent to all the participating data nodes together with the pre-commit message. The system requires that TPCT > TBT + ULT Tolerance, otherwise execution is suspended until the condition is met. And different transactions determine the sequence of the transactions on the data node according to the transaction TID. The most typical usage scenario is to determine the visibility of data in a data multi-version scenario.
Step S4, when two different transactions access the same data, judging whether the difference value of the timestamps of the two different transactions is smaller than a preset second error threshold value; and if so, selecting one data node from the target data nodes according to a preset algorithm as an arbitration node to arbitrate the time stamp sequence of the two different transactions. Further, the second error threshold is set in the following manner: finding the largest transaction tolerance error among different transactions accessing the same data, and setting the second error threshold to be twice the largest transaction tolerance error.
It should be noted that, in a distributed scenario, since transactions are initiated from a plurality of different coordinating nodes, each independently obtains its own timestamp, and in a highly concurrent scenario, these timestamps may be very close to each other. Meanwhile, due to the fact that the time correction time of the coordination node and the cataloging node initiated by each transaction is different, the ULT Tolerance acquired by the coordination node and the cataloging node may be different. The system needs to record the ULT Tolerance corresponding to the beginning of each transaction. Suppose two transactions acquire ULTTollerance A and ULT Tolerance B, respectively. When the timestamp difference of two transactions is less than 2 max (ULT timestamp a + ULT timestamp B), we consider that the timestamps of the two transactions belong to the error range in the sequence from beginning to end, so that the judgment cannot be made and arbitration is needed. It should be noted that arbitration may be required only if different transactions need to access the same data, otherwise only individual execution is required.
A fixed algorithm (such as a hash algorithm) can be used for selecting one data node from data nodes in which two transactions participate together to serve as an arbitration node, and the arbitration node judges the transaction sequence. For example, if the participating nodes for transaction a include (1, 2, 3, 4) and the participating nodes for transaction B include (3, 4, 5, 6), then the arbitrating node is Func (3, 4). So that all subsequent comparisons involving transactions a and B arbitrate at the same node. And taking a result generated when arbitration query is made for the first time as the judgment of the sequence of the two timestamps, and reserving the result for subsequent query. The result of arbitration may be cleared after A, B transactions all end, or after the oldest transaction in the system is newer than max (TBT A, TBT B). The arbitration algorithm is not limited, and can be random, and can also compare node IDs, or compare transaction types and judge according to priority.
The arbitration result needs to be synchronized to the standby node of the arbitration node in a certain way, so that when a single node fails, the standby node can still be used as the arbitration node after being upgraded to the main node, and the same arbitration result is returned. The synchronization mode may refer to an existing master/slave synchronization mode.
Arbitration does not occur with a high probability, but only if the two timestamps are very close and the common data is used. Arbitration is computed per node participating in the transaction, and so is fully distributed with no single point of failure or performance bottleneck. Arbitration does not affect the overall performance of the system.
It should be noted that in various scenarios, the same mechanism can be used to arbitrate as needed to meet system requirements. In the above example, the sequence of the start of the transaction is determined by comparing different timestamps, or the sequence of the commit of the transaction is determined, or the visibility of the data is determined by comparing the time of the start of the transaction and the time stamp of the commit of another transaction. In practical application, the sequence can be judged in a distributed mode by comparing key values of other types.
It should be noted that the above method or flow embodiment is described as a series of acts or combinations for simplicity, but those skilled in the art should understand that the present invention is not limited by the described acts or sequences, as some steps may be performed in other sequences or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are exemplary embodiments and that no single embodiment is necessarily required by the inventive embodiments.
It will be appreciated that the present invention provides a system-wide co-maintained logical timestamp that is used in the distributed database to determine the order of the start of transactions and the visibility of data. When the time stamps of the transactions are close and cannot be directly judged, the system can judge the sequence through a distributed arbitration mechanism. Therefore, a distributed transaction management mechanism without single bottleneck in the distributed database is realized.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
compared with the realization of the hardware global timestamp, the system has the advantages of low cost, easy implementation and easy popularization. Compared with the existing other software, the distributed logic timestamp implementation has the following advantages:
1. the logic timestamp is generated by the nodes participating in the transaction without depending on a globally uniform timestamp management node, so that corresponding network exchange is reduced, performance bottleneck and single point failure of a single node are avoided, and high-frequency and high-concurrency transaction service can be provided.
2. The global fault tolerance parameter is dynamically self-adaptive and can adapt to different distributed environments and scenes.
3. In the whole transaction flow, the waiting of the fault-tolerant time parameter is limited to the coordination of the node and the data node at the beginning of the transaction and the interval between the beginning of the transaction and the submission, and the fault-tolerant waiting is not generated due to the concurrent transaction. The former only appears when the global system environment (network, machine fault) changes significantly, and the occurrence probability is very small; the latter does not affect the whole throughput of the system in normal business logic, only has certain requirement on delay, but has the magnitude order of millisecond, and can meet the requirements of most applications.
4. Different time stamp lengths and node number lengths are configured, so that clusters with different scales can be easily supported.
The introduction of a distributed arbitration mechanism can eliminate blocking or waiting (see google span) in existing software schemes due to the close timestamp between multiple transactions. The arbitration mechanism selects an arbitrator based on the nodes participating in the transaction, without a single point of performance bottleneck and a single point of failure.
Referring to fig. 5, in order to solve the same technical problem, the present invention further provides a distributed database management apparatus based on distributed logic timestamps, where the distributed database includes coordination nodes, cataloging nodes and data nodes, and the local logic time of each node is synchronized according to a preset global logic time;
the distributed database management apparatus based on distributed logical time stamps includes:
a transaction time management module 1, configured to set, when a transaction starts, a transaction start time of the transaction as a local logic time of the coordinating node;
the transaction access management module 2 is configured to, when the data node receives the message sent by the coordinating node for the first time, determine whether a difference value between the local logic time of the data node and the local logic time of the coordinating node exceeds a preset first error threshold; if yes, synchronously timing the local logic time of the data node, and simultaneously controlling the data node to return an error message to the coordination node so that the coordination node rolls back the transaction, and retries the transaction after synchronously timing the local logic time of the coordination node;
the transaction pre-commit management module 3 is configured to set, when a transaction is pre-committed, the transaction pre-commit time of the transaction as the current local logic time of the coordinating node, and determine whether a difference value between the transaction pre-commit time and the transaction start time is greater than a transaction tolerance error of the transaction; if not, suspending the execution; if yes, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction;
the distributed arbitration module 4 is configured to, when two different transactions access the same data, determine whether a phase difference value of timestamps of the two different transactions is smaller than a preset second error threshold; and if so, selecting one data node from the target data nodes according to a preset algorithm as an arbitration node to arbitrate the time stamp sequence of the two different transactions.
Further, the preset first error threshold is twice the global tolerance error.
Further, the second error threshold is set in the following manner: finding the largest transaction tolerance error among different transactions accessing the same data, and setting the second error threshold to be twice the largest transaction tolerance error.
Further, the global tolerance error is dynamically changed according to the system condition; the device further comprises:
the global tolerance error recording module is used for acquiring and recording the global tolerance error of the current system when the co-regulation point and the cataloguing node perform synchronous calibration;
and the global tolerance error updating module is used for storing the current system global tolerance error into the metadata control block of the transaction when the transaction obtains the transaction starting time.
It can be understood that the foregoing apparatus item embodiments correspond to the method item embodiments of the present invention, and the distributed database management apparatus based on distributed logical timestamps provided in the embodiments of the present invention can implement the distributed database management method based on distributed logical timestamps provided in any method item embodiment of the present invention.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A distributed database management method based on distributed logic timestamps is characterized in that the distributed database comprises coordination nodes, cataloging nodes and data nodes, and the local logic time of each node is synchronously calibrated according to preset global logic time;
the distributed database management method based on the distributed logic timestamp comprises the following steps:
when a transaction starts, setting the transaction starting time of the transaction as the local logic time of the coordination node;
when the data node receives a message sent by the coordination node for the first time, judging whether a difference value between the local logic time of the data node and the local logic time of the coordination node exceeds a preset first error threshold value; if yes, synchronously timing the local logic time of the data node, and simultaneously controlling the data node to return an error message to the coordination node so that the coordination node rolls back the transaction, and retries the transaction after synchronously timing the local logic time of the coordination node;
when the transaction is pre-submitted, setting the transaction pre-submission time of the transaction as the current local logic time of the coordination node, and judging whether the difference value between the transaction pre-submission time and the transaction start time is greater than the transaction tolerance error of the transaction; if not, suspending the execution; if yes, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction;
when two different transactions access the same data, judging whether the difference value of the timestamps of the two different transactions is smaller than a preset second error threshold value or not; and if so, selecting one data node from the target data nodes according to a preset algorithm as an arbitration node to arbitrate the time stamp sequence of the two different transactions.
2. The distributed logical timestamp based distributed database management method of claim 1, wherein the preset first error threshold is twice the global tolerance error.
3. The distributed logical timestamp based distributed database management method of claim 1, wherein the second error threshold is set by: finding the largest transaction tolerance error among different transactions accessing the same data, and setting the second error threshold to be twice the largest transaction tolerance error.
4. The distributed logical timestamp based distributed database management method of claim 1, wherein the global tolerance error is dynamically altered according to system conditions; the method further comprises the following steps:
when the co-regulation point and the cataloguing node perform synchronous calibration, acquiring and recording the global tolerance error of the current system;
and when the transaction acquires the transaction start time, storing the current system global tolerance error into the metadata control block of the transaction.
5. The distributed logical timestamp based distributed database management method of claim 1, further comprising:
when a transaction modifies, deletes or inserts data, the identification ID of the transaction is used as a version to mark the data that was altered.
6. The distributed logical timestamp based distributed database management method of claim 1, wherein the identification ID of a transaction consists of a transaction start time of the transaction and a node number of a coordinating node participating in the transaction.
7. A distributed database management device based on distributed logic timestamps is characterized in that the distributed database comprises coordination nodes, cataloging nodes and data nodes, and the local logic time of each node is synchronously calibrated according to preset global logic time;
the distributed database management apparatus based on distributed logical time stamps includes:
the transaction time management module is used for setting the transaction start time of the transaction as the local logic time of the coordination node when the transaction starts;
the transaction access management module is used for judging whether a difference value between the local logic time of the data node and the local logic time of the coordination node exceeds a preset first error threshold value or not when the data node receives a message sent by the coordination node for the first time; if yes, synchronously timing the local logic time of the data node, and simultaneously controlling the data node to return an error message to the coordination node so that the coordination node rolls back the transaction, and retries the transaction after synchronously timing the local logic time of the coordination node;
the transaction pre-submission management module is used for setting the transaction pre-submission time of the transaction as the current local logic time of the coordination node when the transaction is pre-submitted, and judging whether the difference value between the transaction pre-submission time and the transaction start time is larger than the transaction tolerance error of the transaction or not; if not, suspending the execution; if yes, sending the transaction pre-submission time and the pre-submission message of the transaction to all data nodes participating in the transaction;
the distributed arbitration module is used for judging whether the difference value of the timestamps of the two different transactions is smaller than a preset second error threshold value or not when the two different transactions access the same data; and if so, selecting one data node from the target data nodes according to a preset algorithm as an arbitration node to arbitrate the time stamp sequence of the two different transactions.
8. The distributed logical timestamp based distributed database management apparatus of claim 7, wherein the preset first error threshold is twice the global tolerance error.
9. The distributed logical timestamp based distributed database management apparatus of claim 7, wherein the second error threshold is set in a manner that: finding the largest transaction tolerance error among different transactions accessing the same data, and setting the second error threshold to be twice the largest transaction tolerance error.
10. The distributed logical timestamp based distributed database management apparatus of claim 7, wherein the global tolerance error is dynamically altered according to system conditions; the device further comprises:
the global tolerance error recording module is used for acquiring and recording the global tolerance error of the current system when the co-regulation point and the cataloguing node perform synchronous calibration;
and the global tolerance error updating module is used for storing the current system global tolerance error into the metadata control block of the transaction when the transaction obtains the transaction starting time.
CN201911014865.2A 2019-10-23 2019-10-23 Distributed database management method and device based on distributed logic timestamp Pending CN110795506A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201911014865.2A CN110795506A (en) 2019-10-23 2019-10-23 Distributed database management method and device based on distributed logic timestamp
CN201911291498.0A CN111061810B (en) 2019-10-23 2019-12-13 Distributed transaction management method and system based on distributed logic timestamp
PCT/CN2020/114654 WO2021077934A1 (en) 2019-10-23 2020-09-11 Distributed transaction management method and system based on distributed logic timestamp
CA3125546A CA3125546A1 (en) 2019-10-23 2020-09-11 Distributed logical timestamp-based management method and system for distributed transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911014865.2A CN110795506A (en) 2019-10-23 2019-10-23 Distributed database management method and device based on distributed logic timestamp

Publications (1)

Publication Number Publication Date
CN110795506A true CN110795506A (en) 2020-02-14

Family

ID=69441083

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201911014865.2A Pending CN110795506A (en) 2019-10-23 2019-10-23 Distributed database management method and device based on distributed logic timestamp
CN201911291498.0A Active CN111061810B (en) 2019-10-23 2019-12-13 Distributed transaction management method and system based on distributed logic timestamp

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201911291498.0A Active CN111061810B (en) 2019-10-23 2019-12-13 Distributed transaction management method and system based on distributed logic timestamp

Country Status (3)

Country Link
CN (2) CN110795506A (en)
CA (1) CA3125546A1 (en)
WO (1) WO2021077934A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182103A (en) * 2020-09-24 2021-01-05 广州巨杉软件开发有限公司 Distributed database and method for realizing cross-node transaction strong consistency
CN113238892A (en) * 2021-05-10 2021-08-10 深圳巨杉数据库软件有限公司 Time point recovery method and device for global consistency of distributed system
WO2022001629A1 (en) * 2020-06-29 2022-01-06 华为技术有限公司 Database system, and method and apparatus for managing transactions
CN114095086A (en) * 2021-11-13 2022-02-25 南京智汇电力技术有限公司 Regional time synchronization method suitable for optical fiber communication

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795506A (en) * 2019-10-23 2020-02-14 广州巨杉软件开发有限公司 Distributed database management method and device based on distributed logic timestamp
CN111708796A (en) * 2020-06-23 2020-09-25 浪潮云信息技术股份公司 Data consistency method based on time stamp

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753609B (en) * 2008-12-15 2012-09-19 中国移动通信集团公司 Version control method, nodes and system of distributed system
US8356007B2 (en) * 2010-10-20 2013-01-15 Microsoft Corporation Distributed transaction management for database systems with multiversioning
CN103914452B (en) * 2012-12-30 2017-03-29 航天信息股份有限公司 Time error correction scheme and system in a kind of distributed file system
WO2019000398A1 (en) * 2017-06-30 2019-01-03 深圳市大疆创新科技有限公司 Method for scheduling transaction, and processor, distributed system, and unmanned aerial vehicle
CN110209726B (en) * 2018-02-12 2023-10-20 金篆信科有限责任公司 Distributed database cluster system, data synchronization method and storage medium
CN108984277B (en) * 2018-04-02 2019-08-30 北京百度网讯科技有限公司 Distributed database transaction processing method and processing device based on GPS atomic clock
CN110795506A (en) * 2019-10-23 2020-02-14 广州巨杉软件开发有限公司 Distributed database management method and device based on distributed logic timestamp

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022001629A1 (en) * 2020-06-29 2022-01-06 华为技术有限公司 Database system, and method and apparatus for managing transactions
CN112182103A (en) * 2020-09-24 2021-01-05 广州巨杉软件开发有限公司 Distributed database and method for realizing cross-node transaction strong consistency
CN113238892A (en) * 2021-05-10 2021-08-10 深圳巨杉数据库软件有限公司 Time point recovery method and device for global consistency of distributed system
CN114095086A (en) * 2021-11-13 2022-02-25 南京智汇电力技术有限公司 Regional time synchronization method suitable for optical fiber communication

Also Published As

Publication number Publication date
CA3125546A1 (en) 2021-04-29
CN111061810B (en) 2020-08-21
CN111061810A (en) 2020-04-24
WO2021077934A1 (en) 2021-04-29

Similar Documents

Publication Publication Date Title
CN111061810B (en) Distributed transaction management method and system based on distributed logic timestamp
US20220012264A1 (en) Pipelining Paxos State Machines
US6078930A (en) Multi-node fault-tolerant timestamp generation
CN110196760B (en) Method and device for realizing consistency of distributed transactions
US8301600B1 (en) Failover recovery in a distributed data store
US7076508B2 (en) Method, system, and program for merging log entries from multiple recovery log files
CN108121782B (en) Distribution method of query request, database middleware system and electronic equipment
US20050033947A1 (en) Multiprocessor system with interactive synchronization of local clocks
US20170168756A1 (en) Storage transactions
US11436218B2 (en) Transaction processing for a database distributed across availability zones
CN113987064A (en) Data processing method, system and equipment
US11748215B2 (en) Log management method, server, and database system
US20220253363A1 (en) Distributed database remote backup
WO2022111188A1 (en) Transaction processing method, system, apparatus, device, storage medium, and program product
Demirbas et al. Beyond truetime: Using augmentedtime for improving spanner
US20210263919A1 (en) Centralized Storage for Search Servers
CN112417043A (en) Data processing system and method
CN111552701A (en) Method for determining data consistency in distributed cluster and distributed data system
WO2022002044A1 (en) Method and apparatus for processing distributed database, and network device and computer-readable storage medium
CN114265900A (en) Data processing method and device, electronic equipment and storage medium
Li et al. Enhancing throughput of partially replicated state machines via multi-partition operation scheduling
CN114490570A (en) Production data synchronization method and device, data synchronization system and server
CN113253924A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN108108231A (en) Generation, processing method, device, system and the electronic equipment of user's request
CN114254047A (en) Data synchronization method, system, terminal and storage medium of distributed database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200214