CN112214649B - Distributed transaction solution system of temporal graph database - Google Patents

Distributed transaction solution system of temporal graph database Download PDF

Info

Publication number
CN112214649B
CN112214649B CN202011130789.4A CN202011130789A CN112214649B CN 112214649 B CN112214649 B CN 112214649B CN 202011130789 A CN202011130789 A CN 202011130789A CN 112214649 B CN112214649 B CN 112214649B
Authority
CN
China
Prior art keywords
transaction
version
data
coordinator
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011130789.4A
Other languages
Chinese (zh)
Other versions
CN112214649A (en
Inventor
蒋金凯
林学练
宋景和
马帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202011130789.4A priority Critical patent/CN112214649B/en
Publication of CN112214649A publication Critical patent/CN112214649A/en
Application granted granted Critical
Publication of CN112214649B publication Critical patent/CN112214649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention realizes a distributed transaction solution system of a temporal graph database by a method in the field of artificial intelligence, which is characterized in that a client, a master center node, a Coordinator and a participant structure are arranged, the distributed transaction of the temporal graph database is submitted to be expanded by designing and improving two stages, a concurrency control mechanism and a fault recovery mechanism of the temporal graph database in a distributed environment are designed, the two-stage flow is improved and the MVCC is designed, the speed of operating the temporal graph data in the distributed environment is improved, the performance is better compared with that of the existing TiDB supporting other transactions, the read-write concurrency performance can be well improved in the storage of the temporal graph data, and the transaction execution time is shortened.

Description

Distributed transaction solution system of temporal graph database
Technical Field
The invention relates to the field of artificial intelligence, in particular to a temporal graph database distributed transaction solution system.
Background
The development of the internet of things enables data to be continuously generated at every moment, and the continuously generated data carries time attributes, so that analysis and application can be carried out by using the temporal attributes. The temporal graph data is mainly characterized by the following points: the graph structure changes little, the temporal attributes are generated continuously, the data volume is huge, and the concurrency is high. Many solutions have emerged for the storage of temporal graph data, but one topic that has never been circumvented for proper storage in a distributed database is distributed transactions.
The database transaction is an operation set which forms a single logic working unit, and the conditions that partial operations are successful and the database is operated by a plurality of users simultaneously are abnormal and the like can be avoided. The transaction has four characteristics of ACID (atomicity, consistency, isolation and persistence), and is mainly realized through concurrent control and log recovery. The traditional concurrency control mode mainly comprises lock implementation, timestamp implementation, validity check implementation, multi-version concurrency control and the like. The log recovery technology mainly comprises undoo logs, redo logs and checkpoint setting, and the transaction fault and the system fault can be avoided by combining the three modes.
The types of transactions can be divided into flat transactions, check and save transactions, Distributed and connected transactions and Chained transactions. Different transaction types have different applicable scenarios, and flat transactions are most common in current systems, all operations are at the same level, starting with begin WORK and ending with COMMIT WORK or ROLLBACK WORK, during which the operations are atomic, either all executed or all rolled back. The existing TGgraph affairs are single-machine affairs and are only suitable for single-machine databases. For the temporal diagram data, the single-machine database has many limitations, such as the single-machine system still has poor concurrency performance and cannot support high-concurrency scenes. Moreover, there is an upper limit to the storage capacity of the single machine tggraph, and the time graph data is continuously written, and there is no upper limit theoretically, so that it is difficult for the single machine node to ensure the storage of the oversized time graph data. Finally, if the single tggraph fails, the whole system cannot provide services, and the time chart data is continuously written, so that the system needs to stably support the read-write service of the time chart data for a long time. Therefore, the temporal graph database is more suitable for being stored in a distributed environment, the TGraph single-machine transaction is only stored for a single data machine and is not suitable for the distributed environment, and the performance of the two-phase lock adopted by the concurrent control in the distributed environment is very poor. The transaction type of the patent is Distributed transaction (Distributed transaction), the Distributed transaction divides the top-level transaction into a plurality of sub-transactions according to the view topology, and the sub-transactions are generally flat transactions.
The classical design model for distributed transactions is two-phase commit, three-phase commit, compensation transactions, message queues, etc. The two-phase commit and the three-phase commit divide the transaction into two phases and three phases respectively, so that the atomicity is guaranteed, and the performance of the transaction is very poor. Unlike two-phase and three-phase commit, where the compensation transaction and the message queue are two-phase at the service level, the locking granularity and the like can be determined for the service, but the two are different, the message queue is asynchronous, and the compensation transaction is synchronous.
The existing distributed transaction is represented by TiDB, a two-stage submission mode is adopted to submit the transaction, and during concurrency control, the transaction acquires a timestamp from a central node twice as an initial version number and conducts concurrency control based on the version number. The reason for this is that, firstly, since the whole transaction flow has many network interactions, the increase of network interactions will bring more uncertainty to the system. The common coping method is to combine small transactions and synthesize a large transaction for processing. However, in many scenarios it is not possible to merge small transactions into large transactions. Secondly, the TiDB does not carry out specific design bottom layer storage on the time-state diagram data, and the design of the TiKV storage module is not suitable for storing the diagram data and the time-state data.
There is then a need for improvements to the prior art in the following directions:
1) distributed extended standalone temporal graph database operations.
All data of the current single-machine temporal map database system are stored in one server, so all operations are designed according to single machines. In a distributed system, data is stored in a plurality of nodes in a scattered manner, and the operation of one transaction may need to be completed cooperatively by a plurality of servers, which is insufficient for a single-machine temporal database transaction, so that one of the design goals of the transaction is to enable the temporal database transaction to support distributed operation and support the transaction to cross storage nodes.
2) ACID characteristics of temporal graph database transactions are maintained in a distributed environment.
In a single machine environment, the temporal graph database uses a two-phase lock mode to carry out concurrency control and maintain the ACID characteristics of the transaction. However, in a distributed environment, the application and release of the two-phase lock will consume a large amount of time delay, and the time for the transaction to hold the lock is greatly increased, so that the concurrent performance is reduced. Therefore, in a distributed environment, the scheme of ensuring the ACID characteristic can not use the two-phase lock of the single-machine temporal database is used, and a new concurrency control scheme needs to be redesigned to maintain the AICD characteristic of the temporal database distributed transaction.
Disclosure of Invention
The distributed transaction solution system of the temporal graph database is provided with a client, a Coordinator, a master and a partitive, wherein the master is a central node and is used for maintaining heartbeat and providing a global unique version number, the Coordinator is a storage node selected in the second stage to command other storage nodes to carry out unified submission, the partitive is other storage nodes, the distributed transaction of the temporal graph database is expanded by improving the two-stage submission through design, and a concurrency control mechanism and a fault recovery mechanism of the temporal graph database in a distributed environment are designed;
specifically, the improved two-phase commit comprises: the system provides a temporal graph read-write interface at a user interface level, sends operation to a corresponding storage node through network communication, calls a temporal graph database storage module interface at the bottom layer of the storage node to execute the operation, uniformly submits or rolls back a distributed transaction of a temporal graph database by adopting a mode of improving two-stage submission, is initiated by a client, the client initiating the transaction acquires a start version number from a master at first in the first stage of the transaction, then sends the transaction to each storage node participating in the transaction, the first stage of the storage node completes consistency verification and log writing, the operation result is simultaneously returned to the client and a Coordinator after the storage node completes the first stage, the transaction enters the second stage after returning the result, if all the storage nodes return success, the client returns the transaction success, the subsequent uniform submission is completed by the Coordinator uniformly commanding (the Coordinator acquires the end version number from the master at first when submitting, then sends a commit request to notify other particripts of the commit). If the storage node returns failure of the transaction, the Coordinator uniformly coordinates to roll back;
the concurrency control mechanism comprises a multi-version concurrency control process and a memory MVCC mechanism for solving read-write concurrency; when the multi-version concurrency control process is used for concurrency control, a database maintains a plurality of versions of one data, the data of each version is attached with a plurality of additional version information, a version chain is maintained, meanwhile, each transaction carries a version number, concurrency negotiation is carried out through the version numbers among the transactions, small version numbers gradually exit concurrency control along with the increase of the version numbers and are stored as historical versions, the memory MVCC mechanism only maintains a plurality of version object data in a memory, only maintains maximum version data in a disk, the memory clears the version data when the system considers that the version data are not needed to carry out concurrency control, the format of the data maintained in the memory comprises TGraph key information, Start version, end version and Value fields, the TGraph key information comprises id of a storage node and id of an edge, an attribute name, the Start version and the end version fields are version numbers obtained from a central node when the transactions Start and end respectively, the Value is an attribute Value; the MVCC mechanism of the memory is divided into a writing mechanism and a reading mechanism, wherein in the writing mechanism, the version number is obtained once at the beginning of the first stage and at the time of submitting the second stage of the transaction respectively to be used as the initial version number of the transaction, the data is judged during writing, if the data is the latest version data, the data with the version number is written into the memory, and the data of the disk is updated at the same time, otherwise, the data of the disk is not required to be updated, in the reading mechanism, if the data only exists in the disk, the data of the disk is read, and if the data of multiple versions exists in the memory, the data with the ending version number smaller than the current transaction operation version number and the starting version number smaller than and closest to the current transaction operation version number is selected;
the failure recovery mechanism adopts a central log mode, a Coordinator stores complete redo logs of the whole transaction in a centralized mode, when cluster nodes fail, the Coordinator coordinates and commands the cluster nodes to recover, the central log of the Coordinator ensures that data is consistent when the cluster nodes fail, and the redo logs comprise: the Transaction state is divided into a first done, a success and a rollback, wherein the Transaction is written into the log when the first stage is completed, so that the writing state is defaulted to be the first done when the Coordinator writes into the log, the Transaction after completing the first stage enters the second stage to perform unified rollback or commit, if the Transaction receives the success of all machine feedback, the Transaction is submitted, when the submission is completed, the Coordinator updates the log state to be the success again, and if the Transaction completes the rollback in the second stage, the log state is updated to be the rollback; the master and the cluster maintain a heartbeat, the operation of the whole cluster cannot be influenced when the machine is in a transient fault and is restarted, if the fault is not recovered after timeout, the master sends an instruction to stop all machines, all the transactions which are initiated in the downtime period and are related to the fault node fail, the running transactions are interrupted due to the machine fault, the transactions can be recovered after the machine is restarted, the system fault is solved through a redo log, a check point is set for the redo log, the check point corresponds to a transaction version number, the check point is updated once each time the internal storage version is cleaned, the transactions before the check point are all completed transactions, and the system starts from the check point when being restarted, reads the transaction log according to the version number and recovers.
The magnetic disk does not contain start version and end version information during storage.
The system considers that the mark for performing concurrent control on the version data is as follows: the transaction end version number plus 1 is lower than the running minimum version number of the master record or a timeout.
The node, edge and static attribute version control is updated by taking the node and the edge as units, the granularity of version control of the temporal attribute is smaller than that of the node, the edge and the static attribute, the static attribute is an attribute with an attribute value which changes rarely, and the temporal attribute is an attribute with an attribute value which changes frequently.
The method for solving the system fault by the redo log comprises the following steps: if the down machine assumes the Coordinator role: after restarting, the system checks the log, performs redo operation on the transaction with the state of first done, and re-executes the transaction for recovery; or if the down machine takes the role of a participant and the machine is restarted quickly after being down, the cluster is not closed completely, and the Coordinator finds that the target machine cannot be detected and continuously retries the target machine until the machine is recovered. And reading the redo log by the Coordinator when in retry, resending the operation to the participant and executing the operation, and finishing the unfinished task.
With two-phase commit by the conversion Coordinator, the transaction is client in the first phase Coordinator and Coordinator in the second phase, and the two-phase commit by the conversion Coordinator is more suitable for storing the temporal graph data because the temporal graph data is written frequently, and the improvement can reduce the client waiting time.
The technical effects to be realized by the invention are as follows:
through the improvement of the two-stage flow and the design of the memory MVCC, the speed of operating the time chart data in the distributed environment is improved, the performance is better than that of the existing other distributed database TiDB supporting the transaction, the read-write concurrency performance can be well improved on the storage of the time chart data, and the transaction execution time is shortened.
Drawings
FIG. 1 is a prior art TiDB transaction flow diagram;
FIG. 2 is a flow of execution of a transaction;
FIG. 3 is a version number acquisition and rollback step in the execution flow of a transaction;
FIG. 4 is a flow of clearing a memory version number;
FIG. 5 example of multi-version data reading;
figure 6100 total update transaction latency;
FIG. 7100 update transaction average latencies;
FIG. 8 read transaction TPS comparisons for different concurrency cases;
FIG. 9 write transaction TPS comparison for different concurrency cases;
Detailed Description
The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.
All data of the current single-machine temporal map database system are stored in one server, so all operations are designed according to single machines. In a distributed system, data is stored in a plurality of nodes in a scattered manner, and the operation of one transaction may need to be completed cooperatively by a plurality of servers, which is insufficient for a single-machine temporal database transaction, so that one of the design goals of the transaction is to enable the temporal database transaction to support distributed operation and support the transaction to cross storage nodes.
In a single machine environment, the temporal graph database uses a two-phase lock mode to carry out concurrency control and maintain the ACID characteristics of the transaction. However, in a distributed environment, the application and release of the two-phase lock will consume a large amount of time delay, and the time for the transaction to hold the lock is greatly increased, so that the concurrent performance is reduced. Therefore, in a distributed environment, the scheme of ensuring the ACID characteristic can not use the two-phase lock of the single-machine temporal database is used, and a new concurrency control scheme needs to be redesigned to maintain the AICD characteristic of the temporal database distributed transaction.
Based on the above, the invention provides a temporal graph database distributed transaction solution system.
Because the single-computer temporal map database transaction only aims at data single-computer storage, all operations are carried out on the single computer without considering the situation of data fragment storage. In a distributed environment, one transaction usually spans a plurality of storage nodes, different operations may need to be executed by different storage nodes, and problems such as routing, communication and the like are involved.
Common extensions to distributed transactions include two-phase commit, three-phase commit, compensation-based transactions, message queues, and the like. Although different in implementation and detail, they can be considered variants of the two-phase commit. Two-phase commit divides a distributed transaction into two phases: the client inquires whether each participating node can complete the task in the first stage, enters the second stage after receiving all the node responses, and then is informed of global submission or rollback.
The biggest problem with two-phase commit is that it makes the client's latency very long. However, analyzing the characteristics of the two-phase submission can find that: the client can determine whether the transaction can be submitted in the first stage, the second stage mainly comprises submission/rollback which is mainly completed by the server, and the client waits for the completion of the submission operation only to ensure, so that accidents are prevented.
Thus, the invention designs an extended TGraph distributed transaction method based on an improved two-phase commit.
The distributed affair of the temporal graph database provides a temporal graph read-write interface which is the same as the single affair of the temporal graph database at a user interface level, the operation is sent to the corresponding storage node through network communication, and the same temporal graph database storage module interface is called at the bottom layer of the storage node to execute the operation, so that the operation of the whole affair is completed.
Temporal graph database distributed transactions classify server nodes into three categories:
1) master: and the central node maintains the heartbeat and provides a globally unique version number.
2) Coordinator: and the coordinator mainly refers to the storage node selected in the second stage to instruct other storage nodes to carry out unified submission.
3) Participant: and participating in transaction operations, other storage nodes except the Coordinator.
The temporal graph database distributed transaction uses an improved two-phase commit approach to uniformly commit or rollback transactions. The transaction flow is initiated by the client, in the first phase of the transaction, the client initiating the transaction first obtains the start version number from the master, then sends the transaction to each storage node participating in the transaction, and the first phase of the storage node completes consistency verification and log writing. And after the storage node finishes the first stage, the operation result is simultaneously returned to the client and the Coordinator. And after a result is returned, the transaction enters the second stage, if all the nodes return success, the client returns the success of the transaction, and the subsequent unified submission is finished under the unified command of the Coordinator (the Coordinator acquires an ending version number from the master at first during submission and then sends a submission request to inform other participants of submission). If the returned transaction of the node fails, the Coordinator uniformly coordinates to roll back. The temporal graph database distributed transaction is to ensure that once all nodes return success, the transaction must be successfully committed, which needs to be ensured by fault handling as described later.
Through the conversion of the coordinator, the client only needs to wait for one stage, thereby greatly reducing the network communication times of the client and reducing the waiting time of the client.
Meanwhile, a concurrency control mechanism of the temporal graph database in a distributed environment is designed.
The concurrency control is mainly to maintain the execution sequence of the transaction and ensure the correct reading and writing, and common concurrency control implementation modes include lock-based implementation, timestamp-based implementation, validity check-based implementation, multi-version concurrency control and the like. The concurrency control design of the time chart database in the distributed environment is combined with the characteristics of the time chart data, so that the ACID characteristics of the transaction can be maintained in the distributed environment.
Stand-alone temporal map database:
in a single-machine temporal graph database system, concurrent control is carried out on transactions in a two-stage locking mode, locks of all operation data need to be acquired when the transactions start, the operations can be further executed after the acquisition is complete, and the locks are uniformly released after the transactions are executed. In a single machine environment, the two-phase lock functions well. However, in a distributed environment, if the transaction adopts a two-phase lock mode, the time for the transaction to hold the lock will be greatly increased, and the waiting time of the subsequent transaction which wants to operate the same data is increased, which finally causes the concurrency performance of the system to be sharply reduced. Therefore, in a distributed environment, two-phase locks are not suitable to be used, and the characteristics of the temporal graph database need to be combined to improve the concurrence control of the single-machine temporal graph database, so that the two-phase locks can also play a good role in the distributed environment.
Multi-version concurrency control (MVCC):
the concurrent control of multiple versions is very common in the current database system, such as InNODB, TiDB, etc., which are adopted by the prior art. When the MVCC is used for concurrent control, the database maintains a plurality of versions of one datum, and each version of the datum is attached with a plurality of additional version information to keep a version chain. When the data is written, the new version data is directly inserted, and the read request determines which version of data needs to be read according to the version number of the read request.
When MVCC concurrency control is performed, each transaction carries a version number, and concurrency negotiation is performed through the version numbers among the transactions. But as the version number increases, small version numbers will gradually exit concurrency control and only be saved as historical versions. Therefore, the stored multi-version information can be cleaned without being kept for a long time if no special requirement exists. The temporal map data oriented by the temporal map database usually only contains one version, and the multi-version data is not considered in storage, so that the multi-version data can be periodically cleaned if the multi-version concurrent control is adopted. In addition, the change of the structure of the data diagram of the temporal diagram is less, and the attribute of the temporal diagram is written frequently, so that different version granularities can be designed to reduce the overhead and accelerate the speed.
The MVCC mechanism for memory:
the design of the memory MVCC is combined with the characteristics of temporal graph data storage and the characteristics of MVCC concurrency control, only a plurality of version object data are maintained in the memory, and only the maximum version data are maintained in the disk. The memory is periodically cleaned of useless multi-version data. The multi-version data is only used for concurrent control and can be cleared when the system deems it impossible to need the version data for control.
The format of each piece of data maintained in memory is as follows:
TGraph key information Start version End version Value
Wherein the tggraph key information includes node (edge) id, attribute name, etc. Start version and end version are version numbers acquired from the central node at the beginning and the end of the transaction, respectively. Value is the true attribute Value. The data maintains multi-version in the memory in such a way, the data with the maximum version number can be synchronously updated to the disk for storage, and the disk does not contain version information such as start version and end version during storage.
For multi-version data maintained in the memory, if concurrent control is not required, the data can be cleared. It can be considered that concurrent control is no longer required when the following occurs:
the addition of 1 to the end version number of the transaction is lower than the running minimum version number recorded by the master, which indicates that the transaction is finished before other transactions start, so that the transaction can not conflict with the subsequent transaction any more, and the concurrent control is not required.
When the time is out, the probability of conflict between the transactions is reduced along with the increase of the initial time difference between the transactions, so that the conflict with the transactions can not occur after a certain time.
Example (c): as in the example shown in fig. 4, the current system is running with 7, 8 transactions, so the minimum version number being run is 7. Data information (part) contained in the memory is shown on the right side of fig. 1, and data with the transaction end version number added by 1 and less than 7 is red three pieces of data, so that the data can be cleaned.
Aiming at the characteristics of little change of the structure of the data graph of the temporal graph and frequent writing of the attribute of the temporal graph, the invention adopts different version granularities for the graph structure and the temporal attribute. For the nodes, edges and static attributes, updates rarely occur, so the version control updates in units of nodes and edges. And the temporal attributes are written frequently, if the same granularity is adopted, very many versions of data can be generated, the transaction performance is greatly reduced, and therefore, for the temporal attributes, the selection of finer granularity for version control is more appropriate. In the invention, the version granularity of the temporal attribute is refined to the value of the node temporal attribute at a certain time. Using version control of different granularities for different objects can increase the operating speed while avoiding as much memory overhead as possible.
A reading and writing mechanism:
when a plurality of users operate on certain data at the same time, the consistency of the data is easily destroyed, and the memory MVCC is mainly used for solving the problem of read-write concurrence. Transaction operations are mainly classified into read and write operations, and the read and write operations should write and read data of the correct version.
The writing mechanism is as follows:
and respectively acquiring the version number once when the first stage of the transaction is started and the second stage of the transaction is submitted as the initial version number of the transaction. For write operation, data is judged during write-in, and if the data is the latest version data, the data with the version number is written into the memory, and meanwhile, the data of the disk is updated. If the version data is not the latest version data, the disk does not need to be updated. The writing operations are not interfered with each other, so that the inconsistency condition can not occur.
The reading mechanism comprises the following steps:
data is stored in both memory and disk, so reading is divided into two cases: there is corresponding multi-version data in the memory (i.e. there may be transaction conflict), and the data is only stored in the disk (there is no transaction conflict). For both cases, if data is present only on the disk, the disk data may be read. And if the memory has multi-version data, selecting the data with the ending version number smaller than the current transaction operation version number and the starting version number smaller than and closest to the current transaction operation version number.
Example (c): there are five versions, namely five versions, of the attribute value of the attribute X of a node modified five times at a certain time point, and as shown in fig. 5 respectively, assuming that the version number of the current read transaction is 7, the data of fig. 2-5 (information such as key is omitted) is read, and for this operation, the visible attribute value is ab C, but the starting version number of C is closest to 7, so C is selected to return. Although the version number of D is 6 and is less than 7, the end version number is greater than 7, and is temporarily invisible to 7 in order to avoid inconsistent reading.
Designing a fault recovery mechanism:
one of the most important functions of the transaction is to ensure the data consistency at any time, and the server failure is an unavoidable problem, so that the data consistency can be ensured when the machine is in an accident, which is an important task to be completed by the transaction.
Failures that may be encountered during normal operation of a database are generally classified as: system failure and transaction failure. The system failure is mainly caused by hardware errors or system bugs, so that the system is aborted and crashed, and the transaction failure is caused by that the transaction cannot be executed normally due to conditions such as deadlock and error input. Both system failures, which are mainly discussed herein, and transaction failures, which are mainly guaranteed by transaction rollback, can cause database consistency to be compromised.
The distributed transaction adopts a central log mode, the Coordinator stores the complete redo log of the whole transaction in a centralized mode, when cluster nodes break down, the Coordinator can coordinate and command the cluster nodes to recover, and the central log of the Coordinator can ensure that data are still consistent when a system fault occurs and a machine crashes.
Redo log format:
Transaction Id Version Status Operation
wherein the Transaction Id is the unique Transaction identification ID; version is a starting Version number corresponding to the transaction; status is the current state of the transaction; operation corresponds to the Operation set of the whole transaction.
To simplify the transaction recovery process, the transaction state is reduced to three types: first done, success and rollback. The transaction writes to the log when the first phase is complete, so the default write status is first done when the Coordinator writes to the log. Transactions that complete the first phase then enter the second phase, performing a unified rollback or commit. If the success of all the machine feedbacks is received, the transaction is submitted, the Coordinator updates the log state to success again when the submission is finished, and if the rollback of the transaction is finished in the second stage, the log state is updated to rollback.
The central node master and the cluster maintain a heartbeat, the integral operation of the cluster cannot be influenced when the machine has a short fault and is restarted, and if the fault is not recovered after overtime, the central node master sends out an instruction to stop all machines from operating. Transactions related to the failed node that were initiated during the downtime will all fail. For a running transaction that is interrupted because of a machine failure, recovery will occur after the machine is restarted. The transaction mainly solves the system fault through a redo log, and two conditions are mainly adopted:
1) if the down machine assumes the Coordinator role: and after restarting, the system checks the log, performs redo operation on the transaction with the state of first done, and re-executes the transaction for recovery.
2) If the down machine takes the role of a participant and the machine is restarted quickly after being down, the cluster is not closed completely, and the Coordinator finds that the target machine cannot be detected and continuously retries the target machine until the machine is recovered. And reading the redo log by the Coordinator when in retry, resending the operation to the participant and executing the operation, and finishing the unfinished task.
And setting a check point for the redo log, wherein the check point corresponds to the transaction version number, the check point is updated once every time the transaction clears the memory version, the transactions before the check point are all completed transactions, and the system starts from the check point when restarting, reads the transaction log backwards according to the version number and recovers.
The test effect of the system is as follows:
through the improvement of the two-stage flow and the design of the memory MVCC, the speed of operating the time chart data in the distributed environment is greatly improved. The method is compared with a distributed database TiDB which also supports transactions, and the temporal attribute test of the read-write temporal graph is carried out.
The test configuration is as follows:
1) the windows 10 enterprise version desktop is configured with a processor as an Inter (R) core (TM) i 5-65003.20 Ghz, 8G memory, Seagate 1TB 7200 to SATA hard disk.
2) The CentOS Linux release 7.8.2003 is characterized in that a processor is configured to be Inter (R) core (TM) i5-45703.20Ghz, 8G memory and Seagate 1TB 7200-SATA hard disk.
3) The CentOS Linux release 7.8.2003 is characterized in that a processor is configured to be Inter (R) core (TM) i5-45903.30Ghz, 8G memory and Seagate 1TB 7200-SATA hard disk.
4) The CentOS Linux release 7.8.2003 is characterized in that a processor is configured to be Inter (R) core (TM) i5-45903.30Ghz, 8G memory and Seagate 1TB 7200-SATA hard disk.
5) The CentOS Linux release 7.8.2003 is characterized in that a processor is configured to be Inter (R) core (TM) i5-45703.20Ghz, 8G memory and Seagate 1TB 7200-SATA hard disk.
Wherein, the machine 1 sends a request for a client, the machine 2 provides a version and a route for a central node (both the TiDB and the transaction of the current time chart need the central node), and the machines 3, 4 and 5 are storage nodes.
Fig. 6 shows the total delay of updating the temporal attributes for 100 transactions (each updating the temporal attributes of 5000 nodes) under different concurrency conditions, and it can be seen that the total delay consumed by the scheme (DTG) adopted herein is stable around 16 seconds, while the total time delay of the TiDB is around 530 seconds. Fig. 7 is the average delay of a single transaction in this case, and it can be seen that the delay of a single transaction of TiDB increases with the increase of concurrency, and the delay is larger (greater than 5000ms), while the scheme adopted herein always keeps a relatively stable lower level (180 ms).
Fig. 8 and 9 show that TPS of read-write temporal attributes under different concurrency conditions, and it can be seen that, no matter a read transaction or a write transaction, under our scheme, TPS can increase with concurrency, a difference between a DTG of the read transaction and TPS of the TiDB is not large, the TiDB is stabilized at about 380, and TPS of the DTG gradually increases from 308 to 700. For write transactions TiDB is an order of magnitude lower than DTG, TiDB write transaction TPS settles to single digit, while DTG write transaction TPS ramps from 333 to 709.
The comparison shows that the transaction scheme can well improve the concurrent performance of reading and writing on the storage of the temporal graph data, and simultaneously reduce the time for executing the transaction.

Claims (6)

1. A temporal graph database distributed transaction resolution system, characterized by: the system is provided with a client, a Coordinator, a master and a partitive, wherein the master is a central node and is used for maintaining heartbeat and providing a global unique version number, the Coordinator is a storage node selected in the second stage to command other storage nodes to carry out unified submission, the partitive is other storage nodes, the distributed transaction of the temporal database is expanded by designing and improving the two-stage submission, and a concurrency control mechanism and a fault recovery mechanism of the temporal database in a distributed environment are designed;
specifically, the improved two-phase commit comprises: the system provides a temporal graph read-write interface at a user interface level, sends operation to a corresponding storage node through network communication, calls a temporal graph database storage module interface at the bottom layer of the storage node to execute the operation, uniformly submits or rolls back a distributed transaction of a temporal graph database by adopting a mode of improving two-stage submission, is initiated by a client, the client initiating the transaction acquires a starting version number from a master at first in the first stage of the transaction, then sends the transaction to each storage node participating in the transaction, the first stage of the storage node completes consistency verification and log writing, the operation result is simultaneously returned to the client and a Coordinator after the first stage of the storage node is completed, the transaction enters the second stage after the result is returned, if all the storage nodes return success, the client returns the transaction success, and the subsequent uniform submission is uniformly commanded by the Coordinator, when the transaction is submitted, the C oordiner acquires an ending version number from the master, then sends a submission request to inform other partitionants of submitting, and if the transaction returned by the storage node fails, the Coordinator coordinates to roll back uniformly;
the concurrency control mechanism comprises a multi-version concurrency control process and a memory MVCC mechanism for solving read-write concurrency; when the multi-version concurrency control process is used for concurrency control, a database maintains a plurality of versions of one data, the data of each version is attached with a plurality of additional version information, a version chain is maintained, meanwhile, each transaction carries a version number, concurrency negotiation is carried out through the version numbers among the transactions, small version numbers gradually exit concurrency control along with the increase of the version numbers and are stored as historical versions, the memory MVCC mechanism only maintains a plurality of version object data in a memory, only maintains maximum version data in a disk, the memory clears the version data when the system considers that the version data are not needed to carry out concurrency control, the format of the data maintained in the memory comprises TGraph key information, Start version, end version and Value fields, the TGraph key information comprises id of a storage node, id of an edge, an attribute name, the Start version field and the end version field are version numbers obtained from a central node when the transactions Start and end respectively, the Value is an attribute Value; the MVCC mechanism of the memory is divided into a writing mechanism and a reading mechanism, wherein in the writing mechanism, the version number is obtained once at the beginning of the first stage and at the time of submitting the second stage of the transaction respectively to be used as the initial version number of the transaction, the data is judged during writing, if the data is the latest version data, the data with the version number is written into the memory, and the data of the disk is updated at the same time, otherwise, the data of the disk is not required to be updated, in the reading mechanism, if the data only exists in the disk, the data of the disk is read, and if the data of multiple versions exists in the memory, the data with the ending version number smaller than the current transaction operation version number and the starting version number smaller than and closest to the current transaction operation version number is selected;
the failure recovery mechanism adopts a central log mode, a Coordinator stores complete redo logs of the whole transaction in a centralized mode, when cluster nodes fail, the Coordinator coordinates and commands the cluster nodes to recover, the central log of the Coordinator ensures that data is consistent when the cluster nodes fail, and the redo logs comprise: the transaction state is divided into a first done, a success and a rollback, wherein the transaction is written into the log when the first stage is completed, so that the writing state is defaulted to be the first done when the Coordinator writes into the log, the transaction after completing the first stage enters the second stage to perform unified rollback or commit, if the transaction receives the success of all machine feedback, the transaction is submitted, when the submission is completed, the Coordinator updates the log state to be the success again, and if the transaction completes the rollback in the second stage, the log state is updated to be the rollback; the master and the cluster maintain a heartbeat, the operation of the whole cluster cannot be influenced when the machine is in a transient fault and is restarted, if the fault is not recovered after timeout, the master sends an instruction to stop all machines, all the transactions which are initiated in the downtime period and are related to the fault node fail, the running transactions are interrupted due to the machine fault, the transactions can be recovered after the machine is restarted, the system fault is solved through a redo log, a check point is set for the redo log, the check point corresponds to a transaction version number, the check point is updated once each time the internal storage version is cleaned, the transactions before the check point are all completed transactions, and the system starts from the check point when being restarted, reads the transaction log according to the version number and recovers.
2. The temporal graph database distributed transaction solution system according to claim 1, wherein: the magnetic disk does not contain start version and end version information during storage.
3. The temporal graph database distributed transaction solution system as defined in claim 2, wherein: the system considers that the mark for performing concurrent control on the version data is as follows: the transaction end version number plus 1 is lower than the running minimum version number of the master record or a timeout.
4. The tense graph database distributed transaction resolution system of claim 3, wherein: the node, edge and static attribute version control is updated by taking the node and the edge as units, the granularity of version control of the temporal attribute is smaller than that of the node, the edge and the static attribute, the static attribute is an attribute with an attribute value which changes rarely, and the temporal attribute is an attribute with an attribute value which changes frequently.
5. The tense graph database distributed transaction solution system of claim 4, wherein: the method for solving the system fault by the redo log comprises the following steps: if the down machine assumes the Coordinator role: after restarting, the system checks the log, performs redo operation on the transaction with the state of first done, and re-executes the transaction for recovery; or if the downtime machine takes the role of the partitive, and the machine is rapidly restarted after the downtime, the cluster is not completely closed, the Coordinator finds that the target machine cannot be continuously retried until the machine is recovered, the Coordinator reads the redo log when the target machine is retried, the operation is sent to the partitive again and executed, and the unfinished task is completed.
6. The tense graph database distributed transaction solution system of claim 5, wherein: two-phase commit is adopted for the conversion Coordinator, the Coordinator is a client in the first phase of the transaction, and the Coordinator is in the second phase.
CN202011130789.4A 2020-10-21 2020-10-21 Distributed transaction solution system of temporal graph database Active CN112214649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011130789.4A CN112214649B (en) 2020-10-21 2020-10-21 Distributed transaction solution system of temporal graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011130789.4A CN112214649B (en) 2020-10-21 2020-10-21 Distributed transaction solution system of temporal graph database

Publications (2)

Publication Number Publication Date
CN112214649A CN112214649A (en) 2021-01-12
CN112214649B true CN112214649B (en) 2022-02-15

Family

ID=74056288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011130789.4A Active CN112214649B (en) 2020-10-21 2020-10-21 Distributed transaction solution system of temporal graph database

Country Status (1)

Country Link
CN (1) CN112214649B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391885A (en) * 2021-06-18 2021-09-14 电子科技大学 Distributed transaction processing system
CN113986974A (en) * 2021-10-27 2022-01-28 建信金融科技有限责任公司 Database transaction management method and device and database transaction recovery method and device
CN114741569B (en) * 2022-06-09 2022-09-13 杭州欧若数网科技有限公司 Method and device for supporting composite data types in graph database
CN114282074B (en) * 2022-03-04 2022-08-16 阿里云计算有限公司 Database operation method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994022077A2 (en) * 1993-03-15 1994-09-29 University Of Westminster Apparatus and method for parallel computation
CN1169555A (en) * 1996-07-02 1998-01-07 刘莎 Computor input method of limited-semateme encoding of different natural language
CN103399790A (en) * 2013-08-20 2013-11-20 浙江中控技术股份有限公司 Transaction committing method and device based on distributed type real-time database system
CN103595651A (en) * 2013-10-15 2014-02-19 北京航空航天大学 Distributed data stream processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103842995A (en) * 2011-08-01 2014-06-04 标记公司 Generalized reconciliation in a distributed database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994022077A2 (en) * 1993-03-15 1994-09-29 University Of Westminster Apparatus and method for parallel computation
CN1169555A (en) * 1996-07-02 1998-01-07 刘莎 Computor input method of limited-semateme encoding of different natural language
CN103399790A (en) * 2013-08-20 2013-11-20 浙江中控技术股份有限公司 Transaction committing method and device based on distributed type real-time database system
CN103595651A (en) * 2013-10-15 2014-02-19 北京航空航天大学 Distributed data stream processing method and system

Also Published As

Publication number Publication date
CN112214649A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112214649B (en) Distributed transaction solution system of temporal graph database
CN109739935B (en) Data reading method and device, electronic equipment and storage medium
US20130110781A1 (en) Server replication and transaction commitment
US9690679B2 (en) Transaction commitment and replication in a storage system
JP5660693B2 (en) Hybrid OLTP and OLAP high performance database system
JP4586019B2 (en) Parallel recovery with non-failing nodes
US9779128B2 (en) System and method for massively parallel processing database
US11132350B2 (en) Replicable differential store data structure
CN113396407A (en) System and method for augmenting database applications using blockchain techniques
US9767135B2 (en) Data processing system and method of handling requests
US20090063807A1 (en) Data redistribution in shared nothing architecture
US20230315713A1 (en) Operation request processing method, apparatus, device, readable storage medium, and system
JP2015514248A (en) System and method for supporting transaction recovery based on strict ordering of two-phase commit calls
EP3396560B1 (en) Database operating method and device
CN107533474B (en) Transaction processing method and device
US12099416B1 (en) Apparatus for resolving automatic transaction facility (ATF) failures
CN113391885A (en) Distributed transaction processing system
CN109783578B (en) Data reading method and device, electronic equipment and storage medium
EP4276651A1 (en) Log execution method and apparatus, and computer device and storage medium
US11720451B2 (en) Backup and recovery for distributed database with scalable transaction manager
CN115658245B (en) Transaction submitting system, method and device based on distributed database system
EP4095709A1 (en) Scalable transaction manager for distributed databases
US20170139980A1 (en) Multi-version removal manager
CN112559496A (en) Distributed database transaction atomicity realization method and device
US12066999B1 (en) Lock-free timestamp ordering for distributed transactions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant