CN109977171A - A kind of distributed system and method guaranteeing transaction consistency and linear consistency - Google Patents

A kind of distributed system and method guaranteeing transaction consistency and linear consistency Download PDF

Info

Publication number
CN109977171A
CN109977171A CN201910247559.7A CN201910247559A CN109977171A CN 109977171 A CN109977171 A CN 109977171A CN 201910247559 A CN201910247559 A CN 201910247559A CN 109977171 A CN109977171 A CN 109977171A
Authority
CN
China
Prior art keywords
affairs
consistency
transaction
data
back end
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910247559.7A
Other languages
Chinese (zh)
Other versions
CN109977171B (en
Inventor
卢卫
张孝
杜小勇
陈跃国
赵欣
程一舰
张真苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Publication of CN109977171A publication Critical patent/CN109977171A/en
Application granted granted Critical
Publication of CN109977171B publication Critical patent/CN109977171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present invention relates to a kind of distributed systems and method for guaranteeing transaction consistency and linear consistency comprising multiple client and the database service end being made of access layer, metamessage management cluster, overall situation Gts generation cluster and issued transaction and accumulation layer;Client is used to provide the interface interacted with database service end for user, sends database service end for user's request;Access layer is used to receive the request of client transmission, and parses generation executive plan;Metamessage manages the management that cluster is used for distributed type assemblies;Global Gts generates cluster, for generating length of a game's stamp, carries out unique orderings to the global transaction in distributed system to realize linear consistency;Issued transaction and accumulation layer include multiple Resource Management nodes, and the executive plan for being sent according to access layer executes the affair logic, and obtained result returns to client through access layer.The present invention can be widely applied to data processing field.

Description

A kind of distributed system and method guaranteeing transaction consistency and linear consistency
Technical field
The present invention relates to technical field of data processing, especially with regard to a kind of guarantee transaction consistency and linear consistency Distributed system and method.
Background technique
Firstly, the consistency to distributed transaction is introduced.Data processing technique needs the semanteme of affairs and borrows pass It is tetra- characteristics of ACID of database, to guarantee the transactional attribute of system, to meet the demand of the electronic transaction of business community.Its In, A is atomicity, and C is consistency, and I is isolation, and D is persistence.Electronic transaction generic operation, need by this four characteristics Lai Guarantee the safe and reliable of transaction.
Distributed transaction processing also needs tetra- characteristics of ACID for meeting affairs.To meet tetra- characteristics of ACID, Data processing technique needs multiple technologies to ensure.Most important of which is that the consistency and isolation of data, this is because number According to consistency determine the correctness problem of data, and isolation determines the performance issue of concurrent system.
It realizes transaction consistency, depends on Parallel access control algorithm.Common Parallel access control algorithm has: base In the Parallel access control agreement of lock, the Parallel access control agreement based on timestamp ordering, the concurrently access control based on MVCC Agreement processed and Parallel access control agreement based on OCC.There is not exception firstly the need of guarantee data in these algorithms, that is, meet thing The serializability of business is dispatched, and so just can ensure that correctness.Secondly, different Parallel access control algorithms, determines office The concurrency of reason, and then the transaction throughput of system is influenced, this is a performance issue.
Embodiment of the consistency of affairs in distributed data base is that the distributed transaction transboundary put guarantees consistency.Text Offer " Distributed snapshot isolation:global transactions pay globally, local Transactions pay locally " (distributed snapshot isolation: global transaction is executed in the overall situation, and local matter is held locally Row) it is referred to existing data exception under two kinds of distributed systems, the two are abnormal if it happens, then it cannot be guaranteed that the one of affairs Cause property.
Secondly, the external consistency being related to CAP theory is introduced.It is theoretical based on CAP (also known as brewer theorem), Consistency in distributed system is defined as multiple ranks, and can be divided into " strong consistency " and " weak consistency " two class.
For strong consistency, it is desirable that distributed system will guarantee linear consistency.Linear consistency needs to guarantee global close System, the definition deferred to by linear consistency, all operation global orderlies will be met by being distinctly claimed linear consistency, and be operated Between must assure that returns-before (visible after return) partial ordering relation.Namely it requires from the angle of external observer It sees, after the event with time sequence occurs, the influence generated to data can be read according to event sequence.Linear consistency is point The characteristic of cloth system and database itself are not directly linked.But distributed data base system needs full as its name suggests Sufficient external consistency.
For weak consistency, wherein more commonly used is cause and effect consistency.The definition of cause and effect consistency is distinctly claimed cause and effect For consistency only to there is causal operation to carry out sequence constraint, i.e. cause and effect coherence request is weaker than linear consistency.Cause and effect one If cause property requires process A first to have read the old value of a certain data item, the new value for generating the data item is then updated, then another Read operation of a process B to the data item, it is necessary to assure the affairs first completed will not first read new value than the affairs of rear completion, I.e. it has to be ensured that the partial ordering relation being had determined in process A.At the same time, nothing is accessed with the unallied process C data of process A Limitation.
In distributed system, if being a kind of feasible guarantee transaction consistency using globally unique task manager Method.But by global transaction manager, however it remains three major issues:
1, global transaction manager is realized complex, has many technological difficulties to need that the time is spent to overcome, to technology people The challenge of member is very big.
2, the transaction processing facility of the single machine database in its lower layer, i.e. single node cannot be efficiently used, because on Layer realizes the function of transaction management, and bottom has not needed to realize again.This means that the db transaction type engine quilt of single-point It is discarded, it goes to upper layer and repeats to make wheel, waste time and manpower, financial resources.
3, global transaction manager is the framework of a single-point, this does not meet the distributed thought of decentralization, is one Performance bottleneck point.
Currently, about guaranteeing that it is following several that the scheme of transaction consistency and linear consistency has in distributed system:
The first: is as shown in Figure 1, being the global thing of Postgres-XC (Cluster Solution of Postgres database) Business management implementation, Postgres-XC (are coordinated to save by a GTM (global transaction manager), multiple Coordinator Point), multiple Datanode (back end) composition.Wherein, GTM is the core component of Postgres-XC, is used for global transaction control The control of the visibility of system and tuple.The effect of GTM is to distribute global transaction number and manages PGXC MVCC module, at one There can only be a main GTM in CLUSTER.
However, by explanation before as it can be seen that the core technology concurrently accessed (is most difficult to most complicated portion in Database Systems Point and size of code at most spread the parts of entire storage engines), all in this module of GTM.In this way as unique one Global transaction management node, the work done is not only complicated and framework on but also be a single-point, therefore readily become a bottleneck point.
Second, be a kind of algorithm of the task manager of complete decentralization of pertinent literature introduction, the algorithm is every One node (independent database) is as soon as all safeguard " global transaction manager GTM ", there is how many node in a cluster How many a GTM.Wherein, GTM is responsible for safeguarding that the serializability of global transaction, each global transaction can be endowed in entire collection range Uniquely incremental global transaction mark, this mark is a timestamp value, illustrates the sequence between global transaction, so as to Realize the serializability scheduling of affairs.Later, global transaction can be broken down into the subtransaction executed on different nodes, global thing The subtransaction of business executes (each node uses S2PL algorithm) in each node with the time identifier of global transaction, because of affairs mark Know (subtransaction mark) Time-Dependent stamp and sorts event global orderly so subtransaction is carried out success on each interdependent node of son Then global transaction can be submitted.Multiple global transactions are being dispersed in each node execution by this algorithm, have been reached and have been removed the overall situation The purpose of task manager.But the distribution of the timestamp of global transaction (referred to as GTS), when local dependent on individual node Clock, author think that the clock of multiple nodes needs whether synchronous but clock synchronizes the correctness for having no effect on its algorithm, this be because It is the timestamp value storage in algorithm the submitted affairs of each overall situation to the child node being related to, as under child node The foundation that one timestamp value generates, and use this timestamp value as the condition of transaction rollback in algorithm, new affairs are opened Compared with GTS, timestamp value is rolled back dynamic timestamp value less than the new affairs of the overall situation of GTS.
However, the algorithm does not use foundation of the conflicting information as Conflict solving between global affairs, but according to Rely time-sequencing global transaction to realize serializability, introduces more rollback situation;The timestamp value of child node is dependent on The timestamp value of submitted affairs introduces the case where putting off backward of transaction time stamp.Moreover, for Parallel access control The Database Systems of MVCC are not based on, in the case where read transaction is more, the performance of issued transaction is not high.
The third, the Spanner system of Google, using Truetime mechanism, (this mechanism relies on setting for physics to the system It is standby: GPS and atomic clock), the transaction management mechanism of decentralization may be implemented, and do not depend on global timestamp as affairs Concurrently access mutual exclusion does not depend on global timestamp according to (taking SS2PL+MVCC technology as Parallel access control algorithm) yet Realization foundation as linear consistency.However, due to the mechanism depend on physics equipment: GPS and atomic clock, expense compared with Height is not suitable for all users and uses from the point of view of economy.
Summary of the invention
In view of the above-mentioned problems, the object of the present invention is to provide a kind of distributions for guaranteeing transaction consistency and linear consistency System and method, the system can guarantee that the transaction consistency of the distributed transaction of cross-node and CAP are theoretical in distributed structure/architecture The linear consistency being related to, to include distributed data base (SQL, NoSQL, NewSQL, relationship type, non-relational) system System, distributed big data processing system, there are in the data processing system of the transaction type system of global write operation of cross-node etc., Guarantee that data operated by the global transaction of cross-node are consistent affairs and are linear consistent.
To achieve the above object, the present invention takes following technical scheme: a kind of guarantee transaction consistency and linear consistency Distributed system comprising: multiple client and by access layer, metamessage management cluster, overall situation Gts generation cluster and thing The database service end that business processing and accumulation layer are constituted;The client be used for for user provide with the database service end into User's request is sent the database service end by the interface of row interaction;The access layer is for receiving the client hair The request sent, and parse generation executive plan;The metamessage management cluster is used to collect the distribution of the distributed system Group is managed collectively;The overall situation Gts generates cluster, for generating length of a game's stamp, to the global thing in distributed system Business carries out unique orderings to realize linear consistency;The issued transaction and accumulation layer include multiple Resource Management nodes, described Resource Management node includes coordinator node and back end, what the coordinator node and back end were used to be sent according to access layer Executive plan executes the affair logic, and obtained result returns to the client through the access layer.
Further, the back end is to carry out subregion storage, the association for the data in the distributed system Point of adjustment is used to carry out Coordination Treatment to the affairs in the distributed system;For the use of all Resource Management nodes On the way, there is the following two kinds method of salary distribution: master slave mode: a portion Resource Management node is specially used as to the coordination of issued transaction Node, while surplus resources management node is used as back end;Ad-hoc mode: all Resource Management nodes are equities , each Resource Management node has the function of back end and coordinator node two simultaneously.
Further, length of a game's stamp that the overall situation Gts generates cluster is made of eight bytes, and eight bytes are using mixing Physical clock mode forms: a) first 44 are physical time timestamp value;B) afterwards 20 in one millisecond monotonic increase count.
Further, the Data Structures that the distributed system is related to include global transaction state table, local matter State table, data item data structure, affairs global readset write collection and communication protocol and four class data structure of message;
The global transaction state table is for safeguarding from the transaction status from the point of view of the distributed system overall situation, with hexa-atomic ancestral { TID, Lowts, Uppts, Status, Gts, Nodes } is indicated, wherein TID represents affairs unique identification, and Lowts represents affairs Logic submission time stabs lower bound, and Uppts represents the affair logic submission time stamp upper bound, and Status represents Current transaction in global institute The state at place, Gts represent the timestamp that affairs overall situation submission/rollback is completed, and Nodes represents the data section that Current transaction is related to Point;
The local matter state table is used for local matter state of the care of transaction on each Resource Management node, uses { TID, Lowts, Uppts, Status } is indicated, wherein TID represents affairs unique identification, when Lowts represents the affair logic submission Between stab lower bound, Uppts represents the affair logic submission time stamp upper bound, and Status represents the local state of affairs;
The data item data structure includes as first group of data element of linear consistency foundation and as distribution Second group of data element of transaction consistency, first group of data element include { gts, info_bit }, wherein gts is represented The globally unique sequence of one affairs in a distributed system, info_bit are currently recorded in gts field to identify It is Gts or TID;Second group of data element includes { wts, rts }, wherein wts creates the data item version for recording Affairs logical time stamp, rts is used to record the logical time stamps of the newest affairs for reading the data item;
The global readset of the affairs reads total data item for recording office, with BlockAddress, Offset, Size, Value } it indicates, wherein BlockAddress represents data item and corresponds to block address, and Offset represents data item in block Offset, Size represents data item size, and Value represents data item occurrence;
The global write collection of the affairs is used to record the total data item that affairs need to update, with BlockAddress, Offset, Size, NewValue, OperationType } it indicates, wherein BlockAddress is with representing data item corresponding blocks Location, Offset represent offset of the data item in block, and Size represents data item size, and NewValue represents data item occurrence, It is update, insertion or delete operation that OperationType, which represents operation,;
The communication protocol and message include that the coordinator node is sent to the message of the back end, the back end It is sent to the message of the coordinator node, the coordinator node is sent to the overall situation Gts and generates the message of cluster, the overall situation Gts Spanning set mass-sends the message toward the coordinator node;The message that the coordinator node is sent to the back end includes reading data to ask Ask message, checking request message, write-in submission/rollback request;The message that the back end is sent to the coordinator node includes Read request feedback message, local verification feedback message;The coordinator node, which is sent to the overall situation Gts and generates the message of cluster, includes Request message stabs in length of a game;The message that the overall situation Gts spanning set mass-sends the past coordinator node includes that length of a game's stamp is asked Seek feedback message.
A kind of multi-level coherence method for the distributed system guaranteeing transaction consistency and linear consistency comprising with Lower step: 1) the system consistency model that can be realized multi-level consistency is established;2) it is determined and is distributed according to actual business demand Formula system needs consistency level to be achieved, and is determined based on the system consistency model of foundation and be suitable for the one of the coherence request Cause property execute algorithm, in distributed system distributed transaction and single machine affairs execute, obtain transaction execution results.
Further, in the step 1), the method that can be realized the system consistency model of multi-level consistency, packet are established Include following steps:
1.1) transaction concurrency control is carried out using the OCC strategy based on DTA, established for guaranteeing transaction consistency RUC-CC algorithm;
1.2) based on global Gts spanning set all living creatures at length of a game stamp and global transaction state, establish based on the overall situation The linear consistency of timestamp guarantees algorithm, for guaranteeing linear consistency between affairs;
1.3) using the method for carrying out reading data twice, foundation reads linear consistency twice and guarantees algorithm, for guaranteeing Consistency between affairs;
1.4) step 1.1)~step 1.3) transaction consistency and linear consistency and MVCC algorithm are combined, Establish the unified cause model that can satisfy a variety of consistency levels.
Further, in the step 1.1), transaction concurrency control is carried out using the OCC strategy based on DTA, is established The method of RUC-CC algorithm, comprising the following steps:
1.1.1) to the affairs T sent by client, corresponding initial work is completed on coordinator node;
1.1.2 the global execution stage of affairs) is divided into 3 stages: reading stage, Qualify Phase and submission write-in/ Rollback phase, and under the coordination of coordinator node, each back end relevant to operation executes affairs, and to affairs shape It is submitted in state table or the corresponding table item of rollback affairs is purged.
Further, the step 1.1.2) in, the global execution stage of affairs is divided into 3 stages: the reading stage tests Card stage and submission write-in/rollback phase, and under the coordination of coordinator node, each back end relevant to operation is to affairs It is executed, comprising the following steps:
1.1.2.1) affairs T reads required data according to logic is executed, and will update the local memory for writing affairs T,
1.1.2.2) affairs T verifies whether itself conflicts with the presence of other affairs, is verified result;
1.1.2.3) for affairs T according to the verification result of Qualify Phase, selection, which executes, is written submission or rollback.
Further, the step 1.1.2.1) in, affairs T reads required data according to logic is executed, and update is write To the method for the local memory of affairs T are as follows:
Firstly, the coordinator node of affairs T needs to send the reading data of item of read data x to the back end where data item x Request message;
Then, after back end where data item x receives read data request message, first to the local matter shape of affairs T State table is establishd or updated, and the visible version of data item x is then searched in the logic life cycle of affairs T, and to affairs T Coordinator node send read request feedback message;
Finally, after the coordinator node of affairs T receives the read request feedback messages of all back end, to whether needing rollback Judged, if necessary to rollback, then enters global rollback phase, otherwise affairs continue to execute.
Further, the step 1.1.2.2) in, affairs T verifies whether itself conflicts with the presence of other affairs, obtains The method of verification result are as follows:
Firstly, the coordinator node of affairs T modifies the state of affairs T in global transaction state table are as follows: Gvalidating;So The each back end being related to afterwards to affairs T sends checking request message and locally-written collection;
Secondly, executing local verification operation, tool after each back end that affairs T is related to receives checking request message Body the following steps are included:
1. updating T.Lowts=max (T.Lowts, vrm.Lowts), the T.Uppts of affairs T in local matter state table =min (T.Uppts, vrm.Uppts);
It is then authentication failed 2. checking whether T.Lowts is greater than T.Uppts, returns to Abort to the coordinator node of affairs T Message enters rollback, otherwise enters step 3.;
3. finding each of transaction write collection data item y, then check whether the WT of data item y is empty:
If being not sky, Abort message is sent to the coordinator node of affairs T and enters rollback;
Otherwise it enters step 4.;
The WT for concentrating each data item y is write as T.TID 4. updating, and adjusts the time of affairs T in local matter state table Lower bound is stabbed, the rts of y is larger than;
It is then authentication failed, local rollback, then to the coordination of affairs T 5. checking whether T.Lowts is greater than T.Uppts Node returns to Abort message;Otherwise, it enters step 6.;
6. concentrating each element y to writing, the timestamp of affairs, eliminates read/write conflict in adjustment affairs T or RTlist;
7. creating the new version of data item y according to the updated value of data item y, while it is not global that expression new version is arranged The flag of submission;
8. to the coordinator node of affairs T return affairs T local verification feedback message lvm, wherein the Lowts of lvm and Uppts has recorded logical time stamp bound of the affairs T on local data node respectively;
Finally, after the coordinator node of affairs T receives the local verification feedback messages of all Resource Management nodes, according to receiving Message determine that can affairs T pass through verifying.
Further, in the step 1.2), based on global Gts spanning set all living creatures at length of a game's stamp and global thing The linear consistency based on length of a game's stamp of business state, foundation guarantees algorithm, comprising the following steps:
1.2.1) client initiates affairs T request, is established a connection by access layer, forms a session;
1.2.2) access layer parses affairs T, and chooses coordinator node to be responsible for managing the implementation procedure of the affairs;
1.2.3 when) read transaction T starts, length of a game when global Gts is generated cluster obtains affairs is stabbed, and Gts is recorded in the global transaction state table of the read transaction;Coordinator node back end relevant to all read transactions, which is established, to be connected It connects, by the query execution plan and length of a game's stamp Gts formation data packet after parsing, all correlations is handed down to by network communication Back end;
1.2.4) all back end respectively carry out data read operation, are determined for compliance with the data item of alternative condition, then There is the data item of multiple versions to each, traversed since latest edition, until finding its first visible version;
1.2.5) coordinator node summarizes the data that all back end return, and returns to access layer, and access layer is to foundation The client returned data of session relationship, current read transaction are completed.
Further, in the step 1.3), using the method for carrying out reading data twice, the reading twice linear one of foundation Cause property guarantees the process of algorithm are as follows:
1.3.1) client initiates affairs T request, is established a connection by access layer, forms a session;
1.3.2) access layer parses affairs T, and chooses coordinator node to be responsible for managing the implementation procedure of the affairs;
1.3.3) coordinator node back end relevant to all read transactions establishes connection, Xiang Suoyou related data node Data acquisition request is sent, all related data nodes execute first time reading data algorithm, and return data to coordination section Point, coordinator node determine that Gts stabs in the length of a game of current read transaction T based on the data item of all returns;
1.3.4) coordinator node sends data acquisition request to all back end again, and by determining current read transaction The length of a game stamp Gts of T is sent to all back end, and second of reading data algorithm, return pair are executed on back end Meet the versions of data of linear consistency in the length of a game stamp Gts of current read transaction T;
1.3.5) coordinator node summarizes the data that all back end return, and returns to access layer, and access layer is to foundation The client returned data of session relationship, current read transaction are completed.
Further, in the step 2), it is to be achieved consistent to determine that distributed system needs according to actual business demand Property rank, and based on foundation system consistency model determine be suitable for the consistency level require consistency execute algorithm, it is right The method that distributed transaction in distributed system is executed, comprising the following steps:
2.1) whether need to operate the data on multiple Resource Management nodes according to affairs, it will be in distributed system The affairs being related to are divided into distributed transaction and two kinds of single machine affairs;
2.2) adaptable consistency is required to execute algorithm using with consistency level, to the distribution in distributed system Affairs are executed;
2.3) adaptable consistency is required to execute algorithm using with consistency level, to the single machine thing in distributed system Business is executed.
Further, in the step 2.2), adaptable consistency is required to execute algorithm using with consistency level, it is right The process that distributed transaction in distributed system is executed are as follows:
2.2.1) client is responsible for issuing the request for executing affairs T, and access layer is responsible for receiving the request that client is sent, and Session relationship is established with client;
2.2.2 it after) access layer receives solicited message, is interacted with metadata management cluster, obtains relevant meta information Afterwards, to requirement analysis, and different coordinator nodes is given by route assignment;
2.2.3) coordinator node optimizes SQL and generates physics executive plan, and carries out global transaction initial work, records Then executive plan is decomposed into the executive plan on each back end, is sent to corresponding number by global transaction status information The state of being currently running is denoted as according to node, and by global transaction state;
2.2.4) each back end is respectively adopted the algorithm being adapted with consistency level requirement and carries out according to executive plan Data manipulation, and local transactional execution state is recorded, after the completion of back end locally executes reading and writing data, sent out to coordinator node " can verify " is sent to instruct;It is specific:
When for the affair logic consistency and affairs cause and effect coherence request: back end is used according to executive plan RUC-CC algorithm carries out data manipulation and transaction scheduling;
When for the requirement of linear consistency level: back end guarantees algorithm according to executive plan, using linear consistency Data manipulation is carried out, and transaction scheduling is carried out based on MVCC algorithm;
When for the requirement of crash consistency rank: back end guarantees algorithm according to executive plan, using linear consistency Data manipulation and transaction scheduling are carried out in conjunction with RUC-CC algorithm;
2.2.5) coordinator node receive whole related data nodes send " can verify " instruction after, record global transaction shape State is to verify, and send " verifying " instruction to all related data nodes;
2.2.6 after) back end receives " verifying " instruction;Into local verification process, if the verification passes, then send out Send " being verified " instruction to coordinator node;
2.2.7 after) coordinator node receives " being verified " instruction that whole related data nodes are sent, according to different consistent Property rank require to determine the need for generating cluster with overall situation Gts to interact and stabbed with obtaining the length of a game of affairs, and record Global transaction state is to have been filed on;Then, while opening two threads: first is used to result set returning to access layer, by Access layer is responsible for implementing result returning to client;Second will send " submission " to all related data nodes and refer to It enables;
When for the affair logic coherence request: coordinator node receives " being verified " that whole related data nodes are sent After instruction, record global transaction state is to have been filed on;
When for the requirement of affairs cause and effect consistency level, linear consistency and crash consistency: coordinator node receives whole After " being verified " instruction that related data node is sent, needs to interact with overall situation Gts generation cluster, obtain the complete of affairs Office's timestamp, and recording global transaction state is to have been filed on;
2.2.8 after) each back end receives " submission " instruction, process is submitted into local.
Further, in the step 2.3), determine that distributed system needs to be achieved one according to actual business demand Cause property rank, and determine that being suitable for the consistency that the consistency level requires executes algorithm based on the system consistency model of foundation, The method that single machine affairs in distributed system are executed, comprising the following steps:
2.3.1) client is responsible for issuing the request for executing affairs T, and access layer is responsible for receiving the request that client is sent, and Session relationship is established with client;
2.3.2 it after) access layer receives solicited message, is interacted with metadata management cluster, obtains relevant meta information Afterwards, to requirement analysis, different coordinator nodes is given by route assignment;
2.3.3) coordinator node optimizes SQL, and generates physics executive plan, and physics executive plan is sent to selected number According to node, coordinator node carries out affairs initial work, and record transaction status is positive after operation, directly sends executive plan To corresponding back end;
2.3.4) back end is respectively adopted the algorithm being adapted with consistency level requirement and is counted according to executive plan According to operation, and local transaction status is recorded, after the completion of back end locally executes reading and writing data, is directly entered verifying process, If the verification passes, then " being verified " instruction is sent to coordinator node, and enters local submission process;
When for the affair logic consistency and affairs cause and effect consistency level requirement when, back end by RUC-CC algorithm into Row data manipulation and transaction scheduling;
When for the requirement of linear consistency level, back end guarantees that algorithm carries out data manipulation by linear consistency, And transaction scheduling is carried out based on MVCC algorithm;
When for the requirement of crash consistency rank, back end guarantees algorithm combination RUC-CC algorithm by linear consistency Carry out data manipulation and transaction scheduling;
2.3.5 it after) coordinator node receives " being verified " instruction that back end is sent, is wanted according to different consistencies rank It asks and determines the need for interacting with overall situation Gts generation cluster to obtain the length of a game of affairs stamp, and record transaction status To have been filed on, result set is then returned into access layer, is responsible for implementing result returning to client by access layer;
When for the affair logic coherence request: after coordinator node receives " being verified " instruction that back end is sent, Record global transaction state is to have been filed on;
When for the requirement of affairs cause and effect consistency level, linear consistency and crash consistency: coordinator node receives data After " being verified " instruction that node is sent, needs to interact with overall situation Gts generation cluster, obtain the length of a game of affairs Stamp, and recording global transaction state is to have been filed on.
The invention adopts the above technical scheme, which has the following advantages: 1) the invention proposes be based on dynamic time The Optimistic Concurrency access control algorithm (RUC-CC algorithm) for stabbing adjustment, be a decentralization, it is efficient, can ensure that system is complete The transaction schedule algorithm of the affairs ACID characteristic of office ensure that the conflict serializability scheduling of distributed transaction.2) present invention is logical Cross " global Gts generate cluster ", ensure that operation all in system all with the sequence consensus under global clock, by " global Gts Generate cluster " it is combined with MVCC algorithm, it proposes that linear consistency guarantees algorithm, can guarantee strongest line in external consistency Property consistency.3) for the present invention by the design to data item basic structure, be guarantees algorithm and linear one transaction consistency Cause property guarantees that algorithm decouples in form, so that the two does not interact, still, from algorithm and functionally, the two again can be efficiently Fusion, thereby it is ensured that the consistency of a variety of ranks, to meet different application scene to consistency correctness and distribution A variety of demands of Database Systems efficiency.
Detailed description of the invention
Fig. 1 is Postgres-XC architecture diagram;
Fig. 2 is distributed data base system architecture diagram of the present invention;
Fig. 3 is global transaction structure graph of the present invention;
Fig. 4 is local matter structure graph of the present invention;
Fig. 5 is data item structure chart of the present invention and its required maintenance information structure diagram;
Fig. 6 is that the global phase transactions that execute execute timing diagram under RUC-CC of the present invention;
Fig. 7 is the multi-level consistency schematic diagram of the present invention.
Specific embodiment
The present invention is described in detail below with reference to the accompanying drawings and embodiments.
As shown in Fig. 2, a kind of distributed system for guaranteeing transaction consistency and linear consistency proposed by the present invention, packet Include multiple client (Client) and by access layer (Proxy), metamessage management cluster (Metadata Manager), the overall situation Gts generates the database service end that cluster (Gts Manager) and issued transaction and accumulation layer are constituted.Wherein, client is used for The interface interacted with database service end is provided for user, sends database service end for user's request;Access layer is used In the request that reception client is sent, and parse generation executive plan;Metamessage manages cluster and is used for in distributed system Each system is managed collectively, such as safeguards the routing iinformation etc. of each back end;Global Gts generates cluster, for giving birth to At global timestamp, unique orderings are carried out to realize linear consistency to the global transaction in distributed system;Issued transaction and Accumulation layer includes multiple Resource Management nodes (Resource Manager, hereinafter referred to as RM), for what is sent according to access layer Executive plan executes the affair logic, and obtained result returns to client through access layer.
Further, in issued transaction and accumulation layer, Resource Management node is divided into two classes, first is that as distributed data base In back end, to by data carry out subregion storage;Second is that the coordinator node as issued transaction, i.e. host node.Cause This has the following two kinds method of salary distribution for the purposes of all RM:
A portion RM: being specially used as the coordinator node of affairs by master slave mode, this part RM host node table Show;Remaining RM is used as back end simultaneously.
Ad-hoc mode: all RM be it is reciprocity, all RM all can serve as host node, and all RM can be used as number According to node, i.e., each RM has the function of back end and transaction coordination node two simultaneously.
Further, length of a game's stamp (Gts) that global Gts generates cluster is made of eight bytes, and eight bytes are using mixed Physical clock mode is closed to form:
A) first 44 are physical time timestamp value (i.e. Unix timestamp, be accurate to millisecond).It can indicate that 2^44 is a without symbol altogether Integer, thus theoretically altogether can indicate be aboutThe physical timestamp in year.
B) afterwards 20 count (as every millisecond of 2^20-1, about 1,000,000) for the monotonic increase in a certain millisecond
C) it is based on this data structure, if the transaction throughput of single machine is 10W/s, can theoretically be supported comprising 10,000 The distributed experiment & measurement system of back end.Meanwhile the total thing that the quantity representative of Gts distributed system theoretically can be supported Business number, namely it is based on this structure, theoretically distributed system can support (2^44-1) * (2^20) a affairs.
D) as needed, the digit of Gts can extend, to meet the support to more number of nodes, issued transaction number.
It is to provide clothes by way of one master and multiple slaves (host node -- mostly from node) in realization that global Gts, which generates cluster, Business, i.e., form cluster by multiple servers, provides High Availabitity service, therefore will not become the performance bottleneck of single-point.
Further, the Data Structures that distributed system is related in the present invention include global transaction state table, local Transaction status table, data item data structure, affairs readset write collection and communication protocol and four class data structure of message.Each data Structure is described in detail as follows.
As shown in figure 3, global transaction state table is known as GlobalTS, that is, safeguard from the affairs from the point of view of the distributed system overall situation State.GlobalTS structure exists on each of host node set node.It is complete for global transaction T Office's transaction status is maintained on the coordinator node host node of global transaction T.For each global transaction T, global transaction State is indicated with hexa-atomic group { TID, Lowts, Uppts, Status, Gts, Nodes }, wherein the meaning of each field are as follows:
A) TID: representing affairs unique identification, is made of 8 bytes, by two parts combination come to all affairs in system Unique identification is carried out, TID is allocated when affairs initialize:
1. first 14 are used to record the number of affairs host node, host node represents the coordination section for handling the affairs Point.14 can indicate 16383 (2^14-1) a signless integers altogether, and therefore, Gts stabs in the length of a game that can be obtained with estimation The number of nodes that can be supported is corresponding.
2. 50 count filling by the monotonic increase in the host node after, distinguish the different affairs in host node, Total 2^50-1, which theoretically can guarantee that TID guarantees to repeat within the scope of total number of transactions of Gts defined.
If needing to pass through TID at this time 3. latter 50 of the TID on a certain host node have been dispensed into 2^50-1 Multiplexing mechanism, to recycle to TID, TID multiplexing mechanism is designed with reference to the PostgreSQL mechanism of freezing provided, The mechanism is known to the skilled person common sense, not described in detail herein.Meanwhile the TID foot that theoretically present invention designs Enough systems, which operate normally, to be used.
B) the affair logic submission time stamp lower bound, the i.e. earliest logical time that can submit of affairs, value Lowts: are represented For nonnegative integer, 8 bytes are taken;
C) the affair logic submission time stamp upper bound, the i.e. logical time the latest that can submit of affairs, value Uppts: are represented For nonnegative integer, 8 bytes are taken;
The affair logic submission time stabs lower bound and the affair logic submission time stamp upper bound constitutes the logic life of affairs Period: [Lowts, Uppts], the initial life cycle of an affairs is [0 ,+∞], when the final logic of global transaction T is submitted Between stamp T.cts be to obtain from section [Lowts, Uppts].The logic life cycle of affairs be it is opposite, life cycle Adjustment depends on Dynamic time-stamp algorithm (DTA), and specific algorithm is described below.
D) Status: Current transaction is represented in global state in which.In the present invention, affairs are being represented with Grunning It executes;Affairs, which are represented, with Gvalidating is in Qualify Phase;It represents affairs with Gcommitting to have completed to verify, just In presentation stage;Affairs, which are represented, with Gaborting is in rollback phase;Affairs global submission is represented with Gcommitted It completes;Representing affairs with Gaborted, global rollback is completed.
E) Gts: representing the timestamp that affairs overall situation submission/rollback is completed, and is given birth to by " global Gts generates cluster " At guarantee global orderly.
TID be the unique number as global transaction in system come using, assignment is carried out when global transaction starts, and And for being ordered between the affairs TID on identical host node, it is therefore contemplated that be a kind of embodiment of partial ordering relation. Gts represents length of a game's stamp, and meaning is sequence of the affairs under system overall situation visual angle, it is therefore contemplated that being a kind of ordering relation. So TID and Gts can one affairs of unique identification, but difference both is that the sequence meaning identified is different.
F) back end that Current transaction is related to, the i.e. set of back end Nodes: are represented.
Further, as shown in figure 4, being state of the care of transaction on each Resource Management node, the present invention is each Affairs safeguard local matter state table, referred to as LocalTS.LocalTS structure exists on each RM.For global transaction T For, corresponding local matter state has more parts, is maintained on each RM involved by global transaction T respectively.Local thing Business state table includes { TID, Lowts, Uppts, Status } 4 fields, is meant that:
A) TID: affairs unique identification, distribution when affairs start take 8 bytes, with the TID meaning phase in global transaction state Together;
B) Lowts: it is identical as the meaning of Lowts in global transaction state, take 8 bytes;
C) Uppts: it is identical as the meaning of Uppts in global transaction state, take 8 bytes;
D) Status: the local state for describing affairs, using 4 byte-sizeds, affairs have 4 local states: Run mode (Running), local verification complete state (Committed) and local rollback by state (Validated), local submit State (Aborted).
Further, as shown in figure 5, in the basic structure of data item of the present invention, each data item includes two parts: data Item head information and several versions of data.2 groups of three elements relevant to distributing real time system are contained in this two parts, are made Linear consistency and the decoupling of distributed transaction consistency are obtained, meaning is as follows:
First group: linear consistency foundation.
A) gts: length of a game's stamp is known as global linear mark, to indicate an affairs in a distributed system Globally unique sequence, each versions of data have a gts.Its assignment procedure is as follows:
1. gts field is multiplexed to record the global thing of the affairs in the affairs also global submission for generating notebook data item Business TID currently generates the affairs of the version for unique identification, convenient to position to the global transaction state of the affairs.
2. submitting when the affairs for generating notebook data item are global, gts is assigned a value of this affairs and remembers in global transaction state Gts stabs in the length of a game of record, indicates global sequence of this affairs inside entire distributed system, to realize linear consistent Property.
B) info_bit: representing flag bit, and 1, be Gts or TID to identify currently record in gts field. If info_bit is 1, that represent record is Gts, if it is 0, for TID.
Second group: distributed transaction consistency foundation.
A) wts: each release maintenance one wts, the wts record of data item creates patrolling for the affairs of the data item version Timestamp is collected, each data item version is both needed to one wts of maintenance;
B) rts: recording the logical time stamp of the newest affairs for reading the data item, and each data item safeguards a rts, because This is maintained in data item head information;
To guarantee which affairs distributed transaction consistency, each data item are also recorded and are reading and writing data item, contain Justice is:
A) RTlist: record access crosses the transaction list of enlivening of the data item latest edition, each element record of list The affairs TID of specific affairs;
B) WT: having recorded and want modification (writing) data item and enliven affairs, is List in form, element it is specific Recording content is to enliven the TID of affairs;
C) by RTlist, WT can the logic life cycle to affairs make adjustment.
Further, collection data structure is write about the readset of affairs:
The global readset of affairs T has recorded the read total data item of affairs T;Affairs T is in a certain Resource Management node RM The set of upper read data item constitutes local readset of the affairs T on Resource Management node RM, which is global The a subset of readset;The union of local readset of the affairs T on all correlation RM is equal to the global readset of affairs T.In this hair In bright, the local readset of the affairs will be safeguarded on each RM involved in affairs T.The readset effect of record affairs T has 2:
A) when affairs T is submitted, the rts field of data item in readset is updated;
B) when affairs T is submitted, T is deleted from the RTlist of each readset element x;
The readset that a list structure carrys out care of transaction T is used in the present invention, each chained list node represents a readset Element x is made of 4 fields:
A) BlockAddress: taking 8 bytes, and block address shows data item x and corresponds to block address;
B) Offset: taking 4 bytes, and block bias internal amount shows offset of the data item x in block;
C) Size: taking 4 bytes, and data item size has recorded the size that data item x corresponds to tuple, that is, specifies Value word The byte number of section.
D) Value: elongated, data item occurrence, the physical record value of data item x;
The global write collection of affairs T has recorded the total data item that the affairs need to update;Local of the affairs T on a certain RM It writes collection and has recorded the affairs will update which data item on RM.The locally-written collection of affairs is a subset of global write collection, thing The union of locally-written collection of the business T on all RM is equal to the global write collection of T.The collection effect of writing of record affairs T has 2:
A) in Qualify Phase, global write collection is divided into several according to the difference of RM where writing element by the host node of T Collection is write on ground, and each locally-written collection is sent to related RM in form of a message, it is desirable that RM is according to the locally-written value for concentrating element Create new versions of data;
B) in presentation stage, the WT of collection element is each write in the sub- RM cleaning of every of affairs T;
Carry out the collection of writing of care of transaction T using a list structure in the invention, each chained list node corresponding one is write collection Data item y is made of 5 fields:
A) BlockAddress: taking 8 bytes, and block address shows data item y and corresponds to block address;
B) Offset: taking 4 bytes, and block bias internal amount shows offset of the data item y in block;
C) Size: taking 4 bytes, and data item size has recorded the size that data item y corresponds to tuple, that is, specifies The byte number of NewValue field.
D) NewValue: elongated, data item occurrence, physical record data item y updated value;
E) OperationType: taking 1 byte, indicates that operation is update, insertion or delete operation, value indicates more for 0 Newly, value is 1 expression insertion, and value is that 2 expressions are deleted.
In the reading stage of affairs T, the global write collection of T will be maintained on the host node of T;In the Qualify Phase of T, T root Global write collection is divided into several locally-written collection, and sends corresponding RM for each locally-written collection by different according to RM where writing element Upper maintenance.
Further, in the present invention, coordinator node host node and back end, coordinator node host node with it is " complete Office Gts generates cluster " it needs to be communicated in form of a message, according to the difference of sender and recipient, the present invention will be communicated Agreement and message are divided into following 4 major class:
1, host node is sent to the message of back end, and main includes 3 kinds:
A) read data request message, ReadRequestMessage: affairs T reading stage, host node send this and disappear It ceases to RM, the related data on RM is read in request.The message includes following 4 fields:
1. TID: Transaction Identifier takes 8 bytes, indicates which transactions requests reads data;
2. Lowts: the affair logic timestamp lower bound takes 8 bytes, indicates on host node under the logical time stamp of affairs T Boundary;
3. Uppts: the affair logic timestamp upper bound takes 8 bytes, indicates on host node on the logical time stamp of affairs T Boundary.
4. ReadPlan: reading the inquiry plan of data item x;
B) checking request message, ValidateRequestMessage: the Qualify Phase of affairs T, host node is sent should Message executes the local verification of affairs T to back end, request data node.The message includes following 4 fields:
1. Type: type of message takes 1 byte, indicates that the message is checking request message;
2. TID: Transaction Identifier takes 8 bytes, indicates and needs back end executes local verification to which affairs;
3. Lowts: the affair logic timestamp lower bound takes 8 bytes, indicates on host node under the logical time stamp of affairs T Boundary;
4. Uppts: the affair logic timestamp upper bound takes 8 bytes, indicates on host node on the logical time stamp of affairs T Boundary.
C) submission/rollback request, CommitOrAbortRequestMessage: affairs T write-in submission/rollback is written Stage, host node transmit the messages to back end, and request data node completes local submission or rollback.The message includes 5 fields below:
1. Type: type of message takes 1 byte, indicates the message for write-in submission/rollback request message;
2. TID: Transaction Identifier takes 8 bytes, indicates and needs back end executes local submit to which affairs;
3. IsAbort: whether rollback, take 1 byte, if value be 1 indicate need rollback affairs T, take other values not need Rollback T.
4. Cts: the affair logic submission time stamp takes 8 bytes, indicates that host node is the logic submission time that T is selected Stamp;
5. Gts: affairs length of a game stamp takes 8 bytes, and indicating that global Gts generates cluster is that the global of affairs T distribution is submitted Timestamp.
2, back end is sent to the message of host node, and main includes 2 kinds;
A) read request feedback message, in ReadReplyMessage: affairs T reading stage, back end is to host node The value of read data item is returned to, which includes following 6 fields:
1. TID: Transaction Identifier takes 8 bytes, indicates the read request feedback message of which affairs;
2. IsAbort: whether rollback, take 1 byte, if value be 1 indicate need rollback affairs T, take other values not need Rollback T.
3. Lowts: the affair logic timestamp lower bound takes 8 bytes, indicates the logical time of affairs T on local data node Stab lower bound;
4. Uppts: the affair logic timestamp upper bound takes 8 bytes, indicates the logical time of affairs T on local data node Stab the upper bound.
5. Size: taking 4 bytes, specify the size of Value field;
6. Value: data item occurrence has recorded the value of read data item;
B) local verification feedback message, LocalValidateMessage: the Qualify Phase of affairs T, back end pass through After local verification message, the message is sent to host node comprising following 5 fields:
1. Type: type of message takes 1 byte, indicates that the message is local verification feedback message;
2. TID: Transaction Identifier takes 8 bytes, indicates which affairs local verification feedback message;
3. IsAbort: whether rollback, take 1 byte, if value be 1 expression affairs need rollback affairs T, take other values not Need rollback T.
4. Lowts: the affair logic timestamp lower bound takes 8 bytes, indicates on back end under the logical time stamp of affairs T Boundary;
5. Uppts: the affair logic timestamp upper bound takes 8 bytes, indicates on back end on the logical time stamp of affairs T Boundary.
3, host node is sent to the message that global Gts generates cluster, and main includes a kind, it may be assumed that length of a game's stamp request disappears Breath, GtsRequsetMessage.After affairs T is by verifying, host node is to the global Gts spanning set pocket transmission message Affairs T requests length of a game's stamp, which mainly includes following 2 fields:
A) Type: type of message takes 1 byte, indicates that the message is that request message stabs in length of a game;
B) TID: Transaction Identifier takes 8 bytes, is indicated as being which transactions requests length of a game stamp.
4, overall situation Gts spanning set mass-sends the message toward host node, and main includes a kind, it may be assumed that length of a game's stamp please negate Present message, GtsReplyMessage.It is after affairs T distributes length of a game's stamp, in the form of the message that global Gts, which generates cluster, It is sent to host node.The message mainly includes following 3 fields:
A) Type: type of message takes 1 byte, indicates that the message is length of a game's stamp request feedback message;
B) TID: Transaction Identifier takes 8 bytes, is indicated as being which affairs is assigned with length of a game's stamp;
C) Gts: affairs length of a game stamp takes 8 bytes, indicates length of a game's timestamp value of affairs.
Based on the introduction of the above-mentioned frame to the distributed system for guaranteeing transaction consistency and linear consistency, the present invention is also There is provided a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency comprising following steps:
1) system consistency model (United Consistency Mode, the abbreviation that can be realized multi-level consistency are established UCM)。
2) determine that distributed system needs consistency level to be achieved, and the system based on foundation according to actual business demand Consistency model determines that being suitable for the consistency that the consistency level requires executes algorithm, to the distributed thing in distributed system Business and single machine affairs are executed.
Above-mentioned steps 1) in, the method for foundation system consistency model, comprising the following steps:
1.1) using the OCC for being based on DTA (Dynamic Timestamp Allocation, hereinafter referred to as DTA) (Optimistic Concurrency Control, hereinafter referred to as OCC) strategy carries out transaction concurrency control, establishes RUC- CC algorithm, for guaranteeing transaction consistency in distributed system.
1.2) based on global Gts spanning set all living creatures at length of a game stamp and global transaction state, establish based on the overall situation The linear consistency of timestamp guarantees algorithm, for guaranteeing linear consistency between affairs.
1.3) using the method for carrying out reading data twice, foundation reads linear consistency twice and guarantees algorithm, for guaranteeing Linear consistency between affairs.
1.4) by step 1.1)~step 1.3) transaction consistency, linear consistency and MVCC (Multi-Version Concurrency Control, hereinafter referred to as MVCC) algorithm is combined, establish the system that can satisfy a variety of consistency levels Consistent model.
Above-mentioned steps 1.1) in, for the transaction scheduling for realizing distributed decentralization, guarantee transaction consistency, the present invention adopts With transaction concurrency control is carried out based on the OCC strategy of DTA, the algorithm frame of OCC is mainly applied in strategy, and combine DTA reduces the rollback rate of affairs, to promote the oncurrent processing ability of affairs, for sake of convenience, the present invention is named as RUC-CC algorithm.
As shown in fig. 6, the life cycle of affairs is divided into 2 stages under RUC-CC scheduling: global initial phase, The global execution stage.In this 2 stages, be the coordination in host node node, in the dependency number of host node and transaction operation According to what is completed on node.So pressing different nodes as follows, the workflow in each stage is refined, work is divided into global initial Change, a global execution stage.Specifically includes the following steps:
1.1.1 it) global initial phase: to the affairs T sent by client, is completed on host node corresponding first Beginning chemical industry is made.
1.1.2) the global execution stage: the global execution stage of affairs is divided into 3 stages: reading stage, Qualify Phase And write-in/rollback phase is submitted, and under the coordination of host node, each RM relevant to operation executes affairs, it The corresponding table item of submission or rollback affairs in transaction status table is purged afterwards.
Above-mentioned steps 1.1.1), global initial phase: to the affairs T sent by client, completed on host node Corresponding initial work, comprising the following steps:
1.1.1.1) a globally unique transaction number TID is distributed for affairs T;
1.1.1.2 the state of affairs T) is recorded in the GlobalTS on host node, status state is set to Grunning, Lowts and Uppts are initialized as 0 and+∞ respectively.
Above-mentioned steps 1.1.2) in, the detailed process in global execution stage the following steps are included:
1.1.2.1 it) reads the stage, affairs T reads required data according to logic is executed, and will update the sheet for writing affairs T Ground memory (i.e. the local memory of host node).
1.1.2.2) Qualify Phase, affairs T verify whether itself conflicts with the presence of other affairs, are verified result.
1.1.2.3) write-in submission or rollback phase: according to the verification result of Qualify Phase, selection execution is written to be mentioned affairs T Returning is rollback.
Above-mentioned steps 1.1.2.1) in, the reading stage, detailed process included following when affairs T needs to read data item x Step:
I, the host node of affairs T needs to send the read data request message for reading x to the back end where data item x ReadRequestMessage rrqm。
Wherein, the value of tetra- fields of read data request message ReadRequestMessage rrqm is respectively as follows:
A) TID, the TID of affairs T;
B) Lowts of Lowts, affairs T on host node;
C) Uppts of Uppts, affairs T on host node;
D) inquiry plan of ReadPlan, affairs T reading x;
After back end where II, data item x receives message rrqm, the local matter state table of affairs T is carried out first It establishs or updates, the visible version of data item x is then searched in the logic life cycle of affairs T, and to the host of affairs T Node sends read request feedback message rrpm.
III, affairs T host node receive the read request feedback message rrpm of back end transmission after, to whether needing Rollback is judged, if necessary to rollback, then enters global rollback phase, otherwise affairs continue to execute.
Above-mentioned steps 1.1.2.1) step II in, data item x to the host node of affairs T send read request feedback disappears The detailed process of breath are as follows:
1. check back end local matter state table LocalTS in whether include affairs T information:
A) if it is not, initializing the information of affairs T on it, i.e., one record of insertion in LocalTS, value difference It is rrqm.TID, rrqm.Lowts, rrqm.Uppts and rrqm.Running;
B) if so, accessing other data item on the back end before i.e. affairs T reading data item x, then updating The information of affairs T, make T.Lowts=max (T.Lowts, rrqm.Lowts), T.Uppts=min (T.Uppts, rrqm.Uppts)。
2. checking whether the logic submission time stamp lower bound of affairs T is less than its logic submission time stamp upper bound, that is, check Whether T.Lowts is less than or equal to T.Uppts:
If it is, continuing to read data x;
Otherwise, the state for updating affairs T in LocalTS is Aborted, i.e. T.Status=Aborted, and to affairs T Host node return Abort message, i.e., to the host node of affairs T send read request feedback message rrpm, wherein Rrpm.IsAbort=1;
3. back end finds the suitable visible of data item x according to the logic life cycle [Lowts, Uppts] of affairs T Version.
Wherein, the suitable visible version for finding x should be checked since newest submission version first, if T.Uppts is greater than most The wts of new version, the latest edition are suitable visible version.Otherwise, then it is not suitable visible version, requires to look up one A version, until finding first versions of data x.v for meeting T.Uppts > wts, wherein wts is the creation time of x.v Stamp.
4. after finding the appropriate version x.v of data item x, modifying the Lowts of affairs T, eliminate T.Lowts > x.v.wts ( Write-read is abnormal);In addition if the version found is the latest edition of data item x, the following operations need to be performed:
A) whether the corresponding WT of x.v (WT, which has recorded, is modifying x, and passes through the Transaction Identifier of the affairs of verifying) is checked The Uppts of affairs T is then adjusted, its satisfaction is made if not being empty (the corresponding affairs of assumed value TID1, TID1 are T1) for sky T.Uppts < T1.Lowts (eliminates read/write conflict);
B) T.TID is added in the RTlist list of x.v;
C) data item x is added in the local readset of affairs T;
5. returning to read request feedback message, ReadReplyMessage rrpm to the host node of affairs T.
Wherein, the Lowts and Uppts of rrpm is had recorded on current data node respectively above and below the logical time stamp of affairs T Boundary, Value have recorded the value of read data item.
Above-mentioned steps 1.1.2.1) step III in, the host node of affairs T receive back end read request feedback disappears After ceasing rrpm, to the detailed process for whether rollback being needed to be judged are as follows:
1. checking whether the message received is Abort, i.e. whether inspection rrpm.IsAborts is equal to 1, if it is, into Enter global rollback phase;Otherwise it continues to execute;
2. update GlobalTS in affairs T state: update T T.Lowts=max (T.Lowts, rrpm.Lowts), T.Uppts=min (T.Uppts, rrpm.Uppts);
3. checking whether T.Lowts is greater than T.Uppts in GlobalTS, if it is, entering global rollback phase;It is no Then continue the execution of affairs.
Rule 3. in, if host node determines rollback affairs T, the state for needing to modify T in GlobalTS is Gaborting notifies the portion's rollback of related child node executive board;
By above-mentioned rule it is found that in the reading stage of affairs T, communicate mainly in the host node of affairs T and correlator RM Between occur.Data of the every successfully reading of affairs T need to communicate twice:
A) the host node of affairs T is sent in read data request information to correlator RM;
B) correlator back end is sent on read request feedback information to host node;
Therefore reading the stage at most carries out 2n time and communicates, and peak volume is n × (request message size+corresponding message is big It is small), wherein n is the number remotely read.A kind of optimal way that saving number of communications is: affairs T needs certain related subdata When multiple data of node, request can be transmitted, read these data in batches.
Above-mentioned steps 1.1.2.2) in, Qualify Phase, whether affairs T verifies itself specific in the presence of what is conflicted with other affairs Process are as follows:
I, the host node of affairs T modifies the state of affairs T in GlobalTS first are as follows: Gvalidating;Then to T The each RM being related to sends checking request message vrm and locally-written collection.
Wherein, the Lowts in checking request message ValidateRequestMessage vrm, vrm and Uppts remember respectively Affairs T the affair logic timestamp bound in GlobalTS has been recorded, has in addition been sent to back end together with checking request message Further include back end locally-written collection.
After each back end that II, affairs T are related to receives checking request message vrm, local verification operation is executed.
III, affairs T host node receive the local verification feedback message lvm of all back end after, according to receiving Message determine that can affairs T pass through verifying.
Above-mentioned steps 1.1.2.2) step II in, local verification operation needs sequentially to execute following steps:
1. updating T.Lowts=max (T.Lowts, vrm.Lowts), the T.Uppts=min of T in LocalTS (T.Uppts, vrm.Uppts), it should be noted that is updated here is the logical time stamp of affairs in local matter state table Information is controlled for transaction concurrency, i.e., for guaranteeing transaction consistency;
It is then authentication failed 2. checking whether T.Lowts is greater than T.Uppts, returns to Abort to the host node of affairs T Message (and then causing global rollback), that is, send local verification feedback message lvm, wherein lvm.IsAbort=1;Otherwise enter The verifying of next step;
Each of collection data item y is write 3. finding, then checks whether the WT of data item y is empty:
If being not sky, illustrate there are other affairs modifying data item y, and the affairs have come into verifying rank Section, needs rollback affairs T to write write conflict to eliminate, i.e., sends Abort message to the host node of affairs T;
Otherwise, continue to operate in next step: the WT of data item y being locked, prevents other concurrent transactions from concurrently modifying y (in number Suggest lock according to applying on item y, mutual exclusion operates the modification of the WT of data item y).
The WT for concentrating each data item y is write 4. updating as T.TID (the affairs T for indicating entry into Qualify Phase will modify y), and The timestamp lower bound for adjusting affairs T in local matter state table, is larger than the rts of y, i.e. T.Lowts=max (T.Lowts, Y.cts+1) (read/write conflict is eliminated);In realization, using no lock CAS technology (CAS (and compare and exchange, Compare and Swap it is) a kind of famous without lock algorithm) it is the WT assignment of y, (it is also not excluded for common mode, after such as locking again to improve performance For the WT assignment of y).
It is then authentication failed, local rollback, then to the host node of T 5. checking whether T.Lowts is greater than T.Uppts Return to Abort message;Otherwise, and enter next step verifying;
6. concentrating each element y to writing, the timestamp of affairs, eliminates read/write conflict in adjustment affairs T or RTlist.
Adjustment rule are as follows:
A) read/write conflict solves: the reading that this transaction write, other affairs completed occur in the past, so that this transaction write is grasped Work is put off until after the read operation for having completed the affairs read.
It submits firstly, finding all be in or by the affairs T1 of local verification state, adjusts the time of T itself Section lower bound is stabbed, the Uppts of T1, i.e. T.Lowts=max (T.Lowts, T1.Uppts+1) are larger than.
Then, check whether the timestamp section of affairs T is still legal, if illegal, returns to Abort message, it is no The local matter state for then updating affairs T is Validated, i.e. T.Status=Validated, and enters and adjust in next step.
B) read/write conflict solves: this transaction write, other ongoing affairs are read, so that other affairs are read less than this affairs The data write:
All affairs T2 in Running state are found out, the timestamp section of T2 is adjusted, are less than its timestamp upper bound The Lowts of T.That is T2.Uppts=min (T2.Uppts, T.Lowts-1).If Lowts > Uppts of affairs T2 can be notified Affairs T2 should rollback.
7. indicating that affairs T has passed through local verification to this step, according to the updated value of y, the new version of y is created, but need Flag is set, indicates that new version is not global and submits, it is externally invisible under RUC-CC agreement;
8. to the host node of affairs T return affairs T local verification feedback message lvm, wherein the Lowts of lvm and Uppts has recorded logical time stamp bound of the affairs T on local data node respectively;Wherein it should be noted that if The failure of affairs T local verification, needing to update affairs T-shaped state in LocalTS is Aborted, i.e. T.Status=Aborted.
Above-mentioned steps 1.1.2.2) step III in, the local that the host node of affairs T receives all back end is tested After demonstrate,proving feedback message lvm, determine that can affairs T be broadly divided into following several situations by verifying according to the message received:
1. showing that affairs not over whole local verifications, then determine if being equal to 1 lvm containing IsAbort field Global rollback affairs T;The state of affairs in GlobalTS is updated simultaneously are as follows: Gaborting;All child nodes are notified to complete back Rolling sends write-in submission/rollback message coarm to related data node, wherein coarm.IsAbort=1.
2. otherwise, the timestamp section of all affairs T received is sought common ground, new timestamp section is obtained [T.Lowts, T.Uppts] determines global rollback affairs if T.Lowts > T.Uppts, updates the shape of T in GlobalTS State is Gaborting, and all child nodes is notified to complete rollback;Otherwise enter in next step;
3. determining that affairs T randomly chooses a time point as thing by verifying, and from section [T.Lowts, T.Uppts] The logic submission time stamp of business T is cts assignment.A T.Lowts is such as selected to stab as the logic submission time of T.
4. updating T.Lowts=T.Uppts=T.cts in GlobalTS;Update the state of affairs in GlobalTS are as follows: Gcommitting;Global transaction state is further denoted as Gcommitted at this time, while global Gts being requested to generate cluster point It stabs, is recorded in the Gts field of global transaction state with length of a game.
5. related data node is notified to complete to submit, i.e., write-in submission/rollback message is sent to related data node Coarm, wherein coarm.IsAbort=0, coarm.Cts and coarm.Gts have recorded the logical time stamp of affairs and complete respectively Office's timestamp.
By it is above-mentioned rule it is found that affairs T Qualify Phase, communicate mainly in the host node of T and correlator data section Occur between point.Communication mainly includes following two step:
A) the host node of T sends the locally-written collection of checking request message and back end to every correlator back end;
B) host node of each correlator back end transmission local verification feedback message to T.
Therefore, Qualify Phase at most needs 2m communication, and the size of the traffic is m × (requests verification message size+verifying Feedback message size)+global write collection size, wherein m is the number of the subdata node closed with affairs T-phase.
One optimization point:, can be by affairs in global transaction state table in Qualify Phase if affairs are a local matters Status modifier be Gvalidating after, local data node directly executes verify process (i.e. 1.1.2.2) step II in 2.~7.):
If 1) detect that affairs need rollback, modification local matter state table first is Aborted, and completes correlation Rolling back action, then directly modification global transaction state table is Gaborted;
If 2) affairs pass through, executes Qualify Phase and execute the submission behaviour for the business that finishes in local data node Make.
Above-mentioned steps 1.1.2.3) in, write-in presentation stage is entered if affairs T is by verifying, i.e., by affairs T logarithm According to update be persisted in database, and do some cleaning task afterwards.Local data node is needed in write-in presentation stage It performs the following operations:
I, to each readset element x:
A) rts for modifying x, is larger than equal to T.cts, i.e. x.rtx=max (x.rtx, T.cts);
B) oneself is deleted from RTlist (x);
II, collection element y is write to each:
C) wts and rts of y new version are updated, wherein wts=T.cts;
D) rts=max (x.rtx, T.cts) of y is updated
E) y is persisted in database, and modifies flag, mark is externally visible under RUC-CC agreement;
F) the RTlist list content of y is emptied;
G) the WT content of y is emptied.
III, the local readset for emptying affairs T and write collection;
The Lowts=T.cts of T and state are committed (local matter state at this time in IV, update LocalTS Table, is only used for transaction consistency, is not related to the synchronization of global transaction state)
V, it is returned to the host node of affairs T and completes to submit successful ACK;
It is Gcommitted by global transaction status modifier after the host node of affairs T receives all completions submission ACK. And notify each back end that can clear up the state of affairs T from local matter state table.
If affairs T is not over verifying, into global rollback phase, i.e., by affairs T rollback, and corresponding fettler is done Make, cleaning work content includes:
I, to each readset element x, T is deleted from RTlist (x);
II, collection element y is write to each, clears up newly created version y, and the WT content of y is emptied;
III, the local readset for emptying affairs T and write collection;
IV, the local matter state for updating affairs T are Aborted;
V, the ACK for completing rollback is returned to the host node of affairs T;
By it is above-mentioned rule it is found that affairs T submission/rollback phase, communicate mainly in the host node and phase of affairs T Occur between climax RM, communication mainly includes following two step:
A) the host node of affairs T sends submission/rollback request message to each correlator RM;
B) each correlator RM sends submission/rollback to host node and completes corresponding message.
Therefore, submission/rollback phase at most carries out 2m communication, and the size of the traffic is m × (submission/rollback request disappears Cease size+submission/rollback request message size), wherein m is the number of affairs T-phase climax RM.
It is Gaborted by global transaction status modifier after the host node of affairs T receives all completion rollback ACK.And Notify each RM that can clear up the state of affairs T from LocalTS.A kind of optimal way is: system can batch send out to RM Cleaning message is sent to reduce number of communications.
Above-mentioned steps 1.2) in, the linear consistency based on length of a game's stamp guarantees algorithm, when being started based on read transaction, Length of a game's stamp that cluster obtains is generated to global Gts, to determine sequence of the read transaction under system global clock, thus really Which fixed data meet linear consistency for current read transaction.This algorithm regards operation of the affairs within the period as one Operation on time point is stabbed based on global transaction state and length of a game, main to calculate to guarantee the linear consistency between affairs Method process is as follows:
1.2.1) client initiates affairs T request, is established a connection by Proxy, forms a session.
1.2.2) Proxy parses affairs T, and chooses host node to be responsible for managing the implementation procedure of the affairs.
1.2.3 when) read transaction T starts, length of a game when global Gts is generated cluster obtains affairs is stabbed, and Gts is recorded in the global transaction state of the read transaction.Host node back end relevant to all read transactions, which is established, to be connected It connects, by the query execution plan and Gts formation data packet after parsing, all relevant data sections is handed down to by network communication Point.
1.2.4) back end respectively carries out data read operation, the data item of alternative condition is determined for compliance with, then to every One has the data item of multiple versions, is traversed since latest edition, until finding its first visible version.
Above-mentioned steps 1.2.4) in, to each logical data item, the method for finding first visible version are as follows:
1.2.4.1 transaction status extraction algorithm) is executed, the acquisition generation version writes affairs relative to current read transaction Transaction status snapshot.
1.2.4.2 Gts visibility judge algorithm) is utilized, stabs according to read transaction length of a game and generates the version and write affairs Transaction status snapshot, judge whether the version visible.
Above-mentioned steps 1.2.4.1) in, the purpose of transaction status extraction algorithm is when finding for current read transaction T.Gts Carve the transaction status for writing affairs for generating a certain version.If this is write affairs and submits without the overall situation, state is executed in read transaction It may be updated in the process, so needing to find the transaction snapshot for guaranteeing read transaction T consistency, the process of the algorithm is as follows:
I, according to the gts field on versions of data v, the global state record for the affairs for generating the version is read.
If a) had recorded in data item length of a game stamp Gts, obtain generate the data item affairs located Gcommitted state is submitted in global, and length of a game's stamp is Gts.
If that b) record in data item is TID, need according to the RM.ID for including in TID, by network communication to long-range Host node sends request, and the corresponding transaction status of TID is searched in the global transaction status list on long-range host node Record.
II, according to the global state record and read transaction Gts got in previous step, reduction obtains guaranteeing read transaction T The transaction snapshot of consistency.
Because versions of data is read into memory at this time, SNAPSHOT INFO will be recorded directly to the gts field of version v by we On, it is used for visibility judge algorithm.Snapshot acquisition methods are as follows:
If a) status of the corresponding global transaction state recording of versions of data v be Gcommitted, to v.gts into The judgement of one step:
If v.gts > Gts, prove that the affairs that version v is generated at the Gts moment are not submitted, v.gts is set to T.Gts+1.
If v.gts≤Gts, prove that at the Gts moment be Gcommitted state.Because being verified in the present invention It is to carry out global submit to confirm at and into presentation stage, it is believed that in Gts, the overall situation mentions the affairs of generation version v It hands over, does not need modification v.gts.
If b) status of the corresponding global transaction state recording of v is Grunning or Gvalidating, The affairs centainly global submission of version v is generated when T.Gts.
If c) status of the corresponding global transaction state recording of v is Gaborted or Gaborting, right V.gts further judges:
If v.gts > Gts, the Gts moment generate version v affairs be Grunning or Gvalidating or Gaborting state (i.e. not global to submit);
It is Gaborted state in the affairs that the Gts moment generates version v if v.gts≤Gts.
Above-mentioned steps 1.2.4.2) in, using Gts visibility judge algorithm, being stabbed and generated according to read transaction length of a game should Version writes the transaction status snapshot of affairs, judges the whether visible detailed process of the version are as follows:
I, each RM will be seen that data form data packet, be sent to host node by network communication.
II, host node summarize the data that all back end return, and return to Proxy, and Proxy is to establishing session The client returned data of relationship, current read transaction are completed.
III, the algorithm, can be with returned datas by a wheel communication, and need to carry out with " global Gts generates cluster " primary Communication.
Gts visibility judge algorithm acts on each back end, obtains global linear consistent data to judge. According to the gts field on only read transaction Gts and tuple, judge whether data item is visible.Meet the affairs for generating the version simultaneously For providing time point, the global affairs for submitting and modifying the version do not complete also for providing the time point overall situation, Ji Kebao It is visible to read transaction T to demonstrate,prove versions of data v.Therefore, it is seen that the Rule of judgment of property is as follows:
If IV, meeting v.info_bit!=0&&v.gts < T.Gts, that is, meet current gts field record is global Timestamp Gts, and v.gts is less than T.Gts, then and the version is as it can be seen that meet linear consistency.
V, for each logical data item, first physics version (newest visible version) for meeting Rule of judgment is found, As data item version visible for this read transaction.The reason is that the ergodic process of version is by more recent version to older version This, so the version read must be the visible version of the newest modification of data item when first time condition is satisfied.
Above-mentioned steps 1.3) in, linear consistency is read twice and guarantees algorithm, is referred to when read transaction starts, and it is global to save whereabouts Gts generates the process of cluster application length of a game stamp, reads behaviour using the thinking for carrying out reading data twice, and by the way that calculating is current The length of a game of work stabs, and to determine current read operation in global sequence, meets global linear consistent data to obtain.Its The main process of algorithm is as follows:
1.3.1) client initiates affairs T request, is established a connection by Proxy, forms a session.
1.3.2) Proxy parses affairs T, and chooses host node to be responsible for managing the implementation procedure of the affairs.
1.3.3) first round communicates, and host node back end relevant to all read transactions establishes connection, Xiang Suoyou Related data node sends data acquisition request.All related data nodes execute first time reading data algorithms, and by data Host node is returned to, host node determines that Gts stabs in the length of a game of current read transaction T based on the data item of all returns.
1.3.4) the second wheel communication, host node send data acquisition request to all related data nodes, and will T.Gts is sent to all related data nodes, and second of reading data algorithm is executed on back end, returns for T.Gts Meet the versions of data of linear consistency.
1.3.5) host node summarizes the data of back end return, and returns to Proxy, and Proxy is closed to session is established The client returned data of system, current read transaction are completed.
Above-mentioned steps 1.3.3) in, the detailed process of first round communication are as follows:
1.3.3.1) on each back end, to the data item for meeting alternative condition, each data item is traversed Multiple versions (being traversed since current latest edition), judge each version v, qualified until finding It can be seen that version, and it is sent to host node.
1.3.3.2) host node is according to the visible version received, to determine length of a game's stamp of current read transaction, from And determine the global sequence of current read transaction T.Specific method are as follows: all versions of data read of traversal, comparison are counted According to the maximum gts value recorded on item, it is denoted as gts_max.Then length of a game's stamp of current read transaction is gts_max, i.e., T.Gts=gts_max.The last one bit of T.Gts is set to 1 simultaneously, mark Current transaction is read transaction, and indicate its After sequence in the overall situation is gts_max, before gts_max+1.
Above-mentioned steps 1.3.3.1) in, method that each version v is judged are as follows:
If I, having had recorded length of a game stamp gts in data item, prove that the affairs for generating the data item are complete Office submits, and length of a game's stamp is gts.The current version is visible version at this time.
If in gts field record being TID in II, data item, the global transaction status list on the back end Middle lookup obtains the corresponding global transaction state recording of TID.Because the overall situation passes through verifying to the corresponding affairs of the TID at this time. The current version is visible version at this time, and is TID.Gts by the gts field record in the data item.
If in gts field record being TID in III, data item, the global transaction status list on the back end In do not find corresponding global transaction state recording, can not judge whether the version visible by local information at this time.At this time The version is invisible, needs to continue cycling through and judge the visibility of its previous release.
Above-mentioned steps 1.3.4) in, the second wheel communication refers on each back end, to the number for meeting alternative condition According to item, multiple versions (being traversed since current latest edition) of each data item are traversed, for each version v, into The following judgement of row, finds the versions of data for meeting linear consistency, detailed process are as follows:
1.3.4.1) if having had recorded length of a game stamp gts in data item, the affairs for generating the data item are proved It is global to submit, and length of a game's stamp is gts.If judging T.Gts >=gts, current version is visible version.
1.3.4.2) if in gts field record being TID in data item, the global transaction state on the back end It is searched in list and obtains the corresponding global transaction state recording of TID, because the corresponding affairs of the TID are global by testing at this time Card.If judging T.Gts >=TID.Gts, current version is visible version.
1.3.4.3) if in gts field record being TID in data item, the global transaction state on the back end Corresponding global transaction state recording is not found in list, can not judge whether the version is visible by local information at this time. It needs according to the global TID and RM.ID therein recorded in data item, is asked by network communication to long-range host node transmission It asks, the corresponding transaction status record of TID is searched in the global transaction status list on long-range host node.It needs to judge such as Fruit TID.status=Gcommitted or Gcommitting, and T.Gts >=TID.Gts, then otherwise the version was as it can be seen that should Version is invisible, needs to continue cycling through and judge the visibility of its previous release.
It should be noted that read transaction length of a game stamp is determined used by the algorithm by the strategy of twi-read Mode, it can be determined that the global clock sequence between read-write operation, but the sequence between two read transactions can not determine.Therefore, We need to be defined in session layer in face of the sequence of read operation, to prevent from reading the out-of-order problem under complications.
Above-mentioned steps 1.4) in, as shown in fig. 7, can produce a variety of consistency grades by being combined different technologies Not, and reach different system effectivenesies, to meet different business demands.In the present invention, consistency level is distinguished as follows Four ranks (sorting by external consistency rank):
Crash consistency: representing system while meeting linear consistency and transaction consistency, be in consistency level most It is high-level.
Linear consistency: the system of representative fully meets the requirement of linear consistency, but does not do to the guarantee of transaction consistency It is required that.
Affairs cause and effect consistency: the system of representative fully meets transaction consistency and meets cause and effect consistency.
The affair logic consistency: the system of representative fully meets transaction consistency but the rank of external consistency is not wanted It asks.
Above-mentioned steps 1.4) in, according to the four of division ranks, transaction consistency and linear consistency are combined Method are as follows:
1.4.1) by DTA (Dynamic Timestamp Allocation, hereinafter referred to as DTA)+OCC (Optimistic Concurrency Control, hereinafter referred to as OCC) algorithm and MVCC (Multi-Version Concurrency Control, Hereinafter referred to as MVCC) it combines, propose RUC-CC algorithm so that distributed system meets the requirement of the affair logic consistency.
1.4.2 it) is combined according to MVCC algorithm with overall situation Gts, so that distributed system meets global transaction operation and meets The requirement of linear consistency.
1.4.3 overall situation Gts) is introduced in RUC_CC algorithm, so that distributed system meets wanting for affairs cause and effect consistency It asks.
1.4.4) RUC-CC algorithm is combined with linear consistency algorithm, so that distributed system meets crash consistency Requirement.
1.4.5) to step 1.4.1)~1.4.4) in the system execution efficiencys of a variety of consistency levels be compared.
Above-mentioned steps 1.4.1) in, RUC-CC algorithm is integrated in the algorithm of DTA+OCC using by the thinking of MVCC, real Show more efficient serializability isolation level, the main process of argumentation is as follows:
In the rule in the reading stage a) introduced in the part step 1.1.2.1), present invention uses MVCC technologies.It deposits In concurrent transaction T1 and T2, if T2 has modified data item x, it is modified to x1 version from x0 version, T1 needs to read data item x. The present invention utilizes multi version mechanism, and T1 can read the version x 0 before T2 modification, to eliminate the then write-read between T2 and T1 Write-read conflict is not present between any concurrent transaction that is, in system for dependence.
B) since T1 can read the version before T2 modification, T1 is first read in logic, and T2 writes again, and there are read-write dependency passes System.Hereby it is ensured that the key property of " read-write is not mutually blocked " in MVCC.And the dependence passes through in data structure RTlist structure is safeguarded, for the guarantee for the serializability that conflicts.
C) because write-read conflict is eliminated, and read/write conflict is safeguarded with the mode of more lightweight, thus protecting Demonstrate,prove serializability on the basis of, the rollback rate of affairs can be reduced, and promote the efficiency of transaction scheduling, therefore, this method have compared with Efficient transaction scheduling performance.
Above-mentioned steps 1.4.2) in, the linear consistency proposed in the present invention guarantees that algorithm combines the thinking of MVCC, base In the multiple versions safeguarded in system, the data for meeting linear consistency are read from multiple versions.It is proposed in the present invention Three algorithms are not quite similar for meeting the visibility judging method of the data of linear consistency, but be based on the thinking of MVCC into Row.For example, in the algorithm based on length of a game stamp Gts, the visibility of version passes through gts field in data item and read transaction Being compared between Gts can determine whether, so that the thinking of multi version be utilized, it may be convenient to which positioning specifically meets linear one The versions of data of cause property.
Above-mentioned steps 1.4.3) in, if DTA+OCC algorithm is in conjunction with global Gts generation technique, it is ensured that external consistent Cause and effect consistency in property, the process of argumentation are as follows.The requirement of cause and effect consistency are as follows: with causal relation operation need according to because Fruit logic is orderly.
A) assume with the presence of affairs T1 and the following partial ordering relation of affairs T2:
I. the more new data item R of affairs T1 first and version R1 is generated, then T1 affairs, which are submitted, obtains length of a game's stamp.ii. Then affairs T2 more new data item R and tuple R2 is generated, then T2 affairs, which are submitted, obtains length of a game's stamp.
B) operation for assuming affairs T3 and affairs T4 is reading data item R, and concurrent with affairs T2, and there are partial order passes System is T3 before T4.
At this point, we ensure that the method for cause and effect consistency are as follows:
A) for each affairs, it is under the jurisdiction of some session (maintenance when establishing connection by client and system service), It will be interacted with " global Gts generates cluster ", to acquire respective sequence.Such as i.e. by " complete between T3 and T4 Office Gts generates cluster " it ensure that sequence is that T3 will be prior to T4.
B) T3 affairs are submitted prior to T4 affairs, DTA+OCC algorithm are based on, if the logical time stamp of T3 affairs is adjusted to T2 Later, that is, version R2 is read;If the logical time stamp of T3 affairs is adjusted to before T2, that is, read version R1.Therefore, T3 affairs It is possible that reading one in version R1 and R2.
C) it is submitted after T4 affairs, therefore, in the case where T3 reads R1, T4 can only read R1 or R2, and there is no read The scheduling of R0;In the case where T3 reads R2, the logical time stamp of T4 is also naturally larger than T2, therefore can only read R2.
DTA+OCC algorithm itself ensure that transaction consistency, but if not combining MVCC, it is higher to have rollback rate Problem, it is therefore, lower from the efficiency of the algorithm from the point of view of affairs dispatch layer face.
Above-mentioned steps 1.4.4) in, three technologies are combined, can be realized and unified transaction consistency and linear one " crash consistency " (the system consistency) of cause property, as the highest consistency level in distributed data base system.It is " completely the same Property " guarantee there is biggish transaction validation expense, therefore system performance can be influenced to a certain extent.But due to consistency and There are trade-off relationships between system effectiveness, thus it is guaranteed that crash consistency needs to sacrifice certain system performance.
Under " crash consistency " isolation level, read operation needs RUC-CC algorithm is linear consistent with what it is based on timestamp Property guarantee that algorithm is used in combination with, therefore, the operational process of read transaction needs to expand as follows:
A) it firstly, when affairs start, is communicated with " global Gts generates cluster ", when obtaining the overall situation of current read transaction Between stab T.Gts.
B) then, in the stage of reading, required data are got by DTA+OCC algorithm.
C) after, linear consistency checking operation is carried out to the data read, needs to call based on length of a game at this time The linear consistency of stamp guarantees algorithm, judges whether the data currently read meet linear consistency.
I. if all data passed linear consistency judgement, summarize data and return to user.
Ii. cannot be judged by linear consistency if there is data, there are two types of processing methods at this time: (1) this read transaction Rollback;(2) read transaction is retried, if retrying still can not read qualified number three times (parameter, settable) According to the then transaction rollback.
Under " affairs cause and effect consistency " rank, consider for performance, height also may be implemented in the combination for three technologies The affairs cause and effect consistency of effect.Because DTA+OCC algorithm combines overall situation Gts generation technique, that is, it can guarantee affairs cause and effect consistency. MVCC is merged at this time, can reduce the rollback rate in DTA+OCC, thus lifting system performance.Under this consistency level, do not have to Linear consistency checking as described above is carried out to read operation, thus improves the execution efficiency of read operation.
Above-mentioned steps 1.4.5) in, different consistency levels, need through the invention proposed in different technologies group It closes to guarantee, therefore, the efficiency of system also can difference.We carry out the factor that efficiency is influenced under different consistencies rank Summarize, and demonstrate weaker consistency level and can have preferable system effectiveness, still, due to coherence request be Relationship between efficiency of uniting is necessarily negatively correlated, and therefore, higher consistency level performance is weaker to be tolerated:
1. crash consistency rank, it is necessary to combine linear consistency to guarantee that algorithm just can guarantee using RUC-CC algorithm.Cause This, two algorithm joints require additionally to carry out following two operations, so that certain performance be brought to be lost:
A) read operation additional authentication.Read operation requires to carry out consistency checking to the data read, could to read Data guarantee linear consistency while guaranteeing transaction consistency.Therefore, two algorithms are used in combination, have aggravated number According to verifying expense, so as to cause performance loss.
B) the additional rollback of affairs.Meet transaction consistency in the presence of the transaction scheduling generated by RUC-CC algorithm, but discontented The case where foot linear consistency.So this kind of affairs needs are rolled back, to cause additional under crash consistency rank Rollback expense.
2. linear consistency, guaranteeing algorithm i.e. by linear consistency can guarantee.It eliminates in the guarantee of crash consistency rank The additional rolling back action of required read operation additional authentication and affairs, therefore have centainly compared to crash consistency rank Performance boost.
3. affairs cause and effect consistency level, is guaranteed by DTA+OCC+ global clock, it is linear consistent further to eliminate execution Property guarantee algorithm caused by expense, therefore compared to linear consistency level have certain performance boost.
4. the affair logic consistency is guaranteed by RUC-CC algorithm.The algorithm does not account for the guarantee of external consistency, Therefore, the expense communicated with global clock is further saved.Therefore there is certain property compared to affairs cause and effect consistency level It can be promoted.
Above-mentioned steps 2) in, determine that distributed system needs consistency level to be achieved according to actual business demand, and Determine that being suitable for the consistency that the consistency level requires executes algorithm based on the system consistency model of foundation, to distributed system In distributed transaction and single machine affairs executed, comprising the following steps:
2.1) whether need to operate the data on multiple Resource Management nodes according to affairs, it will be in distributed system The affairs being related to are divided into distributed transaction and two kinds of single machine affairs.
2.2) adaptable consistency is required to execute algorithm using with consistency level, to the distribution in distributed system Affairs are executed;
2.3) adaptable consistency is required to execute algorithm using with consistency level, to the single machine thing in distributed system Business is executed.
Above-mentioned steps 2.1) in, carrying out classification to affairs involved in distributed system is because of distributed data base system In, the smallest operation execution unit is affairs.Whether need to operate the data on multiple back end according to affairs, thing Business is divided into distributed transaction and two kinds of single machine affairs.Both affairs are directed to, the present invention takes different hold respectively Row process promotes issued transaction efficiency to reduce the communication overhead between node to the greatest extent.
Wherein, distributed transaction represents affairs needs and is written and read across multiple Resource Management nodes, i.e., affairs can be right Data on multiple RM are operated.For example, affairs T needs running node RM1, RM2 and RM3, then the affairs are a distribution Formula affairs.In this case need to introduce coordinator node host node, to store global transaction shape in affairs implementation procedure State information, and the implementation procedure of affairs is managed.The selection mode of coordinator node host node has the following two kinds:
A) mechanism is randomly selected, i.e., a node is randomly selected from host node set, as host node.
B) it determines selection mechanism, i.e., host node is chosen by a certain determining rule.Such as define selection rule be from It is chosen in host node set by polling mechanism.
Single machine affairs represent affairs and only need to operate the data on individual data node, for example, affairs T needs to grasp Make node R M1, then the affairs are a single machine affairs.Single machine affairs are in the process of implementation, it is only necessary to carry out one with coordinator node Wheel communication.
Above-mentioned steps 2.2) in, it requires to hold the distributed transaction in distributed system according to different consistencies rank Capable process are as follows:
2.2.1) Client client is responsible for issuing the request for executing affairs T, and Proxy is responsible for receiving asking of sending of client It asks, and establishes session relationship with client.
2.2.2 it after) Proxy receives solicited message, is interacted with metadata management cluster, obtains relevant meta information Afterwards, to requirement analysis, SQL statement is routed to different host node.
2.2.3) host node optimizes SQL and generates physics executive plan, and carries out global transaction initial work, remembers Global transaction status information etc. is recorded, executive plan is then decomposed into the executive plan on each node, is sent to corresponding number The state of being currently running is denoted as according to node, and by global transaction state.
2.2.4) back end is respectively adopted the algorithm being adapted with consistency level requirement and is counted according to executive plan According to operation, and local transactional execution state is recorded, after the completion of back end locally executes reading and writing data, is sent out to host node " can verify " is sent to instruct;It is specific:
When for the affair logic consistency and affairs cause and effect coherence request: back end is according to executive plan, using step It is rapid 2.1) in RUC-CC algorithm carry out data manipulation and transaction scheduling.
When for the requirement of linear consistency level: back end is according to executive plan, using linear one in step 1.2) Cause property guarantees that algorithm carries out data manipulation, and carries out transaction scheduling based on MVCC algorithm.
When for the requirement of crash consistency rank: back end is according to executive plan, using linear one in step 1.2) Cause property guarantees that the RUC-CC algorithm in algorithm combination step 1.1) carries out data manipulation and transaction scheduling.
2.2.5) host node receive whole related data nodes send " can verify " instruction after, record global transaction State is to verify, and send " verifying " instruction to all related data nodes.
2.2.6 after) back end receives " verifying " instruction, this is entered using the verification method in step 1.1.2.2) Process is verified on ground, if the verification passes, is then sent " being verified " instruction and is given host node.
2.2.7 after) host node receives " being verified " instruction that whole related data nodes are sent, according to not same Cause property rank requires to determine the need for interacting with " global Gts generates cluster " to obtain the length of a game of affairs stamp, and Record global transaction state is to have been filed on.Then, while opening two threads: first is used to result set returning to Proxy, It is responsible for implementing result returning to client by Proxy;Second will send " submission " to all related data nodes and refer to It enables.
When for the affair logic coherence request: host node receive that whole related data nodes send " verifying is logical Cross " instruction after, record global transaction state be have been filed on.
When for the requirement of affairs cause and effect consistency level, linear consistency and crash consistency: host node receives whole After " being verified " instruction that related data node is sent, needs to interact with overall situation Gts generation cluster, obtain the complete of affairs Office's timestamp, and recording global transaction state is to have been filed on.
2.2.8 after) back end receives " submission " instruction, local is entered using the submission method of step 1.1.2.3) Submit process.
Above-mentioned steps 2.3) in, it requires to execute the single machine affairs in distributed system according to different consistencies rank Process are as follows:
2.3.1) Client client is responsible for issuing the request for executing affairs T, and Proxy is responsible for receiving asking of sending of client It asks, and establishes session relationship with client.
2.3.2 it after) Proxy receives solicited message, is interacted with metadata management cluster, obtains relevant meta information Afterwards, to requirement analysis, different host node is given by route assignment.
2.3.3) host node optimizes SQL, and generates physics executive plan, physics executive plan is sent to selected Transaction coordination node host node.Host node carries out affairs initial work, and record transaction status is positive after operation, directly It connects and sends corresponding back end for executive plan.
2.3.4) back end is respectively adopted the algorithm being adapted with consistency level requirement and is counted according to executive plan According to operation, and local transaction status is recorded, after the completion of back end locally executes reading and writing data, is directly entered verifying process, If the verification passes, then it sends " being verified " instruction and gives host node, and enter local submission process.
When for the affair logic consistency and affairs cause and effect consistency level requirement when, back end by RUC-CC algorithm into Row data manipulation and transaction scheduling;
When for the requirement of linear consistency level, back end by the linear consistency in step 1.2) guarantee algorithm into Row data manipulation, and transaction scheduling is carried out based on MVCC algorithm;
When for the requirement of crash consistency rank, back end guarantees algorithm knot by the linear consistency in step 1.2) The RUC-CC algorithm closed in step 1.1) carries out data manipulation and transaction scheduling.
2.3.5 it after) host node receives " being verified " instruction that RM is sent, is required according to different consistencies rank true It is fixed whether to need to interact with overall situation Gts generation cluster to obtain the length of a game of affairs stamp, and recording transaction status is It submits, result set is then returned into Proxy, is responsible for implementing result returning to client by Proxy.
When for the affair logic coherence request: host node receive that whole related data nodes send " verifying is logical Cross " instruction after, record global transaction state be have been filed on.
When for the requirement of affairs cause and effect consistency level, linear consistency and crash consistency: host node receives whole After " being verified " instruction that related data node is sent, needs to interact with overall situation Gts generation cluster, obtain the complete of affairs Office's timestamp, and recording global transaction state is to have been filed on.
It is special in the above method, it has been more than system if the read-write collection of affairs is larger in transaction process Memory ability to bear, then affairs can not be executed because memory overflows.Following 3 kinds of methods can be used to solve this problem:
1. threshold method: when affairs T the readset of some RM or write collection size be more than a certain threshold value threshold when, then end The only execution of affairs T prevents affairs T from depleting Installed System Memory, and other affairs can not execute, and wherein the selection of threshold can To be configured by parameter, such as its size can be equal to the 60% of the RM free memory.This method is easy to use, but can These read-writes can be made to collect biggish affairs and be unable to get execution;
2. dump method: when affairs T the readset of some back end or write collection size be more than a certain threshold when, then It by the readset of affairs T or writes collection and brushes on disk, when affairs T needs to access the read-write set of oneself, then read in memory, Wherein the selection of threshold can be configured by parameter, such as its size can be equal to the 60% of RM free memory.It should Although method does not need to terminate the execution of affairs T, but since it is desired that read-write disk, brings additional I/O cost;
3, optimization: when only read transaction T readset is excessive, the Parallel access control algorithm of RUC-CC can not be used to carry out Scheduling guarantees that algorithm carries out reading data using the linear consistency stabbed based on length of a game.
It is an extensive concept it should be noted that stating " external consistency " in the present invention, i.e., all distributed systems The linear consistency of system consistency, cause and effect consistency, it is dull read, dullness is write etc. is included in the scope of " external consistency " It is interior.Global clock is the meaning of a logic total order, not specific finger physical timestamp.
The various embodiments described above are merely to illustrate the present invention, wherein the structure of each component, connection type and manufacture craft etc. are all It can be varied, all equivalents and improvement carried out based on the technical solution of the present invention should not exclude Except protection scope of the present invention.

Claims (15)

1. a kind of distributed system for guaranteeing transaction consistency and linear consistency, it is characterised in that: comprising: multiple client And by access layer, metamessage management cluster, the database service that overall situation Gts generates cluster and issued transaction and accumulation layer are constituted End;
The client is used to provide the interface interacted with the database service end for user, sends user's request to The database service end;
The access layer is used to receive the request that the client is sent, and parses generation executive plan;
The metamessage management cluster is for being managed collectively the distributed type assemblies of the distributed system;
The overall situation Gts generates cluster, for generating length of a game's stamp, is uniquely arranged the global transaction in distributed system Sequence is to realize linear consistency;
The issued transaction and accumulation layer include multiple Resource Management nodes, and the Resource Management node includes coordinator node sum number According to node, the executive plan that the coordinator node and back end are used to be sent according to access layer executes the affair logic, obtains As a result the client is returned through the access layer.
2. a kind of distributed system for guaranteeing transaction consistency and linear consistency as described in claim 1, it is characterised in that: The data in the distributed system are carried out subregion storage, the coordinator node is used for described point the back end Affairs in cloth system carry out Coordination Treatment;For the purposes of all Resource Management nodes, there is the following two kinds distribution side Formula:
Master slave mode: a portion Resource Management node is specially used as to the coordinator node of issued transaction, while residue being provided Source control node is used as back end;
Ad-hoc mode: all Resource Management nodes be it is reciprocity, each Resource Management node has data simultaneously Two functions of node and coordinator node.
3. a kind of distributed system for guaranteeing transaction consistency and linear consistency as described in claim 1, it is characterised in that: Length of a game's stamp that the overall situation Gts generates cluster is made of eight bytes, and eight bytes are using mixing physics clock mode composition:
A) first 44 are physical time timestamp value;
B) afterwards 20 in one millisecond monotonic increase count.
4. a kind of distributed system for guaranteeing transaction consistency and linear consistency as described in claim 1, it is characterised in that: The Data Structures that the distributed system is related to include global transaction state table, local matter state table, data item data Structure, affairs global readset write collection and communication protocol and four class data structure of message;
The global transaction state table for safeguarding from the transaction status from the point of view of the distributed system overall situation, with hexa-atomic ancestral TID, Lowts, Uppts, Status, Gts, Nodes } it indicates, wherein TID represents affairs unique identification, and Lowts represents the affair logic Submission time stabs lower bound, and Uppts represents the affair logic submission time stamp upper bound, and Status represents Current transaction locating for the overall situation State, Gts represent the timestamp that affairs overall situation submission/rollback is completed, and Nodes represents the back end that Current transaction is related to;
The local matter state table is used for local matter state of the care of transaction on each Resource Management node, with TID, Lowts, Uppts, Status } it indicates, wherein TID represents affairs unique identification, and Lowts represents the affair logic submission time stamp Lower bound, Uppts represent the affair logic submission time stamp upper bound, and Status represents the local state of affairs;
The data item data structure includes as first group of data element of linear consistency foundation and as distributed transaction Second group of data element of consistency, first group of data element include { gts, info_bit }, wherein gts represents one The globally unique sequence of affairs in a distributed system, info_bit are Gts to identify currently record in gts field Or TID;Second group of data element includes { wts, rts }, wherein wts is used to record the thing for creating the data item version The logical time stamp of business, rts are used to record the logical time stamp of the newest affairs for reading the data item;
The global readset of the affairs reads total data item for recording office, with BlockAddress, Offset, Size, Value } it indicates, wherein BlockAddress represents data item and corresponds to block address, and Offset represents offset of the data item in block Amount, Size represent data item size, and Value represents data item occurrence;
The global write collection of the affairs is used to record the total data item that affairs need to update, with BlockAddress, Offset, Size, NewValue, OperationType } it indicates, wherein BlockAddress is with representing data item corresponding blocks Location, Offset represent offset of the data item in block, and Size represents data item size, and NewValue represents data item occurrence, It is update, insertion or delete operation that OperationType, which represents operation,;
The communication protocol and message include that the coordinator node is sent to the message of the back end, the back end is sent to The message of the coordinator node, the coordinator node are sent to the overall situation Gts and generate the message of cluster, overall situation Gts generation Cluster is sent to the message of the coordinator node;The message that the coordinator node is sent to the back end includes that read data request disappears Breath, checking request message, write-in submission/rollback request;The message that the back end is sent to the coordinator node includes reading to ask Ask feedback message, local verification feedback message;It includes the overall situation that the coordinator node, which is sent to the overall situation Gts and generates the message of cluster, Timestamp request message;The message that the overall situation Gts spanning set mass-sends the past coordinator node includes that length of a game's stamp please negate Present message.
5. a kind of using the distributed system for guaranteeing transaction consistency and linear consistency as described in any one of Claims 1 to 4 Multi-level coherence method, it is characterised in that the following steps are included:
1) the system consistency model that can be realized multi-level consistency is established;
2) determine that distributed system needs consistency level to be achieved according to actual business demand, and the unified cause based on foundation Property model determine be suitable for the coherence request consistency execute algorithm, to the distributed transaction and single machine in distributed system Affairs are executed, and transaction execution results are obtained.
6. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 5, special Sign is: in the step 1), establishing the method that can be realized the system consistency model of multi-level consistency, including following step It is rapid:
1.1) transaction concurrency control is carried out using the OCC strategy based on DTA, establishes the RUC- for guaranteeing transaction consistency CC algorithm;
1.2) based on global Gts spanning set all living creatures at length of a game stamp and global transaction state, establish be based on length of a game The linear consistency of stamp guarantees algorithm, for guaranteeing linear consistency between affairs;
1.3) using the method for carrying out reading data twice, foundation reads linear consistency twice and guarantees algorithm, for guaranteeing affairs Between consistency;
1.4) step 1.1)~step 1.3) transaction consistency and linear consistency and MVCC algorithm are combined, are established It can satisfy the unified of a variety of consistency levels and cause model.
7. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 6, special Sign is: in the step 1.1), carrying out transaction concurrency control using the OCC strategy based on DTA, establishes RUC-CC algorithm Method, comprising the following steps:
1.1.1) to the affairs T sent by client, corresponding initial work is completed on coordinator node;
1.1.2 the global execution stage of affairs) is divided into 3 stages: reading stage, Qualify Phase and submits write-in/rollback Stage, and under the coordination of coordinator node, each back end relevant to operation executes affairs, and to transaction status table The corresponding table item of middle submission or rollback affairs is purged.
8. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 7, special Sign is: the step 1.1.2) in, the global execution stage of affairs is divided into 3 stages: read the stage, Qualify Phase and Write-in/rollback phase is submitted, and under the coordination of coordinator node, each back end relevant to operation executes affairs, The following steps are included:
1.1.2.1) affairs T reads required data according to logic is executed, and will update the local memory for writing affairs T, 1.1.2.2) affairs T verifies whether itself conflicts with the presence of other affairs, is verified result;
1.1.2.3) for affairs T according to the verification result of Qualify Phase, selection, which executes, is written submission or rollback.
9. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 8, special Sign is: the step 1.1.2.1) in, affairs T reads required data according to logic is executed, and will update the sheet for writing affairs T The method of ground memory are as follows:
Firstly, the coordinator node of affairs T needs to send the read data request of item of read data x to the back end where data item x Message;
Then, after back end where data item x receives read data request message, first to the local matter state table of affairs T It is establishd or updated, the visible version of data item x is then searched in the logic life cycle of affairs T, and to the association of affairs T Point of adjustment sends read request feedback message;
Finally, after the coordinator node of affairs T receives the read request feedback messages of all back end, to whether rollback being needed to carry out Judgement then enters global rollback phase, otherwise affairs continue to execute if necessary to rollback.
10. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 8, special Sign is: the step 1.1.2.2) in, affairs T verifies whether itself conflicts with the presence of other affairs, is verified result Method are as follows:
Firstly, the coordinator node of affairs T modifies the state of affairs T in global transaction state table are as follows: Gvalidating;Then to Each back end that affairs T is related to sends checking request message and locally-written collection;
Secondly, local verification operation is executed after each back end that affairs T is related to receives checking request message, it is specific to wrap Include following steps:
1. updating T.Lowts=max (T.Lowts, vrm.Lowts), the T.Uppts=min of affairs T in local matter state table (T.Uppts,vrm.Uppts);
It is then authentication failed 2. checking whether T.Lowts is greater than T.Uppts, returns to Abort message to the coordinator node of affairs T Into rollback, otherwise enter step 3.;
3. finding each of transaction write collection data item y, then check whether the WT of data item y is empty:
If being not sky, Abort message is sent to the coordinator node of affairs T and enters rollback;
Otherwise it enters step 4.;
The WT for concentrating each data item y is write as T.TID 4. updating, and is adjusted in local matter state table under the timestamp of affairs T Boundary is larger than the rts of y;
It is then authentication failed, local rollback, then to the coordinator node of affairs T 5. checking whether T.Lowts is greater than T.Uppts Return to Abort message;Otherwise, it enters step 6.;
6. concentrating each element y to writing, the timestamp of affairs, eliminates read/write conflict in adjustment affairs T or RTlist;
7. creating the new version of data item y according to the updated value of data item y, while the global submission of expression new version is set Flag;
8. the local verification feedback message lvm of affairs T is returned to the coordinator node of affairs T, wherein the Lowts and Uppts of lvm points Logical time stamp bound of the affairs T on local data node is not had recorded;
Finally, being disappeared after the coordinator node of affairs T receives the local verification feedback messages of all Resource Management nodes according to what is received Breath is to determine that can affairs T pass through verifying.
11. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 6, special Sign is: in the step 1.2), based on global Gts spanning set all living creatures at length of a game stamp and global transaction state, build The vertical linear consistency based on length of a game's stamp guarantees algorithm, comprising the following steps:
1.2.1) client initiates affairs T request, is established a connection by access layer, forms a session;
1.2.2) access layer parses affairs T, and chooses coordinator node to be responsible for managing the implementation procedure of the affairs;
1.2.3 when) read transaction T starts, length of a game when global Gts is generated cluster obtains affairs is stabbed, and in the reading Gts is recorded in the global transaction state table of affairs;Coordinator node back end relevant to all read transactions establishes connection, will Query execution plan and length of a game stamp Gts after parsing form data packet, are handed down to all relevant numbers by network communication According to node;
1.2.4) all back end respectively carry out data read operation, the data item of alternative condition are determined for compliance with, then to every One has the data item of multiple versions, is traversed since latest edition, until finding its first visible version;
1.2.5) coordinator node summarizes the data that all back end return, and returns to access layer, and access layer is to establishing session The client returned data of relationship, current read transaction are completed.
12. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 6, special Sign is: in the step 1.3), using the method for carrying out reading data twice, the linear consistency of the reading twice of foundation guarantees to calculate The process of method are as follows:
1.3.1) client initiates affairs T request, is established a connection by access layer, forms a session;
1.3.2) access layer parses affairs T, and chooses coordinator node to be responsible for managing the implementation procedure of the affairs;
1.3.3) coordinator node back end relevant to all read transactions establishes connection, and Xiang Suoyou related data node is sent Data acquisition request, all related data nodes execute first time reading data algorithm, and return data to coordinator node, assist Point of adjustment determines that Gts stabs in the length of a game of current read transaction T based on the data item of all returns;
1.3.4) coordinator node sends data acquisition request to all back end again, and by determining current read transaction T's Length of a game stamp Gts is sent to all back end, and second of reading data algorithm is executed on back end, is returned for working as The length of a game stamp Gts of preceding read transaction T meets the versions of data of linear consistency;
1.3.5) coordinator node summarizes the data that all back end return, and returns to access layer, and access layer is to establishing session The client returned data of relationship, current read transaction are completed.
13. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 6, special Sign is: in the step 2), determining that distributed system needs consistency level to be achieved, and base according to actual business demand Determine that being suitable for the consistency that the consistency level requires executes algorithm in the system consistency model of foundation, in distributed system The method that is executed of distributed transaction, comprising the following steps:
2.1) whether need to operate the data on multiple Resource Management nodes according to affairs, it will be involved in distributed system Affairs be divided into distributed transaction and two kinds of single machine affairs;
2.2) adaptable consistency is required to execute algorithm using with consistency level, to the distributed transaction in distributed system It is executed;
2.3) require adaptable consistency to execute algorithm using with consistency level, to the single machine affairs in distributed system into Row executes.
14. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 13, It is characterized in that: in the step 2.2), requiring adaptable consistency to execute algorithm using with consistency level, be to distribution The process that distributed transaction in system is executed are as follows:
2.2.1) client is responsible for issuing the request for executing affairs T, and access layer is responsible for receiving the request sent of client, and with visitor Session relationship is established at family end;
2.2.2 it after) access layer receives solicited message, is interacted with metadata management cluster, it is right after obtaining relevant meta information Requirement analysis, and different coordinator nodes is given by route assignment;
2.2.3) coordinator node optimizes SQL and generates physics executive plan, and carries out global transaction initial work, and record is global Then executive plan is decomposed into the executive plan on each back end, is sent to corresponding data section by transaction state information Point, and global transaction state is denoted as the state of being currently running;
2.2.4) each back end is respectively adopted the algorithm being adapted with consistency level requirement and carries out data according to executive plan Operation, and local transactional execution state is recorded, after the completion of back end locally executes reading and writing data, sent to coordinator node " can verify " instruction;It is specific:
When for the affair logic consistency and affairs cause and effect coherence request: back end is according to executive plan, using RUC-CC Algorithm carries out data manipulation and transaction scheduling;
When for the requirement of linear consistency level: back end guarantees that algorithm carries out according to executive plan, using linear consistency Data manipulation, and transaction scheduling is carried out based on MVCC algorithm;
When for the requirement of crash consistency rank: back end guarantees that algorithm combines according to executive plan, using linear consistency RUC-CC algorithm carries out data manipulation and transaction scheduling;
2.2.5) coordinator node receive whole related data nodes send " can verify " instruction after, record global transaction state be It is verifying, and is sending " verifying " instruction to all related data nodes;
2.2.6 after) back end receives " verifying " instruction;Into local verification process, if the verification passes, then send " being verified " instructs to coordinator node;
2.2.7 after) coordinator node receives " being verified " instruction that whole related data nodes are sent, according to different consistencies grade It Yao Qiu not determine the need for interacting with overall situation Gts generation cluster to obtain the length of a game of affairs stamp, and record the overall situation Transaction status is to have been filed on;Then, while opening two threads: first is used to result set returning to access layer, by accessing Layer is responsible for implementing result returning to client;Second will send " submission " instruction to all related data nodes;
When for the affair logic coherence request: coordinator node receives " being verified " instruction that whole related data nodes are sent Afterwards, record global transaction state is to have been filed on;
When for the requirement of affairs cause and effect consistency level, linear consistency and crash consistency: coordinator node receives whole related After " being verified " instruction that back end is sent, need to interact with overall situation Gts generation cluster, when obtaining the overall situation of affairs Between stab, and record global transaction state be have been filed on;
2.2.8 after) each back end receives " submission " instruction, process is submitted into local.
15. a kind of multi-level coherence method for guaranteeing transaction consistency and linear consistency as claimed in claim 13, It is characterized in that: in the step 2.3), determining that distributed system needs consistency level to be achieved according to actual business demand, And determine that being suitable for the consistency that the consistency level requires executes algorithm, is to distribution based on the system consistency model of foundation The method that single machine affairs in system are executed, comprising the following steps:
2.3.1) client is responsible for issuing the request for executing affairs T, and access layer is responsible for receiving the request sent of client, and with visitor Session relationship is established at family end;
2.3.2 it after) access layer receives solicited message, is interacted with metadata management cluster, it is right after obtaining relevant meta information Requirement analysis gives different coordinator nodes by route assignment;
2.3.3) coordinator node optimizes SQL, and generates physics executive plan, and physics executive plan is sent to selected data section Point, coordinator node carry out affairs initial work, and record transaction status is positive after operation, directly send phase for executive plan The back end answered;
2.3.4) back end is respectively adopted the algorithm being adapted with consistency level requirement and carries out data behaviour according to executive plan Make, and record local transaction status, after the completion of back end locally executes reading and writing data, is directly entered verifying process, if It is verified, then sends " being verified " instruction to coordinator node, and enter local submission process;
When for the affair logic consistency and the requirement of affairs cause and effect consistency level, back end is counted by RUC-CC algorithm According to operation and transaction scheduling;
When for the requirement of linear consistency level, back end guarantees that algorithm carries out data manipulation, and base by linear consistency Transaction scheduling is carried out in MVCC algorithm;
When for the requirement of crash consistency rank, back end guarantees that algorithm combination RUC-CC algorithm carries out by linear consistency Data manipulation and transaction scheduling;
2.3.5 it after) coordinator node receives " being verified " instruction that back end is sent, is required according to different consistencies rank true It is fixed whether to need to interact with overall situation Gts generation cluster to obtain the length of a game of affairs stamp, and recording transaction status is It submits, result set is then returned into access layer, is responsible for implementing result returning to client by access layer;
When for the affair logic coherence request: after coordinator node receives " being verified " instruction that back end is sent, record Global transaction state is to have been filed on;
When for the requirement of affairs cause and effect consistency level, linear consistency and crash consistency: coordinator node receives back end It after " being verified " instruction sent, needs to interact with overall situation Gts generation cluster, obtains length of a game's stamp of affairs, and Record global transaction state is to have been filed on.
CN201910247559.7A 2019-02-02 2019-03-29 Distributed system and method for ensuring transaction consistency and linear consistency Active CN109977171B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019101072809 2019-02-02
CN201910107280 2019-02-02

Publications (2)

Publication Number Publication Date
CN109977171A true CN109977171A (en) 2019-07-05
CN109977171B CN109977171B (en) 2023-04-28

Family

ID=67081468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910247559.7A Active CN109977171B (en) 2019-02-02 2019-03-29 Distributed system and method for ensuring transaction consistency and linear consistency

Country Status (1)

Country Link
CN (1) CN109977171B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427427A (en) * 2019-08-02 2019-11-08 北京快立方科技有限公司 A kind of bridged by pin realizes global transaction distributed approach
CN110457157A (en) * 2019-08-05 2019-11-15 腾讯科技(深圳)有限公司 Distributed transaction abnormality eliminating method, device, computer equipment and storage medium
CN110807046A (en) * 2019-10-31 2020-02-18 浪潮云信息技术有限公司 Novel distributed NEWSQL database intelligent transaction optimization method
CN111159252A (en) * 2019-12-27 2020-05-15 腾讯科技(深圳)有限公司 Transaction execution method and device, computer equipment and storage medium
CN111190935A (en) * 2019-08-27 2020-05-22 中国人民大学 Data reading method and device, computer equipment and storage medium
CN111240810A (en) * 2020-01-20 2020-06-05 上海达梦数据库有限公司 Transaction management method, device, equipment and storage medium
CN111338766A (en) * 2020-03-12 2020-06-26 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN111399447A (en) * 2019-12-26 2020-07-10 德华兔宝宝装饰新材股份有限公司 Board-like customization furniture quality control system based on MES
CN111475585A (en) * 2020-06-22 2020-07-31 阿里云计算有限公司 Data processing method, device and system
CN111597015A (en) * 2020-04-27 2020-08-28 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN111651244A (en) * 2020-07-01 2020-09-11 中国银行股份有限公司 Processing system for distributed transactions
CN112286992A (en) * 2020-10-29 2021-01-29 星环信息科技(上海)股份有限公司 Query method, distributed system, device and storage medium
CN112463311A (en) * 2021-01-28 2021-03-09 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN112650561A (en) * 2019-10-11 2021-04-13 中兴通讯股份有限公司 Transaction management method, system, network device and readable storage medium
CN112732414A (en) * 2020-12-29 2021-04-30 北京浪潮数据技术有限公司 Distributed transaction processing method, system and related components in OLTP mode
CN112948064A (en) * 2021-02-23 2021-06-11 北京金山云网络技术有限公司 Data reading method and device and data reading system
CN113238892A (en) * 2021-05-10 2021-08-10 深圳巨杉数据库软件有限公司 Time point recovery method and device for global consistency of distributed system
CN113391885A (en) * 2021-06-18 2021-09-14 电子科技大学 Distributed transaction processing system
CN113778632A (en) * 2021-09-14 2021-12-10 杭州沃趣科技股份有限公司 Distributed transaction management method based on cassandra database
WO2022002044A1 (en) * 2020-06-29 2022-01-06 中兴通讯股份有限公司 Method and apparatus for processing distributed database, and network device and computer-readable storage medium
CN114328613A (en) * 2022-03-03 2022-04-12 阿里云计算有限公司 Method, device and system for processing distributed transactions in Structured Query Language (SQL) database
CN114510539A (en) * 2022-04-18 2022-05-17 北京易鲸捷信息技术有限公司 Method for generating and applying consistency check point of distributed database
CN115145942A (en) * 2022-09-05 2022-10-04 北京奥星贝斯科技有限公司 Distributed database system and method and device for realizing monotone reading of distributed database system
WO2022213526A1 (en) * 2021-04-06 2022-10-13 华为云计算技术有限公司 Transaction processing method, distributed database system, cluster, and medium
WO2023061249A1 (en) * 2021-10-11 2023-04-20 阿里云计算有限公司 Data processing method and system for distributed database, and device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286123A (en) * 2006-12-28 2008-10-15 英特尔公司 Efficient and consistent software transactional memory
US20110161281A1 (en) * 2009-12-30 2011-06-30 Sybase, Inc. Distributed Transaction Management in a Distributed Shared Disk Cluster Environment
CN102831156A (en) * 2012-06-29 2012-12-19 浙江大学 Distributed transaction processing method on cloud computing platform
US20130036105A1 (en) * 2011-08-01 2013-02-07 Tagged, Inc. Reconciling a distributed database from hierarchical viewpoints
CN103198159A (en) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286123A (en) * 2006-12-28 2008-10-15 英特尔公司 Efficient and consistent software transactional memory
US20110161281A1 (en) * 2009-12-30 2011-06-30 Sybase, Inc. Distributed Transaction Management in a Distributed Shared Disk Cluster Environment
US20130036105A1 (en) * 2011-08-01 2013-02-07 Tagged, Inc. Reconciling a distributed database from hierarchical viewpoints
CN102831156A (en) * 2012-06-29 2012-12-19 浙江大学 Distributed transaction processing method on cloud computing platform
CN103198159A (en) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427427B (en) * 2019-08-02 2022-05-27 北京快立方科技有限公司 Method for realizing global transaction distributed processing through pin bridging
CN110427427A (en) * 2019-08-02 2019-11-08 北京快立方科技有限公司 A kind of bridged by pin realizes global transaction distributed approach
CN110457157A (en) * 2019-08-05 2019-11-15 腾讯科技(深圳)有限公司 Distributed transaction abnormality eliminating method, device, computer equipment and storage medium
CN110457157B (en) * 2019-08-05 2021-05-11 腾讯科技(深圳)有限公司 Distributed transaction exception handling method and device, computer equipment and storage medium
WO2021036768A1 (en) * 2019-08-27 2021-03-04 腾讯科技(深圳)有限公司 Data reading method, apparatus, computer device, and storage medium
CN111190935A (en) * 2019-08-27 2020-05-22 中国人民大学 Data reading method and device, computer equipment and storage medium
US11822540B2 (en) 2019-08-27 2023-11-21 Tencent Technology (Shenzhen) Company Limited Data read method and apparatus, computer device, and storage medium
JP2022531867A (en) * 2019-08-27 2022-07-12 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド Data reading methods, devices, computer devices and computer programs
JP7220807B2 (en) 2019-08-27 2023-02-10 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド Data reading method, device, computer device and computer program
CN112650561A (en) * 2019-10-11 2021-04-13 中兴通讯股份有限公司 Transaction management method, system, network device and readable storage medium
EP4024236A4 (en) * 2019-10-11 2022-10-26 ZTE Corporation Transaction management method and system, network device and readable storage medium
CN110807046B (en) * 2019-10-31 2022-06-07 浪潮云信息技术股份公司 Novel distributed NEWSQL database intelligent transaction optimization method
CN110807046A (en) * 2019-10-31 2020-02-18 浪潮云信息技术有限公司 Novel distributed NEWSQL database intelligent transaction optimization method
CN111399447A (en) * 2019-12-26 2020-07-10 德华兔宝宝装饰新材股份有限公司 Board-like customization furniture quality control system based on MES
CN111159252B (en) * 2019-12-27 2022-10-21 腾讯科技(深圳)有限公司 Transaction execution method and device, computer equipment and storage medium
CN111159252A (en) * 2019-12-27 2020-05-15 腾讯科技(深圳)有限公司 Transaction execution method and device, computer equipment and storage medium
CN111240810A (en) * 2020-01-20 2020-06-05 上海达梦数据库有限公司 Transaction management method, device, equipment and storage medium
CN111240810B (en) * 2020-01-20 2024-02-06 上海达梦数据库有限公司 Transaction management method, device, equipment and storage medium
CN111338766B (en) * 2020-03-12 2022-10-25 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN111338766A (en) * 2020-03-12 2020-06-26 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN111597015B (en) * 2020-04-27 2023-01-06 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN111597015A (en) * 2020-04-27 2020-08-28 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN111475585A (en) * 2020-06-22 2020-07-31 阿里云计算有限公司 Data processing method, device and system
CN111475585B (en) * 2020-06-22 2021-06-01 阿里云计算有限公司 Data processing method, device and system
WO2022002044A1 (en) * 2020-06-29 2022-01-06 中兴通讯股份有限公司 Method and apparatus for processing distributed database, and network device and computer-readable storage medium
CN111651244B (en) * 2020-07-01 2023-08-18 中国银行股份有限公司 Distributed transaction processing system
CN111651244A (en) * 2020-07-01 2020-09-11 中国银行股份有限公司 Processing system for distributed transactions
CN112286992A (en) * 2020-10-29 2021-01-29 星环信息科技(上海)股份有限公司 Query method, distributed system, device and storage medium
CN112732414B (en) * 2020-12-29 2023-12-08 北京浪潮数据技术有限公司 Distributed transaction processing method and system in OLTP mode and related components
CN112732414A (en) * 2020-12-29 2021-04-30 北京浪潮数据技术有限公司 Distributed transaction processing method, system and related components in OLTP mode
CN112463311A (en) * 2021-01-28 2021-03-09 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN112948064B (en) * 2021-02-23 2023-11-03 北京金山云网络技术有限公司 Data reading method, device and system
CN112948064A (en) * 2021-02-23 2021-06-11 北京金山云网络技术有限公司 Data reading method and device and data reading system
WO2022213526A1 (en) * 2021-04-06 2022-10-13 华为云计算技术有限公司 Transaction processing method, distributed database system, cluster, and medium
CN113238892A (en) * 2021-05-10 2021-08-10 深圳巨杉数据库软件有限公司 Time point recovery method and device for global consistency of distributed system
CN113391885A (en) * 2021-06-18 2021-09-14 电子科技大学 Distributed transaction processing system
CN113778632A (en) * 2021-09-14 2021-12-10 杭州沃趣科技股份有限公司 Distributed transaction management method based on cassandra database
WO2023061249A1 (en) * 2021-10-11 2023-04-20 阿里云计算有限公司 Data processing method and system for distributed database, and device and storage medium
CN114328613A (en) * 2022-03-03 2022-04-12 阿里云计算有限公司 Method, device and system for processing distributed transactions in Structured Query Language (SQL) database
CN114510539B (en) * 2022-04-18 2022-06-24 北京易鲸捷信息技术有限公司 Method for generating and applying consistency check point of distributed database
CN114510539A (en) * 2022-04-18 2022-05-17 北京易鲸捷信息技术有限公司 Method for generating and applying consistency check point of distributed database
CN115145942A (en) * 2022-09-05 2022-10-04 北京奥星贝斯科技有限公司 Distributed database system and method and device for realizing monotone reading of distributed database system
CN115145942B (en) * 2022-09-05 2023-01-17 北京奥星贝斯科技有限公司 Distributed database system and method and device for realizing monotonous reading of distributed database system

Also Published As

Publication number Publication date
CN109977171B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN109977171A (en) A kind of distributed system and method guaranteeing transaction consistency and linear consistency
CN111338766B (en) Transaction processing method and device, computer equipment and storage medium
Adya Weak consistency: a generalized theory and optimistic implementations for distributed transactions
US9672017B2 (en) Object storage and synchronization hooks for occasionally-connected devices
Herlihy Apologizing versus asking permission: Optimistic concurrency control for abstract data types
Breitbart et al. Overview of multidatabase transaction management
US6816873B2 (en) Method for managing distributed savepoints across multiple DBMS&#39;s within a distributed transaction
CN111597015B (en) Transaction processing method and device, computer equipment and storage medium
CN104679881B (en) A kind of concurrency control method and device
CN111190935B (en) Data reading method and device, computer equipment and storage medium
Chairunnanda et al. ConfluxDB: Multi-master replication for partitioned snapshot isolation databases
EP4216061A1 (en) Transaction processing method, system, apparatus, device, storage medium, and program product
Özsu et al. Data replication
Nawab et al. Message Futures: Fast Commitment of Transactions in Multi-datacenter Environments.
CN106648840A (en) Method and apparatus for determining time sequence between transactions
Monteiro et al. A mechanism for replicated data consistency in mobile computing environments
Kanungo et al. Effective correctness criteria for serializability in multiversion concurrency control technique
Grov et al. Scalable and fully consistent transactions in the cloud through hierarchical validation
Alomari Ensuring serializable executions with snapshot isolation dbms
Goyal et al. Concurrency control for object bases
Fan Building Scalable and Consistent Distributed Databases Under Conflicts
Lingam Analysis of real-time multi version concurrency control algorithms using serialisability graphs
SINGH TRANSACTION PROCESSING FOR TRANSACTION PROCESSING FOR DISTRIBUTED REAL TIME DATABASE SYSTEM
Elmagarmid et al. Reservable transactions: An approach for reliable multidatabase transaction management
Holliday Exploiting communication mechanisms in replicated databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant