CN103164219A - Distributed transaction processing system using multi-type replica in decentralized schema - Google Patents
Distributed transaction processing system using multi-type replica in decentralized schema Download PDFInfo
- Publication number
- CN103164219A CN103164219A CN2013100058578A CN201310005857A CN103164219A CN 103164219 A CN103164219 A CN 103164219A CN 2013100058578 A CN2013100058578 A CN 2013100058578A CN 201310005857 A CN201310005857 A CN 201310005857A CN 103164219 A CN103164219 A CN 103164219A
- Authority
- CN
- China
- Prior art keywords
- submodule
- copy
- write
- transaction
- subtransaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 title claims abstract description 27
- 230000008439 repair process Effects 0.000 claims abstract description 3
- 230000004044 response Effects 0.000 claims description 30
- 238000000034 method Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 abstract 1
- 238000007726 management method Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
Images
Abstract
The invention discloses a distributed transaction processing system using a multi-type replica in a decentralized schema. The distributed transaction processing system comprises a transaction interface module, a transaction processing module and a transaction memory module, wherein the transaction interface module comprises an outward interface sub-module and a transaction preprocessing sub-module, the transaction processing module comprises a multi-type replica sub-module, a read transaction processing module, a replica group transaction state sub-module, a read request distribution sub-module, a replica repair sub-module, a multiversion concurrency control (MVCC) read sub-module, a local writing transaction processing sub-module, a local writing transaction paxos replica consistency sub-module, a local writing transaction commit sub-module, a global writing transaction processing sub-module, a main sub-transaction paxos replica consistency sub-module, a secondary sub-transaction paxos replica consistency sub-module and a global writing transaction commit sub-module. The distributed transaction processing system can solve the problems in an existing system that the reading and writing environment is limited, reading and writing availability can not be configured autonomously according to requirements of application, a global transaction depends on locking and cost is high.
Description
Technical field
The invention belongs to technical field of distributed memory, more specifically, relate to the distributed transaction disposal system of using the polymorphic type copy in a kind of decentralization framework.
Background technology
Along with the development of Internet technology, the data in the internet with mysterious speed rising, and how the so large-scale data of Storage and Processing become the research direction of large data age now.Decentralization NoSQL is a kind of mass data storage system, has following characteristics: high readwrite performance, and without Single Point of Faliure, high availability, enhanced scalability.Cassandra system for example, its uses memory model towards row families to obtain high readwrite performance, uses the decentralization framework to avoid Single Point of Faliure and to obtain high availability, uses the consistance Hash to obtain enhanced scalability.
Distributed transaction disposal system in existing decentralization framework have following these: the Megastore system is a system that Google completes on the basis of Bigtable.This system has used special data model EntityGroups, then uses extra system module Coordinator to also have replica server to guarantee consistance.It submits algorithm to is the variant of Paxos algorithm, is used for safeguarding the consistance of synchronization counterpart between a plurality of data centers.Yet the method has used fixing read-write successfully to count, and can't carry out for different application the adjusting of availability, and its global transaction uses two sections expensive submissions, can cause obstruction.Scalaris is a distributed transaction system on the chord# ring, uses symmetrical replication policy, uses improved Paxos atomic commitment agreement, needs three phases just can complete affairs one time, can't carry out for different application the adjusting of availability.Also have the prototype system that academic research is write, studied the distributed transaction under the P-Ring environment, proposed a kind of MVCC algorithm---LSTP.Busy environment paid attention to read in article, and read-only affairs can not ended and block, but be not suitable for writing busy environment, and the application scenarios of this system comparatively limits to.
In sum, existing distributed transaction disposal system has the following disadvantages: 1, for the read-write environment limit to very much, can't configure voluntarily the read-write availability according to the demand of using.2, global transaction depends on and locks, and cost is higher.
Summary of the invention
Defective for prior art, the object of the present invention is to provide the distributed transaction disposal system of using the polymorphic type copy in a kind of decentralization framework, be intended to solve exist in existing system for the read-write environment limit to very much, can't configure voluntarily the read-write availability according to the demand of using, and global transaction depends on and locks, the problem that cost is higher.
for achieving the above object, the invention provides the distributed transaction disposal system of using the polymorphic type copy in a kind of decentralization framework, comprise the transaction interface module, issued transaction module and affairs memory module, the transaction interface module comprises external interface submodule and affairs pre-service submodule, the issued transaction module comprises polymorphic type copy submodule, read the issued transaction module, replica group transaction status submodule, read request distribution submodule, copy is repaired submodule, the MVCC reading submodule, the issued transaction submodule is write in the part, affairs Paxos copy consistency submodule is write in the part, the part is write affairs and is submitted submodule to, global write issued transaction submodule, main subtransaction Paxos copy consistency submodule, secondary subtransaction Paxos copy consistency submodule and global write affairs are submitted submodule to, the external interface submodule is used for receiving the transactions requests from client, and this transactions requests is sent to affairs pre-service submodule, affairs pre-service submodule is used for the judgement transactions requests and reads transactions requests or write transactions requests, if read transactions requests, will read transactions requests sends to and reads the issued transaction submodule, if write transactions requests, judge further that this writes transactions requests is that transactions requests or global write transactions requests are write in the part, if transactions requests is write in the part, this part being write transactions requests sends to the part and writes the issued transaction submodule, if global write transactions requests, this global write transactions requests is sent to global write issued transaction submodule, each that read that the issued transaction submodule is used for obtaining that this reads that transactions requests comprises from polymorphic type copy submodule read the address of hard copy corresponding to instruction and the response quantity of needs, and will read transactions requests, the address of hard copy and the response quantity that needs send to replica group transaction status submodule, and control the whole overtime retry of reading the transactions requests treatment scheme, replica group transaction status submodule is for the affairs executing state that reads according to the address of hard copy in corresponding replica group, read each maximum attitude daily record numbering and maximum attitude daily record timestamp of submitting to submitted to corresponding to instruction to obtain, reading instruction and whether can carry out in this locality in transactions requests read in the judgement of read request distribution submodule, if can, will read instruction, maximum submission attitude daily record numbering and the maximum copy of submitting to attitude daily record timestamp to send to this locality are repaired submodule, otherwise will read instruction, maximum submission attitude daily record numbering and maximum are submitted to attitude daily record timestamp to send to and are read any one hard copy corresponding to instruction, copy is repaired submodule and is used for maximum that the node updates with its place obtains to replica group transaction status submodule and submits attitude daily record numbering place to, the MVCC reading submodule is used for according to reading instruction and the maximum attitude daily record timestamp of submitting to from affairs memory module reading out data, and data are returned to read the issued transaction module, reading the issued transaction module also is used for data are sent to the external interface submodule, the external interface submodule also is used for data are sent to client, the part is write the issued transaction submodule and is used for obtaining from polymorphic type copy submodule the response quantity that the hard copy of the address of witness copy corresponding to write command that transactions requests comprises and hard copy and needs is write in this part, and transactions requests is write in the part, the response quantity of the hard copy of the address of witness copy and hard copy and needs sends to the part and writes affairs Paxos copy consistency submodule, and control the overtime retry that the transactions requests treatment scheme is write in whole part, the part is write affairs Paxos copy consistency submodule and is used for arranging consistent daily record value on the interior witness copy of witness replica group, this daily record value is sent to the part writes affairs submission submodule, and add local transaction tag for journal entry, the part is write affairs and is submitted to submodule to be used for according to the response quantity of the hard copy of the address of hard copy and needs, the daily record value being committed to corresponding hard copy, and will process successful result and return to the part and write the issued transaction submodule, the part is write the issued transaction submodule and is also sent to the external interface submodule for processing successful result, the external interface submodule also is used for processing successful result and sends to client, global write transaction management submodule is used for obtaining from polymorphic type copy submodule the response quantity of the hard copy of the address of witness copy corresponding to each write command that this global write transactions requests comprises and hard copy and needs, and with the global write transactions requests, the response quantity of the hard copy of the witness copy that each write command is corresponding and the address of hard copy and needs sends to main subtransaction Paxos copy consistency submodule, and control the overtime retry of whole global write transactions requests treatment scheme, main subtransaction Paxos copy consistency submodule is for consistent daily record value on witness copy in the witness replica group that main subtransaction is set, the positional information of this daily record value and main subtransaction is sent to secondary subtransaction submission submodule, and add the global transaction mark for this journal entry, secondary subtransaction Paxos copy consistency submodule is for consistent daily record value on witness copy in the witness replica group that all secondary subtransactions is arranged this secondary subtransaction, add positional information and the global transaction mark of main subtransaction for the journal entry of all secondary subtransactions, and the daily record value of main subtransaction and the daily record value of all secondary subtransactions are sent to main subtransaction submission submodule, global write affairs submission submodule is used for the daily record value of main subtransaction being committed to the hard copy of correspondence according to the response number of the hard copy of the address of the hard copy of main subtransaction and needs, and will process successful result and return to global write transaction management submodule, after processing successfully to each secondary subtransaction according to the response number of the hard copy of the address of the hard copy of this secondary subtransaction and needs will this secondary subtransaction the daily record value be committed to the hard copy of correspondence, global write transaction management submodule also is used for processing successful result and sends to the external interface submodule, the external interface submodule also is used for processing successful result and sends to client.
Affairs pre-service submodule is to judge its type by the OPER field that reads in transactions requests, this field is read transactions requests for expression, for transactions requests is write in expression, utilize the consistance hash function to carry out computing to the key of writing each write operation that comprises in transactions requests, and judge the type of writing transactions requests according to operation result, if same node is all pointed in the key computing of all write operations, this to write transactions requests be that transactions requests is write in the part, otherwise be the global write transactions requests.
Copy is repaired submodule and will all be obtained homogeneity value less than all journal entries of this numbering and need judge whether submission, if the transaction types of this journal entry record is local affairs, when journal entry reaches consistent in the witness copy, namely think and to submit to, otherwise just submit blank operation to; If the transaction types of this journal entry record is global transaction, whether reach unanimously in the witness copy except the needs inspection, whether the main subtransaction that also will check storage is submitted to, when main subtransaction has been submitted to, just think and to submit to, otherwise just submit blank operation to, the affairs that at last all need to be submitted to are carried out to completing attitude.
It is to use the Paxos algorithm that affairs Paxos copy consistency submodule is write in the part, trial reaches the consistent of daily record value on the same daily record position of each witness copy, in transactions requests, the operation of write command adds timestamp to this daily record value for this part is write, and before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
The global write affairs comprise that two or more parts writes affairs, all can be coupled with the global transaction mark, one of them can be designated as main subtransaction, and be used as and submit to point to use, other parts are write affairs and are designated as secondary subtransaction, will record the positional information of main subtransaction, be used for the copy reparation.
Main subtransaction Paxos copy consistency submodule is to use the Paxos algorithm, trial reaches the consistent of daily record value on the same daily record position of each witness copy of main subtransaction, this daily record value adds timestamp for the operation of write command in this main subtransaction, and before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
Secondary subtransaction Paxos copy consistency submodule is to use the Paxos algorithm, to each secondary subtransaction, trial reaches the consistent of daily record value on the same daily record position of each witness copy of secondary subtransaction, this daily record value for this reason in secondary subtransaction the operation of write command add timestamp, before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
By the above technical scheme that the present invention conceives, compared with prior art, the present invention has following beneficial effect:
(1) the read-write node that separates
Write affairs and submit to submodule, main subtransaction Paxos copy consistency submodule, secondary subtransaction Paxos copy consistency submodule and global write affairs to submit submodule to owing to having adopted polymorphic type copy submodule, part to write affairs Paxos copy consistency submodule, part, therefore the processing of daily record and the processing node of data are separated, improved configurability
(2) the read-write availability rank of configurable distributed transaction
Owing to having adopted polymorphic type copy submodule, the hard copy quantity that needs in the time of therefore can setting read-write is guaranteeing conforming while, the availability rank that can also regulate read-write like this.
(3) without the global transaction of locking
Owing to having adopted main subtransaction Paxos copy consistency submodule, secondary subtransaction Paxos copy consistency submodule and global write affairs to submit submodule to, therefore can use main subtransaction as submitting point to, avoided using locking.
(4) extensibility is strong
Owing to having adopted the consistance Hash in the decentralization framework, therefore provide very strong extensibility.When data scale increases, can be extending transversely by increasing node easily, only a token value need to be set just can add whole server cluster voluntarily.Overall performance is along with the increase near-linear of scale increases.
(5) high reliability
Copy mechanism in system the reliability of data, same piece of data can be kept on a plurality of nodes simultaneously, when node failure, data can not lost.Can also copy mechanism be risen to data center's rank by configuration, provide the disaster level other disaster tolerance.
Description of drawings
Fig. 1 is based on the interconnected topological diagram of distributed transaction disposal system of polymorphic type copy in the decentralization framework.
Fig. 2 is based on the distributed transaction disposal system system assumption diagram of polymorphic type copy in the decentralization framework.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
As shown in Figure 1, using the distributed transaction disposal system of polymorphic type copy in decentralization framework of the present invention is to be applied in to comprise client and a plurality of node A, the B of server end and distributed system of C of being arranged in, and specifically is arranged in node.Client is used for the customer in response request, and carries out alternately with node A, B and C, to submit affairs to and to obtain result.Node A, B, C are used for the storage data, response read-write affairs etc., and use express network to interconnect between node.Distributed system adopts distributed hashtable as bottom, process the data of key-value pair (Key-Value) form, each node is mapped as a token (Token) by the consistance hash function with key, and determine the memory node of this key-value pair according to this token, and each node is responsible for the key-value pair of storage certain limit.
As shown in Figure 2, use the distributed transaction disposal system of polymorphic type copy to comprise transaction interface module 1, issued transaction module 2 and affairs memory module 3 in decentralization framework of the present invention.
Transaction interface module 1 comprises external interface submodule 101 and affairs pre-service submodule 102.
issued transaction module 2 comprises polymorphic type copy submodule 201, read issued transaction module 202, replica group transaction status submodule 203, read request distribution submodule 204, copy is repaired submodule 205, MVCC reading submodule 206, issued transaction submodule 207 is write in the part, affairs Paxos copy consistency submodule 208 is write in the part, the part is write affairs and is submitted submodule 209 to, global write issued transaction submodule 210, main subtransaction Paxos copy consistency submodule 211, secondary subtransaction Paxos copy consistency submodule 212 and global write affairs are submitted submodule 213 to.
External interface submodule 101 is used for receiving the transactions requests from client, and this transactions requests is sent to affairs pre-service submodule 102.
Affairs pre-service submodule 102 is used for the judgement transactions requests and reads transactions requests or write transactions requests, if read transactions requests, will read transactions requests sends to and reads issued transaction submodule 204, if write transactions requests, judge further that this writes transactions requests is that transactions requests or global write transactions requests are write in the part, if transactions requests is write in the part, this part being write transactions requests sends to the part and writes issued transaction submodule 207, if the global write transactions requests sends to global write issued transaction submodule 210 with this global write transactions requests.Particularly, be to judge its type by the OPER field that reads in transactions requests, this field is that transactions requests is read in 0 expression, is that transactions requests is write in 1 expression; Utilize the consistance hash function to carry out computing to the key of writing each write operation that comprises in transactions requests, and judge the type of writing transactions requests according to operation result, if same node is all pointed in the key computing of all write operations, this to write transactions requests be that transactions requests is write in the part, otherwise be the global write transactions requests.
Each that read that issued transaction submodule 202 is used for obtaining that this reads that transactions requests comprises from polymorphic type copy submodule 201 read the address of hard copy corresponding to instruction and the response amount R of needs (R is positive integer), and will read the address of transactions requests, hard copy and the response quantity that needs sends to replica group transaction status submodule 203, and control the whole overtime retry of reading the transactions requests treatment scheme.
Replica group transaction status submodule 203 is read each maximum submission attitude daily record numbering corresponding to instruction and maximum submission attitude daily record timestamp for the affairs executing state that reads according to the address of hard copy in corresponding replica group to obtain; Particularly, safeguard the consistance of each node log in replica group, each of daily record has numbering and the timestamp that constantly increases progressively, and have a various states: waiting state, submit attitude to and complete attitude, each node can record known maximum and complete attitude daily record numbering, maximum submission attitude daily record numbering and the maximum attitude daily record timestamp of submitting to, and the result of acquisition is the maximal value in the individual successfully response of R at least.
Reading instruction and whether can carry out in this locality in transactions requests read in 204 judgements of read request distribution submodule, if can, will read instruction, maximum submit attitude daily record numbering to and maximumly submit to attitude daily record timestamp to send to local copy to repair submodule 205, otherwise will read instruction, maximumly submit attitude daily record numbering to and maximumly submit to attitude daily record timestamp to send to read any one hard copy corresponding to instruction.
Copy is repaired submodule 205 and is used for maximum that the node updates with its place obtains to replica group transaction status submodule 203 and submits attitude daily record numbering place to; Particularly, to all obtain homogeneity value less than all journal entries of this numbering and need judge whether and submit to, if the transaction types of this journal entry record is local affairs, when journal entry reaches consistent in the witness copy, namely think and to submit to, otherwise just submit blank operation to; If the transaction types of this journal entry record is global transaction, whether reach unanimously in the witness copy except the needs inspection, whether the main subtransaction that also will check storage is submitted to, when main subtransaction has been submitted to, just think and to submit to, otherwise just submit blank operation to, the affairs that at last all need to be submitted to are carried out to completing attitude.
MVCC reading submodule 206 is used for according to reading instruction and the maximum attitude daily record timestamp of submitting to from affairs memory module 3 reading out datas, and data are returned to reads issued transaction module 202.
Reading issued transaction module 202 also is used for data are sent to external interface submodule 101.
External interface submodule 101 also is used for data are sent to client.
The part write issued transaction submodule 207 be used for from polymorphic type copy submodule 201 obtain this part write the address of witness copy corresponding to write command that transactions requests comprises and hard copy and needs hard copy response quantity W(wherein W be positive integer), and the address of transactions requests, witness copy and hard copy is write in the part and the response quantity of the hard copy that needs sends to the part and writes affairs Paxos copy consistency submodule 208, and control the overtime retry that the transactions requests treatment scheme is write in whole part.
The part is write affairs Paxos copy consistency submodule 208 and is used for arranging in the witness replica group consistent daily record value on the witness copy, this daily record value is sent to the part writes affairs and submit submodule 209 to, and add local transaction tag for journal entry; Particularly, to use the Paxos algorithm, trial reaches the consistent of daily record value on the same daily record position of each witness copy, in transactions requests, the operation of write command adds timestamp to this daily record value for this part is write, and before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
The part is write affairs and is submitted to submodule 209 to be used for according to the response quantity W of the hard copy of the address of hard copy and needs, the daily record value being committed to corresponding hard copy, and will process successful result and return to the part and write issued transaction submodule 207.
The part is write issued transaction submodule 207 and is also sent to external interface submodule 101 for processing successful result.
External interface submodule 101 also is used for processing successful result and sends to client.
Global write transaction management submodule 212 be used for from polymorphic type copy submodule 201 obtain the address of witness copy corresponding to each write command that these global write transactions requests comprise and hard copy and needs hard copy response quantity V(wherein V be positive integer), and the response quantity of the address of the witness copy that global write transactions requests, each write command is corresponding and hard copy and the hard copy that needs sends to main subtransaction Paxos copy consistency submodule 211, and controls the overtime retry of whole global write transactions requests treatment scheme.Particularly, the global write affairs comprise that two or more parts writes affairs, all can be coupled with the global transaction mark, one of them can be designated as main subtransaction, and be used as and submit to point to use, other parts are write affairs and are designated as secondary subtransaction, will record the positional information of main subtransaction, are used for the copy reparation.
Main subtransaction Paxos copy consistency submodule 211 is for consistent daily record value on witness copy in the witness replica group that main subtransaction is set, the positional information of this daily record value and main subtransaction is sent to secondary subtransaction submits submodule 214 to, and add the global transaction mark for this journal entry; Particularly, to use the Paxos algorithm, trial reaches the consistent of daily record value on the same daily record position of each witness copy of main subtransaction, this daily record value adds timestamp for the operation of write command in this main subtransaction, and before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
Secondary subtransaction Paxos copy consistency submodule 212 is for consistent daily record value on witness copy in the witness replica group that all secondary subtransactions is arranged this secondary subtransaction, add positional information and the global transaction mark of main subtransaction for the journal entry of all secondary subtransactions, and the daily record value of main subtransaction and the daily record value of all secondary subtransactions are sent to main subtransaction submission submodule 215; Particularly, to use the Paxos algorithm, to each secondary subtransaction, trial reaches the consistent of daily record value on the same daily record position of each witness copy of secondary subtransaction, this daily record value for this reason in secondary subtransaction the operation of write command add timestamp, before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
Global write affairs submission submodule 213 is used for counting V according to the response of the hard copy of the address of the hard copy of main subtransaction and needs and the daily record value of main subtransaction is committed to the hard copy of correspondence, and will process successful result and return to global write transaction management submodule 212, after processing successfully, each secondary subtransaction is counted according to the response of the hard copy of the address of the hard copy of this secondary subtransaction and needs the hard copy that daily record value that V will this secondary subtransaction is committed to correspondence.
Global write transaction management submodule 212 also is used for processing successful result and sends to external interface submodule 101.
External interface submodule 101 also is used for processing successful result and sends to client.
Example:
For feasibility and the validity of verifying the inventive method, configuration-system under true environment is tested using the distributed transaction of polymorphic type copy in the decentralization framework.
Server basic hardware of the present invention and software configuration are as shown in table 1:
Table 1
The present invention processes the distributed transaction in polymorphic type copy and decentralization framework and has carried out effective combination.It uses the decentralization framework, and very strong extensibility is provided, and its copy mechanism has improved reliability, the restorability of data, and higher availability of data, and the distributed transaction function of strong consistency is provided.This system uses the polymorphic type copy to separate the read-write physical node of distributed transaction, effectively reduce node failure to reading affairs and the impact of writing the affairs availability, distributed transaction important in inhibiting for for the suitable read-write availability of different application scene configuration has larger application potential.
Those skilled in the art will readily understand; the above is only preferred embodiment of the present invention; not in order to limiting the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.
Claims (7)
1. use the distributed transaction disposal system of polymorphic type copy in a decentralization framework, comprise the transaction interface module, issued transaction module and affairs memory module, the transaction interface module comprises external interface submodule and affairs pre-service submodule, the issued transaction module comprises polymorphic type copy submodule, read the issued transaction module, replica group transaction status submodule, read request distribution submodule, copy is repaired submodule, the MVCC reading submodule, the issued transaction submodule is write in the part, affairs Paxos copy consistency submodule is write in the part, the part is write affairs and is submitted submodule to, global write issued transaction submodule, main subtransaction Paxos copy consistency submodule, secondary subtransaction Paxos copy consistency submodule and global write affairs are submitted submodule to, it is characterized in that,
The external interface submodule is used for receiving the transactions requests from client, and this transactions requests is sent to affairs pre-service submodule;
Affairs pre-service submodule is used for the judgement transactions requests and reads transactions requests or write transactions requests, if read transactions requests, will read transactions requests sends to and reads the issued transaction submodule, if write transactions requests, judge further that this writes transactions requests is that transactions requests or global write transactions requests are write in the part, if transactions requests is write in the part, this part being write transactions requests sends to the part and writes the issued transaction submodule, if the global write transactions requests sends to global write issued transaction submodule with this global write transactions requests;
Each that read that the issued transaction submodule is used for obtaining that this reads that transactions requests comprises from polymorphic type copy submodule read the address of hard copy corresponding to instruction and the response quantity of needs, and will read the address of transactions requests, hard copy and the response quantity that needs sends to replica group transaction status submodule, and control the whole overtime retry of reading the transactions requests treatment scheme;
Replica group transaction status submodule is read each maximum submission attitude daily record numbering corresponding to instruction and maximum submission attitude daily record timestamp for the affairs executing state that reads according to the address of hard copy in corresponding replica group to obtain;
Reading instruction and whether can carry out in this locality in transactions requests read in the judgement of read request distribution submodule, if can, will read instruction, maximum submit attitude daily record numbering to and maximumly submit to attitude daily record timestamp to send to local copy to repair submodule, otherwise will read instruction, maximumly submit attitude daily record numbering to and maximumly submit to attitude daily record timestamp to send to read any one hard copy corresponding to instruction;
Copy is repaired submodule and is used for maximum that the node updates with its place obtains to replica group transaction status submodule and submits attitude daily record numbering place to;
The MVCC reading submodule is used for according to reading instruction and the maximum attitude daily record timestamp of submitting to from affairs memory module reading out data, and data are returned to reads the issued transaction module;
Reading the issued transaction module also is used for data are sent to the external interface submodule;
The external interface submodule also is used for data are sent to client;
The part is write the issued transaction submodule and is used for obtaining from polymorphic type copy submodule the response quantity that the hard copy of the address of witness copy corresponding to write command that transactions requests comprises and hard copy and needs is write in this part, and the address of transactions requests, witness copy and hard copy is write in the part and the response quantity of the hard copy that needs sends to the part and writes affairs Paxos copy consistency submodule, and control the overtime retry that the transactions requests treatment scheme is write in whole part;
The part is write affairs Paxos copy consistency submodule and is used for arranging in the witness replica group consistent daily record value on the witness copy, this daily record value is sent to the part writes affairs and submit submodule to, and add local transaction tag for journal entry;
The part is write affairs and is submitted to submodule to be used for according to the response quantity of the hard copy of the address of hard copy and needs, the daily record value being committed to corresponding hard copy, and will process successful result and return to the part and write the issued transaction submodule;
The part is write the issued transaction submodule and is also sent to the external interface submodule for processing successful result;
The external interface submodule also is used for processing successful result and sends to client;
Global write transaction management submodule is used for obtaining from polymorphic type copy submodule the response quantity of the hard copy of the address of witness copy corresponding to each write command that this global write transactions requests comprises and hard copy and needs, and the response quantity of the address of the witness copy that global write transactions requests, each write command is corresponding and hard copy and the hard copy that needs sends to main subtransaction Paxos copy consistency submodule, and controls the overtime retry of whole global write transactions requests treatment scheme;
Main subtransaction Paxos copy consistency submodule is for consistent daily record value on witness copy in the witness replica group that main subtransaction is set, the positional information of this daily record value and main subtransaction is sent to secondary subtransaction submits submodule to, and add the global transaction mark for this journal entry;
Secondary subtransaction Paxos copy consistency submodule is for consistent daily record value on witness copy in the witness replica group that all secondary subtransactions is arranged this secondary subtransaction, add positional information and the global transaction mark of main subtransaction for the journal entry of all secondary subtransactions, and the daily record value of main subtransaction and the daily record value of all secondary subtransactions are sent to main subtransaction submission submodule;
Global write affairs submission submodule is used for the daily record value of main subtransaction being committed to the hard copy of correspondence according to the response number of the hard copy of the address of the hard copy of main subtransaction and needs, and will process successful result and return to global write transaction management submodule, after processing successfully to each secondary subtransaction according to the response number of the hard copy of the address of the hard copy of this secondary subtransaction and needs will this secondary subtransaction the daily record value be committed to the hard copy of correspondence;
Global write transaction management submodule also is used for processing successful result and sends to the external interface submodule;
The external interface submodule also is used for processing successful result and sends to client.
2. distributed transaction disposal system according to claim 1, it is characterized in that, affairs pre-service submodule is to judge its type by the OPER field that reads in transactions requests, this field is read transactions requests for expression, for transactions requests is write in expression, utilize the consistance hash function to carry out computing to the key of writing each write operation that comprises in transactions requests, and judge the type of writing transactions requests according to operation result, if same node is all pointed in the key computing of all write operations, this to write transactions requests be that transactions requests is write in the part, otherwise be the global write transactions requests.
3. distributed transaction disposal system according to claim 1, it is characterized in that, copy is repaired submodule and will all be obtained homogeneity value less than all journal entries of this numbering and need judge whether submission, if the transaction types of this journal entry record is local affairs, when journal entry reaches consistent in the witness copy, namely think and to submit to, otherwise just submit blank operation to; If the transaction types of this journal entry record is global transaction, whether reach unanimously in the witness copy except the needs inspection, whether the main subtransaction that also will check storage is submitted to, when main subtransaction has been submitted to, just think and to submit to, otherwise just submit blank operation to, the affairs that at last all need to be submitted to are carried out to completing attitude.
4. distributed transaction disposal system according to claim 1, it is characterized in that, it is to use the Paxos algorithm that affairs Paxos copy consistency submodule is write in the part, trial reaches the consistent of daily record value on the same daily record position of each witness copy, in transactions requests, the operation of write command adds timestamp to this daily record value for this part is write, and before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
5. distributed transaction disposal system according to claim 1, it is characterized in that, the global write affairs comprise that two or more parts writes affairs, all can be coupled with the global transaction mark, one of them can be designated as main subtransaction, and is used as the use of submission point, and other parts are write affairs and are designated as secondary subtransaction, will record the positional information of main subtransaction, be used for the copy reparation.
6. distributed transaction disposal system according to claim 1, it is characterized in that, main subtransaction Paxos copy consistency submodule is to use the Paxos algorithm, trial reaches the consistent of daily record value on the same daily record position of each witness copy of main subtransaction, this daily record value adds timestamp for the operation of write command in this main subtransaction, and before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
7. distributed transaction disposal system according to claim 1, it is characterized in that, secondary subtransaction Paxos copy consistency submodule is to use the Paxos algorithm, to each secondary subtransaction, trial reaches the consistent of daily record value on the same daily record position of each witness copy of secondary subtransaction, this daily record value for this reason in secondary subtransaction the operation of write command add timestamp, before this timestamp is carried out greater than this Paxos algorithm, the maximum of this witness replica group is submitted attitude daily record timestamp to.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310005857.8A CN103164219B (en) | 2013-01-08 | 2013-01-08 | The distributing real time system system of polymorphic type copy is used in decentralization framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310005857.8A CN103164219B (en) | 2013-01-08 | 2013-01-08 | The distributing real time system system of polymorphic type copy is used in decentralization framework |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103164219A true CN103164219A (en) | 2013-06-19 |
CN103164219B CN103164219B (en) | 2015-09-23 |
Family
ID=48587340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310005857.8A Expired - Fee Related CN103164219B (en) | 2013-01-08 | 2013-01-08 | The distributing real time system system of polymorphic type copy is used in decentralization framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103164219B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530362A (en) * | 2013-10-12 | 2014-01-22 | 清华大学 | Computer data read-write method for multi-copy distributed system |
CN104699527A (en) * | 2013-12-10 | 2015-06-10 | 杭州海康威视系统技术有限公司 | Critical resource management method and device in cloud storage system |
CN105208096A (en) * | 2015-08-24 | 2015-12-30 | 用友网络科技股份有限公司 | Distributed cache system and method |
CN106021277A (en) * | 2016-04-27 | 2016-10-12 | 湖南蚁坊软件有限公司 | State-based method for implementation of lock-less distributed ACID consistency |
CN108322459A (en) * | 2018-01-31 | 2018-07-24 | 北京信息科技大学 | A kind of decentralization domain names method of servicing and system based on EPaxos |
CN109074387A (en) * | 2016-04-18 | 2018-12-21 | 亚马逊科技公司 | Versioned hierarchical data structure in Distributed Storage area |
CN109783578A (en) * | 2019-01-09 | 2019-05-21 | 腾讯科技(深圳)有限公司 | Method for reading data, device, electronic equipment and storage medium |
CN109902127A (en) * | 2019-03-07 | 2019-06-18 | 腾讯科技(深圳)有限公司 | History state data processing method, device, computer equipment and storage medium |
CN112995262A (en) * | 2019-12-18 | 2021-06-18 | 中国移动通信集团浙江有限公司 | Distributed transaction submission method, system and computing equipment |
US11308123B2 (en) | 2017-03-30 | 2022-04-19 | Amazon Technologies, Inc. | Selectively replicating changes to hierarchial data structures |
WO2022134876A1 (en) * | 2020-12-24 | 2022-06-30 | 中兴通讯股份有限公司 | Data synchronization method and apparatus, and electronic device and storage medium |
CN115357600A (en) * | 2022-10-21 | 2022-11-18 | 鹏城实验室 | Data consensus processing method, system, device, equipment and readable storage medium |
US11550763B2 (en) | 2017-03-30 | 2023-01-10 | Amazon Technologies, Inc. | Versioning schemas for hierarchical data structures |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156174A (en) * | 2015-04-16 | 2016-11-23 | 中国移动通信集团山西有限公司 | The system and method that a kind of db transaction processes |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020083078A1 (en) * | 2000-11-02 | 2002-06-27 | Guy Pardon | Decentralized, distributed internet data management |
CN102521330A (en) * | 2011-12-07 | 2012-06-27 | 华中科技大学 | Mirror distributed storage method under desktop virtual environment |
CN102831156A (en) * | 2012-06-29 | 2012-12-19 | 浙江大学 | Distributed transaction processing method on cloud computing platform |
-
2013
- 2013-01-08 CN CN201310005857.8A patent/CN103164219B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020083078A1 (en) * | 2000-11-02 | 2002-06-27 | Guy Pardon | Decentralized, distributed internet data management |
CN102521330A (en) * | 2011-12-07 | 2012-06-27 | 华中科技大学 | Mirror distributed storage method under desktop virtual environment |
CN102831156A (en) * | 2012-06-29 | 2012-12-19 | 浙江大学 | Distributed transaction processing method on cloud computing platform |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530362A (en) * | 2013-10-12 | 2014-01-22 | 清华大学 | Computer data read-write method for multi-copy distributed system |
CN103530362B (en) * | 2013-10-12 | 2017-01-04 | 清华大学 | A kind of computer data reading/writing method for many copies distributed system |
CN104699527A (en) * | 2013-12-10 | 2015-06-10 | 杭州海康威视系统技术有限公司 | Critical resource management method and device in cloud storage system |
CN105208096A (en) * | 2015-08-24 | 2015-12-30 | 用友网络科技股份有限公司 | Distributed cache system and method |
CN109074387B (en) * | 2016-04-18 | 2022-05-13 | 亚马逊科技公司 | Versioned layered data structures in distributed data stores |
US11157517B2 (en) | 2016-04-18 | 2021-10-26 | Amazon Technologies, Inc. | Versioned hierarchical data structures in a distributed data store |
CN109074387A (en) * | 2016-04-18 | 2018-12-21 | 亚马逊科技公司 | Versioned hierarchical data structure in Distributed Storage area |
CN106021277A (en) * | 2016-04-27 | 2016-10-12 | 湖南蚁坊软件有限公司 | State-based method for implementation of lock-less distributed ACID consistency |
US11860895B2 (en) | 2017-03-30 | 2024-01-02 | Amazon Technologies, Inc. | Selectively replicating changes to hierarchial data structures |
US11550763B2 (en) | 2017-03-30 | 2023-01-10 | Amazon Technologies, Inc. | Versioning schemas for hierarchical data structures |
US11308123B2 (en) | 2017-03-30 | 2022-04-19 | Amazon Technologies, Inc. | Selectively replicating changes to hierarchial data structures |
CN108322459B (en) * | 2018-01-31 | 2020-10-16 | 北京信息科技大学 | EPaxos-based decentralized network domain name service method and system |
CN108322459A (en) * | 2018-01-31 | 2018-07-24 | 北京信息科技大学 | A kind of decentralization domain names method of servicing and system based on EPaxos |
CN109783578A (en) * | 2019-01-09 | 2019-05-21 | 腾讯科技(深圳)有限公司 | Method for reading data, device, electronic equipment and storage medium |
CN109783578B (en) * | 2019-01-09 | 2022-10-21 | 腾讯科技(深圳)有限公司 | Data reading method and device, electronic equipment and storage medium |
CN109902127B (en) * | 2019-03-07 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Historical state data processing method and device, computer equipment and storage medium |
CN109902127A (en) * | 2019-03-07 | 2019-06-18 | 腾讯科技(深圳)有限公司 | History state data processing method, device, computer equipment and storage medium |
CN112995262B (en) * | 2019-12-18 | 2022-06-10 | 中国移动通信集团浙江有限公司 | Distributed transaction submission method, system and computing equipment |
CN112995262A (en) * | 2019-12-18 | 2021-06-18 | 中国移动通信集团浙江有限公司 | Distributed transaction submission method, system and computing equipment |
WO2022134876A1 (en) * | 2020-12-24 | 2022-06-30 | 中兴通讯股份有限公司 | Data synchronization method and apparatus, and electronic device and storage medium |
CN115357600A (en) * | 2022-10-21 | 2022-11-18 | 鹏城实验室 | Data consensus processing method, system, device, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103164219B (en) | 2015-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103164219B (en) | The distributing real time system system of polymorphic type copy is used in decentralization framework | |
CN108804112B (en) | Block chain settlement processing method and system | |
CN111338766B (en) | Transaction processing method and device, computer equipment and storage medium | |
EP3117349B1 (en) | System and method for massively parallel processing database | |
US20230100223A1 (en) | Transaction processing method and apparatus, computer device, and storage medium | |
US9582520B1 (en) | Transaction model for data stores using distributed file systems | |
US8301600B1 (en) | Failover recovery in a distributed data store | |
CN105359099B (en) | Index update pipeline | |
CN111597015B (en) | Transaction processing method and device, computer equipment and storage medium | |
CN107209704A (en) | Detect the write-in lost | |
CN103268318A (en) | Distributed key value database system with strong consistency and read-write method thereof | |
CN102882927A (en) | Cloud storage data synchronizing framework and implementing method thereof | |
CN103593266A (en) | ot standby method based on arbitration disk mechanism | |
US20180165343A1 (en) | Quorum-based replication of data records | |
US10324905B1 (en) | Proactive state change acceptability verification in journal-based storage systems | |
EP2976714A2 (en) | Method and system for byzantine fault tolerant data replication | |
Qin et al. | Scalable replay-based replication for fast databases | |
US10467223B1 (en) | Mixed-mode method for combining active/active and validation architectures | |
CN104978336A (en) | Unstructured data storage system based on Hadoop distributed computing platform | |
CN102693312B (en) | Flexible transaction management method in key-value store data storage | |
US11003550B2 (en) | Methods and systems of operating a database management system DBMS in a strong consistency mode | |
WO2020119050A1 (en) | Write-write conflict detection for multi-master shared storage database | |
JP2023541298A (en) | Transaction processing methods, systems, devices, equipment, and programs | |
Dey et al. | Scalable distributed transactions across heterogeneous stores | |
US10635552B1 (en) | Method for tracking validity of journal copies to allow journal mirroring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150923 |