CN101706811A - Transaction commit method of distributed database system - Google Patents

Transaction commit method of distributed database system Download PDF

Info

Publication number
CN101706811A
CN101706811A CN200910238270A CN200910238270A CN101706811A CN 101706811 A CN101706811 A CN 101706811A CN 200910238270 A CN200910238270 A CN 200910238270A CN 200910238270 A CN200910238270 A CN 200910238270A CN 101706811 A CN101706811 A CN 101706811A
Authority
CN
China
Prior art keywords
affairs
participant
node
message
daily record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910238270A
Other languages
Chinese (zh)
Other versions
CN101706811B (en
Inventor
付艳艳
陈驰
王伏根
殷佳欣
张大朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN2009102382705A priority Critical patent/CN101706811B/en
Publication of CN101706811A publication Critical patent/CN101706811A/en
Application granted granted Critical
Publication of CN101706811B publication Critical patent/CN101706811B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a transaction commit method of a distributed database system, belonging to the technical field of computer network. The transaction commit method comprises the following steps: 1) a cache area is respectively allocated in memories of participant nodes and coordinator nodes and used for caching transaction logs; 2) the coordinator nodes determine the participant nodes and establishes connection with the participant nodes according to the transaction content, and simultaneously determines operation request information of the transaction; 3) the coordinator nodes send the one-step operation request messages of the transaction to the participant nodes and simultaneously records the transaction request logs of each operation request; 4) the participant nodes process local logs according to the completion condition of each operation request and send corresponding messages to the coordinator nodes; and 5) the coordinator nodes judge whether the transaction is completed according to the received messages sent by all the participant nodes, and if so, a final decision is given. Compared with the prior art, the transaction commit method greatly reduces the operation times of logs, improves system efficiency and transaction efficiency and has extremely high availability.

Description

A kind of transaction commit method of distributed database system
Technical field
The present invention relates to a kind of Database Systems transaction commit method, relate in particular to transaction commit method in a kind of distributed data base system, belong to technical field of the computer network.
Background technology
Distributed system is that data are stored in the multi-node system in the various databases.Node can be any data handling system such as computer system, can be in a position or intersperse among a plurality of positions, is connected to each other by the network such as LAN (Local Area Network) or wide area network.The example of distributed system comprises Database Systems, mail server system etc.
Because affairs may be revised the data of a plurality of nodes in the distributed system, in order to satisfy the data consistency of distributed system, no matter whether break down (for example power failure, hardware conflicts etc.), all must meet the following conditions: affairs are atoms, be all requests of affairs or successful execution, or be performed without any request.
Distributed system utilized for two stages submitted to (2PC) agreement to keep data consistency usually.In the 2PC system, the coordinator node of each affairs, be the node that client (for example application program) is submitted affairs, for holding the corresponding resource of request in each the request identification distributed system in the affairs, be responsible for handling the node of request.Hold the corresponding resource of request, and each node that is assigned with the request in the processing transactions that is used for is called as participant's node.This step has been determined affairs participant node, and sets up the connection of coordinator node to participant's node, for affairs are prepared.
Each participant's node in the two-phase commitment protocol is chosen in a vote and is submitted still rollback affairs to, and its ballot is sent to coordinator node.Then, coordinator node still is the final decision of rollback affairs based on make submission from the ballot of each participant's node.If all participant's node approvals are submitted to, then write a log record to indicate the result and to notify all participant's nodes to submit affairs to.If have one or above participant's node to think and to submit to, then write log record indication result and notify all participant's nodes that approval is submitted to transaction rollback.
In case participant's node votes through submission, must wait for that coordinator's final decision just can carry out next step action.If certain participant ballot can not be submitted to, rollback voluntarily then.
After participant's node executes affairs submission or rollback, need to reply the affairs end and give coordinator node.After coordinator node is received the feedback message of all participant's nodes, record affairs end log, affairs formally finish.
There is following shortcoming in two-phase commitment protocol:
1) the daily record write operation of two-phase commitment protocol is frequent.For the atomicity that guarantees affairs and the restorability of system, in the implementation of two-phase commitment protocol, the write operation of daily record must be executed.In the 2PC system that forms by n participant's node and 1 coordinator's node, carry out one time two-phase commitment protocol, need carry out 2n+1 time and force journalizing.
2) two-phase commitment protocol be not message efficiently.In the 2PC system that forms by n participant's node and 1 coordinator's node, carry out one time two-phase commitment protocol, the information content of transmission is the 4n bar.
3) the efficient execution of two-phase commitment protocol depends on the unimpeded of the whole network scope that comprises coordinator and all participant's nodes. under the environment of network congestion, any one votes through participant's node of submission must wait for that coordinator's final decision could also discharge the resource of holding by complete operation. during this period, the operation of the resource that any request participant holds all need be waited in line. and, do not obtaining the message that participant's node answer affairs finish, coordinator node can not think that affairs finish, and must hold transaction journal and can not " forget " (releasing memory).
Present technology mainly is the single message scheme that reduces of solution or journalizing efficiently of seeking, perhaps only be optimized at the characteristics of a certain class affairs, can not fully realize the efficient processing that the distributed system affairs are submitted to, therefore need a kind of than the more effective distributed transaction commit method of existing submission technology.
Summary of the invention
Submit the not high problem of efficient at present distributed system affairs, the object of the present invention is to provide a kind of transaction commit method of distributed database system, affairs successfully submit to or the condition of rollback under can guarantee less amount of transmitted information and force the journalizing number of times.
Technical scheme of the present invention is:
A kind of transaction commit method of distributed database system the steps include:
1) in participant's node and coordinator node internal memory, respectively distributes a buffer zone, be used for the buffer memory transaction journal;
2) coordinator node is determined participant's node according to the affairs content and is connected with it, determines the operation requests information of affairs simultaneously;
3) coordinator node is to the single-step operation request message of participant's node transmission affairs, the transactions requests daily record of writing down each operation requests simultaneously;
4) participant's node is handled local daily record according to the performance of each operation requests, and sends corresponding message to coordinator node;
5) coordinator node is judged according to the message that all participant's nodes that receive send:
If a) affairs need rollback, then coordinator node buffer memory local matter daily record, and participant's node that the operation requests of affairs is finished to success in return messages sends transaction rollback message; Participant's node is carried out and is connect then
Transaction rollback message of receiving and buffer memory transaction journal;
B) if affairs can be submitted then coordinator node buffer memory local matter daily record, and send to all participant's nodes of these affairs and to submit message to; Participant's node is carried out submission message and the buffer memory transaction journal that receives then;
C) if the operation requests of affairs is finished and is success in all participant's node return messages, and affairs do not finish as yet, and then repeating step 3)~5).
Further, the operation requests information of described affairs comprises: each operation requests of affairs, and the operation requests of affairs sum n.
Further, when described coordinator node begins in each affairs, determine whether there is the buffer memory transaction journal in the current coordinator node, if exist then with its one-time write disk; If the buffer zone of described participant's node is full or buffer memory surpasses setting-up time or when taking place to force to write the buffer memory transaction journal, with buffer memory transaction journal one-time write disk.
Further, the operation requests message of described affairs comprises: affairs ID, telegon ID, relate to all participant ID of this operation requests, affairs single step request content, i, telegon local time stamp; Describedly send corresponding message to coordinator node and comprise: the up-to-date timestamp that telegon ID, telegon affairs ID, local participant ID, i, vote, coordinator node send; The local daily record of described participant's node comprises: local matter ID, telegon ID, telegon affairs ID, relate to the up-to-date timestamp that all participant ID of this operation requests, affairs single step request content, vote, coordinator node send; Wherein i is the i step operation requests of these affairs, and vote is this transaction operation request performance.
Further, the method that described coordinator node writes down the transactions requests daily record of each operation requests is: if the operation requests of affairs is operation requests of new affairs, affairs that then will these new affairs begin daily record forces to write disk; If the operation requests of affairs is i step requests of current affairs, then this daily record of buffer memory.
Further, described affairs begin daily record and comprise: affairs ID, telegon ID, all participant ID that relate to the affairs all operations, the 1st step of affairs request content, this affairs opening flag, telegon local time stamp; The daily record of the i of described affairs step request comprises: affairs ID, telegon ID, relate to affairs i step operation all participant ID, affairs i step request content, i, telegon local time stamp, wherein i>1.
Further, the local daily record of described participant's node processing, and to the method that coordinator node sends corresponding message be: if participant's node is finished to failure the operation requests of affairs, then need the buffer memory transaction journal, return message is given coordinator node and rollback affairs; Be success if participant's node is finished the operation requests of affairs, then only need buffer memory transaction journal, return message to give coordinator node and wait for the operation requests message of coordinator node transmission affairs.
Further, in the described step 5), be failure if there is one or more participant's node not have the operation requests of affairs in return messages or the return messages to finish in setting-up time, then coordinator node is judged as affairs needs rollback; If the operation requests of affairs is finished and is success in all participant's node return messages, and the operation requests number of affairs equals the operation requests sum of these affairs in participant's node return messages, and then coordinator node is judged as affairs and can submits to.
Further, in the described step 5), described transaction rollback message comprises: affairs ID, telegon ID, participant ID, transaction rollback decision, affairs end mark, telegon local time stamp; Described affairs submit to message to comprise: affairs ID, telegon ID, participant ID, affairs are for decision, affairs end mark, telegon local time stamp; The local matter daily record of coordinator node buffer memory comprises: affairs ID, telegon ID, participant ID, affairs final decision, affairs end mark, telegon local time stamp; The transaction journal of participant's nodal cache comprises: the up-to-date timestamp that local matter ID, telegon ID, telegon affairs ID, participant ID, affairs end mark, affairs final decision, coordinator node send; Described affairs final decision is that transaction rollback decision or affairs are for decision.
Further, if all participant's nodes are not all received the rollback message that coordinator node is sent or submit message to that then participant's node is at first inquired mutually according to participant's list of the same affairs step of execution that receives in setting-up time:
Be failure if a) there is participant's operation requests to finish, then all affairs participants all carry out transaction rollback message;
B) be success if all participants' operation requests is finished, then participant's node continues the message that the wait coordinator node is sent;
C) if there is participant's node to receive rollback message or submission message, then all the other participants consult and carry out;
D) if there is participant's node in the time that limits, can't get in touch, but and can't determine whether rollback of affairs, then continue to wait for.
Further,, then after coordinator node is restarted, at first search the time stamp of the last affairs end log in this locality, inquire unclosed transaction journal that telegon is initiated before between all participant's nodes are at this moment then if coordinator node lost efficacy:
If a) daily record that has or above participant's node to reply shows affairs rollback voluntarily, then the daily record that receives of coordinator node arrangement is preserved the transaction rollback record and is notified all participants this transaction rollback again to recover local daily record;
B) if the daily record that all participants all reply shows affairs still to be waited for, then the daily record that receives of coordinator node arrangement is submitted to or rollback according to the implementation status decision affairs of affairs to recover local daily record;
C) if having in the participant Shang Weicong participant node failure and recover, do not have whole daily records; Perhaps have the participant to get in touch, then think transaction rollback, the daily record that the coordinator node arrangement receives to be to recover local daily record, writes the rollback daily record and sends the affairs decision to participant's node recovering getting in touch the back with participant's node.
In order to reduce the frequent pressure journalizing of affairs, the present invention at first respectively distributes certain buffer zone in the participant of system node and coordinator node internal memory, is used for the buffer memory transaction journal.For participant's node, full or buffer memory surpasses after the certain hour or when forcing to write daily record, with log cache one-time write disk when the buffer zone.For coordinator node, then when each affairs begin, with the log cache one-time write.
Before affairs began, coordinator node was sought corresponding participant according to the affairs content, each instruction of affairs is sent to corresponding participant's node carry out.Restorability during the atomicity that guarantees whole affairs implementation by the information transmission carried out between coordinator node and participant's node and coordinator node timestamp, and fault then, and increased the dirigibility of system.Entire method is divided into two aspects:
First aspect is carried out affairs and is prepared.Comprise: seek related resource according to the affairs content, determine the node at resource place; Be established to the connection of each resource in a plurality of resources for application program; Determine the context of current affairs.
Second aspect is in order to guarantee carrying out smoothly of affairs, step and requirement that strict execution information is transmitted.
Particularly, distributed transaction of the present invention submits to scheme to realize by following steps:
1) coordinator node is determined the node (promptly determining participant's node) at resource place according to the resource that affairs content search needs use; Be established to the connection of resource; Determine whether there is log cache in the current coordinator node, determine each operation requests of affairs, and the request sum.
2) coordinator node sends the operation requests message of affairs to participant's node.In this message, comprise telegon ID, telegon local time stamp TimeStamp, affairs single step request content, and all participant ID that relate to this single step request, the affairs ID of telegon this locality.Message format is as follows:
Affairs ID Telegon ID Participant ID Affairs single step request content i TimeStamp
Wherein i shows that this request is affairs i step operation.
Simultaneously, coordinator node is write daily record according to the transactions requests situation:
1. if new affairs, then force to write affairs and begin daily record.If 1) in confirm to have the affairs log cache in the current coordinator node, then log cache and things are begun daily record and together refresh into disk.If 1) in confirm that there is not the affairs log cache in current coordinator node, only need to force to write affairs and begin daily record.
It is as follows that affairs begin journal format:
Affairs ID Telegon ID Participant ID Affairs single step request content ?start TimeStamp
Wherein, start is this affairs opening flag, shows that these affairs begin, and also can represent with " 1 ".Participant ID is for participating in all the participant ID in steps of this office.
2. if the i of affairs goes on foot request, then this daily record of buffer memory.
Affairs i step Request Log form
Affairs ID Telegon ID Participant ID Affairs single step request content i TimeStamp
Wherein, i>1.Participant ID is for participating in the affairs i participant ID in step.
3) participant handles local daily record according to the operation requests performance of affairs, and sends message to coordinator node.Local daily record and message all need according to step 2) in the message that receives add corresponding information.
Message format is as follows:
Telegon ID Telegon affairs ID This participant ID i vote TimeStamp
Wherein, " telegon affairs ID " is step 2) in these affairs of transmitting at " the affairs ID " of coordinator node correspondence; " this participant ID " is local participant's id information; " i " shows that this operation requests is the i time of affairs request, according to step 2) in the information transmitted can get; " vote " is the operation requests performance explanation of affairs, can be " yes " or " no ", respectively success of the operation requests of corresponding affairs or failure; " TimeStamp " timestamp for comprising in the telegon message of this message correspondence.
Journal format is as follows:
Local matter ID Telegon ID Telegon affairs ID Participant ID OP vote TimeStamp
Wherein, " local matter ID " is the affairs ID of the operation requests of these affairs in the local correspondence of participant's node." OP " is affairs single step request content, can get from the coordinator node message transmitted.
If participant's node is not finished the operation requests of affairs, then need log cache, return message is given coordinator node and rollback affairs immediately.If participant's node has been finished the operation requests of affairs, then only need log cache, return message is given coordinator node and is waited for the further indication of coordinator node.
If before log cache, it is full to be checked through local log buffer zone, and perhaps the most outmoded daily record arrives the longest cache-time, then forces to write daily record, sends corresponding message to coordinator node again.
4) coordinator node receives participant's nodal information, analyzes voting results, and sends message to participant's node.
If 1. have one or above participant's node not in time ballot or voting results be " no ", affairs need rollback.In this case, the local daily record of coordinator node buffer memory is to participant's node transmission transaction rollback message of all ballots " yes ".This log cache can write with the pressure log record operation of coordinator node next time.Enter step 5).Message format is as follows:
Affairs ID Telegon ID Participant ID Abort end TimeStamp
Journal format is as follows:
Affairs ID Telegon ID Participant ID Abort end TimeStamp
Wherein, " end " shows that affairs finish, and can represent with " 0 ".
If 2. all participants all vote and are " yes ", and the operation requests of the affairs that show in the information returned of participant's node is counted i and is also met 1) in the operation requests sum of these affairs of determining, then affairs can be submitted to.In this case, the local daily record of coordinator node buffer memory is submitted message to all participant's nodes transmissions of these affairs.This log cache can write with the pressure log record operation of coordinator node next time.Enter step 5).
Message format is as follows:
Affairs ID Telegon ID Participant ID Commit end TimeStamp
Journal format is as follows:
Affairs ID Telegon ID Participant ID Commit end TimeStamp
Wherein, " end " shows that affairs finish, and can represent with " 0 ".
If 3. the ballot of all participant's nodes is " yes " and affairs are not finished as yet, then get back to step 2).
5) participant's node is carried out and log cache according to final decision.Log-structured as follows:
Local matter ID Telegon ID Telegon affairs ID Participant ID end fi TimeStamp
Wherein, " end " shows that affairs finish; " fi " is the affairs final decision, is " commit " or " abort ".
If before log cache, it is full to be checked through local log buffer zone, and perhaps the most outmoded daily record arrives the longest cache-time, then forces to write daily record, carries out the operation that requires again.
The fault that occurs in the distributed system is divided into two classes: the communication failure that a class causes for the network reason, another kind of is coordinator node or participant's node failure (site failure).Below introduce the disposal route of these two kinds of faults respectively.
For communication failure, if in step 2) in, participant's node may all can't accept the request of coordinator node, after coordinator node is treated wait timeout, can independently determine the rollback affairs; Perhaps subparticipation person's node does not receive request, and in this case, coordinator node can be thought participant's node of not replying rollback affairs, so, coordinator node decision rollback affairs.
If in step 3), then may coordinator node can not receive the ballot of all participant's nodes, in this case, coordinator node thinks, the participant's node that not have a ballot is the rollback affairs, so determine the rollback affairs.
If in step 4), then may all not receive final decision by all participant's nodes, in this case, participant's node at first can be inquired mutually according to participant's list of the same affairs of execution that receive: if participant's ballot is arranged is the rollback affairs, equal rollback affairs according to this of all affairs participants then, and reply transaction rollback message to the participant who inquired this message; If all participants' that can get in touch ballot is the submission affairs, because participant's node can not judge whether executed finishes all requests to affairs, can not judge whether participant's list of oneself holding comprises all participants, so need to continue to wait for the final decision of coordinator node; Perhaps have subparticipation person to receive final decision, then all the other participants can carry out final decision according to this, and reply the affairs final decision to the participant who inquired this message; If there is subparticipation person's node in the time that limits, can't get in touch, but and can't determine whether rollback of affairs, then should continue to wait for.
The situation that coordinator node lost efficacy can be divided into two classes and discuss.
If coordinator node lost efficacy, after coordinator node is restarted, can at first search the time stamp of the last affairs end log in this locality. unclosed transaction journal that telegon is initiated before inquiring between all participant's nodes are at this moment then. each participant's node returns the relevant daily record of these affairs, situation can be divided into three kinds: 1. daily record shows affairs rollback voluntarily. owing to can't obtain the message of coordinator node for a long time, the inevitable wait timeout of participant's node, initiate inquiry then mutually, if it is no that the voting results of any one participant's node are arranged, if these affairs are rollback voluntarily. coordinator node is received the transaction rollback message that one or above participant's node are replied, the daily record that the coordinator node arrangement receives is to recover local daily record, if and notify this transaction rollback of all participants to get final product again. 2. daily record shows that affairs are still in wait. all participant's nodes are all replied in the affairs wait, at this moment, the daily record that the coordinator node arrangement receives is to recover local daily record, if and submit to or rollback according to the implementation status of affairs decision affairs. 3. have in the participant Shang Weicong participant node failure and recover, do not have whole daily records, perhaps can't get in touch, then think transaction rollback, the daily record that the coordinator node arrangement receives writes the rollback daily record and is recovering to get in touch the back to participant's node transmission decision with participant's node to recover local daily record.
In this process, if notified certain the affairs rollback of participant's node is arranged, but this participant self has lost the relevant record of affairs, need ask the daily record of being correlated with of coordinator node affairs once more.
If participant's node failure after participant's node is restarted, can at first be searched the last time stamp of local static daily record, inquire affairs situation afterwards between coordinator node at this moment then.Log record according to coordinator node recovers to get final product.
Compare with traditional 2PC, these affairs commit method has the following advantages:
1) for the distributed system that n participant's node and 1 coordinator node are arranged, only need carry out forcing log record 1 time, contrasting traditional 2PC needs the pressure log record 2n time, has significantly reduced the journalizing number of times, has improved system effectiveness.
2) quantity of message exchange becomes 2n_op+n from the 4n bar, and wherein n_op is the operand that affairs are carried out altogether.For example, suppose that transactions requests is total up to k step, ask participant's node of nearly 60% to participate in this affairs, then n_op=k*60%*n at every turn.For the affairs that only comprise a request generally, n_op=60%*n, the message exchange quantity of this moment is 2.2n, be far smaller than the amount of information exchange of the 4n bar of traditional 2PC.
3) utilize participant's list between the participant, to inquire mutually, can reduce in the communication failure participant's stand-by period under the certain situation, improve affairs efficient, and has a high availability, for the situation that can determine transaction rollback, can effectively solve traditional 2PC under the situation of network congestion, the situation that transaction queue waits for.The participant's list that transmits in the message is limited among the participant who participates in same affairs step, also better controlled the inquiry scale.
Cache policy is all adopted in the daily record of participant's node, when spatial cache is full, perhaps arrives on the cache-time to prescribe a time limit, can the one-time write disk.The security of daily record is simply described as follows:
If the probability that main frame breaks down is constant λ, then the mean free error time of main frame (MTTF) is 1/ λ.Because the probability that breaks down between main frame is separate, the main frame number is n, then the time 0 in t, the probability that All hosts breaks down should be p=(1-e -λ t) nIf the MTTF of separate unit main frame is T, λ=1/T then, so
Figure G2009102382705D0000091
The failure free time of supposing the separate unit main frame is 100 hours, and the affairs time of carrying out is 1 minute, at this moment between in the equal fault of All hosts, daily record is lost, the probability that can't inquire about is:
P=(0.00016665277854932547) n≈ (2 * 10 -4) nBecause coordinator node forces log record operation meeting that log caches all in the system is write disk in the lump at every turn, the time of log buffer is shortened, and the probability that corresponding daily record is lost is littler.
Description of drawings
Fig. 1 is the flow process of traditional two-phase commitment protocol.
Fig. 2 is that affairs of the present invention are submitted flow process to.
Embodiment
The present invention will be further described in detail below in conjunction with accompanying drawing and an example, but the scope that does not limit the present invention in any way.
Consider following example.There are a central office and n different store in a commercial chain store.Central office is used for preserving employee's data, the type of merchandize of whole chain store and the type of inventories of each point of sale etc.; Store preserves the sale and the inventory data of head store.If manager will inquire about all shops, find out the stock quantity of all shop toothbrush, and the store inventory of all understock 300 is supplied 300.
The tentation data situation is as follows:
Shop 1,2 comprises toothpaste in 3,4,5 stocks, and storage is respectively 100,200, and 300,400,500.Coordinator node affairs ID is 654321; Each participant's node affairs ID is respectively 154321,254321,354321,454321,554321; Coordinator node ID is " X ", and all the other participant's node ID are followed successively by " A " " B " " C " " D " " E "; Transactions requests can be divided into twice to be finished, and is respectively query requests op_1 and update request op_2.The voting results of participant's node are yes or no.
The execution in step of affairs is as follows:
1) node of submission affairs, the node of promptly handling the place becomes coordinator node automatically.Coordinator node determines that according to the record of oneself preserving the type of inventories in which shop comprises toothpaste, is established to the connection of these databases.Determine that just shop 1,2,3,4,5 is participant's node and the connection that is established to these nodes.Check self log buffer zone, determine whether that the daily record of buffer memory does not write the static store system as yet.
2) coordinator node sends transaction message to all participant's nodes.Message format is:
654321 X A,B,C,D,E Op_1 1 TimeStamp
Simultaneously, coordinator node forces to write daily record.Log-structured
654321 X A,B,C,D,E Op_1 start TimeStamp
If 1) in define buffer memory daily record do not write the static store system, then together write.
3) if participant's node is finished the Op_1 transactions requests, the then local daily record of buffer memory and ballot sent to coordinator node.The message log-structured and that send of each participant's node is as follows respectively:
Participant's node 1:
Journal format:
154321 X 654321 A,B,C,D,E Op_1 yes TimeStamp
Message format:
?X 654321 A 1 yes TimeStamp
Participant's node 2:
Journal format:
254321 X 654321 A,B,C,D,E Op_1 yes TimeStamp
Message format:
?X 654321 B 1 yes TimeStamp
Participant's node 3:
Journal format:
354321 X 654321 A,B,C,D,E Op_1 yes TimeStamp
Message format:
?X 654321 C 1 yes TimeStamp
Participant's node 4:
Journal format:
454321 X 654321 A,B,C,D,E Op_1 yes TimeStamp
Message format:
?X 654321 D 1 yes TimeStamp
Participant's node 5:
Journal format:
554321 X 654321 A,B,C,D,E Op_1 yes TimeStamp
Message format:
?X 654321 E 1 yes TimeStamp
If participant's node is not finished transactions requests, then daily record of buffer memory transaction rollback and rollback affairs.Log-structured the same with message format, only the yes of correspondence need be changed into no and get final product.
Whether before log cache, checking needs to force to write daily record, if desired, then forces earlier to write daily record, carries out other operations again.
4) coordinator node is analyzed voting results.If the 3rd) all participant's nodes all return yes message in the step, affairs enter step 5); Otherwise enter step 7).
5) coordinator node sends transaction message to all participant's nodes.Message format is:
654321 X A,B Op_2 2 TimeStamp
Simultaneously, coordinator node log cache.Log-structured
654321 X A,B Op_2 2 TimeStamp
6) if participant's node is finished the Op_2 transactions requests, the then local daily record of buffer memory and ballot sent to coordinator node.The message log-structured and that send of each participant's node is as follows respectively:
Participant's node 1:
Journal format:
154321 X 654321 A,B Op_2 yes TimeStamp
Message format:
?X 654321 A 2 yes TimeStamp
Participant's node 2:
Journal format:
254321 X 654321 A,B Op_2 yes TimeStamp
Message format:
?X 654321 B 2 yes TimeStamp
If participant's node is not finished the Op_2 transactions requests, then daily record of buffer memory transaction rollback and rollback affairs.Log-structured the same with message format, only the yes of correspondence need be changed into no and get final product.
Whether before log cache, checking needs to force to write daily record, if desired, then forces earlier to write daily record, carries out other operations again.
7) coordinator node is according to next step operation of voting results decision affairs.
If have one or above participant's node not ballot in time or ballot be " no ", then affairs need rollback.In this case, the local daily record of coordinator node buffer memory is to the node transmission transaction rollback message of all ballot yes.This log cache can write with the pressure log record operation of coordinator node next time.
Message format is as follows:
654321 X A,B,C,D,E Abort end TimeStamp
Journal format is as follows:
654321 X A,B,C,D,E Abort end TimeStamp
If all participant's nodes are all voted " yes ", and for the affairs number of steps is 2, then affairs can be submitted to.In this case, the local daily record of coordinator node buffer memory sends submission message to all participant's nodes.
Message format is as follows:
654321 X A,B,C,D,E Commit end TimeStamp
Journal format is as follows:
654321 X A,B,C,D,E Commit end TimeStamp
8) participant's node is according to final decision log cache and execution.Log-structured as follows: when final decision is submission:
Participant's node 1:
154321 X 654321 A,B,C,D,E end Commit TimeStamp
All the other participant's nodes only need be replaced local matter ID and get final product.
If final decision is the rollback affairs, only needs that Commit is replaced with Abort and get final product.
According to this flow process, it is 19 that these affairs of successful execution need exchange message quantity, forces log record 1 time.If according to traditional 2PC, remove the communication that transactions requests sends the stage, also need 20 of exchange messages, force log record 11 times.This shows that the method that the present invention adopts has remarkable advantages.

Claims (11)

1. a transaction commit method of distributed database system the steps include:
1) in participant's node and coordinator node internal memory, respectively distributes a buffer zone, be used for the buffer memory transaction journal;
2) coordinator node is determined participant's node according to the affairs content and is connected with it, determines the operation requests information of affairs simultaneously;
3) coordinator node is to the single-step operation request message of participant's node transmission affairs, the transactions requests daily record of writing down each operation requests simultaneously;
4) participant's node is handled local daily record according to the performance of each operation requests, and sends corresponding message to coordinator node;
5) coordinator node is judged according to the message that all participant's nodes that receive send:
If a) affairs need rollback, then coordinator node buffer memory local matter daily record, and participant's node that the operation requests of affairs is finished to success in return messages sends transaction rollback message; Participant's node is carried out transaction rollback message and the buffer memory transaction journal that receives then;
B) if affairs can be submitted then coordinator node buffer memory local matter daily record, and send to all participant's nodes of these affairs and to submit message to; Participant's node is carried out submission message and the buffer memory transaction journal that receives then;
C) if the operation requests of affairs is finished and is success in all participant's node return messages, and affairs do not finish as yet, and then repeating step 3)~5).
2. the method for claim 1 is characterized in that the operation requests information of described affairs comprises: each operation requests of affairs, and the operation requests of affairs sum n.
3. method as claimed in claim 1 or 2 is characterized in that described coordinator node when each affairs begin, determines whether there is the buffer memory transaction journal in the current coordinator node, if exist then with its one-time write disk; If the buffer zone of described participant's node is full or buffer memory surpasses setting-up time or when taking place to force to write the buffer memory transaction journal, with buffer memory transaction journal one-time write disk.
4. method as claimed in claim 2 is characterized in that the operation requests message of described affairs comprises: affairs ID, telegon ID, relate to all participant ID of this operation requests, affairs single step request content, i, telegon local time stamp; Describedly send corresponding message to coordinator node and comprise: the up-to-date timestamp that telegon ID, telegon affairs ID, local participant ID, i, vote, coordinator node send; The local daily record of described participant's node comprises: local matter ID, telegon ID, telegon affairs ID, relate to the up-to-date timestamp that all participant ID of this operation requests, affairs single step request content, vote, coordinator node send; Wherein i is the i step operation requests of these affairs, and vote is this transaction operation request performance.
5. method as claimed in claim 4, it is characterized in that the method that described coordinator node writes down the transactions requests daily record of each operation requests is: if the operation requests of affairs is operation requests of new affairs, affairs that then will these new affairs begin daily record forces to write disk; If the operation requests of affairs is i step requests of current affairs, then this daily record of buffer memory.
6. method as claimed in claim 5 is characterized in that described affairs begin daily record and comprise: affairs ID, telegon ID, all participant ID that relate to the affairs all operations, the 1st step of affairs request content, this affairs opening flag, telegon local time stamp; The daily record of the i of described affairs step request comprises: affairs ID, telegon ID, relate to affairs i step operation all participant ID, affairs i step request content, i, telegon local time stamp, wherein i>1.
7. as claim 1 or 4 described methods, it is characterized in that the local daily record of described participant's node processing, and to the method that coordinator node sends corresponding message be: if participant's node is finished the operation requests of affairs is failure, then need the buffer memory transaction journal, return message is given coordinator node and rollback affairs; Be success if participant's node is finished the operation requests of affairs, then only need buffer memory transaction journal, return message to give coordinator node and wait for the operation requests message of coordinator node transmission affairs.
8. method as claimed in claim 2, it is characterized in that in the described step 5), be failure if there is one or more participant's node not have the operation requests of affairs in return messages or the return messages to finish in setting-up time, then coordinator node is judged as affairs needs rollback; If the operation requests of affairs is finished and is success in all participant's node return messages, and the operation requests number of affairs equals the operation requests sum of these affairs in participant's node return messages, and then coordinator node is judged as affairs and can submits to.
9. the method for claim 1 is characterized in that in the described step 5), and described transaction rollback message comprises: affairs ID, telegon ID, participant ID, transaction rollback decision, affairs end mark, telegon local time stamp; Described affairs submit to message to comprise: affairs ID, telegon ID, participant ID, affairs are for decision, affairs end mark, telegon local time stamp; The local matter daily record of coordinator node buffer memory comprises: affairs ID, telegon ID, participant ID, affairs final decision, affairs end mark, telegon local time stamp; The transaction journal of participant's nodal cache comprises: the up-to-date timestamp that local matter ID, telegon ID, telegon affairs ID, participant ID, affairs end mark, affairs final decision, coordinator node send; Described affairs final decision is that transaction rollback decision or affairs are for decision.
10. the method for claim 1, it is characterized in that then participant's node is at first inquired mutually according to participant's list of the same affairs step of execution that receives if all participant's nodes are not all received the rollback message that coordinator node is sent or submitted message in setting-up time:
Be failure if a) there is participant's operation requests to finish, then all affairs participants all carry out transaction rollback message;
B) be success if all participants' operation requests is finished, then participant's node continues the message that the wait coordinator node is sent;
C) if there is participant's node to receive rollback message or submission message, then all the other participants consult and carry out;
D) if there is participant's node in the time that limits, can't get in touch, but and can't determine whether rollback of affairs, then continue to wait for.
11. the method for claim 1, it is characterized in that if coordinator node lost efficacy, then after coordinator node is restarted, at first search the time stamp of the last affairs end log in this locality, inquire unclosed transaction journal that telegon is initiated before between all participant's nodes are at this moment then:
If a) daily record that has or above participant's node to reply shows affairs rollback voluntarily, then the daily record that receives of coordinator node arrangement is preserved the transaction rollback record and is notified all participants this transaction rollback again to recover local daily record;
B) if the daily record that all participants all reply shows affairs still to be waited for, then the daily record that receives of coordinator node arrangement is submitted to or rollback according to the implementation status decision affairs of affairs to recover local daily record;
C) if having in the participant Shang Weicong participant node failure and recover, do not have whole daily records; Perhaps have the participant to get in touch, then think transaction rollback, the daily record that the coordinator node arrangement receives to be to recover local daily record, writes the rollback daily record and sends the affairs decision to participant's node recovering getting in touch the back with participant's node.
CN2009102382705A 2009-11-24 2009-11-24 Transaction commit method of distributed database system Expired - Fee Related CN101706811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102382705A CN101706811B (en) 2009-11-24 2009-11-24 Transaction commit method of distributed database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102382705A CN101706811B (en) 2009-11-24 2009-11-24 Transaction commit method of distributed database system

Publications (2)

Publication Number Publication Date
CN101706811A true CN101706811A (en) 2010-05-12
CN101706811B CN101706811B (en) 2012-01-25

Family

ID=42377036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102382705A Expired - Fee Related CN101706811B (en) 2009-11-24 2009-11-24 Transaction commit method of distributed database system

Country Status (1)

Country Link
CN (1) CN101706811B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521112A (en) * 2011-11-18 2012-06-27 深圳中兴网信科技有限公司 Memory-based reading and writing method of log information
CN102693312A (en) * 2012-05-28 2012-09-26 清华大学 Flexible transaction management method in key-value store data storage
CN103399790A (en) * 2013-08-20 2013-11-20 浙江中控技术股份有限公司 Transaction committing method and device based on distributed type real-time database system
CN104220982A (en) * 2013-10-29 2014-12-17 华为技术有限公司 Transaction processing method and device
WO2015062113A1 (en) * 2013-10-29 2015-05-07 华为技术有限公司 Affair processing method and device
CN105210062A (en) * 2013-03-15 2015-12-30 亚马逊科技公司 System-wide checkpoint avoidance for distributed database systems
CN105574217A (en) * 2016-03-16 2016-05-11 中国联合网络通信集团有限公司 Data synchronization method and device of distributed relational database
CN105721337A (en) * 2014-12-04 2016-06-29 中国移动通信集团公司 Distributed transaction processing method and device in software definition network
CN105824842A (en) * 2015-01-07 2016-08-03 阿里巴巴集团控股有限公司 Distributed transaction processing method and system
CN105989164A (en) * 2015-03-04 2016-10-05 阿里巴巴集团控股有限公司 Rollback processing method and apparatus
WO2016169048A1 (en) * 2015-04-24 2016-10-27 Hewlett Packard Enterprise Development Lp Transaction management and committing
CN106325978A (en) * 2015-06-19 2017-01-11 阿里巴巴集团控股有限公司 Distributed transaction processing method and apparatus
CN106412116A (en) * 2016-11-17 2017-02-15 上海斐讯数据通信技术有限公司 Method and device for distributed processing on logging in of user by cloud access controller
CN106776130A (en) * 2016-11-30 2017-05-31 华为技术有限公司 A kind of journal recovery method, storage device and memory node
CN106897288A (en) * 2015-12-18 2017-06-27 阿里巴巴集团控股有限公司 The service providing method and system of database
CN106997305A (en) * 2013-10-29 2017-08-01 华为技术有限公司 A kind of transaction methods and device
WO2017143824A1 (en) * 2016-02-24 2017-08-31 华为技术有限公司 Transaction execution method, apparatus, and system
CN107644025A (en) * 2016-07-20 2018-01-30 阿里巴巴集团控股有限公司 The distribution method and device of the WAL records of distributed data base
CN108027829A (en) * 2015-07-10 2018-05-11 起元技术有限责任公司 The system and framework of Access and control strategy of database are provided in the network with distributed data base system
CN108055296A (en) * 2017-11-30 2018-05-18 北京中电普华信息技术有限公司 A kind of transaction methods and device based on micro services framework
CN110008271A (en) * 2019-04-04 2019-07-12 航天云网科技发展有限责任公司 Micro services affairs based on single database submit method
CN110134735A (en) * 2019-04-10 2019-08-16 阿里巴巴集团控股有限公司 The storage method and device of distributed transaction log
CN110196760A (en) * 2018-07-12 2019-09-03 腾讯科技(深圳)有限公司 Distributed transaction consistency implementation method and device
CN110457157A (en) * 2019-08-05 2019-11-15 腾讯科技(深圳)有限公司 Distributed transaction abnormality eliminating method, device, computer equipment and storage medium
CN111858629A (en) * 2020-07-02 2020-10-30 北京奥星贝斯科技有限公司 Method and device for realizing two-stage submission of distributed transaction update database
CN114238353A (en) * 2021-12-21 2022-03-25 山东浪潮科学研究院有限公司 Method and system for realizing distributed transaction
CN115145697A (en) * 2022-07-05 2022-10-04 中电金信软件有限公司 Database transaction processing method and device and electronic equipment
CN115658805A (en) * 2022-09-15 2023-01-31 星环信息科技(上海)股份有限公司 Transaction consistency management engine and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100561920C (en) * 2004-12-27 2009-11-18 北京航空航天大学 Web service transacter and processing method
US7725446B2 (en) * 2005-12-19 2010-05-25 International Business Machines Corporation Commitment of transactions in a distributed system

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521112A (en) * 2011-11-18 2012-06-27 深圳中兴网信科技有限公司 Memory-based reading and writing method of log information
CN102693312A (en) * 2012-05-28 2012-09-26 清华大学 Flexible transaction management method in key-value store data storage
CN105210062B (en) * 2013-03-15 2019-05-14 亚马逊科技公司 System scope checkpoint for distributed data base system avoids
CN105210062A (en) * 2013-03-15 2015-12-30 亚马逊科技公司 System-wide checkpoint avoidance for distributed database systems
CN103399790A (en) * 2013-08-20 2013-11-20 浙江中控技术股份有限公司 Transaction committing method and device based on distributed type real-time database system
CN103399790B (en) * 2013-08-20 2016-12-28 浙江中控技术股份有限公司 A kind of affairs based on distributed real-time database system submit method and device to
EP3026574A4 (en) * 2013-10-29 2016-08-24 Huawei Tech Co Ltd Affair processing method and device
WO2015062113A1 (en) * 2013-10-29 2015-05-07 华为技术有限公司 Affair processing method and device
CN106997305A (en) * 2013-10-29 2017-08-01 华为技术有限公司 A kind of transaction methods and device
CN104220982A (en) * 2013-10-29 2014-12-17 华为技术有限公司 Transaction processing method and device
US10055445B2 (en) 2013-10-29 2018-08-21 Huawei Technologies Co., Ltd. Transaction processing method and apparatus
EP3514693A1 (en) * 2013-10-29 2019-07-24 Huawei Technologies Co., Ltd. Transaction processing method and apparatus
CN106997305B (en) * 2013-10-29 2020-09-29 华为技术有限公司 Transaction processing method and device
US9348841B2 (en) 2013-10-29 2016-05-24 Huawei Technologies Co., Ltd. Transaction processing method and system
CN105721337B (en) * 2014-12-04 2019-06-25 中国移动通信集团公司 Distributed transaction processing method and device in software defined network
CN105721337A (en) * 2014-12-04 2016-06-29 中国移动通信集团公司 Distributed transaction processing method and device in software definition network
CN105824842A (en) * 2015-01-07 2016-08-03 阿里巴巴集团控股有限公司 Distributed transaction processing method and system
CN105824842B (en) * 2015-01-07 2019-05-10 阿里巴巴集团控股有限公司 Distributed transaction processing method and its system
CN105989164B (en) * 2015-03-04 2019-08-09 阿里巴巴集团控股有限公司 Rollback processing method and processing device
CN105989164A (en) * 2015-03-04 2016-10-05 阿里巴巴集团控股有限公司 Rollback processing method and apparatus
WO2016169048A1 (en) * 2015-04-24 2016-10-27 Hewlett Packard Enterprise Development Lp Transaction management and committing
CN106325978A (en) * 2015-06-19 2017-01-11 阿里巴巴集团控股有限公司 Distributed transaction processing method and apparatus
CN106325978B (en) * 2015-06-19 2020-06-30 阿里巴巴集团控股有限公司 Distributed transaction processing method and device
CN108027829A (en) * 2015-07-10 2018-05-11 起元技术有限责任公司 The system and framework of Access and control strategy of database are provided in the network with distributed data base system
CN108027829B (en) * 2015-07-10 2022-07-29 起元技术有限责任公司 Method, apparatus and computer readable medium for managing database transactions
CN106897288B (en) * 2015-12-18 2021-01-08 阿里巴巴集团控股有限公司 Service providing method and system for database
CN106897288A (en) * 2015-12-18 2017-06-27 阿里巴巴集团控股有限公司 The service providing method and system of database
CN107122354B (en) * 2016-02-24 2020-05-08 华为技术有限公司 Transaction execution method, device and system
WO2017143824A1 (en) * 2016-02-24 2017-08-31 华为技术有限公司 Transaction execution method, apparatus, and system
US10891286B2 (en) 2016-02-24 2021-01-12 Huawei Technologies Co., Ltd. Transaction execution method, apparatus, and system
CN107122354A (en) * 2016-02-24 2017-09-01 华为技术有限公司 Affairs perform method, apparatus and system
CN105574217A (en) * 2016-03-16 2016-05-11 中国联合网络通信集团有限公司 Data synchronization method and device of distributed relational database
CN105574217B (en) * 2016-03-16 2019-04-30 中国联合网络通信集团有限公司 The method of data synchronization and device of distributed relation database
CN107644025A (en) * 2016-07-20 2018-01-30 阿里巴巴集团控股有限公司 The distribution method and device of the WAL records of distributed data base
CN106412116A (en) * 2016-11-17 2017-02-15 上海斐讯数据通信技术有限公司 Method and device for distributed processing on logging in of user by cloud access controller
WO2018098972A1 (en) * 2016-11-30 2018-06-07 华为技术有限公司 Log recovery method, storage device and storage node
CN106776130A (en) * 2016-11-30 2017-05-31 华为技术有限公司 A kind of journal recovery method, storage device and memory node
CN108055296A (en) * 2017-11-30 2018-05-18 北京中电普华信息技术有限公司 A kind of transaction methods and device based on micro services framework
CN108055296B (en) * 2017-11-30 2020-11-27 北京中电普华信息技术有限公司 Transaction processing method and device based on micro-service architecture
CN110196760A (en) * 2018-07-12 2019-09-03 腾讯科技(深圳)有限公司 Distributed transaction consistency implementation method and device
CN110196760B (en) * 2018-07-12 2023-04-18 腾讯科技(深圳)有限公司 Method and device for realizing consistency of distributed transactions
CN110008271A (en) * 2019-04-04 2019-07-12 航天云网科技发展有限责任公司 Micro services affairs based on single database submit method
CN110134735A (en) * 2019-04-10 2019-08-16 阿里巴巴集团控股有限公司 The storage method and device of distributed transaction log
CN110457157A (en) * 2019-08-05 2019-11-15 腾讯科技(深圳)有限公司 Distributed transaction abnormality eliminating method, device, computer equipment and storage medium
CN111078451A (en) * 2019-08-05 2020-04-28 腾讯科技(深圳)有限公司 Distributed transaction processing method and device, computer equipment and storage medium
CN110457157B (en) * 2019-08-05 2021-05-11 腾讯科技(深圳)有限公司 Distributed transaction exception handling method and device, computer equipment and storage medium
CN111858629A (en) * 2020-07-02 2020-10-30 北京奥星贝斯科技有限公司 Method and device for realizing two-stage submission of distributed transaction update database
CN111858629B (en) * 2020-07-02 2023-08-22 北京奥星贝斯科技有限公司 Implementation method and device for two-stage submitting distributed transaction update database
CN114238353A (en) * 2021-12-21 2022-03-25 山东浪潮科学研究院有限公司 Method and system for realizing distributed transaction
CN115145697A (en) * 2022-07-05 2022-10-04 中电金信软件有限公司 Database transaction processing method and device and electronic equipment
CN115658805A (en) * 2022-09-15 2023-01-31 星环信息科技(上海)股份有限公司 Transaction consistency management engine and method
CN115658805B (en) * 2022-09-15 2023-10-17 星环信息科技(上海)股份有限公司 Transaction consistency management engine and method

Also Published As

Publication number Publication date
CN101706811B (en) 2012-01-25

Similar Documents

Publication Publication Date Title
CN101706811B (en) Transaction commit method of distributed database system
US20200167370A1 (en) Maintaining a relationship between two different items of data
US9830348B2 (en) Persistent data storage techniques
US6438582B1 (en) Method and system for efficiently coordinating commit processing in a parallel or distributed database system
CN102037463B (en) Log based replication of distributed transactions using globally acknowledged commits
US5799305A (en) Method of commitment in a distributed database transaction
CN100568193C (en) The system and method that is used for performance management in the multilayer computing environment
US6247023B1 (en) Method for providing database recovery across multiple nodes
US8892509B2 (en) Systems and methods for a distributed in-memory database
CN103473318B (en) A kind of distributed transaction support method towards memory data grid
CN104094228B (en) For the system and method for the transaction recovery for supporting to submit the strict sequencing called based on the two-stage
US10366106B2 (en) Quorum-based replication of data records
US20050268300A1 (en) Distributed task scheduler for computing environments
CN102882927A (en) Cloud storage data synchronizing framework and implementing method thereof
CN105183544B (en) A kind of distributed transaction that non-block type is fault-tolerant submission method and system
CN108446335B (en) Heterogeneous system data extraction and unified external data exchange method based on database
CN103559245A (en) Distributed transaction committing failure handling method, device and system
CN109614403B (en) Data consistency checking method and device for cluster service node
JP2005025432A (en) Transaction processing method, transaction controller, and transaction control program
CN111711526A (en) Consensus method and system for block chain nodes
WO2017143824A1 (en) Transaction execution method, apparatus, and system
EP1197876A2 (en) Persistent data storage techniques
CN108090056B (en) Data query method, device and system
US8725708B2 (en) Resolving a unit of work
CN105874435A (en) Non-blocking registration in distributed transactions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120125

Termination date: 20181124