US8407195B2 - Efficient multi-version locking for main memory databases - Google Patents

Efficient multi-version locking for main memory databases Download PDF

Info

Publication number
US8407195B2
US8407195B2 US13/042,269 US201113042269A US8407195B2 US 8407195 B2 US8407195 B2 US 8407195B2 US 201113042269 A US201113042269 A US 201113042269A US 8407195 B2 US8407195 B2 US 8407195B2
Authority
US
United States
Prior art keywords
transaction
version
wait
dependency
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/042,269
Other versions
US20120233139A1 (en
Inventor
Per-Ake Larson
Spyridon Blanas
Cristian Diaconu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US13/042,269 priority Critical patent/US8407195B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIACONU, CRISTIAN, LARSON, PER-AKE, BLANAS, SPYRIDON
Priority to CN201210057483.XA priority patent/CN102682071B/en
Publication of US20120233139A1 publication Critical patent/US20120233139A1/en
Application granted granted Critical
Publication of US8407195B2 publication Critical patent/US8407195B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2315Optimistic concurrency control
    • G06F16/2329Optimistic concurrency control using versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • a database system optimized for in-memory storage and running on a many-core processor can support very high transaction rates and levels of concurrency. Efficiently ensuring isolation between concurrently executing transactions becomes challenging in such an environment.
  • Current database systems typically implement isolation by means of locking
  • traditional single-version locking suffers from scalability constraints, making traditional locking unsuitable for systems with very high transaction rates.
  • the present invention extends to methods, systems, and computer program products for implementing concurrency control by means of efficient multi-version locking in main memory databases where locks are non-blocking and correct ordering of transactions is enforced by a dependency mechanism.
  • a first transaction places a read marker (a.k.a. read lock) on a version of a record in a database.
  • the read marker indicates that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently.
  • a second transaction acquires a write lock on the version of the record. The write lock prevents another transaction from updating the version of the record.
  • the second transaction also creates a wait for dependency on the version. The second transaction continues processing, but waits to begin its commit until the first transaction terminates and removes the read marker on the version.
  • one or more first transactions each place a scan marker on a bucket in a hash table.
  • a second transaction attempts to add a new version of a record to the bucket.
  • the second transaction upon detecting the one or more scan markers on the bucket, creates a wait for dependency on each of the one or more first transactions.
  • the second transaction continues processing, but waits to begin its commit until each of the one or more first transactions terminate.
  • a first transaction acquires a write lock on a version of a record. While the version is write locked by the first transaction, a second transaction attempts to place a read marker on the version. Upon determining that the version is write locked by the first transaction, the second transaction creates a wait for dependency on the version for the first transaction and places a read marker on the version. The wait for dependency causes the first transaction to wait to begin its commit until the second transaction has terminated.
  • FIG. 1 illustrates an example of how a version's timestamps are set according to one or more embodiments.
  • FIG. 2 illustrates an exemplary data structure representing a record lock according to one or more embodiments.
  • FIG. 3 illustrates an exemplary data structure representing a transaction object according to one or more embodiments.
  • FIG. 4 illustrates an exemplary data structure representing a scan marker according to one or more embodiments.
  • FIG. 5 illustrates a flowchart of a method for creating a wait for dependency when a transaction acquires a write lock on a version for which a read marker is currently issued.
  • FIG. 6 illustrates a flowchart of a method for creating a wait for dependency when a transaction adds a new version to bucket that is locked by one or more other transaction.
  • FIG. 7 illustrates a flowchart of a method for creating a wait for dependency when a transaction acquires a read marker on a version that is already write locked by another transaction.
  • FIG. 8 illustrates an exemplary data structure representing a transaction object according to one or more embodiments.
  • the present invention extends to methods, systems, and computer program products for implementing multi-version concurrency control in main memory databases where locks are non-blocking and correct ordering of transactions is enforced by a dependency mechanism.
  • the present invention also includes embodiments of a multi-version concurrency control database that can implement both optimistic and pessimistic transactions simultaneously.
  • a first transaction places a read marker on a version of a record in a database.
  • the read marker indicates that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently.
  • a second transaction acquires a write lock on the version of the record. The write lock prevents another transaction from updating the version of the record.
  • the second transaction also creates a wait for dependency on the version. The second transaction continues processing, but waits to begin its commit until the first transaction terminates and removes its read marker on the version.
  • each of one or more first transactions places a scan marker on a bucket in a hash table.
  • a second transaction attempts to add a new version of a record to the bucket.
  • the second transaction upon detecting the one or more scan markers on the bucket, creates a wait for dependency on each of the one or more first transactions.
  • the second transaction continues processing, but waits to begin its commit until each of the one or more first transactions terminate.
  • a first transaction acquires a write lock on a version of a record. While the version is write locked by the first transaction, a second transaction attempts to place a read marker on the version. Upon determining that the version is write locked by the first transaction, the second transaction creates a wait for dependency on the version for the first transaction and places a read marker on the version. The wait for dependency causes the first transaction to wait to begin its commit until the second transaction has terminated.
  • Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below.
  • Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
  • Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer.
  • Computer-readable media that store computer-executable instructions are physical computer storage media (devices).
  • Computer-readable media that carry computer-executable instructions are transmission media.
  • embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
  • Computer storage media includes RAM, ROM, EEPROM, CD-ROM, DVD, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means (software) in the form of computer-executable instructions or data structures and which can be accessed and executed by one or more processors of a general purpose or special purpose computer to implement aspects of the invention, such that they are not merely transitory carrier waves or propagating signals.
  • a “network” is defined as one or more data links that enable the transport of electronic data between computers and/or modules and/or other electronic devices.
  • a network or another communications connection can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
  • program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (devices) (or vice versa).
  • computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer RAM and/or to less volatile computer storage media (devices) at a computer.
  • a network interface module e.g., a “NIC”
  • NIC network interface module
  • computer storage media (devices) can be included in computer components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at one or more processors, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
  • the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • the invention may be practiced in network computing environments with many types of computer configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like.
  • the invention may also be practiced in distributed system environments where local and remote computers, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.
  • a transaction is given two unique timestamp that indicate the logical time of its begin and end events, respectively. These timestamps are used to define the overall ordering among transaction events.
  • a timestamp as used herein may be a value received from a monotonically increasing counter and is not limited to a clock value.
  • a transaction when a transaction begins, it can receive a timestamp by reading and incrementing a timestamp counter. This begin timestamp uniquely identifies the transaction and therefore in some embodiments can serve as the transaction id. When a transaction terminates, it can also receive an end timestamp by reading the timestamp counter and incrementing it. If the transaction terminates by committing, this end timestamp can also serve as its commit timestamp. This use of timestamps enables the multi-versioning scheme to preserve serializability among the concurrent transactions.
  • Timestamps are also used to identify versions of records and their valid times.
  • a committed version of a record contains two timestamps, a start timestamp and an end timestamp.
  • the start timestamp of a committed version is equal to the commit time of the transaction that created the version. For example, if a transaction T 1 creates a version of a record during its processing (such as by modifying an existing record or creating a new record), the created version will receive the same start timestamp as the transaction T 1 's commit timestamp.
  • a version's end timestamp is initially set to a value that indicates that the timestamp is not yet determined such as infinity.
  • the version's end timestamp is set to the commit timestamp of transaction T 2 .
  • T 2 commits (thus making its new version of the record or deletion of the record durable)
  • the previous version of the record is no longer valid.
  • the end timestamp of the version Prior to T 2 committing, the end timestamp of the version is set to T 2 's transaction ID because T 2 's commit time is not yet known. This same transaction ID is also initially used as the start timestamp of the new version for the same reason.
  • a transaction creates a new version, it assigns its transaction ID to the end timestamp of the version being modified, and the start timestamp of the new version.
  • T 2 commits it writes its commit timestamp as the end timestamp of the old version and as the start timestamp of the new version.
  • a flag may be used.
  • FIG. 1 illustrates an exemplary lifetime of a version V 2 and how its timestamps are set.
  • a transaction T 1 creates version V 2 by updating a prior version.
  • T 1 sets V 2 's start timestamp to its own transaction ID and V 2 's end timestamp to a value such as infinity to indicate that the end timestamp is undetermined. Because V 2 's start timestamp is set to T 1 's transaction ID, other transactions can find T 1 's transaction object and thus determine T 1 's status.
  • T 1 also sets the end timestamp of version V 1 (the version that is updated to create V 2 ) to T 1 's transaction ID to indicate to other transactions that V 1 has been updated and is write locked.
  • T 1 precommits. Precommitting involves T 1 obtaining an end timestamp and entering a validation stage prior to committing. This time, t 2 , is the start of V 2 's valid time. However, because T 1 has not yet committed, and may still abort, the existence of V 2 is still in doubt. Accordingly, V 2 's start timestamp remains as T 1 's transaction ID.
  • T 1 completes the validation stage and commits.
  • T 1 then updates V 2 's start timestamp from its transaction ID to its end timestamp, t 2 .
  • V 2 's start timestamp indicates that it became valid (from the perspective of other transactions) as soon as T 1 committed which made V 2 durable.
  • V 2 's start and end timestamps are t 2 and infinity, respectively.
  • T 1 also updates V 1 's end timestamp to t 2 (not shown in the figure) which indicates that V 1 's valid time ended at t 2 .
  • a transaction T 2 updates V 2 to create a new version V 3 .
  • T 2 takes similar steps to set V 2 's and V 3 's timestamps as T 1 did to set V 2 's and V 1 's timestamps as described above. For example, T 2 sets V 2 's end timestamp to T 2 's transaction ID.
  • T 2 precommits and receives t 6 as its end timestamp. If T 2 proceeds to commit, t 6 will be the end of V 2 's valid time. Once committed, T 2 sets V 2 's end timestamp to t 6 .
  • V 2 's start timestamp takes on two values. First, it is initialized with T 1 's transaction ID upon being created, and then set to T 1 's end timestamp once T 1 commits. This indicates that V 2 becomes valid once the changes made by T 1 are durable. In contrast, V 2 's end timestamp takes on three values. First, it is initialized to infinity, then it is set to T 2 's transaction ID, and finally, it is set to T 2 's end timestamp once T 2 commits. This indicates that once V 3 , which is created by T 2 , becomes durable upon T 2 committing, V 2 is no longer valid.
  • a concurrency control technique is called pessimistic if it relies on proactively preventing such harmful interference from ever occurring. This is typically implemented by means of locking.
  • An optimistic concurrency control technique does not attempt to prevent interference proactively but instead relies on validating that no harmful interference occurred before allowing a transaction to commit.
  • a transaction is called pessimistic or optimistic depending on the type of concurrency control technique it relies on.
  • the present invention allows pessimistic and optimistic transactions to co-exist.
  • a pessimistic transaction uses read markers, scan markers and write locks to implement the multi-version concurrency control scheme of the present invention.
  • a pessimistic transaction prevents its reads from being invalidated by placing markers.
  • two different types of markers may be used to implement pessimistic transactions: read markers and scan markers.
  • Read markers are placed on versions to ensure read stability, whereas scan markers are placed on buckets to prevent phantoms.
  • a bucket may refer to a hash index, however, the present invention is not limited to databases using hash indexes; scan markers can be applied equally to ordered indexes and the like.
  • a transaction places a scan marker on a hash table bucket before beginning a scan of the records in the bucket. This does not prevent new records from being added to the bucket but the new versions cannot be committed until the scan marker has been removed.
  • a scan marker on a node protects the subtree rooted at that node.
  • a scan marker on a tower protects the range from that tower to the next tower of the same height. Phantoms occur when the set of versions returned by a query at the start of a transaction is different from the set of versions returned by the same query at the end of the transaction.
  • a transaction places a read marker on a version V by incrementing V's read marker count.
  • a version may be limited to a maximum number of read markers and may also include a flag to prevent any further read markers from being placed. Therefore, at any given time, a version may have multiple read markers. In contrast, a version may only have a single write lock at any given time.
  • FIG. 2 illustrates an exemplary data structure 200 representing read markers according to some embodiments of the present invention. As described above, each version contains an end timestamp field 201 a which may contain a timestamp. FIG. 2 further describes how the present invention can use this field to record read markers.
  • a first bit is designated as a content type bit which defines the type of content the field contains.
  • the first bit of the field is defined as this content type bit.
  • the content type bit is set to a first value (e.g. 0)
  • the remaining 63 bits of the field are the timestamp field 201 a as described above.
  • the content type bit is set to a second value (e.g. 1)
  • the remaining 63 bits of the field are interpreted differently. For example, as shown in FIG.
  • the 63 bits may be divided between a no more read marker flag 202 a , a read marker count 202 b , and a write lock field 202 c .
  • the no more read marker flag 202 a may be set to prevent any further read markers from being placed on the version.
  • the read marker count 202 b records the current number of read markers on the version.
  • the write lock field 202 c contains the transaction ID of the transaction (if any) holding a write lock on the version, or infinity if the version is not locked as described above.
  • a transaction can write lock a version by writing its transaction ID into the version's write lock field 202 c .
  • a transaction can place a read marker on a version by incrementing the version's read marker count 202 b .
  • a read marker in the present invention is different from a read lock in typical database implementation because a read marker on a version does not prevent another transaction from updating the version as will now be further described.
  • a transaction when a transaction attempts to update a version that is read locked, it would be forced to block.
  • another transaction may write lock the version to update it.
  • the updating transaction is not forced to block until the read markers are removed.
  • the updating transaction may continue processing, including updating the version; however, the updating transaction cannot commit until all read markers on the version have been removed.
  • a version is write locked by one transaction
  • another transaction may concurrently place a read marker on the version.
  • the updating transaction (the one with the write lock) cannot commit until the read marker is removed.
  • a read marker can be removed by either the reading transaction committing or aborting. Accordingly, in each of the above described scenarios, the updating transaction is forced to wait to commit until all read markers on the version are removed whether the updating transaction write locks the version before or after the one or more read markers are placed.
  • Similar rules apply to scan markers. For example, if a first transaction has placed a scan marker on a bucket, a second transaction is allowed to insert a new version into the bucket. However, the second transaction is not allowed to commit until the first transaction removes its scan marker on the bucket.
  • a wait for dependency forces an update transaction to wait before it can acquire an end timestamp and begin commit processing.
  • a transaction keeps track of its incoming and outgoing wait for dependencies.
  • An incoming dependency is one that the transaction waits on whereas a transaction has an outgoing dependency if some other transaction waits on the transaction to complete.
  • each transaction includes fields to track dependencies.
  • the fields may be contained in the transaction object, as shown in FIG. 3 , or elsewhere.
  • For incoming wait for dependencies two fields can be maintained: a wait for count 301 and a no more wait for dependencies flag 302 .
  • the wait for count 301 indicates how many incoming wait for dependencies a transaction is waiting for.
  • the no more wait for dependencies flag 302 can be set to prevent the creation of any more incoming dependencies. This flag can be used, for example, to prevent starvation by new incoming dependencies continuously being added.
  • a waiting transaction list 303 is maintained. This list contains the transaction IDs of any other transactions that are waiting for the transaction to complete.
  • TU can also obtain a wait for dependency in another way. If TU obtains a write lock on V while V's read marker count 202 b is zero, TU will not initially take out a wait for dependency on V. While V is locked by TU, another transaction TR may attempt to place a read marker on V. TR will detect that V's read marker count 202 b is zero, but that V is write locked. TR then reads TU's no more wait for dependencies flag 302 to determine whether TU will allow a wait for dependency to be created.
  • TR places a read marker on V by incrementing V's read marker count, and gives TU a wait for dependency on V by incrementing TU's wait for count 301 . For this reason, it can be viewed that TR gives TU a wait for dependency in this example.
  • a transaction TR performs different steps depending on various factors including whether V has outstanding read markers, and whether another transaction TU has a write lock on V.
  • TR simply decrements V's read marker count 202 b and proceeds.
  • V's read marker count is greater than one
  • TR also simply decrements V's read marker count 202 b and proceeds.
  • TR in a third scenario, if V is write locked and V's read marker count is equal to one (meaning that TR is the only transaction with a read marker on V), TR is about to remove the last read marker on V. In this third scenario, TR must release TU's wait for dependency on V. To do so, TR sets V's read marker count 202 b to zero and V's no more read markers flag 202 a to true thus preventing any further read markers from being obtained on V. Then, TR locates TU (by reading its transaction ID in V's write lock field 202 c ) and decrements TU's wait for count 301 .
  • V's no more read markers flag 202 a is set to true prior to releasing TU's wait for dependency on V to ensure that no other transaction places a read marker on V prior to TU committing the updated version of V. This is necessary because once TU's wait for dependency is removed, TU can proceed to commit. Thus V will become invalid by being replaced by an updated version V′ created by TU.
  • FIG. 4 illustrates an exemplary data structure 400 that is used to implement scan markers in some embodiments of the invention.
  • Wait for dependencies related to scan markers function similarly to wait for dependencies related to read markers.
  • a transaction TR places a scan marker on a bucket B by incrementing B's marker count 401 and adding its transaction ID to B's marker list 402 .
  • the purpose of the scan marker is not to prevent a version from being added to the bucket, but, instead to prevent any versions which are added while the scan marker is in place from becoming visible to TR during its processing.
  • another transaction TU can add a version to B, but TU cannot commit until TR removes its marker on B. This is enforced by TU obtaining a wait for dependency on TR.
  • this specification refers to the wait for dependency as being on another transaction whereas in the record lock scenario, the specification refers to the wait for dependency as being on the version. This is to distinguish that the wait for dependency in the scan marker scenario is dependent on one or more transactions releasing their scan markers (i.e. a marker on a plurality of versions rather than on a single version as in the read marker scenario).
  • a transaction TU can acquire a wait for dependency caused by to a scan marker in two ways. First, if TU is attempting to add a new version V to a bucket B with one or more scan markers, TU takes out a wait for dependency on every transaction listed in B's marker list 402 (i.e. each transaction that has a scan marker on B). To do so, TU adds its own transaction ID to the waiting transaction list 303 of every transaction listed in B's marker list 402 . TU also increments its own wait for count 301 for each transaction listed in B's marker list 402 .
  • TR registers a wait for dependency for TU on TR by adding TU's transaction ID to TR's waiting transaction list 303 and incrementing TU's wait for count 301 . This wait for dependency is created to prevent TU from committing before TR which would make V a phantom to TR.
  • FIG. 5 illustrates a flowchart of a method 500 for creating a wait for dependency in a multi-version concurrency control scheme of a main memory database.
  • Method 500 will be described with reference to the exemplary data structures in FIGS. 2 and 3 .
  • a first transaction places a read marker on a version of a record in a database (act 501 ).
  • the read marker indicates that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently.
  • the first transaction may acquire the read marker by incrementing the version's read marker count 202 b .
  • a second transaction acquires a write lock on the version of the record (act 502 ).
  • the write lock prevents another transaction from updating the version of the record.
  • the second transaction may acquire the write lock by writing its transaction ID to the version's write lock field 202 c .
  • the second transaction also creates a wait for dependency on the version (act 503 ).
  • the second transaction may increment its wait for count 301 which may be stored in its transaction object.
  • the second transaction continues processing, but waits to commit until the first transaction terminates and removes the read marker on the version (act 504 ).
  • Method 500 may further include the second transaction determining that the version has outstanding read markers prior to creating the wait for dependency by reading the version's read marker count 202 b and determining that the read marker count 202 b is greater than zero.
  • method 500 may also include the first transaction determining that its read marker is the last read marker on the version (such as by determining that the version's read marker count 202 b is equal to one prior to the first transaction terminating).
  • the method may also include the first transaction decrementing the version's read marker count 202 b , setting the version's no more read markers flag 202 a , and decrementing the second transaction's wait for count 301 .
  • the first transaction may identify the second transaction by reading the second transaction's transaction ID in the version's write lock field 202 c.
  • method 500 may also include the first transaction determining that one or more other read markers have been placed on the version, and the first transaction removing its read marker by decrementing the version's read marker count 202 b .
  • the version's no more read markers flag 202 a , read marker count 202 b , and write lock field 202 c are stored within the version.
  • FIG. 6 illustrates a flowchart of a method 600 for creating a wait for dependency when a transaction adds a new version to a bucket with one or more scan markers.
  • Method 600 will be described with reference to the exemplary data structures in FIGS. 3 and 4 .
  • one or more first transactions places a scan marker on a bucket (act 601 ).
  • the one or more first transactions may place the scan markers by incrementing marker count 401 and adding their transaction IDs to marker list 402 .
  • a second transaction attempts to add a new version of a record to the bucket (act 602 ).
  • the second transaction upon detecting the one or more scan markers on the bucket, creates a wait for dependency on each of the one or more first transactions (act 603 ).
  • the second transaction may detect the one or more markers on the bucket by reading the bucket's marker count 401 .
  • the second transaction may then create the one or more wait for dependencies by adding its transaction ID to the waiting transaction list 303 of every transaction listed in the lock list 402 .
  • the second transaction continues processing, but waits to commit until each of the one or more first transactions terminate (act 604 ).
  • each on the one or more first transaction may decrement the second transaction's wait for count 301 .
  • the second transaction's wait for count 301 reaches zero indicating that the second transaction has no more wait for dependencies, the second transaction may proceed to commit.
  • FIG. 7 illustrates a flowchart of a method 700 for creating a wait for dependency when a transaction places a read marker on a version that is already write locked by another transaction.
  • Method 700 will be described with reference to the exemplary data structures in FIGS. 2 and 3 .
  • a first transaction acquires a write lock on a version of a record (act 701 ).
  • the first transaction may acquire the write lock by writing its transaction ID in the version's write lock field 202 c .
  • a second transaction attempts to place a read marker on the version (act 702 ).
  • the second transaction Upon determining that the version is write locked by the first transaction, the second transaction creates a wait for dependency on the version for the first transaction and places a read marker on the version (act 703 ). For example, the second transaction can detect that the version is write locked by determining that the version's write lock field 202 c contains the first transaction's transaction ID. The second transaction may create the first transaction's wait for dependency on the version by incrementing the second transaction's wait for count 301 , and may place the read marker by incrementing the version's read marker count 202 b . The wait for dependency causes the first transaction to wait to commit until the second transaction has terminated and removed its read marker on the version. For example, the first transaction may continue processing, but will not commit until its wait for count 301 equals zero.
  • embodiments of the present invention may also implement commit dependencies simultaneously with wait for dependencies.
  • commit dependencies can be either incoming or outgoing dependencies as will be further described below.
  • a transaction only needs to know the number of incoming commit dependencies and therefore maintains an incoming commit dependency count. Further, a transaction must track each of its outgoing commit dependencies and thus maintains an outgoing commit dependency set.
  • V 2 is valid from t 2 to t 6 , there are periods were V 2 's validity is in doubt. In other words, because a transaction may abort after creating a new version of a record, it cannot be known that the new version will be valid until the transaction commits. Specifically, V 2 is created at t 1 , but the start of its valid time interval is not known until T 1 precommits at time t 2 . During this time (t 1 -t 2 ), V 2 is only visible to T 1 .
  • V 2 is not stable until T 1 actually commits at t 3 because T 1 may still abort after it has pre-committed.
  • commit dependencies allow the reading transaction to assume that T 1 will commit thus allowing the reading transaction to read the updated version V 2 before T 1 has committed.
  • Commit dependencies can be used by both pessimistic and optimistic transactions.
  • a reading transaction TR may register a commit dependency with T 1 .
  • the implementation of commit dependencies will be described with reference to FIG. 8 .
  • FIG. 8 is similar to FIG. 3 in that it includes similar fields to those illustrated in FIG. 3 .
  • TR increments its own incoming commit dependency count 804 and registers its transaction ID in T 1 's outgoing commit dependency set 805 .
  • T 1 has committed, it locates TR's transaction ID in its outgoing commit dependency set 805 (as well as any other transaction IDs of other transactions that have registered commit dependencies with T 1 ), and decrements TR's commit dependency count 804 .
  • TR's only dependency was with T 1 , its commit dependency count 804 will now be zero indicating that it is no longer waiting for any other transactions to commit. TR, therefore, can now commit.
  • TR is able to read a value from a version before it is certain that the version will be valid. If T 1 aborts rather than commits, T 1 will notify TR of the abort thus causing TR to abort as well (because it has read a value that will never become valid). This can be accomplished using an abort flag 806 in each transaction which when set causes the transaction to abort. The aborting transaction (in this case T 1 ) could set this flag in TR.
  • the present invention includes embodiments of a multi-version concurrency control technique that can implement both optimistic and pessimistic transactions, as described above, by utilizing the read markers, scan markers and write locks, as well as both commit dependencies and wait for dependencies.
  • the exemplary data structures illustrated in the figures and described above enable the concurrent use of both types of dependencies with the read markers, scan markers and write locks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A transaction creates a wait for dependency on a version in a main memory database implementing a multi-version concurrency control scheme. The wait for dependency allows the transaction to update the version while other transactions are reading the version. The multi-version concurrency control scheme also allows commit dependencies to be implementing concurrently with wait for dependencies. Commit dependencies allow a transaction to read an updated version before the updated version is committed.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
N/A
BACKGROUND
Main memories are becoming sufficiently large that the working set of most Online Transaction Processing databases can be stored in memory. A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods do not scale to the high transaction rates achievable by such systems.
A database system optimized for in-memory storage and running on a many-core processor can support very high transaction rates and levels of concurrency. Efficiently ensuring isolation between concurrently executing transactions becomes challenging in such an environment. Current database systems typically implement isolation by means of locking However, traditional single-version locking suffers from scalability constraints, making traditional locking unsuitable for systems with very high transaction rates.
BRIEF SUMMARY
The present invention extends to methods, systems, and computer program products for implementing concurrency control by means of efficient multi-version locking in main memory databases where locks are non-blocking and correct ordering of transactions is enforced by a dependency mechanism.
In one embodiment, a first transaction places a read marker (a.k.a. read lock) on a version of a record in a database. The read marker indicates that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently. Before the first transaction terminates, a second transaction acquires a write lock on the version of the record. The write lock prevents another transaction from updating the version of the record. The second transaction also creates a wait for dependency on the version. The second transaction continues processing, but waits to begin its commit until the first transaction terminates and removes the read marker on the version.
In another embodiment, one or more first transactions each place a scan marker on a bucket in a hash table. A second transaction then attempts to add a new version of a record to the bucket. The second transaction, upon detecting the one or more scan markers on the bucket, creates a wait for dependency on each of the one or more first transactions. The second transaction continues processing, but waits to begin its commit until each of the one or more first transactions terminate.
In another embodiment, a first transaction acquires a write lock on a version of a record. While the version is write locked by the first transaction, a second transaction attempts to place a read marker on the version. Upon determining that the version is write locked by the first transaction, the second transaction creates a wait for dependency on the version for the first transaction and places a read marker on the version. The wait for dependency causes the first transaction to wait to begin its commit until the second transaction has terminated.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
BRIEF DESCRIPTION OF THE DRAWINGS
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
FIG. 1 illustrates an example of how a version's timestamps are set according to one or more embodiments.
FIG. 2 illustrates an exemplary data structure representing a record lock according to one or more embodiments.
FIG. 3 illustrates an exemplary data structure representing a transaction object according to one or more embodiments.
FIG. 4 illustrates an exemplary data structure representing a scan marker according to one or more embodiments.
FIG. 5 illustrates a flowchart of a method for creating a wait for dependency when a transaction acquires a write lock on a version for which a read marker is currently issued.
FIG. 6 illustrates a flowchart of a method for creating a wait for dependency when a transaction adds a new version to bucket that is locked by one or more other transaction.
FIG. 7 illustrates a flowchart of a method for creating a wait for dependency when a transaction acquires a read marker on a version that is already write locked by another transaction.
FIG. 8 illustrates an exemplary data structure representing a transaction object according to one or more embodiments.
DETAILED DESCRIPTION
The present invention extends to methods, systems, and computer program products for implementing multi-version concurrency control in main memory databases where locks are non-blocking and correct ordering of transactions is enforced by a dependency mechanism. The present invention also includes embodiments of a multi-version concurrency control database that can implement both optimistic and pessimistic transactions simultaneously.
In one embodiment, a first transaction places a read marker on a version of a record in a database. The read marker indicates that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently. Before the first transaction terminates, a second transaction acquires a write lock on the version of the record. The write lock prevents another transaction from updating the version of the record. The second transaction also creates a wait for dependency on the version. The second transaction continues processing, but waits to begin its commit until the first transaction terminates and removes its read marker on the version.
In another embodiment, each of one or more first transactions places a scan marker on a bucket in a hash table. A second transaction then attempts to add a new version of a record to the bucket. The second transaction, upon detecting the one or more scan markers on the bucket, creates a wait for dependency on each of the one or more first transactions. The second transaction continues processing, but waits to begin its commit until each of the one or more first transactions terminate.
In another embodiment, a first transaction acquires a write lock on a version of a record. While the version is write locked by the first transaction, a second transaction attempts to place a read marker on the version. Upon determining that the version is write locked by the first transaction, the second transaction creates a wait for dependency on the version for the first transaction and places a read marker on the version. The wait for dependency causes the first transaction to wait to begin its commit until the second transaction has terminated.
Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. Computer-readable media that store computer-executable instructions are physical computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, DVD, or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means (software) in the form of computer-executable instructions or data structures and which can be accessed and executed by one or more processors of a general purpose or special purpose computer to implement aspects of the invention, such that they are not merely transitory carrier waves or propagating signals.
A “network” is defined as one or more data links that enable the transport of electronic data between computers and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer RAM and/or to less volatile computer storage media (devices) at a computer. Thus, it should be understood that computer storage media (devices) can be included in computer components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which, when executed at one or more processors, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computers, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Prior to discussing the use of pessimistic transactions to implement the multi-version concurrency control scheme of the present invention, various basic concepts of an exemplary multi-version concurrency control scheme usable in the present invention will be described. In this exemplary multi-version concurrency control scheme, a transaction is given two unique timestamp that indicate the logical time of its begin and end events, respectively. These timestamps are used to define the overall ordering among transaction events. A timestamp as used herein may be a value received from a monotonically increasing counter and is not limited to a clock value.
For example, when a transaction begins, it can receive a timestamp by reading and incrementing a timestamp counter. This begin timestamp uniquely identifies the transaction and therefore in some embodiments can serve as the transaction id. When a transaction terminates, it can also receive an end timestamp by reading the timestamp counter and incrementing it. If the transaction terminates by committing, this end timestamp can also serve as its commit timestamp. This use of timestamps enables the multi-versioning scheme to preserve serializability among the concurrent transactions.
Records in the main memory database are versioned to allow for concurrent access by multiple transactions. Timestamps are also used to identify versions of records and their valid times. For example, a committed version of a record contains two timestamps, a start timestamp and an end timestamp. The start timestamp of a committed version is equal to the commit time of the transaction that created the version. For example, if a transaction T1 creates a version of a record during its processing (such as by modifying an existing record or creating a new record), the created version will receive the same start timestamp as the transaction T1's commit timestamp.
A version's end timestamp is initially set to a value that indicates that the timestamp is not yet determined such as infinity. However, when another transaction T2 commits a modification to the version (whether an update to the version that thus creates a new version, or a deletion of the version), the version's end timestamp is set to the commit timestamp of transaction T2. In other words, once T2 commits (thus making its new version of the record or deletion of the record durable), the previous version of the record is no longer valid.
Prior to T2 committing, the end timestamp of the version is set to T2's transaction ID because T2's commit time is not yet known. This same transaction ID is also initially used as the start timestamp of the new version for the same reason. Thus, when a transaction creates a new version, it assigns its transaction ID to the end timestamp of the version being modified, and the start timestamp of the new version. Once T2 commits, it writes its commit timestamp as the end timestamp of the old version and as the start timestamp of the new version. To distinguish between versions that contain a valid timestamp and those that have a temporary transaction ID assigned as its timestamp, a flag may be used.
FIG. 1 illustrates an exemplary lifetime of a version V2 and how its timestamps are set. At time t1, a transaction T1 creates version V2 by updating a prior version. T1 sets V2's start timestamp to its own transaction ID and V2's end timestamp to a value such as infinity to indicate that the end timestamp is undetermined. Because V2's start timestamp is set to T1's transaction ID, other transactions can find T1's transaction object and thus determine T1's status. T1 also sets the end timestamp of version V1 (the version that is updated to create V2) to T1's transaction ID to indicate to other transactions that V1 has been updated and is write locked.
At time t2, T1 precommits. Precommitting involves T1 obtaining an end timestamp and entering a validation stage prior to committing. This time, t2, is the start of V2's valid time. However, because T1 has not yet committed, and may still abort, the existence of V2 is still in doubt. Accordingly, V2's start timestamp remains as T1's transaction ID.
At time t3, T1 completes the validation stage and commits. At time t4, T1 then updates V2's start timestamp from its transaction ID to its end timestamp, t2. Thus, V2's start timestamp indicates that it became valid (from the perspective of other transactions) as soon as T1 committed which made V2 durable. V2's start and end timestamps, at this point, are t2 and infinity, respectively. At the same time T1 also updates V1's end timestamp to t2 (not shown in the figure) which indicates that V1's valid time ended at t2.
At some later time, t5, a transaction T2 updates V2 to create a new version V3. T2 takes similar steps to set V2's and V3's timestamps as T1 did to set V2's and V1's timestamps as described above. For example, T2 sets V2's end timestamp to T2's transaction ID. At time t6, T2 precommits and receives t6 as its end timestamp. If T2 proceeds to commit, t6 will be the end of V2's valid time. Once committed, T2 sets V2's end timestamp to t6.
To summarize the above example, V2's start timestamp takes on two values. First, it is initialized with T1's transaction ID upon being created, and then set to T1's end timestamp once T1 commits. This indicates that V2 becomes valid once the changes made by T1 are durable. In contrast, V2's end timestamp takes on three values. First, it is initialized to infinity, then it is set to T2's transaction ID, and finally, it is set to T2's end timestamp once T2 commits. This indicates that once V3, which is created by T2, becomes durable upon T2 committing, V2 is no longer valid.
Concurrently running transactions may interfere with each other so as to produce incorrect results. A concurrency control technique is called pessimistic if it relies on proactively preventing such harmful interference from ever occurring. This is typically implemented by means of locking. An optimistic concurrency control technique, on the other hand, does not attempt to prevent interference proactively but instead relies on validating that no harmful interference occurred before allowing a transaction to commit. Similarly, a transaction is called pessimistic or optimistic depending on the type of concurrency control technique it relies on.
The present invention allows pessimistic and optimistic transactions to co-exist. A pessimistic transaction uses read markers, scan markers and write locks to implement the multi-version concurrency control scheme of the present invention. A pessimistic transaction prevents its reads from being invalidated by placing markers. In the present invention two different types of markers may be used to implement pessimistic transactions: read markers and scan markers. Read markers are placed on versions to ensure read stability, whereas scan markers are placed on buckets to prevent phantoms. A bucket may refer to a hash index, however, the present invention is not limited to databases using hash indexes; scan markers can be applied equally to ordered indexes and the like.
A transaction places a scan marker on a hash table bucket before beginning a scan of the records in the bucket. This does not prevent new records from being added to the bucket but the new versions cannot be committed until the scan marker has been removed. If an ordered index is implemented by a tree structure, a scan marker on a node protects the subtree rooted at that node. Also, if an ordered index is implemented by skip lists, a scan marker on a tower protects the range from that tower to the next tower of the same height. Phantoms occur when the set of versions returned by a query at the start of a transaction is different from the set of versions returned by the same query at the end of the transaction.
A transaction places a read marker on a version V by incrementing V's read marker count. In some embodiments, a version may be limited to a maximum number of read markers and may also include a flag to prevent any further read markers from being placed. Therefore, at any given time, a version may have multiple read markers. In contrast, a version may only have a single write lock at any given time.
FIG. 2 illustrates an exemplary data structure 200 representing read markers according to some embodiments of the present invention. As described above, each version contains an end timestamp field 201 a which may contain a timestamp. FIG. 2 further describes how the present invention can use this field to record read markers.
As shown in FIG. 2, to enable the use of the end timestamp field 201 a for both a timestamp as well as read markers, a first bit is designated as a content type bit which defines the type of content the field contains. In the exemplary data structure shown in FIG. 2, the first bit of the field is defined as this content type bit. When the content type bit is set to a first value (e.g. 0), the remaining 63 bits of the field are the timestamp field 201 a as described above. However, when the content type bit is set to a second value (e.g. 1), the remaining 63 bits of the field are interpreted differently. For example, as shown in FIG. 2, the 63 bits may be divided between a no more read marker flag 202 a, a read marker count 202 b, and a write lock field 202 c. The no more read marker flag 202 a may be set to prevent any further read markers from being placed on the version. The read marker count 202 b records the current number of read markers on the version. The write lock field 202 c contains the transaction ID of the transaction (if any) holding a write lock on the version, or infinity if the version is not locked as described above.
Using the exemplary data structure of FIG. 2, a transaction can write lock a version by writing its transaction ID into the version's write lock field 202 c. Similarly, a transaction can place a read marker on a version by incrementing the version's read marker count 202 b. A read marker in the present invention is different from a read lock in typical database implementation because a read marker on a version does not prevent another transaction from updating the version as will now be further described.
In a traditional locking implementation of a database, when a transaction attempts to update a version that is read locked, it would be forced to block. In contrast, in the present invention, if a read marker has been placed on a version by one or more transactions, another transaction may write lock the version to update it. In other words, the updating transaction is not forced to block until the read markers are removed. The updating transaction may continue processing, including updating the version; however, the updating transaction cannot commit until all read markers on the version have been removed.
Similarly, in the present invention, if a version is write locked by one transaction, another transaction may concurrently place a read marker on the version. In this scenario, the updating transaction (the one with the write lock) cannot commit until the read marker is removed. A read marker can be removed by either the reading transaction committing or aborting. Accordingly, in each of the above described scenarios, the updating transaction is forced to wait to commit until all read markers on the version are removed whether the updating transaction write locks the version before or after the one or more read markers are placed.
Similar rules apply to scan markers. For example, if a first transaction has placed a scan marker on a bucket, a second transaction is allowed to insert a new version into the bucket. However, the second transaction is not allowed to commit until the first transaction removes its scan marker on the bucket.
To facilitate correct serialization when using these schemes, the present invention implements wait for dependencies. A wait for dependency forces an update transaction to wait before it can acquire an end timestamp and begin commit processing. To implement these wait for dependencies, a transaction keeps track of its incoming and outgoing wait for dependencies. An incoming dependency is one that the transaction waits on whereas a transaction has an outgoing dependency if some other transaction waits on the transaction to complete.
As shown in FIG. 3, each transaction includes fields to track dependencies. The fields may be contained in the transaction object, as shown in FIG. 3, or elsewhere. For incoming wait for dependencies, two fields can be maintained: a wait for count 301 and a no more wait for dependencies flag 302. The wait for count 301 indicates how many incoming wait for dependencies a transaction is waiting for. The no more wait for dependencies flag 302 can be set to prevent the creation of any more incoming dependencies. This flag can be used, for example, to prevent starvation by new incoming dependencies continuously being added. For outgoing wait for dependencies, a waiting transaction list 303 is maintained. This list contains the transaction IDs of any other transactions that are waiting for the transaction to complete.
The following paragraphs describe how the two exemplary data structures shown in FIGS. 2 and 3 can be used to implement wait for dependencies utilizing the multi-version concurrency control schemes of the present invention. When a transaction TU updates a version V, it obtains a write lock on V by copying its transaction ID into V's write lock field 202 c. If V's read marker count 202 b is greater than zero, TU takes a wait for dependency on V by incrementing TU's wait for count 301. In this example, it can be viewed that TU creates its own wait for dependency.
TU can also obtain a wait for dependency in another way. If TU obtains a write lock on V while V's read marker count 202 b is zero, TU will not initially take out a wait for dependency on V. While V is locked by TU, another transaction TR may attempt to place a read marker on V. TR will detect that V's read marker count 202 b is zero, but that V is write locked. TR then reads TU's no more wait for dependencies flag 302 to determine whether TU will allow a wait for dependency to be created. If TU's no more wait for dependencies flag 302 is not set, TR places a read marker on V by incrementing V's read marker count, and gives TU a wait for dependency on V by incrementing TU's wait for count 301. For this reason, it can be viewed that TR gives TU a wait for dependency in this example.
To remove a read marker on a version V, a transaction TR performs different steps depending on various factors including whether V has outstanding read markers, and whether another transaction TU has a write lock on V. In a first scenario, if V is not write locked, TR simply decrements V's read marker count 202 b and proceeds. In a second scenario, if V is write locked, but one or more other transactions have placed read markers on V (i.e. V's read marker count is greater than one), TR also simply decrements V's read marker count 202 b and proceeds.
However, in a third scenario, if V is write locked and V's read marker count is equal to one (meaning that TR is the only transaction with a read marker on V), TR is about to remove the last read marker on V. In this third scenario, TR must release TU's wait for dependency on V. To do so, TR sets V's read marker count 202 b to zero and V's no more read markers flag 202 a to true thus preventing any further read markers from being obtained on V. Then, TR locates TU (by reading its transaction ID in V's write lock field 202 c) and decrements TU's wait for count 301.
V's no more read markers flag 202 a is set to true prior to releasing TU's wait for dependency on V to ensure that no other transaction places a read marker on V prior to TU committing the updated version of V. This is necessary because once TU's wait for dependency is removed, TU can proceed to commit. Thus V will become invalid by being replaced by an updated version V′ created by TU.
FIG. 4 illustrates an exemplary data structure 400 that is used to implement scan markers in some embodiments of the invention. Wait for dependencies related to scan markers function similarly to wait for dependencies related to read markers. A transaction TR places a scan marker on a bucket B by incrementing B's marker count 401 and adding its transaction ID to B's marker list 402. The purpose of the scan marker is not to prevent a version from being added to the bucket, but, instead to prevent any versions which are added while the scan marker is in place from becoming visible to TR during its processing. In other words, another transaction TU can add a version to B, but TU cannot commit until TR removes its marker on B. This is enforced by TU obtaining a wait for dependency on TR.
It is noted that in this scan marker scenario, this specification refers to the wait for dependency as being on another transaction whereas in the record lock scenario, the specification refers to the wait for dependency as being on the version. This is to distinguish that the wait for dependency in the scan marker scenario is dependent on one or more transactions releasing their scan markers (i.e. a marker on a plurality of versions rather than on a single version as in the read marker scenario).
A transaction TU can acquire a wait for dependency caused by to a scan marker in two ways. First, if TU is attempting to add a new version V to a bucket B with one or more scan markers, TU takes out a wait for dependency on every transaction listed in B's marker list 402 (i.e. each transaction that has a scan marker on B). To do so, TU adds its own transaction ID to the waiting transaction list 303 of every transaction listed in B's marker list 402. TU also increments its own wait for count 301 for each transaction listed in B's marker list 402.
Second, if a transaction TR scans a bucket B and finds a version V that satisfies TR's search predicate but that is not visible to TR because V is write locked by a transaction TU that is still active, TR registers a wait for dependency for TU on TR by adding TU's transaction ID to TR's waiting transaction list 303 and incrementing TU's wait for count 301. This wait for dependency is created to prevent TU from committing before TR which would make V a phantom to TR.
FIG. 5 illustrates a flowchart of a method 500 for creating a wait for dependency in a multi-version concurrency control scheme of a main memory database. Method 500 will be described with reference to the exemplary data structures in FIGS. 2 and 3. In method 500, a first transaction places a read marker on a version of a record in a database (act 501). The read marker indicates that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently. For example, the first transaction may acquire the read marker by incrementing the version's read marker count 202 b. Before the first transaction terminates, a second transaction acquires a write lock on the version of the record (act 502). The write lock prevents another transaction from updating the version of the record. For example, the second transaction may acquire the write lock by writing its transaction ID to the version's write lock field 202 c. The second transaction also creates a wait for dependency on the version (act 503). For example, the second transaction may increment its wait for count 301 which may be stored in its transaction object. The second transaction continues processing, but waits to commit until the first transaction terminates and removes the read marker on the version (act 504).
Method 500 may further include the second transaction determining that the version has outstanding read markers prior to creating the wait for dependency by reading the version's read marker count 202 b and determining that the read marker count 202 b is greater than zero.
In some embodiments, method 500 may also include the first transaction determining that its read marker is the last read marker on the version (such as by determining that the version's read marker count 202 b is equal to one prior to the first transaction terminating). The method may also include the first transaction decrementing the version's read marker count 202 b, setting the version's no more read markers flag 202 a, and decrementing the second transaction's wait for count 301. The first transaction may identify the second transaction by reading the second transaction's transaction ID in the version's write lock field 202 c.
In other embodiments, method 500 may also include the first transaction determining that one or more other read markers have been placed on the version, and the first transaction removing its read marker by decrementing the version's read marker count 202 b. In some embodiments, the version's no more read markers flag 202 a, read marker count 202 b, and write lock field 202 c are stored within the version.
FIG. 6 illustrates a flowchart of a method 600 for creating a wait for dependency when a transaction adds a new version to a bucket with one or more scan markers. Method 600 will be described with reference to the exemplary data structures in FIGS. 3 and 4. In method 600, one or more first transactions places a scan marker on a bucket (act 601). For example, the one or more first transactions may place the scan markers by incrementing marker count 401 and adding their transaction IDs to marker list 402. A second transaction then attempts to add a new version of a record to the bucket (act 602). The second transaction, upon detecting the one or more scan markers on the bucket, creates a wait for dependency on each of the one or more first transactions (act 603). For example, the second transaction may detect the one or more markers on the bucket by reading the bucket's marker count 401. The second transaction may then create the one or more wait for dependencies by adding its transaction ID to the waiting transaction list 303 of every transaction listed in the lock list 402. The second transaction continues processing, but waits to commit until each of the one or more first transactions terminate (act 604). For example, upon terminating, each on the one or more first transaction may decrement the second transaction's wait for count 301. Once the second transaction's wait for count 301 reaches zero indicating that the second transaction has no more wait for dependencies, the second transaction may proceed to commit.
FIG. 7 illustrates a flowchart of a method 700 for creating a wait for dependency when a transaction places a read marker on a version that is already write locked by another transaction. Method 700 will be described with reference to the exemplary data structures in FIGS. 2 and 3. In method 700, a first transaction acquires a write lock on a version of a record (act 701). For example, the first transaction may acquire the write lock by writing its transaction ID in the version's write lock field 202 c. While the version is write locked by the first transaction, a second transaction attempts to place a read marker on the version (act 702). Upon determining that the version is write locked by the first transaction, the second transaction creates a wait for dependency on the version for the first transaction and places a read marker on the version (act 703). For example, the second transaction can detect that the version is write locked by determining that the version's write lock field 202 c contains the first transaction's transaction ID. The second transaction may create the first transaction's wait for dependency on the version by incrementing the second transaction's wait for count 301, and may place the read marker by incrementing the version's read marker count 202 b. The wait for dependency causes the first transaction to wait to commit until the second transaction has terminated and removed its read marker on the version. For example, the first transaction may continue processing, but will not commit until its wait for count 301 equals zero.
In addition to wait for dependencies as described above, embodiments of the present invention may also implement commit dependencies simultaneously with wait for dependencies.
Like wait for dependencies, commit dependencies can be either incoming or outgoing dependencies as will be further described below. Similarly, a transaction only needs to know the number of incoming commit dependencies and therefore maintains an incoming commit dependency count. Further, a transaction must track each of its outgoing commit dependencies and thus maintains an outgoing commit dependency set.
Referring again to FIG. 1, although V2 is valid from t2 to t6, there are periods were V2's validity is in doubt. In other words, because a transaction may abort after creating a new version of a record, it cannot be known that the new version will be valid until the transaction commits. Specifically, V2 is created at t1, but the start of its valid time interval is not known until T1 precommits at time t2. During this time (t1-t2), V2 is only visible to T1.
Further, although the start of V2's valid time is known once T1 precommits at t2, V2 is not stable until T1 actually commits at t3 because T1 may still abort after it has pre-committed. However, using commit dependencies according to the present invention, another transaction may be allowed to read V2 during this interval (t2-t3). Commit dependencies allow the reading transaction to assume that T1 will commit thus allowing the reading transaction to read the updated version V2 before T1 has committed. Commit dependencies can be used by both pessimistic and optimistic transactions.
A reading transaction TR, in this scenario, may register a commit dependency with T1. The implementation of commit dependencies will be described with reference to FIG. 8. FIG. 8 is similar to FIG. 3 in that it includes similar fields to those illustrated in FIG. 3. With reference to FIG. 8, to register a commit dependency with T1, TR increments its own incoming commit dependency count 804 and registers its transaction ID in T1's outgoing commit dependency set 805. Then, when T1 has committed, it locates TR's transaction ID in its outgoing commit dependency set 805 (as well as any other transaction IDs of other transactions that have registered commit dependencies with T1), and decrements TR's commit dependency count 804.
If TR's only dependency was with T1, its commit dependency count 804 will now be zero indicating that it is no longer waiting for any other transactions to commit. TR, therefore, can now commit. As can be seen, using this approach, TR is able to read a value from a version before it is certain that the version will be valid. If T1 aborts rather than commits, T1 will notify TR of the abort thus causing TR to abort as well (because it has read a value that will never become valid). This can be accomplished using an abort flag 806 in each transaction which when set causes the transaction to abort. The aborting transaction (in this case T1) could set this flag in TR.
Because most transactions commit, this speculative read approach using commit dependencies is very efficient. Additionally, in many cases, the reading transaction will never wait because the transaction on which the reading transaction depends finishes processing before the reading transaction is ready to commit.
The present invention includes embodiments of a multi-version concurrency control technique that can implement both optimistic and pessimistic transactions, as described above, by utilizing the read markers, scan markers and write locks, as well as both commit dependencies and wait for dependencies. The exemplary data structures illustrated in the figures and described above enable the concurrent use of both types of dependencies with the read markers, scan markers and write locks.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (20)

What is claimed is:
1. A method for creating a wait for dependency in a multi-version concurrency control scheme of a main memory database, the method comprising:
a first transaction placing a read marker on a version of a record in a database, the read marker indicating that the first transaction is reading the version of the record, but does not prevent another transaction from reading or updating the record concurrently;
before the first transaction terminates, a second transaction acquiring a write lock on the version of the record, the write lock preventing another transaction from updating the version of the record;
as part of acquiring the write lock, the second transaction creating a wait for dependency on the version; and
the second transaction continuing processing, but waiting to begin commit until the first transaction terminates and removes the read marker on the version.
2. The method of claim 1, wherein the second transaction creates the wait for dependency by incrementing the second transaction's wait for count.
3. The method of claim 2, wherein the wait for count is stored in the second transaction's transaction object.
4. The method of claim 2, further comprising:
the first transaction determining that the version's read marker count indicates that the first transaction is the only transaction with a read marker on the version; and
the first transaction setting the version's no more read markers flag to prevent another transaction from placing a read marker on the version.
5. The method of claim 4, further comprising:
the first transaction identifying the second transaction by reading the second transaction's transaction ID in a write lock field of the version, and decrementing the second transaction's wait for count.
6. The method of claim 1, wherein the first transaction places the read marker on the version by incrementing a read marker count of the version.
7. The method of claim 1, wherein the second transaction acquires the write lock by writing its transaction ID into a write lock field of the version.
8. The method of claim 7, wherein the version's write lock field is stored within the version.
9. The method of claim 1, further comprising:
one or more other transactions placing a read marker on the version while the version is write locked by the second transaction, wherein each of the one or more other transactions place a read marker by incrementing the version's read marker count.
10. The method of claim 9, further comprising:
the first transaction terminating and removing its read marker on the version prior to the one or more other transactions terminating and removing their read markers on the version;
the one or more other transactions terminating and removing their read markers on the version, wherein the last of the one or more other transactions to terminate and remove its read marker on the version further performs the following:
determining that the version's read marker count indicates that the last of the one or more other transactions is the only transaction with a read marker on the version;
setting the version's no more read markers flag to prevent another transaction from acquiring a read marker on the version;
identifying the second transaction as the transaction with the write lock on the version by reading the second transaction's transaction ID in the version's write lock field; and
decrementing the second transaction's wait for count.
11. The method of claim 1, further comprising:
the second transaction creating a modified version of the version;
the second transaction precommitting;
while the second transaction is precommitting, but before the second transaction commits, a third transaction creating a commit dependency on the second transaction and reading the modified version; and
the third transaction continuing processing, but waiting to commit until the second transaction commits.
12. The method of claim 11, wherein creating the commit dependency comprises the third transaction incrementing its commit dependency count, and writing its transaction ID to the second transaction's outgoing commit dependency set.
13. The method of claim 12, further comprising:
upon the second transaction committing, the second transaction reading the third transaction's transaction ID in the second transaction's outgoing commit dependency set; and
the second transaction decrementing the third transaction's commit dependency count.
14. A method for creating a wait for dependency in a multi-version concurrency control scheme of a main memory database, the method comprising:
one or more first transactions placing a scan marker on a bucket;
while the bucket is marked by the one or more first transactions, a second transaction attempting to add a new version of a record to the bucket;
upon detecting the one or more markers on the bucket, the second transaction creating a wait for dependency on each of the one or more first transactions; and
the second transaction continuing processing, but waiting to commit until each of the one or more first transactions terminate.
15. The method of claim 14, wherein the second transaction creating the wait for dependency on each of the one or more first transactions comprises the second transaction adding its transaction ID to each of the one or more first transaction's waiting transaction list, and the second transaction incrementing its wait for count for each of the one or more first transactions.
16. The method of claim 15, further comprising:
upon terminating, each of the one or more first transactions decrementing the second transaction's wait for count.
17. A method for creating a wait for dependency in a multi-version concurrency control scheme of a main memory database, the method comprising:
a first transaction acquiring a write lock on a version of a record;
while the version is write locked by the first transaction, a second transaction attempting to place a read marker on the version;
upon determining that the version is write locked by the first transaction, the second transaction creating a wait for dependency on the version for the first transaction, wherein the second transaction creates the wait for dependency on the version for the first transaction by incrementing the first transaction's wait for count;
the second transaction placing a read marker on the version, wherein the wait for dependency causes the first transaction to wait to commit until the second transaction has terminated;
the second transaction, upon terminating, decrementing the first transaction's wait for count;
the first transaction determining that its wait for count indicates that the first transaction has no more wait for dependencies; and
the first transaction committing.
18. The method of claim 17, further comprising the first transaction creating a modified version of the version.
19. The method of claim 18, wherein the method further comprises, after the second transaction terminates, the first transaction pre-committing and while the first transaction is pre-committing, but before the first transaction commits, a third transaction creating a commit dependency on the first transaction and reading the modified version.
20. The method of claim 19, wherein the method further includes the third transaction continuing processing, but waiting to commit until the first transaction commits.
US13/042,269 2011-03-07 2011-03-07 Efficient multi-version locking for main memory databases Active 2031-09-23 US8407195B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/042,269 US8407195B2 (en) 2011-03-07 2011-03-07 Efficient multi-version locking for main memory databases
CN201210057483.XA CN102682071B (en) 2011-03-07 2012-03-06 Efficient multi version for main storage database locks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/042,269 US8407195B2 (en) 2011-03-07 2011-03-07 Efficient multi-version locking for main memory databases

Publications (2)

Publication Number Publication Date
US20120233139A1 US20120233139A1 (en) 2012-09-13
US8407195B2 true US8407195B2 (en) 2013-03-26

Family

ID=46797011

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/042,269 Active 2031-09-23 US8407195B2 (en) 2011-03-07 2011-03-07 Efficient multi-version locking for main memory databases

Country Status (2)

Country Link
US (1) US8407195B2 (en)
CN (1) CN102682071B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346386B2 (en) * 2016-11-04 2019-07-09 Salesforce.Com, Inc. Multiversion concurrency control of database records with uncommitted transactions
US20230306011A1 (en) * 2020-08-06 2023-09-28 Leanxcale, S.L. System for conflict less concurrency control

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120310934A1 (en) * 2011-06-03 2012-12-06 Thomas Peh Historic View on Column Tables Using a History Table
EP2740055A4 (en) * 2011-08-01 2015-09-09 Tagged Inc Systems and methods for asynchronous distributed database management
US8983900B2 (en) * 2012-10-23 2015-03-17 Sap Se Generic semantic layer for in-memory database reporting
US9805074B2 (en) * 2012-11-28 2017-10-31 Sap Ag Compressed representation of a transaction token
US9436561B2 (en) 2013-03-28 2016-09-06 Microsoft Technology Licensing, Llc Recovery processing using torn write detection
CN103744936B (en) * 2013-12-31 2017-02-08 华为技术有限公司 Multi-version concurrency control method in database and database system
US9645844B2 (en) 2014-03-28 2017-05-09 Futurewei Technologies, Inc. Systems and methods to optimize multi-version support in indexes
US9928264B2 (en) * 2014-10-19 2018-03-27 Microsoft Technology Licensing, Llc High performance transactions in database management systems
US11301457B2 (en) * 2015-06-29 2022-04-12 Microsoft Technology Licensing, Llc Transactional database layer above a distributed key/value store
CN106708608B (en) * 2015-11-16 2020-08-11 阿里巴巴集团控股有限公司 Distributed lock service method, acquisition method and corresponding device
US20180329900A1 (en) * 2015-11-19 2018-11-15 Entit Software Llc Prediction models for concurrency control types
US11080261B2 (en) 2016-01-29 2021-08-03 Hewlett Packard Enterprise Development Lp Hybrid concurrency control
US9959176B2 (en) * 2016-02-22 2018-05-01 Red Hat Inc. Failure recovery in shared storage operations
CN109947742B (en) * 2019-02-28 2021-08-03 上海交通大学 Multi-version database concurrency control method and system for two-stage lock
US11567899B2 (en) 2019-12-03 2023-01-31 Western Digital Technologies, Inc. Managing dependent delete operations among data stores
US11409711B2 (en) * 2019-12-03 2022-08-09 Western Digital Technologies, Inc. Barriers for dependent operations among sharded data stores
US12111794B2 (en) 2019-12-03 2024-10-08 Western Digital Technologies, Inc. Replication barriers for dependent data transfers between data stores
CN112231070B (en) * 2020-10-15 2024-08-30 北京金山云网络技术有限公司 Data writing and reading method, device and server
CN112433868B (en) * 2020-11-27 2024-07-30 深圳前海微众银行股份有限公司 Transaction processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5623659A (en) * 1993-04-30 1997-04-22 International Business Machines Corporation Parent/child subset locking scheme for versioned objects
US5701480A (en) 1991-10-17 1997-12-23 Digital Equipment Corporation Distributed multi-version commitment ordering protocols for guaranteeing serializability during transaction processing
US20020138353A1 (en) * 2000-05-03 2002-09-26 Zvi Schreiber Method and system for analysis of database records having fields with sets
US6681226B2 (en) 2001-01-30 2004-01-20 Gemstone Systems, Inc. Selective pessimistic locking for a concurrently updateable database
US20080034172A1 (en) * 2006-08-04 2008-02-07 Microsoft Corporation Combined pessimistic and optimistic concurrency control
US20080256073A1 (en) * 2007-04-11 2008-10-16 Microsoft Corporation Transactional memory using buffered writes and enforced serialization order
US20090132535A1 (en) 2007-11-19 2009-05-21 Manik Ram Surtani Multiversion concurrency control in in-memory tree-based data structures
US20100174875A1 (en) * 2009-01-08 2010-07-08 David Dice System and Method for Transactional Locking Using Reader-Lists

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5701480A (en) 1991-10-17 1997-12-23 Digital Equipment Corporation Distributed multi-version commitment ordering protocols for guaranteeing serializability during transaction processing
US5623659A (en) * 1993-04-30 1997-04-22 International Business Machines Corporation Parent/child subset locking scheme for versioned objects
US20020138353A1 (en) * 2000-05-03 2002-09-26 Zvi Schreiber Method and system for analysis of database records having fields with sets
US6681226B2 (en) 2001-01-30 2004-01-20 Gemstone Systems, Inc. Selective pessimistic locking for a concurrently updateable database
US20080034172A1 (en) * 2006-08-04 2008-02-07 Microsoft Corporation Combined pessimistic and optimistic concurrency control
US20080256073A1 (en) * 2007-04-11 2008-10-16 Microsoft Corporation Transactional memory using buffered writes and enforced serialization order
US20090132535A1 (en) 2007-11-19 2009-05-21 Manik Ram Surtani Multiversion concurrency control in in-memory tree-based data structures
US20100174875A1 (en) * 2009-01-08 2010-07-08 David Dice System and Method for Transactional Locking Using Reader-Lists

Non-Patent Citations (18)

* Cited by examiner, † Cited by third party
Title
"CIS 307: Deadlocks:" 7 pgs. Retrieved online Dec. 17, 2010 http://www.cis.temple.edu/~ingargio/cis307/readings/deadlock.html.
"CIS 307: Deadlocks:" 7 pgs. Retrieved online Dec. 17, 2010 http://www.cis.temple.edu/˜ingargio/cis307/readings/deadlock.html.
"Design of Main Memory Database System/Concurrency", Dec. 1983, from Wikibooks, the open-content textbook collection, 11 pgs. Retrieved online Dec. 17, 2010, http://en.wikibooks.org/wiki/Design-of-Main-Memory-Database-System/Concurrency11.
"Pessimistic vs. Optimistic concurrency control". 3 pgs. Retrieved online Dec. 17, 2010 http://publib.boulder.ibm.com/infocenter/soliddb/v6r3/topic/com.ibm.swg.im.soliddb.sql.doc/doc/pessimistic.vs.optimistic.concurrency.control.html.
Berenson et al., "A Critique of ANSI SQL Isolation Levels", Proc. ACM SIGMOD 95, pp. 1-12, San Jose CA, Jun. 1995.
Bernstein, Philip A. and Goodman, Nathan, "Multiversion Concurrency Control-Theory and Algorithms", ACM Transactions on Database Systems, vol. 8, No. 4, Dec. 1983, pp. 465-483.
Cahill, Michael James, "Serializable Isolation for Snapshot Databases", Aug. 2009, 135 pgs.
Carey, Michael J. et al., "The Performance of Multiversion Concurrency Control Algorithms" ACM Transactions on Database Systems, vol. 4, No. 4, Nov. 1986, pp. 338-378.
Carey, Michael J., "Multiple versions and the performance of optimistic concurrency control", 28 pgs, Computer Science Technical Report #517, C.S.D., University of Wisconsin, Oct. 1993.
Concurrency Control: (Chapter 13), 1996-2010. 1pg, Retrieved online Dec. 17, 2010, http://www.postgresql.org/docs/8.4/static/mvcc.html.
Fraser, Kier, "Practical lock-freedom" Technical Report #579, Cambridge University Feb. 2004, 119 pgs.
Herlihy, Maurice et al., "The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures" C.S.D. Brown University, Jul. 2002, 15 pgs.
Johnson, Ryan et al. "Improving OLTP Scalability using Speculative Lock Inheritance" VLDB '09, Aug. 24-28, 2009, 11pgs.
Kung, H.T. et al., "On optimistic methods for concurrency control" ACM Transactions on Database Systems, vol. 6, No. 2, Jun. 1981, pp. 213-226.
Maged, Michael M., "Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects" IEEE Transactions on Parallel and Distributed Systems, vol. 15, No. 6, Jun. 2004, pp. 491-504.
Rastogi, Rajeev et al., "Logical and Physical Versioning in Main Memory Databases", 10 pgs, Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.
Thomasian, Alexander et al, "Performance Analysis of Two-Phase Locking" IEEE Transactions on Software Engineering, vol. 17, No. 5, May 1991, pp. 386-402.
Thomasian, Alexander, "Concurrency Control: Methods, Performance, and Analysis", ACM Computing Surveys, vol. 30, No. 1 Mar. 1998, pp. 70-119.

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346386B2 (en) * 2016-11-04 2019-07-09 Salesforce.Com, Inc. Multiversion concurrency control of database records with uncommitted transactions
US11416470B2 (en) * 2016-11-04 2022-08-16 Salesforce, Inc. Multiversion concurrency control of database records with uncommitted transactions
US20230306011A1 (en) * 2020-08-06 2023-09-28 Leanxcale, S.L. System for conflict less concurrency control
US11934373B2 (en) * 2020-08-06 2024-03-19 Leanxcale, S.L. System for conflict less concurrency control

Also Published As

Publication number Publication date
CN102682071B (en) 2016-12-21
CN102682071A (en) 2012-09-19
US20120233139A1 (en) 2012-09-13

Similar Documents

Publication Publication Date Title
US8407195B2 (en) Efficient multi-version locking for main memory databases
US11314716B2 (en) Atomic processing of compound database transactions that modify a metadata entity
US9336258B2 (en) Reducing database locking contention using multi-version data record concurrency control
EP3111325B1 (en) Automatically retrying transactions with split procedure execution
Ports et al. Serializable snapshot isolation in PostgreSQL
US11321299B2 (en) Scalable conflict detection in transaction management
US20190213203A1 (en) Distributed database transaction protocol
US20220197896A1 (en) Transactional database layer above a distributed key/value store
US8396831B2 (en) Optimistic serializable snapshot isolation
US7644106B2 (en) Avoiding lock contention by using a wait for completion mechanism
US7716182B2 (en) Version-controlled cached data store
US9679003B2 (en) Rendezvous-based optimistic concurrency control
US9576038B1 (en) Consistent query of local indexes
US10754854B2 (en) Consistent query of local indexes
US20100076940A1 (en) Method for providing maximal concurrency in a tree structure
WO2011009274A1 (en) Method and apparatus of concurrency control
CN110716936B (en) Database optimistic lock implementation method and system based on SpringBoot + JPA
US11243820B1 (en) Distributed deadlock detection and resolution in distributed databases
CN106648840B (en) Method and device for determining time sequence between transactions
WO2023124242A1 (en) Transaction execution method and apparatus, device, and storage medium
Zhou et al. Posterior snapshot isolation
US20230205785A1 (en) A distributed database that uses hybrid table secondary indexes
Zhou et al. Decentralizing MVCC by Leveraging Visibility
El-Shaikh et al. Lightweight Latches for B-Trees to Cope with High Contention
Chen et al. Spectrum: Speedy and Strictly-Deterministic Smart Contract Transactions for Blockchain Ledgers

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LARSON, PER-AKE;BLANAS, SPYRIDON;DIACONU, CRISTIAN;SIGNING DATES FROM 20110304 TO 20110307;REEL/FRAME:025913/0910

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12