WO2015116208A1 - Orthogonal key-value locking - Google Patents

Orthogonal key-value locking Download PDF

Info

Publication number
WO2015116208A1
WO2015116208A1 PCT/US2014/014296 US2014014296W WO2015116208A1 WO 2015116208 A1 WO2015116208 A1 WO 2015116208A1 US 2014014296 W US2014014296 W US 2014014296W WO 2015116208 A1 WO2015116208 A1 WO 2015116208A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
value
locking
lock
orthogonal
Prior art date
Application number
PCT/US2014/014296
Other languages
French (fr)
Inventor
Graefe GOETZ
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2014/014296 priority Critical patent/WO2015116208A1/en
Publication of WO2015116208A1 publication Critical patent/WO2015116208A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2336Pessimistic concurrency control approaches, e.g. locking or multiple versions without time stamps
    • G06F16/2343Locking methods, e.g. distributed locking or locking implementation details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Definitions

  • Ordered indexes are often used in computer science to enable data access, search, updates, insertions, and deletions in large data stores, such as databases. Concurrency control is needed to coordinate access to a single index by multiple users, multiple applications, and/or multiple transactions.
  • Figure 1 is a diagrammatic illustration of an example system which may implement orthogonal key-value locking.
  • Figure 2 is an example index record within a non-unique secondary index.
  • Figure 3 is a table illustrating construction of example lock modes.
  • Figure 4 is an illustration of an example primary index with index records.
  • Figures 5a-f are illustrations of locking an example non-unique secondary index for the example index records in the primary index shown in Figure 4.
  • Figure 6 is a diagrammatic illustration of example operations which may implement orthogonal key-value locking.
  • Orthogonality of locking a key value (versus locking a gap between neighboring key values) includes the ability to lock an entire key value, yet this key value only. Locking a key value in its entirety (all actual and possible instances) and/or in partitions, enables the use of independent locks on different key values, and thus concurrency among transactions accessing different partitions within a list of items associated with a specific key value.
  • FIG. 1 is a diagrammatic illustration of an example system which may implement orthogonal key-value locking.
  • orthogonal key-value locking may be implemented for ordered indexes, such as B-trees 100.
  • B-trees are a good example of one type of data structure which may implement orthogonal key- value locking.
  • B-trees 100 offer several advantages over other types of data structures, for example, when the time to access the data from a node exceeds the data processing time for that data. Examples include, but are not limited to, when the node data resides in secondary storage, such as database systems and file systems for disk drives and/or other non-volatile memory.
  • the B-tree 100 structure maintains keys in sorted order for sequential traversing.
  • B-trees 100 use a hierarchical index to minimize the number of disk reads, and use partially full blocks to speed insertions and deletions.
  • B-tree is used herein to refer to a specific data structure.
  • B-tree is also used herein in a more general sense, to include variations thereon, such as but not limited to a B+-tree and a B * -tree, and/or any other data structures now known or later developed which can benefit from the teachings herein.
  • orthogonal key-value locking may be implemented with other types of ordered indexes.
  • a B-tree 100 is made up of a plurality of nodes.
  • the internal (or non-leaf) nodes of a B-tree 100 can have a variable number of child nodes (within a pre-defined range). When data is inserted or removed from a node, the number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split.
  • each internal node 110 may have 2 or 3 child nodes 120a-c.
  • the number of child nodes 120a-c (or "branches") from an internal node 110 is one more than the number of keys 115a-b stored in the node 110.
  • the internal nodes 110 of a 2-3 B-tree 100 may store either one key with two child nodes, or two keys with three child nodes (as illustrated in Figure 1 ). It should be noted that the illustration of a 2-3 B-tree 110 in Figure 1 is only described herein for purposes of a simplified illustration. Most computer systems implement more sophisticated B-tree(s), but the basic principles are the same.
  • each internal node 110 of the B-tree 100 includes a number of keys 115a-b.
  • the keys 115a-b are defined as separation values which divide child nodes 120a-c (or "subtrees") of the B-tree 100.
  • the number of keys 115a-b may be selected to vary between d and 2d, where d is the minimum number of keys, and d +1 is the minimum degree or branching factor of the tree.
  • a factor of 2 allows nodes to be split or combined, although this is not necessary in all B-tree designs.
  • adding a key 115a-b to that node 110 can be accomplished by splitting the 2d key node 110 into two d key nodes (not shown) and adding the key to the parent node. Each split node (not shown) has the minimum number of keys.
  • an internal node 110 and its neighbor each have d keys 115a-b, then a key may be deleted from the internal node 110 by combining with its neighbor (not shown). Deleting the key 115a-b would make the internal node 110 have d -1 keys; joining the neighbor (not shown) would add d keys plus one more key from the neighbor's parent (not shown). The result is an entirely full node of 2d keys.
  • an internal node 110 has 3 child nodes 120a-c (or subtrees) then it will have 2 keys (e.g., Key 1 and Key 2). All values (e.g., 1 , 2, 3, . . . 6) in subtree 120a are less than the value of Key 1 (e.g., 7); all values (e.g., 8, 9, . . . 14) in subtree 120b are between the value of Key 1 (e.g., 7) and the value of Key 2 (e.g., 15); and all values (e.g., 16 . . . 20) in subtree 120c are greater than the value of Key 2 (e.g., 15).
  • All values e.g., 1 , 2, 3, . . . 6) in subtree 120a are less than the value of Key 1 (e.g., 7); all values (e.g., 8, 9, . . . 14) in subtree 120b are between the value of Key 1 (e.
  • Key values in secondary indexes may or may not be unique. In other words, there may be many, even thousands, of rows with the same value in an indexed attribute. In those cases, compression is often useful (e.g., bitmaps instead of lists of row identifiers). However, the representation choices are independent from transactional concurrency control.
  • B-tree locking may be implemented to control concurrent searches and updates in b-trees. These locking techniques separate transactions in order to protect the b-tree contents during node access (e.g., insertions/deletions).
  • the locking technique disclosed herein permits covering a distinct key value and all its index entries with a single lock acquisition, and at other times permits high concurrency among updates and transactions.
  • the technique provides benefits of orthogonal key-range locking (e.g., complete separation of lock modes for key value and gap), and of key-value locking (e.g., a single lock is used for a distinct key value and all its instances). As such, the technique is referred to herein as "orthogonal key-value locking.”
  • Orthogonal key-value locking may be implemented for B-trees, such as may be used in file systems (not shown), or in a database management system 150, as illustrated in Figure 1.
  • the operations are embodied in program code 160 stored on a non-transient computer readable medium 170 and executed by a processor 180.
  • the program code 160 executes the function of machine readable instructions.
  • the program code 160 executes to divide a set of index entries for a specific key value into a fixed number of partitions; lock index entries within a same partition together; and lock index entries in different partitions independently.
  • orthogonal key-value locking results in a lock request which may cover: (i) a gap (open interval) between two (actual) key values, (ii) a key value with its entire list of (actual and possible) row identifiers (in non-unique secondary indexes), (iii) a specific pair of unique key value and row identifier, or (iv) a distinct key value and a fraction of all (actual and possible) row identifiers.
  • Orthogonal key-value locking also offers the benefits of high concurrency (by using a small granularity of locking) and low overhead (by using a large granularity of locking).
  • orthogonal key-value locking technique described herein is based on a strict division of low-level concurrency control (also known as latching), thereby ensuring the physical integrity of the data structure and high-level transactional coordination protecting the logical contents of database tables and indexes.
  • orthogonal key-value locking assumes that locking is selected with a granularity of locking finer than pages or entire B-tree nodes.
  • Orthogonal key-value locking provides correctness in the form of two- phase locking.
  • An example implementation of orthogonal key-value locking offers simplicity, to enable ready understanding, efficient development, and thorough quality assurance.
  • An example implementation of orthogonal key-value locking also provides high concurrency, via a fine granularity of locking, as may be desirable for updates guided by searches in other indexes.
  • An example implementation of orthogonal key-value locking provides run time efficiency, via a coarse granularity of locking. This granularity may be particularly desirable for large operations, such as range queries and equality queries in non-unique indexes, including the queries (read-only search) used to determine a set of rows to update or delete.
  • Orthogonal key-value locking also enables a single lock that covers a key value and its entire list of row identifiers; concurrent locks on individual instances of non-unique keys; and independent locks for actual key values and the gaps therebetween.
  • Orthogonal key-value locking divides a set of index entries for a specific key value into a fixed number of partitions (e.g., k partitions). Methods for lock acquisition are modified to consider not just one, but multiple lock modes in a single lock manager invocation. For example, by using two lock modes in orthogonal key range locking, it is possible to lock the key value without locking the gap between key values and vice versa. Orthogonal key-value locking extends this idea to k+1 lock modes. One of the lock modes is for the gap between two distinct key values. The other k lock modes pertain to the k partitions in the list of index entries.
  • FIG. 2 is an example index record 200 within a non-unique secondary index.
  • the non-unique secondary index is for an employee table stored in a database.
  • Each index record contains a key value 210 (e.g., Gender), a count 220 of instances, and a corresponding set 230 of instances as a sorted list of bookmarks (e.g., primary key values).
  • the assignment from bookmark value to partition uses a simple "modulo 4" calculation. Index entries within the same partition (and for the same index key value) are locked together; index entries in different partitions can be locked independently.
  • lock modes may be defined such as "XN” or “NS” (pronounced “key exclusive, gap free” and “key free, gap shared”). They are formed from exclusive (“X”), shared (“S”), and no-lock (“N”) modes on key and gap. Additional primitive lock modes (e.g., update and increment locks) can readily be implemented by orthogonal key-value locking.
  • the optimal value of k may depend at least in part on design parameters.
  • Example design parameters may include, but are not limited to the desired degree of concurrency and/or the number of hardware and software threads in the system.
  • lock requests with k+1 lock modes impose little extra complexity on the lock manager. Specifically, each request for a lock on a key value in an ordered (B-tree) index has to be tested in each of the k+1 components. On the other hand, in most implementations the cost of hash table lookup and of concurrency control (latching) in the lock manager likely dominates the cost of manipulating locks.
  • the locks may be identified (and located in the lock manager's hash table) by the index identifier and a distinct key value (e.g., Gender). A lock request is granted if the request is compatible with all locks already granted.
  • lock modes are considered to be “compatible” if the lock modes are compatible on a component-by-component basis. For example, multiple exclusive locks on the same key value, but on different components or partitions, are considered compatible. If the lock manager speeds up testing for compatibility by maintaining a "cumulative lock mode" (e.g., summarizing all granted locks for a specific index and key value), then this mode is computed and maintained on a component-by-component basis.
  • a “cumulative lock mode” e.g., summarizing all granted locks for a specific index and key value
  • Releasing a lock usually involves re-computing the cumulative lock mode by scanning the list of all granted locks. Alternatively, multiple reference counts may be maintained for each lock. Releasing a lock on a single partition, however, may be simpler because such a lock is often an exclusive lock. In such cases, the cumulative lock mode may simply set the appropriate partition to "no lock.”
  • This additional lock component on the entire key value permits absolute lock modes (e.g., S, X), in addition to intention lock modes (e.g., IS, IX), and mixed modes (e.g., SIX).
  • a lock request for an individual partition includes an intention lock on the entire key value.
  • a lock request for all partitions at once locks the entire key value in an absolute mode, with no locks on individual partitions.
  • a mixed lock mode combines these two aspects.
  • FIG. 3 is a table 300 illustrating construction of example lock modes. In this illustration, both lock modes are on all partitions of a distinct key value and on gaps between such key values.
  • Column 310 lists the set of possible lock modes for the entire key value.
  • the remaining columns 320a-c and rows 330a-f show lock modes formed by pairing the entire key value and the gap to the next key value.
  • this set of lock modes the number of components in a lock request shrinks back from k+2 to k+1.
  • lock acquisition and release is as fast as traditional lock managers.
  • the locking technique locks index entries on the leading field (e.g., the student identifier), just as in a non-unique index.
  • queries may lock the entire enrollment information of one student with a single lock (locking all partitions in the student's list of course numbers).
  • Updates may lock subsets of course numbers (e.g., by locking only a single partition within the list of course numbers). With the number of partitions appropriately selected, the concurrency among updates is practically the same as with locks on individual course numbers.
  • this pattern of queries is fairly typical for many-to-many relationships. That is, a precise retrieval in one of the entity types (e.g., a specific student), an index that facilitates efficient navigation to the other entity type (e.g., on enrollment, with student identifier as leading key), and retrieval of all instances related to the initial instance.
  • entity types e.g., a specific student
  • index that facilitates efficient navigation to the other entity type
  • retrieval of all instances related to the initial instance e.g., on enrollment, with student identifier as leading key
  • Another example is also typical, listing all students for a single course using an index on course number of enrollment.
  • This example has to be well supported in a database system, its indexes, repertoire of query execution plans, etc.
  • B-tree indexes cluster relevant index entries due to their sort order; the proposed use of key-value locking supports these queries with the minimal number of locks and of lock requests.
  • a table's primary key is also the search key in the table's primary index.
  • the table's bookmark adds a system-generated value to the user- defined search key. This value can be unique by itself within the table or even the entire database. For example, some systems attach a unique, high-resolution time stamp to each row upon creation (e.g., often referred to as a "database key"). In some systems, the system-generated value is unique only for a specific table and a specific value of the user-defined search key. If a specific value of the user-defined search key happens to be unique, this system-generated "uniquifier" might be null or even entirely absent.
  • orthogonal key-value locking in the primary index does not offer any further benefit from partitioning, and operates similar orthogonal key-range locking. But if the user-defined search key for the primary index is not unique, then partitioning is appropriate for the system-generated key suffix. In other words, the user-defined key value is locked like a distinct key value in a non-unique secondary index and values of the system-defined key suffix are partitioned like bookmarks.
  • phantom protection may be implemented in both the primary indexes and the secondary indexes.
  • Orthogonal key-value locking (using a lock on the gap between distinct key values), protects such a transaction against insertion of phantom records without blocking any operations on the adjacent key values. In other words, insertion and deletion of rows with one of the adjacent user-defined key values remain unrestricted by the lock for phantom protection.
  • orthogonal key-value locking combines all the advantages of locking a distinct key value and all its index entries in a single lock manager invocation (i.e., low overhead for locking), and the advantages of locking individual index entries (i.e., high concurrency). Moreover, orthogonal key-value locking opens new opportunities for concurrency control in unique secondary indexes as well as primary indexes.
  • FIG 4 is an illustration of an example primary index 400 with index records for the following case studies. The comparisons discussed below with reference to each of the case studies are based on the example employee information shown in the index 400. It is noted that for purposes of this illustration, data is shown in table format, having a primary index on its primary key (unique, not null) and a non-unique secondary index on one of the non-unique columns. The rows are shown sorted to simplify the illustration. It is noted, however, that the index records would actually be maintained in a data structure (e.g., a B-tree).
  • a data structure e.g., a B-tree
  • Figures 5a-f are illustrations showing locking of an example non-unique secondary index 500 for the example index records in the primary index 400 shown in Figure 4.
  • the following case studies discuss the locks and the enabled concurrency of each locking technique for record-level locking in ordered (e.g., B-tree) indexes. These comparisons are qualitative in nature but nonetheless serve to highlight the differences among the locking techniques. It is noted that actual performance and scalability for a particular implementation may depend at least in part on the workload.
  • Figure 5a corresponds to a first example case study illustrating "phantom protection.” Phantom protection may be implemented for unsuccessful queries or searches for non-existing key values. In other words, this example illustrates techniques that lock the absence of key values.
  • This example assumes a serializable transaction, wherein a lock is used for phantom protection until end-of-transaction.
  • Index records are maintained in a non-unique secondary index 500 on FirstName 520 (corresponding to FirstName 420 in the primary index 400 shown in Figure 4).
  • distinct key values are paired with a list of bookmarks.
  • Unique search keys in the primary index (400 in Figure 4) serve as bookmarks in the secondary index 500 shown in Figure 5.
  • the bookmarks are also the primary key of the secondary index 500.
  • FirstName 520 Gary
  • orthogonal key-value locking provides phantom protection with the least restrictive lock scope.
  • Figure 5b corresponds to a second example case study illustrating "successful equality queries.”
  • Successful equality queries are successful index searches for a single key value. Note that this case study may occur both in selection queries and in index nested loops join operations.
  • orthogonal key-value locking acquires a single lock 551 covering all actual and possible index entries with FirstName Joe. Both adjacent key values remain unlocked (i.e., Gary and Larry).
  • Figure 5c corresponds to a third example case study illustrating "successful range queries.”
  • a successful range query is specified as "...where FirstName between 'Gary' and 'Larry'.”
  • the query result includes all instances of Gary and Larry. Locking for serializability prevents insertion of additional instances of these key values.
  • the issues to compare among the locking schemes are the number of lock requests and the precision of locking to no more than the subject key range.
  • orthogonal key-value locking acquires three locks 552a-c on the three distinct key values within the range of the query predicate (i.e., FirstName Gary, Joe, and Larry). For these, all actual and possible EmpNo values are locked. In addition, the two gaps between these key values are locked with appropriate lock modes in two of the lock requests (i.e., S mode for all partitions as well as the gap to the next higher key value). For the highest key value within the range of the query predicate (i.e., FirstName Larry), the lock request leaves the gap to the next higher key value free (i.e., no lock).
  • Figure 5d corresponds to a fourth example case study illustrating "non- key updates.”
  • Non-key updates are updates that modify non-key columns. In the primary index of the running example, such an update modifies any column other than the primary key.
  • the secondary index 500 may be implemented with a small definition change for such an update to be possible, namely a column added to each index entry using the "include" statement supported by some systems. This puts the included column behind the bookmark, and therefore is irrelevant for the sort order and the index organization. A non-key update of this extended secondary index modifies such an included column.
  • each element in the list carries an instance of the included columns.
  • orthogonal key-value locking locks the distinct key value Joe but only a single partition within the list of instances, as illustrated by lock 550d.
  • this is almost always a single instance.
  • locking a partition may mean locking many instances even if only one instance is needed. Larger choices of k may alleviate hash conflicts and false sharing. The gaps above and below FirstName Joe remain unlocked.
  • Figure 5e corresponds to a fifth example case study illustrating "deletions.”
  • Deletions include deletion of rows in a table and/or deletions of index entries in a B-tree.
  • any update of a key column usually is modeled as a pair of deletion and insertion.
  • non-unique secondary indexes one bit is needed for each index entry to indicate the ghost status.
  • a bit is needed per element in the list.
  • a key value becomes itself a ghost when its list is empty or when all remaining entries are ghosts.
  • ghosts can be removed outside of user transactions using system transactions or top-level actions. Ghost removal does not use transactional locks; latches coordinate threads and protect data structures; log records describe ghost removal if subsequent user transactions log their actions with references to slot numbers within pages rather than key values; and system transactions or top- level actions dispense with log flush upon commit.
  • Deletion via a ghost is a standard implementation method. Among other advantages, it ensures fast and reliable transaction rollback if a user transaction aborts or fails. Specifically, rollback of a user transaction does not use space allocation, for example a split of a B-tree node. Deletion via a ghost record uses the same locks as a non-key update, discussed earlier. The remainder of this subsection pertains to deletion without ghosts, whether or not this is the recommended implementation technique. If nothing else, it offers a level comparison, where the commit-duration X lock on the next key indicates that immediate record removal is assumed (i.e., record deletion without ghost records).
  • orthogonal key-value locking works best with ghost records, both during insertion and deletion.
  • cases (i) and (ii) above acquire and hold locks 550e, but the next key value itself remains unlocked.
  • locking follows the rules for non-key updates.
  • Figure 5f corresponds to a sixth example case study illustrating "insertions." Insertions include inserting new rows into a table, and thus inserting new index entries in primary and secondary indexes. For purposes of illustration, a command may be specified as "insert... values (4, 'Joe', 54546, 4499)".
  • orthogonal key-value locking works best with a ghost record 550f left by a prior deletion or by a system transaction. Once the ghost record 550f is in place, orthogonal key-value locking locks the appropriate partition within the list of bookmarks associated with the distinct key value, exactly as described earlier for updates on non-key attributes. If insertion via a ghost record is deemed undesirable for some reason, insertion of a new distinct key value uses a lock on the gap below the next higher key value. This lock is for phantom protection and may be held only briefly. Thereafter, a lock on the new distinct key value and the appropriate partition are used, with no lock on the other partitions or on the gaps above or below the new distinct key value. If the insertion merely adds another instance of an existing key value, a lock on the appropriate partition suffices.
  • FIG. 6 is a diagrammatic illustration of example operations which may implement orthogonal key-value locking.
  • Operations 600 may be embodied as logic instructions on one or more computer-readable medium. When executed on a processor, the logic instructions cause a general purpose computing device to be programmed as a special-purpose machine that implements the described operations.
  • the components and connections depicted in the figures may be used.
  • operation 610 includes selecting a lock 620.
  • the lock may be configured to protect at least one of the following in a data structure: (621 ) a key value and an adjoining gap to a neighboring key value; (622) a gap between key values, without locking the key value; (623) a key value without locking any adjoining gaps; (624) a partition of possible instances of a key value; and (624) multiple partitions of possible instances of a key value.
  • operation 630 includes maintaining concurrency among transactions on the data structure during access of the key value.
  • Further operations may also be implemented.
  • other operations include combining locks on partitions with locks on the gap between key values.
  • Other example operations include setting a specific fixed set of partitions for each separate key value.
  • Still other example operations may include resetting the specific fixed set of partitions when the key value is unlocked.
  • both the fixed set of partitions and partitioning rule may be reset.
  • resetting is by range partitioning or hash partitioning.
  • the operations may also support multiple lock modes.
  • the techniques provide support for k+1 lock modes in a single lock manager invocation.
  • one of the k+1 lock modes may be for a gap between two distinct key values.
  • the other k lock modes may correspond to k partitions in a list of index entries.
  • Further operations may include support for various index operations (e.g., those discussed above for the case studies illustrated in Figures 5a-f).
  • Index operations may include, but are not limited to, search for non-existing key values; successful index search for a single key value; return successful range queries; modify non-key columns; delete rows in a table of index entries in a B-tree; and insert new rows into a table of index entries in primary and secondary indexes of a B-tree.
  • orthogonal key-value locking is not limited in scope. Indeed, the orthogonal key-value locking techniques may be adapted to work with other locking techniques.
  • the index management (IM) technique locks logical rows of a table. By locking a bookmark, a transaction locks a record in the primary data structure, one record in each secondary index, plus a gap (to the next lower key value) in each secondary index. While very good for single-row insertion and deletion, the design suffers from too many locks in non-unique secondary indexes and from too much "footprint.” For example, for phantom protection, one might want to lock a gap between two index entries in some secondary index. But the only lock scope available also locks the key value above the gap as well as the logical row it pertains to (i.e., including index entries in other indexes as well as gaps between index entries in other indexes).
  • Orthogonal key-value locking techniques offer: (i) the separation of key value and gap, (ii) partitioning within the list of bookmarks, (iii) a hierarchy consisting of the entire key value and the partitions, with lock modes that reduce the number of lock manager invocations.
  • IM index management
  • Orthogonal row locking may be implemented by separating the logical row from the gaps between key values.
  • a lock on a logical row has two modes, one for the row and its index entries and another one for the gaps in all the indexes. This permits, for example, one transaction locking a row merely for phantom protection in one of the indexes while another transaction modifies the row including some of its index entries, but excluding all index keys. In other words, this update may modify non-key fields including the ghost bits, thus logically deleting the row and all its index entries. Combined lock modes such as this may help reduce the number of lock manager invocations.
  • Orthogonal key-value locking may also be implemented with a list of locks for indexes and their gaps (i.e., a lock request for a row would specify the indexes in which gaps ought to be locked). This permits, for example, phantom protection with a locked gap in a single index only. In a way, this idea introduces key-range locking to the index management (IM) technique. If the set of possible indexes is very large, the size of this list can be limited by partitioning the set of possible indexes such that locks cover a partition rather than a specific index (and the gaps therein).
  • IM index management
  • Orthogonal key-value locking may also be implemented to define a hierarchy of locking scopes.
  • the bottom of the hierarchy may be the set of partitions, and the top of the hierarchy may be the row and all its index entries (i.e., all partitions).
  • This adaptation enables lock acquisition with the scope and speed of the original index management (IM) technique, as well as locks with much reduced footprint with little additional overhead.
  • IM original index management
  • a single lock request for a logical row can lock merely the gap between two index entries within a single index. Nonetheless, if desired, a single lock request can also lock a logical row and all its index entries, optionally with all gaps in all indexes or with some or none of the gaps. Locking all gaps emulates the original index management (IM) technique, locking none seems more appropriate for single-row updates, and locking gaps in a subset of the indexes only seems more appropriate for phantom protection.
  • IM original index management
  • Orthogonal key-value locking may also be implemented by applying partitioning to the index entries. For that, a lock for a logical row is augmented with a list of indexes in which index entries are locked. Thus, one transaction might update the row in the primary data structure as well as some affected indexes while another transaction still searches other indexes of the same table, including index entries for the row being updated (in indexes not affected by the update).
  • query optimization may hold an SN lock (pronounced 'schema shared, data free' here), whereas dropping an index uses an XX lock (pronounced 'schema exclusive, data exclusive' here).
  • Intention lock modes are supported, at least for the data.
  • Another application of orthogonal key-value locking may be to divide the data lock into individual data structures (e.g., a table's set of indexes).
  • a single lock request (with a list of lock modes) may lock individual indexes.
  • a table has only a few indexes, but it can have dozens.
  • each index may have its own entry in a list of lock modes provided with each lock request for a table.
  • the set of indexes might be partitioned using a hash function to map index identifiers to positions in the lock list. The most immediate benefit may be to reduce the number of lock requests.
  • a transaction may instead lock the table in IX mode and, within the same lock manager request, acquire IX locks on two indexes (or actually, on two partitions within the set of indexes, with each partition probably containing only a single index).
  • Another application of orthogonal key-value locking may divide the schema lock into individual components of a schema (e.g., column set or individual columns, index set or individual indexes, integrity constraints, partitions of a table or indexes, histograms, etc.).
  • Sets of schema entries with individual locks may improve concurrency among changes in the data definition or concurrency of schema changes (DDL) with query and transaction processing (DML). For example, one transaction's query execution may join two secondary indexes while another transaction adds a new index.
  • DDL data definition or concurrency of schema changes
  • DML query and transaction processing
  • one transaction's query execution may join two secondary indexes while another transaction adds a new index.
  • online index creation no longer uses a "schema modify" lock for the entire table when it first registers a new index and when it eventually releases the completed index for unrestricted use.
  • This application of orthogonal key-value locking may improve the online behavior of DDL with minimal impact on active users and applications.

Abstract

Orthogonal key-value locking is disclosed. An example method includes selecting a lock configured to protect at least one of the following in a data structure: a key value and an adjoining gap to a neighboring key value, a gap between key values, without locking the key value, the key value without locking any adjoining gaps, a partition of possible instances of the key value, and multiple partitions of possible instances of the key value. The method also includes maintaining concurrency among transactions on the data structure during access of the key value.

Description

ORTHOGONAL KEY-VALUE LOCKING
BACKGROUND
[0001] Ordered indexes are often used in computer science to enable data access, search, updates, insertions, and deletions in large data stores, such as databases. Concurrency control is needed to coordinate access to a single index by multiple users, multiple applications, and/or multiple transactions.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Figure 1 is a diagrammatic illustration of an example system which may implement orthogonal key-value locking.
[0003] Figure 2 is an example index record within a non-unique secondary index.
[0004] Figure 3 is a table illustrating construction of example lock modes.
[0005] Figure 4 is an illustration of an example primary index with index records.
[0006] Figures 5a-f are illustrations of locking an example non-unique secondary index for the example index records in the primary index shown in Figure 4.
[0007] Figure 6 is a diagrammatic illustration of example operations which may implement orthogonal key-value locking.
DETAILED DESCRIPTION
[0008] Systems and methods of orthogonal key-value locking are disclosed. Orthogonality" of locking a key value (versus locking a gap between neighboring key values) includes the ability to lock an entire key value, yet this key value only. Locking a key value in its entirety (all actual and possible instances) and/or in partitions, enables the use of independent locks on different key values, and thus concurrency among transactions accessing different partitions within a list of items associated with a specific key value.
[0009] Before continuing, it should be noted that as used herein, the terms "includes" and "including" mean, but are not limited to, "includes" or "including" and "includes at least" or "including at least." The term "based on" means "based on" and "based at least in part on." [0010] Figure 1 is a diagrammatic illustration of an example system which may implement orthogonal key-value locking. In an example, orthogonal key-value locking may be implemented for ordered indexes, such as B-trees 100. Of course, orthogonal key-value locking is not limited to use with B-trees. But B-trees are a good example of one type of data structure which may implement orthogonal key- value locking.
[0011] B-trees 100 offer several advantages over other types of data structures, for example, when the time to access the data from a node exceeds the data processing time for that data. Examples include, but are not limited to, when the node data resides in secondary storage, such as database systems and file systems for disk drives and/or other non-volatile memory. By design, the B-tree 100 structure maintains keys in sorted order for sequential traversing. B-trees 100 use a hierarchical index to minimize the number of disk reads, and use partially full blocks to speed insertions and deletions.
[0012] It is noted that the term "B-tree" is used herein to refer to a specific data structure. The term "B-tree" is also used herein in a more general sense, to include variations thereon, such as but not limited to a B+-tree and a B*-tree, and/or any other data structures now known or later developed which can benefit from the teachings herein.
[0013] Again, while B-trees are discussed herein for purposes of illustration, in other examples orthogonal key-value locking may be implemented with other types of ordered indexes.
[0014] In general, a B-tree 100 is made up of a plurality of nodes. The internal (or non-leaf) nodes of a B-tree 100 can have a variable number of child nodes (within a pre-defined range). When data is inserted or removed from a node, the number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split.
[0015] The lower and upper bounds on the number of child nodes may be fixed for a particular implementation. For example, in a 2-3 B-tree 100 (such as the example illustrated in Figure 1 ), each internal node 110 may have 2 or 3 child nodes 120a-c. The number of child nodes 120a-c (or "branches") from an internal node 110 is one more than the number of keys 115a-b stored in the node 110. For example, the internal nodes 110 of a 2-3 B-tree 100 may store either one key with two child nodes, or two keys with three child nodes (as illustrated in Figure 1 ). It should be noted that the illustration of a 2-3 B-tree 110 in Figure 1 is only described herein for purposes of a simplified illustration. Most computer systems implement more sophisticated B-tree(s), but the basic principles are the same.
[0016] As mentioned above, each internal node 110 of the B-tree 100 includes a number of keys 115a-b. The keys 115a-b are defined as separation values which divide child nodes 120a-c (or "subtrees") of the B-tree 100. The number of keys 115a-b may be selected to vary between d and 2d, where d is the minimum number of keys, and d +1 is the minimum degree or branching factor of the tree.
[0017] It is noted that a factor of 2 allows nodes to be split or combined, although this is not necessary in all B-tree designs. For example, if an internal node 110 has 2d keys, then adding a key 115a-b to that node 110 can be accomplished by splitting the 2d key node 110 into two d key nodes (not shown) and adding the key to the parent node. Each split node (not shown) has the minimum number of keys. Similarly, if an internal node 110 and its neighbor each have d keys 115a-b, then a key may be deleted from the internal node 110 by combining with its neighbor (not shown). Deleting the key 115a-b would make the internal node 110 have d -1 keys; joining the neighbor (not shown) would add d keys plus one more key from the neighbor's parent (not shown). The result is an entirely full node of 2d keys.
[0018] Again with reference to the 2-3 B-tree illustration shown in Figure 1 , if an internal node 110 has 3 child nodes 120a-c (or subtrees) then it will have 2 keys (e.g., Key 1 and Key 2). All values (e.g., 1 , 2, 3, . . . 6) in subtree 120a are less than the value of Key 1 (e.g., 7); all values (e.g., 8, 9, . . . 14) in subtree 120b are between the value of Key 1 (e.g., 7) and the value of Key 2 (e.g., 15); and all values (e.g., 16 . . . 20) in subtree 120c are greater than the value of Key 2 (e.g., 15).
[0019] Many data stores support indexes on multiple attributes of data items stored in a B-tree data structure. In databases, for example, these are known as secondary indexes or as non-clustered indexes. Most relational database management systems permit dozens of secondary indexes for each table in the database, even if a smaller number of secondary indexes per table is much more common.
[0020] Key values in secondary indexes may or may not be unique. In other words, there may be many, even thousands, of rows with the same value in an indexed attribute. In those cases, compression is often useful (e.g., bitmaps instead of lists of row identifiers). However, the representation choices are independent from transactional concurrency control.
[0021] B-tree locking may be implemented to control concurrent searches and updates in b-trees. These locking techniques separate transactions in order to protect the b-tree contents during node access (e.g., insertions/deletions).
[0022] Record-level locking for ordered indexes (such as B-trees) has become ubiquitous over the last 20 years. There are multiple locking techniques, each offering a specific tradeoff between: (i) high concurrency and a fine granularity of locking during updates, (ii) efficient coarse locks during equality and range queries, (iii) run time efficiency with the fewest possible invocations of lock manager methods, and (iv) conceptual simplicity for efficient development and quality assurance.
[0023] These tradeoffs leave B-tree designers with difficult choices to make when deciding which locking technique to implement in a particular database index. That is, each locking technique suffers from limited concurrency and/or excessive overhead (i.e., too many locks and lock manager invocations).
[0024] The locking technique disclosed herein permits covering a distinct key value and all its index entries with a single lock acquisition, and at other times permits high concurrency among updates and transactions. The technique provides benefits of orthogonal key-range locking (e.g., complete separation of lock modes for key value and gap), and of key-value locking (e.g., a single lock is used for a distinct key value and all its instances). As such, the technique is referred to herein as "orthogonal key-value locking."
[0025] Orthogonal key-value locking may be implemented for B-trees, such as may be used in file systems (not shown), or in a database management system 150, as illustrated in Figure 1. In an example, the operations are embodied in program code 160 stored on a non-transient computer readable medium 170 and executed by a processor 180. The program code 160 executes the function of machine readable instructions. In an example operation, the program code 160 executes to divide a set of index entries for a specific key value into a fixed number of partitions; lock index entries within a same partition together; and lock index entries in different partitions independently.
[0026] In an example, orthogonal key-value locking results in a lock request which may cover: (i) a gap (open interval) between two (actual) key values, (ii) a key value with its entire list of (actual and possible) row identifiers (in non-unique secondary indexes), (iii) a specific pair of unique key value and row identifier, or (iv) a distinct key value and a fraction of all (actual and possible) row identifiers. Orthogonal key-value locking also offers the benefits of high concurrency (by using a small granularity of locking) and low overhead (by using a large granularity of locking).
[0027] Early work on locking in B-tree data structures focused on protecting the physical data structure. However, protection of the physical data structure is not the focus of this disclosure. Instead, orthogonal key-value locking is directed to locks in ordered indexes (e.g., a B-tree), the effect of lock scopes on the number of locks, and on the degree of concurrency enabled. These topics are orthogonal to the data structures and organization of the lock manager. For example, locks may be managed within a single traditional lock manager (and its hash table), or the locks may be attached to data pages in the buffer pool. Both data structure organizations are compatible with any of a variety of lock types.
[0028] It is noted that the orthogonal key-value locking technique described herein is based on a strict division of low-level concurrency control (also known as latching), thereby ensuring the physical integrity of the data structure and high-level transactional coordination protecting the logical contents of database tables and indexes. Moreover, orthogonal key-value locking assumes that locking is selected with a granularity of locking finer than pages or entire B-tree nodes.
[0029] Orthogonal key-value locking provides correctness in the form of two- phase locking. An example implementation of orthogonal key-value locking offers simplicity, to enable ready understanding, efficient development, and thorough quality assurance. An example implementation of orthogonal key-value locking also provides high concurrency, via a fine granularity of locking, as may be desirable for updates guided by searches in other indexes. An example implementation of orthogonal key-value locking provides run time efficiency, via a coarse granularity of locking. This granularity may be particularly desirable for large operations, such as range queries and equality queries in non-unique indexes, including the queries (read-only search) used to determine a set of rows to update or delete. This granularity may also be desirable for queries (index search) in index nested loops join operations. It is noted that all transactions benefit from a minimal number of lock manager invocations, as well as the fewest and earliest lock acquisition failures in cases of contention. [0030] Orthogonal key-value locking also enables a single lock that covers a key value and its entire list of row identifiers; concurrent locks on individual instances of non-unique keys; and independent locks for actual key values and the gaps therebetween.
[0031] Before continuing, it should be noted that the components shown in Figure 1 are provided only for purposes of illustration of an example execution environment, and are not intended to limit implementation of orthogonal key-value locking to any particular execution environment.
[0032] Orthogonal key-value locking divides a set of index entries for a specific key value into a fixed number of partitions (e.g., k partitions). Methods for lock acquisition are modified to consider not just one, but multiple lock modes in a single lock manager invocation. For example, by using two lock modes in orthogonal key range locking, it is possible to lock the key value without locking the gap between key values and vice versa. Orthogonal key-value locking extends this idea to k+1 lock modes. One of the lock modes is for the gap between two distinct key values. The other k lock modes pertain to the k partitions in the list of index entries. In a lock acquisition, it is possible to request the mode "no lock" for any of the k partitions or for the gap to the next key value. An example implementation of orthogonal key- value locking may be described in the context of a non-unique secondary index, e.g., as illustrated with reference to Figure 2.
[0033] Figure 2 is an example index record 200 within a non-unique secondary index. For purposes of this illustration, the non-unique secondary index is for an employee table stored in a database. Each index record contains a key value 210 (e.g., Gender), a count 220 of instances, and a corresponding set 230 of instances as a sorted list of bookmarks (e.g., primary key values). The list may be partitioned (e.g., k=4 partitions 331-334). In this example, the assignment from bookmark value to partition uses a simple "modulo 4" calculation. Index entries within the same partition (and for the same index key value) are locked together; index entries in different partitions can be locked independently.
[0034] It makes little difference whether the lock acquisition method lists two lock modes or k+1 lock modes, and whether these lock modes are listed individually (e.g., in an array) or new lock modes are defined as combinations of primitive lock modes. For example, lock modes may be defined such as "XN" or "NS" (pronounced "key exclusive, gap free" and "key free, gap shared"). They are formed from exclusive ("X"), shared ("S"), and no-lock ("N") modes on key and gap. Additional primitive lock modes (e.g., update and increment locks) can readily be implemented by orthogonal key-value locking.
[0035] Each lock request in orthogonal key-value locking lists k+1 modes. In order to lock only the gap between key values for an implementation with k=4 partitions per list, the requested mode might be NNNNS (i.e., no lock on any of the partitions plus a shared lock on the gap to the next key value). A request may lock any subset of these partitions, and typically locks either one partition or all partitions. For example, a query with an equality predicate on the non-unique index key, locks all partitions with a single method invocation (i.e., all actual and possible row identifiers) by requesting a lock in mode SSSSN for k=4 partitions. No lock is acquired on the gap following the key value. An insertion or a deletion locks only one partition for one key value in that index (e.g., NXNNN to lock partition 1 among k=4 partitions). In this way, multiple transactions may proceed concurrently with insertions and deletions for the same key value, each with its own partition locked, and without conflict.
[0036] Individual entries within a list are assigned to specific partitions using a hash function. The hash function is applied to the unique identifier of the index entry, excluding the key value. In a non-unique secondary index, the bookmarks (pointing to records in a primary data structure) serve as input into this hash function. By using locks on individual partitions, concurrent transactions may modify different entries at the same time. Yet when a query uses the entire set of entries for a search key, the entire set can be locked in a single invocation of the lock manager.
[0037] In an example, values for k may include k=1 (for unique indexes) and k=3 to k=100 (for non-unique indexes). The optimal value of k may depend at least in part on design parameters. Example design parameters may include, but are not limited to the desired degree of concurrency and/or the number of hardware and software threads in the system.
[0038] It is noted that lock requests with k+1 lock modes impose little extra complexity on the lock manager. Specifically, each request for a lock on a key value in an ordered (B-tree) index has to be tested in each of the k+1 components. On the other hand, in most implementations the cost of hash table lookup and of concurrency control (latching) in the lock manager likely dominates the cost of manipulating locks. [0039] With reference again to Figure 2, the locks may be identified (and located in the lock manager's hash table) by the index identifier and a distinct key value (e.g., Gender). A lock request is granted if the request is compatible with all locks already granted.
[0040] It is noted that the construction of lock modes are considered to be "compatible" if the lock modes are compatible on a component-by-component basis. For example, multiple exclusive locks on the same key value, but on different components or partitions, are considered compatible. If the lock manager speeds up testing for compatibility by maintaining a "cumulative lock mode" (e.g., summarizing all granted locks for a specific index and key value), then this mode is computed and maintained on a component-by-component basis.
[0041] Releasing a lock usually involves re-computing the cumulative lock mode by scanning the list of all granted locks. Alternatively, multiple reference counts may be maintained for each lock. Releasing a lock on a single partition, however, may be simpler because such a lock is often an exclusive lock. In such cases, the cumulative lock mode may simply set the appropriate partition to "no lock."
[0042] For small values of k (e.g., k=3, or even k=1 for unique indexes), checking each component is efficient. For large values of k (e.g., k=31 or k=100), and in particular if equality queries frequently lock all partitions at once, it may be more efficient to reserve one component for the entire key value (i.e., all partitions). In a sense, this change re-introduces a lock on the distinct key value. This change also has aspects of locking all actual and possible instances of a key value, and of separate lock modes for key value and gap.
[0043] With the additional lock on an entire key value, the number of components in the lock increases from k+1 to 1 +k+1 (=k+2) lock modes for each key value in the index. This additional lock component on the entire key value permits absolute lock modes (e.g., S, X), in addition to intention lock modes (e.g., IS, IX), and mixed modes (e.g., SIX). A lock request for an individual partition includes an intention lock on the entire key value. A lock request for all partitions at once locks the entire key value in an absolute mode, with no locks on individual partitions. A mixed lock mode combines these two aspects. That is, a SIX lock on a key value combines an S lock on all possible instances of that key value, and the right to acquire X locks on individual partitions (of bookmarks associated with the key value). [0044] Figure 3 is a table 300 illustrating construction of example lock modes. In this illustration, both lock modes are on all partitions of a distinct key value and on gaps between such key values. Column 310 lists the set of possible lock modes for the entire key value. The remaining columns 320a-c and rows 330a-f show lock modes formed by pairing the entire key value and the gap to the next key value. With this set of lock modes, the number of components in a lock request shrinks back from k+2 to k+1. Moreover, when a transaction needs to lock an entire key value (e.g., for a query with an equality predicate), lock acquisition and release is as fast as traditional lock managers.
[0045] Another optimization pertains to unique indexes locked with k=1 (i.e., a single partition). In this case, it is sufficient to rely entirely on the top portion of table 300, rendering the one and only per-partition lock mute. For unique indexes, the default is to use a single partition (k=1 ) to match the behavior of orthogonal key- range locking.
[0046] There is a locking technique specifically for multi-column unique keys. For example, it is possible to lock some prefix of the key (e.g., the key in a non-unique index), and to lock the remaining columns starting with the suffix of the unique key (like the bookmarks for non-unique indexes).
[0047] To illustrate, consider two entity types and their tables (e.g., called "student" and "course"), and the many-to-many relationship with its own table (e.g., called "enrollment"). An index on enrollment may be unique on the combination of student identifier and course number, which suggests orthogonal key-value locking with a single partition (k=1 ).
[0048] In this illustration, the locking technique locks index entries on the leading field (e.g., the student identifier), just as in a non-unique index. But here, queries may lock the entire enrollment information of one student with a single lock (locking all partitions in the student's list of course numbers). Updates may lock subsets of course numbers (e.g., by locking only a single partition within the list of course numbers). With the number of partitions appropriately selected, the concurrency among updates is practically the same as with locks on individual course numbers.
[0049] It is noted that this pattern of queries is fairly typical for many-to-many relationships. That is, a precise retrieval in one of the entity types (e.g., a specific student), an index that facilitates efficient navigation to the other entity type (e.g., on enrollment, with student identifier as leading key), and retrieval of all instances related to the initial instance.
[0050] Another example is also typical, listing all students for a single course using an index on course number of enrollment. This example has to be well supported in a database system, its indexes, repertoire of query execution plans, etc. B-tree indexes cluster relevant index entries due to their sort order; the proposed use of key-value locking supports these queries with the minimal number of locks and of lock requests.
[0051] In many uses, a table's primary key is also the search key in the table's primary index. In other uses (e.g., if the user-defined search key in the primary index is not unique), the table's bookmark adds a system-generated value to the user- defined search key. This value can be unique by itself within the table or even the entire database. For example, some systems attach a unique, high-resolution time stamp to each row upon creation (e.g., often referred to as a "database key"). In some systems, the system-generated value is unique only for a specific table and a specific value of the user-defined search key. If a specific value of the user-defined search key happens to be unique, this system-generated "uniquifier" might be null or even entirely absent.
[0052] If the user-defined search key for the primary index is unique, orthogonal key-value locking in the primary index does not offer any further benefit from partitioning, and operates similar orthogonal key-range locking. But if the user- defined search key for the primary index is not unique, then partitioning is appropriate for the system-generated key suffix. In other words, the user-defined key value is locked like a distinct key value in a non-unique secondary index and values of the system-defined key suffix are partitioned like bookmarks.
[0053] With this design in mind, when a unique bookmark value (including the suffix) is used to search the primary index (e.g., when fetching additional columns after searching a secondary index), a single partition is the appropriate granularity of locking in the primary index. When a specific value only for the user-defined search key is used to search the primary index (e.g., based on an equality predicate in a query), orthogonal key-value locking can, with a single invocation of the lock manager, lock all rows with that search key value.
[0054] When a search is not successful in a serializable transaction, phantom protection may be implemented in both the primary indexes and the secondary indexes. Orthogonal key-value locking (using a lock on the gap between distinct key values), protects such a transaction against insertion of phantom records without blocking any operations on the adjacent key values. In other words, insertion and deletion of rows with one of the adjacent user-defined key values remain unrestricted by the lock for phantom protection.
[0055] In summary, orthogonal key-value locking combines all the advantages of locking a distinct key value and all its index entries in a single lock manager invocation (i.e., low overhead for locking), and the advantages of locking individual index entries (i.e., high concurrency). Moreover, orthogonal key-value locking opens new opportunities for concurrency control in unique secondary indexes as well as primary indexes.
[0056] Implementations and advantages of orthogonal key-value locking may be better understood with reference to the following illustrative case studies. Figure 4 is an illustration of an example primary index 400 with index records for the following case studies. The comparisons discussed below with reference to each of the case studies are based on the example employee information shown in the index 400. It is noted that for purposes of this illustration, data is shown in table format, having a primary index on its primary key (unique, not null) and a non-unique secondary index on one of the non-unique columns. The rows are shown sorted to simplify the illustration. It is noted, however, that the index records would actually be maintained in a data structure (e.g., a B-tree). Note the skipped values in the sequence of values for EmpNo 410 (e.g., 2, 4, 7, 8, . . . ), and the duplicate values for FirstName 420 ("Joe" and "Joe"). Other employee data is shown for purposes of illustration, e.g., PostalCode 430 and Phone 440.
[0057] Figures 5a-f are illustrations showing locking of an example non-unique secondary index 500 for the example index records in the primary index 400 shown in Figure 4. In order to clarify the specific behaviors of the various locking schemes, the following case studies discuss the locks and the enabled concurrency of each locking technique for record-level locking in ordered (e.g., B-tree) indexes. These comparisons are qualitative in nature but nonetheless serve to highlight the differences among the locking techniques. It is noted that actual performance and scalability for a particular implementation may depend at least in part on the workload. [0058] Figure 5a corresponds to a first example case study illustrating "phantom protection." Phantom protection may be implemented for unsuccessful queries or searches for non-existing key values. In other words, this example illustrates techniques that lock the absence of key values.
[0059] This example assumes a serializable transaction, wherein a lock is used for phantom protection until end-of-transaction. Index records are maintained in a non-unique secondary index 500 on FirstName 520 (corresponding to FirstName 420 in the primary index 400 shown in Figure 4). In this example, distinct key values are paired with a list of bookmarks. Unique search keys in the primary index (400 in Figure 4) serve as bookmarks in the secondary index 500 shown in Figure 5. In this example, the bookmarks are also the primary key of the secondary index 500.
[0060] For purposes of illustration, an example query may be specified as "Select... where FirstName = 'Hank'." There is no entry for "Hank" and thus this query falls into the gap between key values for "Gary" and "Joe."
[0061] In this example, orthogonal key-value locking locks 550 the preceding distinct key value (i.e., FirstName 520 = Gary) in a mode that protects the gap (open interval) between Gary and Joe, but imposes no restrictions on those key values or their lists of EmpNo values 510. For example, another transaction may insert a new row with FirstName 520 = Gary or Joe and with any EmpNo value 510. Removal of rows with FirstName 520 = Joe has no restrictions. Deletion of rows with FirstName 520 = Gary and removal of their index entries means that the value Gary remain in the index, at least as a ghost record, until the need for phantom protection ends and the lock on key value Gary is released. As such, orthogonal key-value locking provides phantom protection with the least restrictive lock scope.
[0062] Figure 5b corresponds to a second example case study illustrating "successful equality queries." Successful equality queries are successful index searches for a single key value. Note that this case study may occur both in selection queries and in index nested loops join operations.
[0063] For purposes of this illustration, an example query predicate is specified as FirstName = Joe is deliberately chosen to focus on a key value with multiple instances in the indexed column. While the example shown in Figures 4 and 5b illustrate only two instances of FirstName = Joe, it is noted that there may be many (e.g., thousands or more) instances. Serializability means that other transactions must not add or remove instances satisfying this search predicate. [0064] In this example, orthogonal key-value locking acquires a single lock 551 covering all actual and possible index entries with FirstName Joe. Both adjacent key values remain unlocked (i.e., Gary and Larry). Moreover, even the gaps below and above FirstName Joe remain unlocked (e.g., other transactions are free to insert new index entries with FirstName Hank or Ken). As such, orthogonal key-value locking protects repeatability of successful equality queries with the fewest locks and precision.
[0065] Figure 5c corresponds to a third example case study illustrating "successful range queries." For purposes of illustration, a successful range query is specified as "...where FirstName between 'Gary' and 'Larry'." The query result includes all instances of Gary and Larry. Locking for serializability prevents insertion of additional instances of these key values. The issues to compare among the locking schemes are the number of lock requests and the precision of locking to no more than the subject key range.
[0066] In this example, orthogonal key-value locking acquires three locks 552a-c on the three distinct key values within the range of the query predicate (i.e., FirstName Gary, Joe, and Larry). For these, all actual and possible EmpNo values are locked. In addition, the two gaps between these key values are locked with appropriate lock modes in two of the lock requests (i.e., S mode for all partitions as well as the gap to the next higher key value). For the highest key value within the range of the query predicate (i.e., FirstName Larry), the lock request leaves the gap to the next higher key value free (i.e., no lock). Thus, these three locks 552a-c protect precisely the key range of the query predicate, leaving other transactions free to insert and delete index entries outside of the locks 552a-c. As such, orthogonal key-value locking acquires the fewest locks to protect the key range of the query predicate with precision.
[0067] Figure 5d corresponds to a fourth example case study illustrating "non- key updates." Non-key updates are updates that modify non-key columns. In the primary index of the running example, such an update modifies any column other than the primary key.
[0068] The secondary index 500 may be implemented with a small definition change for such an update to be possible, namely a column added to each index entry using the "include" statement supported by some systems. This puts the included column behind the bookmark, and therefore is irrelevant for the sort order and the index organization. A non-key update of this extended secondary index modifies such an included column. In the list representation of non-unique secondary indexes, each element in the list carries an instance of the included columns.
[0069] For purposes of illustration the secondary index of Figure 5d is extended by "include PostalCode" 530 and an update "update... set PostalCode = ... where EmpNo = 3".
[0070] In this example, orthogonal key-value locking locks the distinct key value Joe but only a single partition within the list of instances, as illustrated by lock 550d. For k=7, for example, this locks about 1 in 7 or 14% of those instances. For a short list as shown in Figure 5d, this is almost always a single instance. For a longer list, locking a partition may mean locking many instances even if only one instance is needed. Larger choices of k may alleviate hash conflicts and false sharing. The gaps above and below FirstName Joe remain unlocked.
[0071] Figure 5e corresponds to a fifth example case study illustrating "deletions." Deletions include deletion of rows in a table and/or deletions of index entries in a B-tree. Within an index, any update of a key column usually is modeled as a pair of deletion and insertion. For purposes of this illustration, an example deletion may be specified as "delete... where EmpNo = 3".
[0072] There are three cases to consider, which (with decreasing complexity) are
(i) removal of a distinct key value as part of deletion of its last remaining index entry,
(ii) deletion of a single index entry, with other index entries remaining for the same key value, and (iii) turning an existing, valid index entry into a ghost, also known as pseudo-deleted or invalid record.
[0073] In non-unique secondary indexes, one bit is needed for each index entry to indicate the ghost status. In a list representation, a bit is needed per element in the list. A key value becomes itself a ghost when its list is empty or when all remaining entries are ghosts. Ghosts can be removed outside of user transactions using system transactions or top-level actions. Ghost removal does not use transactional locks; latches coordinate threads and protect data structures; log records describe ghost removal if subsequent user transactions log their actions with references to slot numbers within pages rather than key values; and system transactions or top- level actions dispense with log flush upon commit.
[0074] Deletion via a ghost is a standard implementation method. Among other advantages, it ensures fast and reliable transaction rollback if a user transaction aborts or fails. Specifically, rollback of a user transaction does not use space allocation, for example a split of a B-tree node. Deletion via a ghost record uses the same locks as a non-key update, discussed earlier. The remainder of this subsection pertains to deletion without ghosts, whether or not this is the recommended implementation technique. If nothing else, it offers a level comparison, where the commit-duration X lock on the next key indicates that immediate record removal is assumed (i.e., record deletion without ghost records).
[0075] In this example, orthogonal key-value locking works best with ghost records, both during insertion and deletion. When forced to avoid ghost records, cases (i) and (ii) above acquire and hold locks 550e, but the next key value itself remains unlocked. As such, locking follows the rules for non-key updates.
[0076] Figure 5f corresponds to a sixth example case study illustrating "insertions." Insertions include inserting new rows into a table, and thus inserting new index entries in primary and secondary indexes. For purposes of illustration, a command may be specified as "insert... values (4, 'Joe', 54546, 4499)".
[0077] As insertion is the opposite of deletion, there are again three cases to consider, which (with decreasing complexity) are (i) insertion of an entirely new distinct key value, (ii) insertion of another instance of an existing key value, and (iii) insertion via ghost status, with the ghost record a remnant of a prior deletion or the result of a deliberately invoked system transaction or top-level action.
[0078] In this example, orthogonal key-value locking works best with a ghost record 550f left by a prior deletion or by a system transaction. Once the ghost record 550f is in place, orthogonal key-value locking locks the appropriate partition within the list of bookmarks associated with the distinct key value, exactly as described earlier for updates on non-key attributes. If insertion via a ghost record is deemed undesirable for some reason, insertion of a new distinct key value uses a lock on the gap below the next higher key value. This lock is for phantom protection and may be held only briefly. Thereafter, a lock on the new distinct key value and the appropriate partition are used, with no lock on the other partitions or on the gaps above or below the new distinct key value. If the insertion merely adds another instance of an existing key value, a lock on the appropriate partition suffices.
[0079] It can be seen by this example case study that for insertion via ghost status, locking follows the rules for non-key updates. Orthogonal locking techniques only hold a lock on the newly inserted index entry after testing for conflicting locks retained for phantom protection. Insertion via ghost status is the recommended implementation technique, and may also offer advantages over other techniques.
[0080] Before continuing, it should be noted that the examples described above are provided for purposes of illustration, and are not intended to be limiting. Other devices and/or device configurations may be utilized to carry out the operations described herein.
[0081] Figure 6 is a diagrammatic illustration of example operations which may implement orthogonal key-value locking. Operations 600 may be embodied as logic instructions on one or more computer-readable medium. When executed on a processor, the logic instructions cause a general purpose computing device to be programmed as a special-purpose machine that implements the described operations. In an example, the components and connections depicted in the figures may be used.
[0082] In an example, operation 610 includes selecting a lock 620. The lock may be configured to protect at least one of the following in a data structure: (621 ) a key value and an adjoining gap to a neighboring key value; (622) a gap between key values, without locking the key value; (623) a key value without locking any adjoining gaps; (624) a partition of possible instances of a key value; and (624) multiple partitions of possible instances of a key value. Based on selection of the lock, operation 630 includes maintaining concurrency among transactions on the data structure during access of the key value.
[0083] Further operations may also be implemented. For example, other operations include combining locks on partitions with locks on the gap between key values. Other example operations include setting a specific fixed set of partitions for each separate key value.
[0084] Still other example operations may include resetting the specific fixed set of partitions when the key value is unlocked. In an example, both the fixed set of partitions and partitioning rule may be reset. In another example, resetting is by range partitioning or hash partitioning.
[0085] The operations may also support multiple lock modes. The techniques provide support for k+1 lock modes in a single lock manager invocation. For example, one of the k+1 lock modes may be for a gap between two distinct key values. The other k lock modes may correspond to k partitions in a list of index entries. [0086] Further operations may include support for various index operations (e.g., those discussed above for the case studies illustrated in Figures 5a-f). Index operations may include, but are not limited to, search for non-existing key values; successful index search for a single key value; return successful range queries; modify non-key columns; delete rows in a table of index entries in a B-tree; and insert new rows into a table of index entries in primary and secondary indexes of a B-tree.
[0087] The operations shown and described herein are provided to illustrate example implementations. The operations are not limited to the operations and/or ordering shown.
[0088] It should be noted that application of orthogonal key-value locking is not limited in scope. Indeed, the orthogonal key-value locking techniques may be adapted to work with other locking techniques. By way of illustration, the index management (IM) technique locks logical rows of a table. By locking a bookmark, a transaction locks a record in the primary data structure, one record in each secondary index, plus a gap (to the next lower key value) in each secondary index. While very good for single-row insertion and deletion, the design suffers from too many locks in non-unique secondary indexes and from too much "footprint." For example, for phantom protection, one might want to lock a gap between two index entries in some secondary index. But the only lock scope available also locks the key value above the gap as well as the logical row it pertains to (i.e., including index entries in other indexes as well as gaps between index entries in other indexes).
[0089] Orthogonal key-value locking techniques offer: (i) the separation of key value and gap, (ii) partitioning within the list of bookmarks, (iii) a hierarchy consisting of the entire key value and the partitions, with lock modes that reduce the number of lock manager invocations. Adapting orthogonal key-value locking to the index management (IM) technique results in what is referred to herein as "orthogonal row locking."
[0090] Orthogonal row locking may be implemented by separating the logical row from the gaps between key values. In other words, a lock on a logical row has two modes, one for the row and its index entries and another one for the gaps in all the indexes. This permits, for example, one transaction locking a row merely for phantom protection in one of the indexes while another transaction modifies the row including some of its index entries, but excluding all index keys. In other words, this update may modify non-key fields including the ghost bits, thus logically deleting the row and all its index entries. Combined lock modes such as this may help reduce the number of lock manager invocations.
[0091] Orthogonal key-value locking may also be implemented with a list of locks for indexes and their gaps (i.e., a lock request for a row would specify the indexes in which gaps ought to be locked). This permits, for example, phantom protection with a locked gap in a single index only. In a way, this idea introduces key-range locking to the index management (IM) technique. If the set of possible indexes is very large, the size of this list can be limited by partitioning the set of possible indexes such that locks cover a partition rather than a specific index (and the gaps therein).
[0092] Orthogonal key-value locking may also be implemented to define a hierarchy of locking scopes. The bottom of the hierarchy may be the set of partitions, and the top of the hierarchy may be the row and all its index entries (i.e., all partitions). This adaptation enables lock acquisition with the scope and speed of the original index management (IM) technique, as well as locks with much reduced footprint with little additional overhead. For example, for phantom protection, a single lock request for a logical row can lock merely the gap between two index entries within a single index. Nonetheless, if desired, a single lock request can also lock a logical row and all its index entries, optionally with all gaps in all indexes or with some or none of the gaps. Locking all gaps emulates the original index management (IM) technique, locking none seems more appropriate for single-row updates, and locking gaps in a subset of the indexes only seems more appropriate for phantom protection.
[0093] Orthogonal key-value locking may also be implemented by applying partitioning to the index entries. For that, a lock for a logical row is augmented with a list of indexes in which index entries are locked. Thus, one transaction might update the row in the primary data structure as well as some affected indexes while another transaction still searches other indexes of the same table, including index entries for the row being updated (in indexes not affected by the update).
[0094] Another possible application combines lock modes with schemas and data, results in what is referred to herein as "orthogonal schema locking." Most database systems support an interpreted data definition language (DDL), and thus use type information stored as data in catalogs rather than compiled into all programs and scripts written in the system's data manipulation language (DML). For this schema information, Microsoft® SQL Server has special lock modes called "schema stability" and "schema modification." The former is used, for example, during query optimization; without any locks on data such as indexes, partitions, pages, records, or keys. The latter is used, for example, while dropping an index, thereby ensuring exclusive use of both schema and all data structures. All other lock modes permissible for a table (e.g., S, X, IS, IX) imply a schema stability lock. These are implied by a schema modification lock and incompatible with another transaction's schema modification lock. In order to simplify schema management and schema locking in databases, it is possible to avoid these special lock modes by mapping the concurrency control to combined locks, while avoiding any increase in lock manager invocations. In fact, it may be useful to divide either the schema or the data and to achieve a finer granularity of locking using combined locks.
[0095] In a design with combined lock modes, query optimization may hold an SN lock (pronounced 'schema shared, data free' here), whereas dropping an index uses an XX lock (pronounced 'schema exclusive, data exclusive' here). Intention lock modes are supported, at least for the data.
[0096] Another application of orthogonal key-value locking may be to divide the data lock into individual data structures (e.g., a table's set of indexes). In other words, rather than locking a table and all its data structures, a single lock request (with a list of lock modes) may lock individual indexes. Typically, a table has only a few indexes, but it can have dozens. With only a few indexes, each index may have its own entry in a list of lock modes provided with each lock request for a table. With a multitude of indexes, the set of indexes might be partitioned using a hash function to map index identifiers to positions in the lock list. The most immediate benefit may be to reduce the number of lock requests. Where a small transaction may lock first a table in IX mode and then two indexes each in IX mode, a transaction may instead lock the table in IX mode and, within the same lock manager request, acquire IX locks on two indexes (or actually, on two partitions within the set of indexes, with each partition probably containing only a single index).
[0097] Another application of orthogonal key-value locking may divide the schema lock into individual components of a schema (e.g., column set or individual columns, index set or individual indexes, integrity constraints, partitions of a table or indexes, histograms, etc.). Sets of schema entries with individual locks may improve concurrency among changes in the data definition or concurrency of schema changes (DDL) with query and transaction processing (DML). For example, one transaction's query execution may join two secondary indexes while another transaction adds a new index. In other words, online index creation no longer uses a "schema modify" lock for the entire table when it first registers a new index and when it eventually releases the completed index for unrestricted use. This application of orthogonal key-value locking may improve the online behavior of DDL with minimal impact on active users and applications.
[0098] Still other adaptations and applications of orthogonal key-value locking may also be implemented, as will be understood by those having ordinary skill in the database and/or file management arts, after becoming familiar with the teachings herein.
[0099] It is noted that the examples shown and described are provided for purposes of illustration and are not intended to be limiting. Still other examples are also contemplated.

Claims

1 . An orthogonal key-value locking method, comprising:
selecting a lock configured to protect at least one of the following in a data structure:
a key value and an adjoining gap to a neighboring key value, a gap between key values, without locking the key value, the key value without locking any adjoining gaps,
a partition of possible instances of the key value, and
multiple partitions of possible instances of the key value; and maintaining concurrency among transactions on the data structure during access of the key value.
2. The orthogonal key-value locking method of claim 1 , further comprising combining locks on partitions with locks on the gap between key values.
3. The orthogonal key-value locking method of claim 1 , further comprising setting a specific fixed set of partitions for each separate key value.
4. The orthogonal key-value locking method of claim 3, further comprising resetting the specific fixed set of partitions when the key value is unlocked.
5. The orthogonal key-value locking method of claim 4, wherein both the fixed set of partitions and partitioning rule are reset.
6. The orthogonal key-value locking method of claim 4, wherein resetting is by range partitioning or hash partitioning.
7. The orthogonal key-value locking method of claim 1 , further comprising with an additional lock on an entire key value, increasing a number of components in the lock from k+1 to 1 +k+1 (=k+2) lock modes for each key value in an index of the data structure.
8. The orthogonal key-value locking method of claim 1 , further comprising locking a non-unique prefix in a unique index.
9. An orthogonal key-value locking computer program product embodied as computer-readable instructions stored on a non-transient computer-readable media and executable to:
select a lock configured to protect at least one of the following in a data structure:
a key value and an adjoining gap to a neighboring key value, a gap between key values, without locking the key value, the key value without locking any adjoining gaps,
a partition of possible instances of the key value, and
multiple partitions of possible instances of the key value; and maintain concurrency among transactions on the data structure during access of the key value.
10. The orthogonal key-value locking computer program product of claim 9, further executable to search for non-existing key values.
1 1. The orthogonal key-value locking computer program product of claim 9, further executable to successful index search a single key value.
12. The orthogonal key-value locking computer program product of claim 9, further executable to return successful range queries.
13. The orthogonal key-value locking computer program product of claim 9, further executable to modify non-key columns.
14. The orthogonal key-value locking computer program product of claim 9, further executable to delete rows in a table of index entries in a B-tree.
15. The orthogonal key-value locking computer program product of claim 9, further executable to insert new rows into a table of index entries in primary and secondary indexes of a B-tree.
PCT/US2014/014296 2014-01-31 2014-01-31 Orthogonal key-value locking WO2015116208A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2014/014296 WO2015116208A1 (en) 2014-01-31 2014-01-31 Orthogonal key-value locking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/014296 WO2015116208A1 (en) 2014-01-31 2014-01-31 Orthogonal key-value locking

Publications (1)

Publication Number Publication Date
WO2015116208A1 true WO2015116208A1 (en) 2015-08-06

Family

ID=53757587

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/014296 WO2015116208A1 (en) 2014-01-31 2014-01-31 Orthogonal key-value locking

Country Status (1)

Country Link
WO (1) WO2015116208A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10061777B1 (en) * 2017-04-04 2018-08-28 International Business Machines Corporation Testing of lock managers in computing environments
WO2020010325A1 (en) * 2018-07-05 2020-01-09 Fungible, Inc. Data processing unit with key value store
CN114238704A (en) * 2022-02-21 2022-03-25 北京金山云网络技术有限公司 Tree index splitting method, data access method and device and electronic equipment
US11520769B1 (en) 2021-06-25 2022-12-06 International Business Machines Corporation Block level lock on data table

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689602B1 (en) * 2005-07-20 2010-03-30 Bakbone Software, Inc. Method of creating hierarchical indices for a distributed object system
US20100281013A1 (en) * 2009-04-30 2010-11-04 Hewlett-Packard Development Company, L.P. Adaptive merging in database indexes
US20100306222A1 (en) * 2009-05-29 2010-12-02 Microsoft Corporation Cache-friendly b-tree accelerator
US20110320403A1 (en) * 2010-06-28 2011-12-29 O'krafka Brian W Approaches for the replication of write sets
US20110320496A1 (en) * 2010-06-28 2011-12-29 Microsoft Corporation Concurrency control for confluent trees

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689602B1 (en) * 2005-07-20 2010-03-30 Bakbone Software, Inc. Method of creating hierarchical indices for a distributed object system
US20100281013A1 (en) * 2009-04-30 2010-11-04 Hewlett-Packard Development Company, L.P. Adaptive merging in database indexes
US20100306222A1 (en) * 2009-05-29 2010-12-02 Microsoft Corporation Cache-friendly b-tree accelerator
US20110320403A1 (en) * 2010-06-28 2011-12-29 O'krafka Brian W Approaches for the replication of write sets
US20110320496A1 (en) * 2010-06-28 2011-12-29 Microsoft Corporation Concurrency control for confluent trees

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10061777B1 (en) * 2017-04-04 2018-08-28 International Business Machines Corporation Testing of lock managers in computing environments
US10614040B2 (en) 2017-04-04 2020-04-07 International Business Machines Corporation Testing of lock managers in computing environments
US10614039B2 (en) 2017-04-04 2020-04-07 International Business Machines Corporation Testing of lock managers in computing environments
WO2020010325A1 (en) * 2018-07-05 2020-01-09 Fungible, Inc. Data processing unit with key value store
US11258796B2 (en) 2018-07-05 2022-02-22 Fungible, Inc. Data processing unit with key value store
US11520769B1 (en) 2021-06-25 2022-12-06 International Business Machines Corporation Block level lock on data table
CN114238704A (en) * 2022-02-21 2022-03-25 北京金山云网络技术有限公司 Tree index splitting method, data access method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US7577658B2 (en) Hierarchical locking in B-tree indexes
US5625815A (en) Relational database system and method with high data availability during table data restructuring
US9155320B2 (en) Prefix-based leaf node storage for database system
US8543613B2 (en) Generating a checkpoint image for use with an in-memory database
US20130262403A1 (en) Unique attribute constraints for versioned database objects
US20150142733A1 (en) System and method for efficient management of big data in a database using streaming tables
US9576038B1 (en) Consistent query of local indexes
US10754854B2 (en) Consistent query of local indexes
Graefe et al. Concurrency control for adaptive indexing
Graefe et al. Transactional support for adaptive indexing
US7941451B1 (en) Dynamic preconditioning of a B+ tree
Taniar et al. A taxonomy of indexing schemes for parallel database systems
US20170116255A1 (en) System and method for use of lock-less techniques with a multidimensional database
US20090187599A1 (en) Generating identity values in a multi-host database management system
WO2015116208A1 (en) Orthogonal key-value locking
Graefe Hierarchical locking in B-tree indexes
US10970275B2 (en) System and methods for providing a data store having linked differential data structures
US7321898B1 (en) Locking mechanism for materialized views in a database system
US7873672B2 (en) Efficient query processing of DML SQL statements
Graefe et al. Orthogonal key-value locking
US11899640B2 (en) Method of building and appending data structures in a multi-host environment
US20230177034A1 (en) Method for grafting a scion onto an understock data structure in a multi-host environment
US11940972B2 (en) Execution of operations on partitioned tables
Korotkevitch Expert SQL server in-memory OLTP
Peter Supporting the Join Operation in a NoSQL System-Mastering the internals of Cassandra

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14880458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14880458

Country of ref document: EP

Kind code of ref document: A1