WO2019233590A1 - Concurrent datastructure and device to access it - Google Patents

Concurrent datastructure and device to access it Download PDF

Info

Publication number
WO2019233590A1
WO2019233590A1 PCT/EP2018/065111 EP2018065111W WO2019233590A1 WO 2019233590 A1 WO2019233590 A1 WO 2019233590A1 EP 2018065111 W EP2018065111 W EP 2018065111W WO 2019233590 A1 WO2019233590 A1 WO 2019233590A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
thread
delegation
delegation queue
queue
Prior art date
Application number
PCT/EP2018/065111
Other languages
French (fr)
Inventor
Oren AMOR
Liran MISHALI
Shay Goikhman
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2018/065111 priority Critical patent/WO2019233590A1/en
Publication of WO2019233590A1 publication Critical patent/WO2019233590A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms

Definitions

  • the present invention relates to concurrent data structures, i.e. to data structures for concurrent access by e.g. a thread, core or device. Accordingly, the present invention proposes such a concurrent data structure and a method for its creation. Further, the present invention proposes a device for accessing a data structure and a corresponding method.
  • Delegation is a message-passing technique, where an access to a data stmcture is mediated through one or more helper cores mnning threads that are allowed to manipulate the data stmcture directly on behalf of other cores.
  • delegation promises to offload performance critical problematic data stmcture accesses, thus minimizing data and meta-data invalidation traffic, and to improve throughput utilizing single thread locality. So far, though, delegation in the design of concurrent data stmcture has had only limited applicability, because of various deficiencies.
  • NUMA Non-Uniform Memory Access
  • FC Flat Combining
  • Queue Delegation (QD) locking enables a dynamic binding of helper threads, yet, partitions the data structure’s code into critical sections, passing critical section function pointers to helper thread. Thus, combining requests is not possible.
  • QD Queue Delegation
  • the FlTM-assisted Combining Framework (F1CF) approach uses Flardware Transactional Memory (F1TM) to mn both single operation as well as combiner threads across the data stmcture, targeting data stmcture types that have some low- conflict operations, and as well, can benefit from combining in other operations.
  • F1CF Flardware Transactional Memory
  • the basic idea of F1CF is to try running the operations as hardware transaction, and if unsuccessful after a number of retries, delegate the operations to dynamically chosen combiners, that again try mnning in F1TM, and after a number of retries take a global lock on the data stmcture.
  • the premises are a given serial data stmcture algorithm with delegation assigned statically per the whole data stmcture and negating parallelism inside the data stmcture.
  • the present invention aims to improve the conventional approaches.
  • the present invention has the objective to provide an improved concurrent data stmcture and an improved device for accessing such a data structure.
  • a finer grain parallelism among helper threads and an increased single thread efficiency of helper threads should be achieved.
  • the invention aims to provide the possibility of multiple concurrent delegations in progress.
  • the objective of the present invention is achieved by the solution provided in the enclosed independent claims.
  • Advantageous implementations of the present invention are further defined in the dependent claims.
  • the present invention proposes providing the data stmcture with multiple delegation queues, one at each of multiple data stmcture nodes, and a device for accessing these delegation queues with one or more threads.
  • a first aspect of the present invention provides a data stmcture for concurrent access, the data stmcture comprising a plurality of nodes, wherein the data stmcture is provided at each of multiple nodes with a delegation queue pointer and a delegation queue.
  • the data stmcture Since the data stmcture has multiple delegation queues at multiple nodes, it allows multiple concurrent delegations in progress, and thus a finer grain parallelism among helper threads. As a consequence, single thread efficiency of helper threads is improved.
  • a delegation queue pointer of a node points to the delegation queue of the node, if the node is being updated.
  • a delegation queue pointer is set to NULL, if the related node is currently not accessed.
  • a thread accessing the node in the state knows that he can become a helper thread.
  • a head part of each delegation queue is accessible at the same time by only one thread of a device suitable for accessing the data structure.
  • a thread accessing the head part of the delegation queue is the helper thread.
  • Other threats can write operation requests into the tail of the delegation queue.
  • the data structure is provided at each root node or internal node with one or more pointers pointing to one or more child nodes, a child node being either an internal node or a leaf node.
  • the data structure is provided at each of multiple child nodes with a delegation queue pointer and a delegation queue, wherein particularly a delegation queue pointer of a child node points to the delegation queue of the child node, if the child node is being updated.
  • the data structure is recursively defined with delegation queues at each of multiple nodes, thereby increasing thread parallelism.
  • a second aspect of the present invention provides a device for accessing a data stmcture according to the first aspect or any of its implementation forms with at least one thread.
  • the device is configured to run a plurality of threads for accessing a data structure according to the first aspect or any of its implementation forms.
  • the device is configured to, when accessing with a thread of the plurality of threads a node of the data structure that is currently not accessed by another thread of the plurality of threads and/or by threads run by another device, atomically update the delegation queue pointer of the node with the thread to point to the delegation queue of the node.
  • the device is able to become the helper thread, and since it atomically updates the delegation queue pointer, further threats or devices assessing the same node will write their operation requests into the delegation queue. Since the device is able to do this per thread for one node of the data stmcture, the thread parallelism is improved.
  • the thread becomes a helper thread, if it updates the delegation queue pointer, and is thus allowed to access the node to perform its update operation on the node.
  • the thread becomes a helper thread, if it updates the delegation queue pointer, and is thus allowed to access the delegation queue and access the node on behalf of at least one other thread run by the device and/or a thread run by another device.
  • the device is configured to run the thread to: read an operation request of at least one other thread of the plurality of threads run by the device and/or threads mn by another device from the delegation queue of the node, to which the delegation queue pointer of the node points, and access the node based on the at least one operation request.
  • the device is configured to run the thread to: read multiple operation requests of other threads of the plurality of threads run by the device and/or of threads run by another device from the delegation queue, combine the multiple operation requests into at least one combined operation, carry out the combined operation on a replication of the node, atomically install the replication of the node into the data structure, and close the delegation queue.
  • the device is configured to run the thread to replace the node with the replication of the node on which the combined operation has been carried out.
  • the device is configured to, when an access request or a combined operation of access requests from the delegation queue involves another node of the data structure, particularly a child node of the node of a data structure according to the respective implementation forms of the first aspect, determine whether another thread of the plurality of threads run by device or by another device currently accesses the other node, and delegate the operation request or the combined operation to the other node.
  • the device is configured to, when accessing a node currently accessed by another thread of the plurality of threads run by the device or by another device, with the thread, generate an operation request, and provide the operation request into the delegation queue, to which the delegation queue pointer of the node points.
  • the device is configured to, with the thread, proceed further, if no result of the operation request is required, and/or provide an operation result placeholder and an operation completed flag, and wait for the operation completed flag to indicate a completed operation request, if a result of the operation request is required.
  • a third aspect of the present invention provides a method for generating a data stmcture for concurrent access, the method comprising providing the data stmcture with a plurality of nodes, and providing the data stmcture at each of multiple nodes with a delegation queue pointer and a delegation queue, wherein particularly a delegation queue pointer of a node points to the delegation queue of the node, if the node is being updated.
  • the method of the third aspect achieves all advantages and effects of the data stmcture of the first aspect and its implementation forms.
  • a fourth aspect of the present invention provides a method for accessing a data stmcture according to the first aspect or any of its implementation forms with at least one thread, the method comprising, when accessing with a thread a node of the data stmcture that is currently not accessed by another thread, atomically updating the delegation queue pointer of the node to point to the delegation queue of the node, reading at least one operation request of another thread and/or device from the delegation queue of the node, and updating and/or accessing the node according to the combined operation, comprising the request of the helper thread and of any other requests from the delegation queue.
  • the method of the fourth aspect achieves all advantages and facts of the device of the second aspect and its implementation forms.
  • FIG. 1 shows a data structure for concurrent access according to an embodiment of the present invention.
  • FIG. 2 shows a device for accessing the data stmcture according to an embodiment of the present invention.
  • FIG. 3 shows a data stmcture according to an embodiment of the present invention with delegation per node.
  • FIG. 4 shows a principle of the data stmcture according to an embodiment of the present invention compared with a conventional data stmcture.
  • FIG. 5 shows datatypes of an enhanced Adaptive Radix Tree (ART) according to an embodiment of the present invention.
  • FIG. 6 shows delegation based recursive ART insert processing according to an embodiment of the present invention.
  • FIG. 7 shows a general delegation processing
  • FIG. 8 shows a method according to an embodiment of the present invention for generating a data stmcture.
  • FIG. 9 shows a method according to an embodiment of the present invention for accessing a data structure.
  • FIG. 1 shows a concurrent data stmcture 100 according to an embodiment of the present invention.
  • the data stmcture 100 may be implemented on a conventional storage device, and may be controlled by at least one processor.
  • the data stmcture 100 comprises in particular a plurality of nodes 101. At multiple of these nodes 101, i.e. at a subset of the plurality of nodes 101 or at all nodes 101, the data stmcture 100 is provided with a delegation queue pointer 102 and the delegation queue 103. Accordingly, the data stmcture comprises a plurality of delegation queue pointers 102 and a plurality of delegation queues 103. Preferably, all nodes 101 in the data stmcture may be provided with a delegation queue pointer 102 and/or a delegation queue 103. Initially, the delegation queue pointers 102 can be set to NULL, indicating that the related node 101 is currently not accessed.
  • FIG. 2 shows a device 200 for accessing a data structure, particularly a concurrent data structure 100 is shown in FIG. 1.
  • the device 200 is able to access the data structure 100 with at least one thread 201 or core.
  • the device 200 may comprise processing circuitry like a processor, in order to implement the threads 201.
  • the device 200 is particularly configured to, when accessing with a thread 201 a node 101 that is currently not accessed by another thread 201 of the device 200 and/or by another device 200, to atomically update the delegation queue pointer 102 of the node 101 with the thread 201 to point to the delegation queue 103 of the node 101.
  • the delegation queue pointer 102 Before the delegation queue pointer 102 is updated by the device 200 with the thread 201, it typically is set to NULL indicating that the related node 101 is currently not accessed.
  • the data structure 100 and device 200 of the invention provide a methodology to dynamically enable multiple concurrent delegations in fine grain by means of associating a delegation queue with a node 101.
  • the methodology is explained in more detail with respect to FIG. 3.
  • FIG. 3 shows a concurrent data stmcture 100, which builds on the data structure 100 shown in FIG. 1. Same elements in FIG. 1 and FIG. 3 are labelled with the same reference signs and function likewise.
  • a node’s 101 data stmcture is augmented with a delegation queue pointer 102 to a delegation queue 103 type, initialized to NULL.
  • the first thread 201 to update the node 101 opens a thread-local queue (delegation queue 103), and atomically updates the node’s 101 queue pointer 102 to point to the thread-local queue 103 and thus becomes a helper thread 300 of the node 101.
  • the first thread 201 or the helper thread 200 thereby may become the delegation queue 103.
  • the helper thread 300 may combine the batch of requests from the delegation queue 103, particularly taken from the head 303 of the queue 103, into one or more combined operations on a replica of the node 101, may atomically install the replica, and may then close the queue 103, clean up, and continue. The starvation of the helper thread 300 may be avoided by dynamically limiting the number of delegated operations. Different unrelated nodes 101 with disjoint subtrees underneath are thus served concurrently by their respective helper threads 300.
  • the first helper thread 300 can delegate the operation along with related arguments to the second node 101, thus increasing concurrency in the data structure 100.
  • the node’s 101 helper thread 300 may utilize the semantics of a hierarchical data stmcture 100, in which lower-level nodes 101 are operated on underneath the node 101, and pass through the futures of the delegating threads 201, the operated nodes 101 of which the delegating threads 201 will become the helper threads 300.
  • the dynamic nature of concurrency can be structured by the semantics and the load.
  • helper thread 300 • Enhanced fine grain parallelism among helper threads 300 and increased single thread efficiency of the helper threads 300.
  • the actual update to the data stmcture’ s node 101 is performed by the helper thread 300 combining the batch of delegated requests, thus further optimizing the series of updates (e.g. by exploiting the semantics) while amortizing the accumulated update cost. Disjoint operation of the helper threads 300 is enabled.
  • the helper thread 300 can also delegate work further to other helper threads 300, and conversely, combine requests across several nodes
  • Synchronization overhead across the updating threads 201 is kept low because of localized contention, enabling high performance and throughput.
  • the updating threads 201 can avoid waiting; after the delegation of their operation requests into the delegation queue 103, the threads 201 can continue performing useful work.
  • Delegation queue 103 supports affinity/locality considerations, e.g., enabling grouping of the requests coming from the same Non-Uniform Memory Access (NUMA) node into a sub-batch. Consequently, the combined update can reflect the desired NUMA policy, e.g. distribute the result of the series of updates across the NUMA-nodes.
  • NUMA Non-Uniform Memory Access
  • FIG. 4 shows the principle of the data structure 100 of the present invention compared to a conventional data structure.
  • the delegation approach for the conventional data stmcture is depicted on the left side of FIG. 4.
  • the data structure 100 according to an embodiment of the present invention approach is depicted.
  • the data stmcture 100 is used to navigate and distribute the threads 201 to their corresponding target gray areas.
  • Each gray area is associated with a delegation queue 103 and a helper thread 300.
  • the threads 201 deposit their request in the respective delegation queue 103 (per node), and the helper threads 300 carry them out.
  • This delegation approach may be achieved by:
  • helper thread 300 Dynamically setting up a delegation queue 103 within the node 101 by its first updater that become the node’s helper thread 300. Any subsequent contention on the node 101 is exploited through combining the delegated requests. Thus the hotspots and skews are localized. The efficiency of the helper thread 300 is maximized through combining that utilizes the semantics and locality at the node. A delegating thread 201 is free to continue after the successful delegation, thereby enhancing concurrency. • Enabling concurrent operation of helper threads 300 on disjoint portions of the data structure 100 by means of simple lock-free techniques. All other threads 201 can traverse the data stmcture 100 and get a consistent global view of it.
  • FIG. 5 shows an enhanced ART’s node 101 structure consisting of array of keys 502, each taking one byte, and a corresponding array of child pointers 500 and a prefix 502 string.
  • the prefix 502 is used to store the common prefix of all the child nodes 501 for path compression.
  • the node 101 is not constrained in size, and includes a delegation queue pointer 102, shown as the leftmost pointer pointing here to the delegation queue 103 with three elements. Each element of the queue 103 holds operation opcode and its arguments.
  • the child pointers 500 point either to sub-trees via an internal node (inode) as child node 501, or a leaf node shown in the picture as a rightmost child node 501.
  • the internal nodes enable concurrent replacement of connected parent 101 and child nodes 501.
  • the leaf node is augmented with a delegation queue pointer 102 as well, shown in the picture as pointing to NUFF.
  • FIG. 6 displays exemplarily a high-level recursive insert algorithm, modified to accommodate the delegation scheme according to the present invention. Insertions are explained for the purpose of exemplifying the general case.
  • a leaf key’s character at position depth navigates tree traversal to the next node 101 in the tree.
  • a new intermediate node instead of the found leaf is inserted having the found leaf and the argument leaf as children with the proper adjustment of the common prefix for lazy evaluation.
  • the found leaf includes a delegation queue 103, which is opened by the first thread (the helper thread 300), and into which any subsequent contending threads 201 can offload their inserts and continue.
  • the helper thread 300 processes all the inserts in the leaf’s queue to build a subtree and before existing, it would replace the found leaf with the root of the subtree. The details of delegation are discussed in the sequel.
  • Step 3 an internal node 101 is encountered, and its prefix is compared to the next substring in the argument key. If there is a match, then Step 6 is executed, looking for the next child node 501 to recurse to in Step 7. If the child is nil, the current node is expanded to record the leaf.
  • Step 8 employs delegation, multiple threads 201 at this step will delegate their inserts into the current node 101 to the helper thread 300.
  • the helper thread 300 at step 8 thus can size the required expansion of the current node 101 exactly and significantly, and can exploit the locality while inserting leafs into the node 101 residing in its cache.
  • the first thread (becoming the helper thread 300) that traverses the path through Step 3, Step 4, and Step 5 opens the node’s delegation queue 103, and when the subsequent threads 201 follow this path, each with a different prefix, they enqueue their insert operation and the argument on the queue.
  • the details of the delegation processing from the side of the helper thread 300 and the delegating thread 201 are given later.
  • the delegating thread 201 can detach after the enqueuing, while the helper thread 300 builds a subtree corresponding to the delegated leaves, and atomically updates the tree with this sub-tree.
  • FIG. 7 shows the details of the delegation procedure for both the helper thread 300 and the delegating threads 201, as a common template to all the‘Delegate’ steps in Figure 3.
  • the node’s delegation queue 103 assignment is tested. If it is already opened, the delegating thread 201 attempts (at Step 3) to enqueue its operation and operands. If the queue meanwhile closes, the thread 201 retries the enqueue until it succeeds or becomes the helper thread 300.
  • the delegating thread 201 detaches, i.e., continues on. In other scenarios the thread 201 can pass a“future”, i.e. a location for returned result and a flag, to be set by the helper thread 300 when the thread’s 201 request processing is completed.
  • Step 1 when the thread 201 succeeds swapping the node’s queue pointer 102 with its own queue 103 it becomes the helper thread 300 and moves to Step 2. It then allocates a new node 101 with predetermined capacity, replicates the node 101, and performs its own operation on the new node 100. Recall, that for Steps 2 and 5 in FIG. 6, the helper thread 300 attaches the node 101 and the leaf as children 501 of the new node 101, adjusting the new node’s and the current node’s prefixes. If the queue at Step 4 is empty, or the helper thread 300 already has processed a fixed number of batched requests, it finishes at Step 5, by closing the queue 103 and atomically updating the tree.
  • a predetermined fixed number of requests that the helper thread 300 can process is maintained to avoid the helper’s 300 starvation.
  • the helper thread 300 reads all the requests in the queue 103, which constitutes an array of leafs to be inserted, and can utilize the knowledge of the semantics of the algorithm for combining. E.g., it can order prefixes of leafs from the queue 103 in ascending order, such that it would need to build the subtree traversing it just once. Additional optimization are made possible concerning memory copying and using Single Instmction Multiple Date (SIMD) instructions. Compiler loop optimization are effective at this combining step, as the essentially the same computation is performed in the combining loop.
  • the helper can adjust any of the allocated storage at the new node if more than needed was allocated in the first place. At last, it returns to Step 4 to check for the termination conditions.
  • the inserting threads 201 follow invariant child pointers 500 until they find the node 101 or leaf to enqueue on. That is, when a node 101 has a child pointer 500, and has a delegation queue 103 with the requests that the helper thread 300 processes, the incoming thread 201 can follow this child pointer 500, as the delegated requests address disjoint paths in the tree.
  • lookups that return a slightly stalled data are tolerated.
  • This kind of lookups can be trivially supported, when the required key is being processed by the helper thread 300 at the time the lookup thread 201 encounters the node 101, and does not find the key. Otherwise, if serialized lookups and inserts are required, the lookup operation is delegated to the helper thread 300, and its result is returned through the associated future.
  • the helper thread 300 may need to match all the inserts enqueued prior to the lookup with the lookup key. If the partial subtree corresponding to these inserts has already been built, the matching is the same as the key lookup in the tree. Alternatively, the lookup key can be compared to each of the prior insert keys.
  • the helper thread 300 design thus has several design choices and tradeoffs to consider in the build of the subtree while answering the serialized lookup requests.
  • Serialized scans are supported using similar principle: a sub-range scan predicate and the corresponding future offset for each delegated request are computed.
  • a range scan can take advantage of concurrency of multiple helper threads 300 computing the scan result.
  • FIG. 8 shows a method 800 for generating a data stmcture 100 for concurrent access according to an embodiment of the present invention.
  • the method 800 generates a data structure 100 as shown in FIG. 1.
  • the method 800 comprises a step 801 of providing 801 the data stmcture 100 with a plurality of nodes 101. Further, a step 802 of providing the data stmcture 100 at each of multiple nodes 101 with a delegation queue pointer 102 and a delegation queue 103, wherein particularly a delegation queue pointer 102 of a node 101 points to the delegation queue 102 of the node 101, if the node 101 is being updated.
  • FIG. 9 shows a method 900 for accessing a data stmcture 100, particularly as shown in FIG. 1, according to an embodiment of the present invention.
  • the method 900 may be carried out by a device 200 as shown in FIG. 2.
  • the method 900 comprises, when accessing with a thread 201 a node 101 of the data stmcture 100 that is currently not accessed by another thread 201, a step 901 of atomically updating the delegation queue pointer 102 of the node 101 to point to the delegation queue 103 of the node 101. Further, a step 902 of reading at least one operation request of another thread and/or device from the delegation queue of the node. Further, a step 903 of updating and/or accessing the node 101 according to the combined operation, comprising the request of the helper thread 300 and of any other requests from the delegation queue 103.

Abstract

The present invention relates to the field of concurrent data structures. In particular, the present invention presents a data structure for concurrent access, and a device for assessing a data structure. The data structure comprises a plurality of nodes, wherein the data structure is provided at each of multiple nodes with a delegation queue pointer and a delegation queue. The device can access the data structure with at least one thread, and is configured to, when accessing with a thread a node that is currently not accessed by another thread of the device and/or another device, atomically update the delegation queue pointer of the node with the thread to point to the delegation queue of the node.

Description

CONCURRENT DATASTRUCTURE AND DEVICE TO ACCESS IT
TECHNICAL FIELD The present invention relates to concurrent data structures, i.e. to data structures for concurrent access by e.g. a thread, core or device. Accordingly, the present invention proposes such a concurrent data structure and a method for its creation. Further, the present invention proposes a device for accessing a data structure and a corresponding method.
BACKGROUND
The design of efficient concurrent data structures is of a paramount significance. For example, performance and scalability of important applications such as databases, and thus the effective use of emerging multicore architectures, on which these databases run, is crucially dependent on the scalability of their concurrent index data structures.
The technical challenge in the design of a concurrent data structure is relatively high, and there are only few known efficient parallel implementations of serial data stmctures, most of which are either implemented through fine-grain locking or lock-free techniques. Delegation is a relatively new approach that advocates transforming a serial data stmcture into a concurrent one.
Delegation is a message-passing technique, where an access to a data stmcture is mediated through one or more helper cores mnning threads that are allowed to manipulate the data stmcture directly on behalf of other cores. Thus, delegation promises to offload performance critical problematic data stmcture accesses, thus minimizing data and meta-data invalidation traffic, and to improve throughput utilizing single thread locality. So far, though, delegation in the design of concurrent data stmcture has had only limited applicability, because of various deficiencies.
Several conventional approaches exist, but all have their individual deficiencies: • In one conventional approach, a hash table is partitioned across Non-Uniform Memory Access (NUMA) nodes statically assigning helper threads responsible for disjoint portions of a data structure. The resulting data structure’s performance is inferior to spinlocks. Its performance suffers mostly in stressed dynamic circumstances e.g., skewed growth, hot spots, load imbalance and starvation.
• A Flat Combining (FC) approach proposes a dynamic binding of helper threads and advocates combining, thus facilitating better utilization of semantics and locality. Yet, the architecture might suffer from helper starvation. Moreover, FC, proposes a single global FC lock protecting the whole of the data structure, thereby utilizing effectively only single thread performance and negating parallelism.
• Queue Delegation (QD) locking enables a dynamic binding of helper threads, yet, partitions the data structure’s code into critical sections, passing critical section function pointers to helper thread. Thus, combining requests is not possible. In the concurrent pairing heap constmction, a single global QD lock is used to essentially protect data structure with QD locks instead of mutexes.
• The FlTM-assisted Combining Framework (F1CF) approach uses Flardware Transactional Memory (F1TM) to mn both single operation as well as combiner threads across the data stmcture, targeting data stmcture types that have some low- conflict operations, and as well, can benefit from combining in other operations. The basic idea of F1CF is to try running the operations as hardware transaction, and if unsuccessful after a number of retries, delegate the operations to dynamically chosen combiners, that again try mnning in F1TM, and after a number of retries take a global lock on the data stmcture.
Flowever, in all these approaches on delegation, the premises are a given serial data stmcture algorithm with delegation assigned statically per the whole data stmcture and negating parallelism inside the data stmcture.
In other words, no multiple concurrent delegations in progress, spread over the data in finer grain, are proposed. SUMMARY
In view of the above-mentioned challenges, the present invention aims to improve the conventional approaches. The present invention has the objective to provide an improved concurrent data stmcture and an improved device for accessing such a data structure. In particular, a finer grain parallelism among helper threads and an increased single thread efficiency of helper threads should be achieved. To this end, the invention aims to provide the possibility of multiple concurrent delegations in progress. The objective of the present invention is achieved by the solution provided in the enclosed independent claims. Advantageous implementations of the present invention are further defined in the dependent claims.
In particular, the present invention proposes providing the data stmcture with multiple delegation queues, one at each of multiple data stmcture nodes, and a device for accessing these delegation queues with one or more threads.
A first aspect of the present invention provides a data stmcture for concurrent access, the data stmcture comprising a plurality of nodes, wherein the data stmcture is provided at each of multiple nodes with a delegation queue pointer and a delegation queue.
Since the data stmcture has multiple delegation queues at multiple nodes, it allows multiple concurrent delegations in progress, and thus a finer grain parallelism among helper threads. As a consequence, single thread efficiency of helper threads is improved.
In an implementation form of the first aspect, a delegation queue pointer of a node points to the delegation queue of the node, if the node is being updated.
Thus, if the node is being updated, another thread accessing the node can provide its requests into the delegation queue.
In a further implementation form of the first aspect, a delegation queue pointer is set to NULL, if the related node is currently not accessed. Thus, a thread accessing the node in the state, knows that he can become a helper thread.
In a further implementation form of the first aspect, a head part of each delegation queue is accessible at the same time by only one thread of a device suitable for accessing the data structure.
In particular, a thread accessing the head part of the delegation queue is the helper thread. Other threats can write operation requests into the tail of the delegation queue.
In a further implementation form of the first aspect, the data structure is provided at each root node or internal node with one or more pointers pointing to one or more child nodes, a child node being either an internal node or a leaf node.
In a further implementation form of the first aspect, the data structure is provided at each of multiple child nodes with a delegation queue pointer and a delegation queue, wherein particularly a delegation queue pointer of a child node points to the delegation queue of the child node, if the child node is being updated.
Thus, the data structure is recursively defined with delegation queues at each of multiple nodes, thereby increasing thread parallelism.
A second aspect of the present invention provides a device for accessing a data stmcture according to the first aspect or any of its implementation forms with at least one thread. In particular, the device is configured to run a plurality of threads for accessing a data structure according to the first aspect or any of its implementation forms. The device is configured to, when accessing with a thread of the plurality of threads a node of the data structure that is currently not accessed by another thread of the plurality of threads and/or by threads run by another device, atomically update the delegation queue pointer of the node with the thread to point to the delegation queue of the node.
Thus, the device is able to become the helper thread, and since it atomically updates the delegation queue pointer, further threats or devices assessing the same node will write their operation requests into the delegation queue. Since the device is able to do this per thread for one node of the data stmcture, the thread parallelism is improved. In a further implementation form of the second aspect, the thread becomes a helper thread, if it updates the delegation queue pointer, and is thus allowed to access the node to perform its update operation on the node.
In a further implementation form of the second aspect, the thread becomes a helper thread, if it updates the delegation queue pointer, and is thus allowed to access the delegation queue and access the node on behalf of at least one other thread run by the device and/or a thread run by another device.
In a further implementation form of the second aspect, the device is configured to run the thread to: read an operation request of at least one other thread of the plurality of threads run by the device and/or threads mn by another device from the delegation queue of the node, to which the delegation queue pointer of the node points, and access the node based on the at least one operation request.
Accordingly, it can carry out the operation requests of the other threads, which have been written into the delegation queue. Safety and consistency of the node are guaranteed since a single thread performs a direct access on the node.
In a further implementation form of the second aspect, the device is configured to run the thread to: read multiple operation requests of other threads of the plurality of threads run by the device and/or of threads run by another device from the delegation queue, combine the multiple operation requests into at least one combined operation, carry out the combined operation on a replication of the node, atomically install the replication of the node into the data structure, and close the delegation queue.
Thus, an efficient operation on the data structure is achieved.
In a further implementation form of the second aspect, the device is configured to run the thread to replace the node with the replication of the node on which the combined operation has been carried out.
In a further implementation form of the second aspect, the device is configured to, when an access request or a combined operation of access requests from the delegation queue involves another node of the data structure, particularly a child node of the node of a data structure according to the respective implementation forms of the first aspect, determine whether another thread of the plurality of threads run by device or by another device currently accesses the other node, and delegate the operation request or the combined operation to the other node.
Thus, the efficiency in accessing the data stmcture in a parallel manner is improved.
In a further implementation form of the second aspect, the device is configured to, when accessing a node currently accessed by another thread of the plurality of threads run by the device or by another device, with the thread, generate an operation request, and provide the operation request into the delegation queue, to which the delegation queue pointer of the node points.
In a further implementation form of the second aspect, the device is configured to, with the thread, proceed further, if no result of the operation request is required, and/or provide an operation result placeholder and an operation completed flag, and wait for the operation completed flag to indicate a completed operation request, if a result of the operation request is required.
A third aspect of the present invention provides a method for generating a data stmcture for concurrent access, the method comprising providing the data stmcture with a plurality of nodes, and providing the data stmcture at each of multiple nodes with a delegation queue pointer and a delegation queue, wherein particularly a delegation queue pointer of a node points to the delegation queue of the node, if the node is being updated.
The method of the third aspect achieves all advantages and effects of the data stmcture of the first aspect and its implementation forms.
A fourth aspect of the present invention provides a method for accessing a data stmcture according to the first aspect or any of its implementation forms with at least one thread, the method comprising, when accessing with a thread a node of the data stmcture that is currently not accessed by another thread, atomically updating the delegation queue pointer of the node to point to the delegation queue of the node, reading at least one operation request of another thread and/or device from the delegation queue of the node, and updating and/or accessing the node according to the combined operation, comprising the request of the helper thread and of any other requests from the delegation queue. The method of the fourth aspect achieves all advantages and facts of the device of the second aspect and its implementation forms.
It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:
FIG. 1 shows a data structure for concurrent access according to an embodiment of the present invention.
FIG. 2 shows a device for accessing the data stmcture according to an embodiment of the present invention.
FIG. 3 shows a data stmcture according to an embodiment of the present invention with delegation per node. FIG. 4 shows a principle of the data stmcture according to an embodiment of the present invention compared with a conventional data stmcture.
FIG. 5 shows datatypes of an enhanced Adaptive Radix Tree (ART) according to an embodiment of the present invention.
FIG. 6 shows delegation based recursive ART insert processing according to an embodiment of the present invention.
FIG. 7 shows a general delegation processing.
FIG. 8 shows a method according to an embodiment of the present invention for generating a data stmcture. FIG. 9 shows a method according to an embodiment of the present invention for accessing a data structure.
DETAILED DESCRIPTION OF THE EMBODIMENTS FIG. 1 shows a concurrent data stmcture 100 according to an embodiment of the present invention. The data stmcture 100 may be implemented on a conventional storage device, and may be controlled by at least one processor.
The data stmcture 100 comprises in particular a plurality of nodes 101. At multiple of these nodes 101, i.e. at a subset of the plurality of nodes 101 or at all nodes 101, the data stmcture 100 is provided with a delegation queue pointer 102 and the delegation queue 103. Accordingly, the data stmcture comprises a plurality of delegation queue pointers 102 and a plurality of delegation queues 103. Preferably, all nodes 101 in the data stmcture may be provided with a delegation queue pointer 102 and/or a delegation queue 103. Initially, the delegation queue pointers 102 can be set to NULL, indicating that the related node 101 is currently not accessed. In case that a node 101 is accessed (e.g. for updating the node 101), the related delegation queue pointer can be updated to point to the accessed node 101, indicating that the node is currently accessed. FIG. 2 shows a device 200 for accessing a data structure, particularly a concurrent data structure 100 is shown in FIG. 1. In particular, the device 200 is able to access the data structure 100 with at least one thread 201 or core. To this end, the device 200 may comprise processing circuitry like a processor, in order to implement the threads 201.
The device 200 is particularly configured to, when accessing with a thread 201 a node 101 that is currently not accessed by another thread 201 of the device 200 and/or by another device 200, to atomically update the delegation queue pointer 102 of the node 101 with the thread 201 to point to the delegation queue 103 of the node 101. Before the delegation queue pointer 102 is updated by the device 200 with the thread 201, it typically is set to NULL indicating that the related node 101 is currently not accessed.
The data structure 100 and device 200 of the invention provide a methodology to dynamically enable multiple concurrent delegations in fine grain by means of associating a delegation queue with a node 101. The methodology is explained in more detail with respect to FIG. 3. FIG. 3 shows a concurrent data stmcture 100, which builds on the data structure 100 shown in FIG. 1. Same elements in FIG. 1 and FIG. 3 are labelled with the same reference signs and function likewise.
• A node’s 101 data stmcture is augmented with a delegation queue pointer 102 to a delegation queue 103 type, initialized to NULL. The first thread 201 to update the node 101 opens a thread-local queue (delegation queue 103), and atomically updates the node’s 101 queue pointer 102 to point to the thread-local queue 103 and thus becomes a helper thread 300 of the node 101. In a specific implementation example, the first thread 201 or the helper thread 200 thereby may become the delegation queue 103.
• Threads 201 contending at the node 101 delegate their operation and parameters into the queue 103, particularly a tail 302 of the queue 103, and proceed further if no result of the operation is needed. Otherwise they pass a“future”, consisting of result placeholder and a flag, and can wait continuously if desired, for the result on the“future”. • The helper thread 300 may combine the batch of requests from the delegation queue 103, particularly taken from the head 303 of the queue 103, into one or more combined operations on a replica of the node 101, may atomically install the replica, and may then close the queue 103, clean up, and continue. The starvation of the helper thread 300 may be avoided by dynamically limiting the number of delegated operations. Different unrelated nodes 101 with disjoint subtrees underneath are thus served concurrently by their respective helper threads 300.
• If a request coming from the queue 103 or the resulting combined operation involves some other node 101, possibly hierarchically underneath or above the first node 101, that happens to be served by yet another helper thread 300, the first helper thread 300 can delegate the operation along with related arguments to the second node 101, thus increasing concurrency in the data structure 100.
• Conversely, the node’s 101 helper thread 300 may utilize the semantics of a hierarchical data stmcture 100, in which lower-level nodes 101 are operated on underneath the node 101, and pass through the futures of the delegating threads 201, the operated nodes 101 of which the delegating threads 201 will become the helper threads 300. Thereby, the dynamic nature of concurrency can be structured by the semantics and the load.
The major advantages of the present invention implemented by the data structure 100 and device 200, respectively, according to embodiments of the present invention are as follows:
• Finer distribution of contention on the data stmcture nodes 101.
• Enhanced fine grain parallelism among helper threads 300 and increased single thread efficiency of the helper threads 300. The actual update to the data stmcture’ s node 101 is performed by the helper thread 300 combining the batch of delegated requests, thus further optimizing the series of updates (e.g. by exploiting the semantics) while amortizing the accumulated update cost. Disjoint operation of the helper threads 300 is enabled. The helper thread 300 can also delegate work further to other helper threads 300, and conversely, combine requests across several nodes
101.
• Synchronization overhead across the updating threads 201 is kept low because of localized contention, enabling high performance and throughput. The updating threads 201 can avoid waiting; after the delegation of their operation requests into the delegation queue 103, the threads 201 can continue performing useful work.
• Delegation queue 103 supports affinity/locality considerations, e.g., enabling grouping of the requests coming from the same Non-Uniform Memory Access (NUMA) node into a sub-batch. Consequently, the combined update can reflect the desired NUMA policy, e.g. distribute the result of the series of updates across the NUMA-nodes.
FIG. 4 shows the principle of the data structure 100 of the present invention compared to a conventional data structure. The delegation approach for the conventional data stmcture is depicted on the left side of FIG. 4. There is one global delegation queue, on which all threads contend and enqueue their request targeting the gray area symbolizing the possible range of needed data in the data structure. On the right side of FIG. 4, the data structure 100 according to an embodiment of the present invention approach is depicted. The data stmcture 100 is used to navigate and distribute the threads 201 to their corresponding target gray areas. Each gray area is associated with a delegation queue 103 and a helper thread 300. The threads 201 deposit their request in the respective delegation queue 103 (per node), and the helper threads 300 carry them out. This delegation approach may be achieved by:
• Dynamically setting up a delegation queue 103 within the node 101 by its first updater that become the node’s helper thread 300. Any subsequent contention on the node 101 is exploited through combining the delegated requests. Thus the hotspots and skews are localized. The efficiency of the helper thread 300 is maximized through combining that utilizes the semantics and locality at the node. A delegating thread 201 is free to continue after the successful delegation, thereby enhancing concurrency. • Enabling concurrent operation of helper threads 300 on disjoint portions of the data structure 100 by means of simple lock-free techniques. All other threads 201 can traverse the data stmcture 100 and get a consistent global view of it.
• Further sub-grouping the delegation requests coming from same affinity domain enables affinity-related combining and policy managing. As well, maintaining a delegation queue 103 sinking requests from several data structure nodes 101 enables efficient combining of processing across these nodes 101.
As an example of the invention, a delegation-based concurrent ART index is now presented in the purpose to exhibit the above-described methodology. The original ART’s data structures is modified as explained in the following and as shown in FIG. 5.
FIG. 5 shows an enhanced ART’s node 101 structure consisting of array of keys 502, each taking one byte, and a corresponding array of child pointers 500 and a prefix 502 string. The prefix 502 is used to store the common prefix of all the child nodes 501 for path compression. The node 101 is not constrained in size, and includes a delegation queue pointer 102, shown as the leftmost pointer pointing here to the delegation queue 103 with three elements. Each element of the queue 103 holds operation opcode and its arguments. The child pointers 500 point either to sub-trees via an internal node (inode) as child node 501, or a leaf node shown in the picture as a rightmost child node 501. The internal nodes enable concurrent replacement of connected parent 101 and child nodes 501. The leaf node is augmented with a delegation queue pointer 102 as well, shown in the picture as pointing to NUFF.
FIG. 6 displays exemplarily a high-level recursive insert algorithm, modified to accommodate the delegation scheme according to the present invention. Insertions are explained for the purpose of exemplifying the general case.
At each step, a leaf key’s character at position depth navigates tree traversal to the next node 101 in the tree. The recursion stops at the leaf node, see Steps 1-2. In an original ART algorithm, a new intermediate node instead of the found leaf is inserted having the found leaf and the argument leaf as children with the proper adjustment of the common prefix for lazy evaluation. In the present invention, the found leaf includes a delegation queue 103, which is opened by the first thread (the helper thread 300), and into which any subsequent contending threads 201 can offload their inserts and continue. The helper thread 300 processes all the inserts in the leaf’s queue to build a subtree and before existing, it would replace the found leaf with the root of the subtree. The details of delegation are discussed in the sequel.
At Step 3 an internal node 101 is encountered, and its prefix is compared to the next substring in the argument key. If there is a match, then Step 6 is executed, looking for the next child node 501 to recurse to in Step 7. If the child is nil, the current node is expanded to record the leaf.
As the operation at Step 8 employs delegation, multiple threads 201 at this step will delegate their inserts into the current node 101 to the helper thread 300. The helper thread 300 at step 8 thus can size the required expansion of the current node 101 exactly and significantly, and can exploit the locality while inserting leafs into the node 101 residing in its cache.
If, however, the prefix of the inspected node at Step 4 is longer than the next substring in the key, a new intermediate node should be inserted above the inspected one. As the algorithm proceeds, the first thread (becoming the helper thread 300) that traverses the path through Step 3, Step 4, and Step 5 opens the node’s delegation queue 103, and when the subsequent threads 201 follow this path, each with a different prefix, they enqueue their insert operation and the argument on the queue. The details of the delegation processing from the side of the helper thread 300 and the delegating thread 201 are given later. The delegating thread 201 can detach after the enqueuing, while the helper thread 300 builds a subtree corresponding to the delegated leaves, and atomically updates the tree with this sub-tree.
FIG. 7 shows the details of the delegation procedure for both the helper thread 300 and the delegating threads 201, as a common template to all the‘Delegate’ steps in Figure 3. At Step 1 the node’s delegation queue 103 assignment is tested. If it is already opened, the delegating thread 201 attempts (at Step 3) to enqueue its operation and operands. If the queue meanwhile closes, the thread 201 retries the enqueue until it succeeds or becomes the helper thread 300. In FIG. 7, after the delegating thread 201 enqueues it detaches, i.e., continues on. In other scenarios the thread 201 can pass a“future”, i.e. a location for returned result and a flag, to be set by the helper thread 300 when the thread’s 201 request processing is completed.
At Step 1, when the thread 201 succeeds swapping the node’s queue pointer 102 with its own queue 103 it becomes the helper thread 300 and moves to Step 2. It then allocates a new node 101 with predetermined capacity, replicates the node 101, and performs its own operation on the new node 100. Recall, that for Steps 2 and 5 in FIG. 6, the helper thread 300 attaches the node 101 and the leaf as children 501 of the new node 101, adjusting the new node’s and the current node’s prefixes. If the queue at Step 4 is empty, or the helper thread 300 already has processed a fixed number of batched requests, it finishes at Step 5, by closing the queue 103 and atomically updating the tree. A predetermined fixed number of requests that the helper thread 300 can process is maintained to avoid the helper’s 300 starvation. At step 6, the helper thread 300 reads all the requests in the queue 103, which constitutes an array of leafs to be inserted, and can utilize the knowledge of the semantics of the algorithm for combining. E.g., it can order prefixes of leafs from the queue 103 in ascending order, such that it would need to build the subtree traversing it just once. Additional optimization are made possible concerning memory copying and using Single Instmction Multiple Date (SIMD) instructions. Compiler loop optimization are effective at this combining step, as the essentially the same computation is performed in the combining loop. At the end of combining the helper can adjust any of the allocated storage at the new node if more than needed was allocated in the first place. At last, it returns to Step 4 to check for the termination conditions.
Please note as the new node is replaced atomically, the inserting threads 201 follow invariant child pointers 500 until they find the node 101 or leaf to enqueue on. That is, when a node 101 has a child pointer 500, and has a delegation queue 103 with the requests that the helper thread 300 processes, the incoming thread 201 can follow this child pointer 500, as the delegated requests address disjoint paths in the tree.
In some applications, lookups that return a slightly stalled data are tolerated. This kind of lookups can be trivially supported, when the required key is being processed by the helper thread 300 at the time the lookup thread 201 encounters the node 101, and does not find the key. Otherwise, if serialized lookups and inserts are required, the lookup operation is delegated to the helper thread 300, and its result is returned through the associated future. The helper thread 300 may need to match all the inserts enqueued prior to the lookup with the lookup key. If the partial subtree corresponding to these inserts has already been built, the matching is the same as the key lookup in the tree. Alternatively, the lookup key can be compared to each of the prior insert keys. The helper thread 300 design thus has several design choices and tradeoffs to consider in the build of the subtree while answering the serialized lookup requests. Serialized scans are supported using similar principle: a sub-range scan predicate and the corresponding future offset for each delegated request are computed. Thus a range scan can take advantage of concurrency of multiple helper threads 300 computing the scan result.
FIG. 8 shows a method 800 for generating a data stmcture 100 for concurrent access according to an embodiment of the present invention. In particular, the method 800 generates a data structure 100 as shown in FIG. 1. The method 800 comprises a step 801 of providing 801 the data stmcture 100 with a plurality of nodes 101. Further, a step 802 of providing the data stmcture 100 at each of multiple nodes 101 with a delegation queue pointer 102 and a delegation queue 103, wherein particularly a delegation queue pointer 102 of a node 101 points to the delegation queue 102 of the node 101, if the node 101 is being updated.
FIG. 9 shows a method 900 for accessing a data stmcture 100, particularly as shown in FIG. 1, according to an embodiment of the present invention. The method 900 may be carried out by a device 200 as shown in FIG. 2. The method 900 comprises, when accessing with a thread 201 a node 101 of the data stmcture 100 that is currently not accessed by another thread 201, a step 901 of atomically updating the delegation queue pointer 102 of the node 101 to point to the delegation queue 103 of the node 101. Further, a step 902 of reading at least one operation request of another thread and/or device from the delegation queue of the node. Further, a step 903 of updating and/or accessing the node 101 according to the combined operation, comprising the request of the helper thread 300 and of any other requests from the delegation queue 103.
The present invention has been described in conjunction with various embodiments as examples as well as implementations. Flowever, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word“comprising” does not exclude other elements or steps and the indefinite article“a” or“an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

1. Data structure (100) for concurrent access, the data structure (100) comprising a plurality of nodes (101), wherein
the data structure (100) is provided at each of multiple nodes (101) with a delegation queue pointer (102) and a delegation queue (103).
2. Data structure (100) according to claim 1, wherein
a delegation queue pointer (102) of a node (101) points to the delegation queue (103) of the node (101), if the node is being updated.
3. Data structure (100) according to claim 1 or 2, wherein
a delegation queue pointer (102) is set to NULL, if the related node (101) is currently not accessed.
4. Data structure (100) according to one of claims 1 to 3, wherein
a head part (303) of each delegation queue (103) is accessible at the same time by only one thread (300) of a device (200) suitable for accessing the data stmcture (100).
5. Data structure (100) according to one of claims 1 to 4, wherein
the data structure (100) is provided at each root node or internal node with one or more pointers (500) pointing to one or more child nodes (501), a child node (501) being either an internal node or a leaf node.
6. Data structure (100) according to claim 5, wherein
the data structure (100) is provided at each of multiple child nodes (500) with a delegation queue pointer (102) and a delegation queue (103),
wherein a delegation queue pointer (102) of a child node (500) points to the delegation queue (103) of the child node (500), if the child node (500) is being updated.
7. Device (200):
configured to run a plurality of threads (201) for accessing a data stmcture (100) according to one of the claims 1 to 6, the device (200) being configured to, when accessing with a thread (201) of the plurality of threads a node (101) of the data stmcture (100) that is currently not accessed by another thread (201’) of the plurality of threads and/or by threats run by another device (200),
atomically update the delegation queue pointer (102) of the node (101) with the thread (201) to point to the delegation queue (103) of the node (101).
8. Device (200) according to claim 7, wherein
the thread (201) becomes a helper thread (300), if it updates the delegation queue pointer (102), and is thus allowed to access the node (101) to perform its update operation on the node (101).
9. Device (200) according to claim 7 or 8, wherein
the thread (201) becomes a helper thread (300), if it updates the delegation queue pointer (102), and is thus allowed to access the delegation queue (103) and access the node (101) on behalf of at least one other thread (201) of the device (200) and/or a thread run by another device (200).
10. Device (200) according to claim 9, configured to mn the thread (201) to:
read an operation request of at least one other thread (201) of the plurality of threads and/or of threads mn by another device (200) from the delegation queue (103) of the node (101), to which the delegation queue pointer (102) of the node (101) points, and access the node (101) based on the at least one operation request.
11. Device (200) according to claim 10, configured to mn the thread (201) to:
read multiple operation requests of other threads (20 G) of the plurality of threads and/or of threads mn by another device (200) from the delegation queue (103),
combine the multiple operation requests into at least one combined operation, carry out the combined operation on a replication of the node (101),
atomically install the replication of the node (101) into the data stmcture (100), and
close the delegation queue (103).
12. Device (200) according to claim 11, configured to mn the thread (201) to
replace the node (101) with the replication of the node on which the combined operation has been carried out.
13. Device (200) according to claim 10 or 11, configured to, when an access request or a combined operation of access requests from the delegation queue (103) involves another node (101) of the data stmcture, particularly a child node (500) of the node (101) of a data structure (100) according to claim 5 or 6,
determine whether another thread (20 G) of the plurality of threads run by the device 200 or by another device (200) currently accesses the other node (101), and
delegate the operation request or the combined operation to the other node (101).
14. Device (200) according to one of claims 7 to 13, configured to, when accessing a node (101) currently accessed by another thread (20 G) of the plurality of threads run by the device (200) or by another device (200), with the thread (201),
generate an operation request, and
provide the operation request into the delegation queue (103), to which the delegation queue pointer (102) of the node (101) points.
15. Device (200) according to claim 14, configured to, with the thread (201),
proceed further, if no result of the operation request is required, and/or provide an operation result placeholder and an operation completed flag, and wait for the operation completed flag to indicate a completed operation request, if a result of the operation request is required.
16. Method (800) for generating a data structure (100) for concurrent access, the method (800) comprising
providing (801) the data structure (100) with a plurality of nodes (101), and providing (802) the data stmcture (100) at each of multiple nodes (101) with a delegation queue pointer (102) and a delegation queue (103),
wherein particularly a delegation queue pointer (102) of a node (101) points to the delegation queue (102) of the node (101), if the node (101) is being updated.
17. Method (900) for accessing a data stmcture (100) according to one of the claims 1 to 6 with at least one thread (201), the method (900) comprising, when accessing with a thread (201) a node (101) of the data stmcture (100) that is currently not accessed by another thread (20 G), atomically updating (901) the delegation queue pointer (102) of the node (101) to point to the delegation queue (103) of the node (101),
reading (902) at least one operation request of the another thread (201’) from the delegation queue (103) of the node (101), and
updating and/or accessing (903) the node (101) according to the combined operation, comprising the request of the helper thread (300) and of any other requests from the delegation queue (103).
PCT/EP2018/065111 2018-06-08 2018-06-08 Concurrent datastructure and device to access it WO2019233590A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/065111 WO2019233590A1 (en) 2018-06-08 2018-06-08 Concurrent datastructure and device to access it

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/065111 WO2019233590A1 (en) 2018-06-08 2018-06-08 Concurrent datastructure and device to access it

Publications (1)

Publication Number Publication Date
WO2019233590A1 true WO2019233590A1 (en) 2019-12-12

Family

ID=62620841

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/065111 WO2019233590A1 (en) 2018-06-08 2018-06-08 Concurrent datastructure and device to access it

Country Status (1)

Country Link
WO (1) WO2019233590A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060090044A1 (en) * 2004-10-21 2006-04-27 International Business Machines Corporation Memory controller and method for optimized read/modify/write performance
US20060230411A1 (en) * 2005-04-12 2006-10-12 Microsoft Corporation Resource accessing with locking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060090044A1 (en) * 2004-10-21 2006-04-27 International Business Machines Corporation Memory controller and method for optimized read/modify/write performance
US20060230411A1 (en) * 2005-04-12 2006-10-12 Microsoft Corporation Resource accessing with locking

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAVID KLAFTENEGGER ET AL: "Queue Delegation Locking", 13 February 2014 (2014-02-13), pages 1 - 10, XP055559077, Retrieved from the Internet <URL:http://arachne.it.uu.se/research/group/languages/software/qd_lock_lib/paper.pdf> [retrieved on 20190219] *
DAVID KLAFTENEGGER ET AL: "Queue Delegation Locking", IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS., vol. 29, no. 3, 1 March 2018 (2018-03-01), US, pages 687 - 704, XP055559072, ISSN: 1045-9219, DOI: 10.1109/TPDS.2017.2767046 *
Y. OYAMA ET AL: "EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY", PROC. OF THE INTERNATIONAL WORKSHOP ON PARALLEL AND DISTRIBUTED COMPUTING FOR SYMBOLIC AND IRREGULAR APPLICATIONS, 1 January 1999 (1999-01-01), pages 182 - 204, XP055559066, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159.183&rep=rep1&type=pdf> [retrieved on 20190219] *

Similar Documents

Publication Publication Date Title
EP3304298B1 (en) Controlling atomic updates of indexes using hardware transactional memory
JP5137971B2 (en) Method and system for achieving both locking fairness and locking performance with spin lock
US9454560B2 (en) Cache-conscious concurrency control scheme for database systems
US6360220B1 (en) Lock-free methods and systems for accessing and storing information in an indexed computer data structure having modifiable entries
US7490214B2 (en) Relocating data from a source page to a target page by marking transaction table entries valid or invalid based on mappings to virtual pages in kernel virtual memory address space
US11073995B2 (en) Implementing scalable memory allocation using identifiers that return a succinct pointer representation
Zhang et al. A lock-free priority queue design based on multi-dimensional linked lists
KR100714766B1 (en) Method of prefetching non-contiguous data structures
US20070288718A1 (en) Relocating page tables
Peters et al. Fast in‐place, comparison‐based sorting with CUDA: a study with bitonic sort
TW201227301A (en) Real address accessing in a coprocessor executing on behalf of an unprivileged process
WO2002101557A1 (en) Cache-conscious concurrency control scheme for database systems
Jiang et al. Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation
Xie et al. A comprehensive performance evaluation of modern in-memory indices
US8914601B1 (en) Systems and methods for a fast interconnect table
Li et al. Phast: Hierarchical concurrent log-free skip list for persistent memory
Pandey et al. IcebergHT: High performance PMEM hash tables through stability and low associativity
Chen et al. Concurrent hash tables on multicore machines: Comparison, evaluation and implications
Wang et al. Circ-Tree: A B+-Tree variant with circular design for persistent memory
Boyd-Wickizer Optimizing communication bottlenecks in multiprocessor operating system kernels
Mohapatra Authentication of sub-numa clustering effect on intel skylake for memory latency and bandwidth
US20220269675A1 (en) Hash-based data structure
Tinoco et al. {EnigMap}:{External-Memory} Oblivious Map for Secure Enclaves
WO2019233590A1 (en) Concurrent datastructure and device to access it
Schmeißer et al. B2-Tree: Page-Based String Indexing in Concurrent Environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18731387

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18731387

Country of ref document: EP

Kind code of ref document: A1