CN109407979A - Multithreading persistence B+ data tree structure design and implementation methods - Google Patents
Multithreading persistence B+ data tree structure design and implementation methods Download PDFInfo
- Publication number
- CN109407979A CN109407979A CN201811129623.3A CN201811129623A CN109407979A CN 109407979 A CN109407979 A CN 109407979A CN 201811129623 A CN201811129623 A CN 201811129623A CN 109407979 A CN109407979 A CN 109407979A
- Authority
- CN
- China
- Prior art keywords
- node
- tree
- persistence
- chained list
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of multithreading persistence B+ data tree structure design and implementation methods, method includes: one layer of shadow leaf node based on chain structure of introducing in preset B+ tree;The leaf node based on chained list is stored in NVM by the data layout strategy based on mixing main memory, to generate the tree layer based on structure of arrays, and the other parts of index data structure are stored in DRAM, to generate the link layer based on list structure, so that the design by the volatibility tree construction and persistence list structure of layering avoids the persistence expense for balancing and sorting;It designs Embedded fine granularity lock mechanism and optimism writes mechanism, with the con current control for being respectively used between read-write operation and writing between write operation.This method uses the mixing main memory data structure of Nonvolatile memory and volatile ram, increases the concurrency of data retrieval and realizes lasting data storage, solves the lock overhead issues of amplification, and accelerate the system recovery procedure of data structure.
Description
Technical field
The present invention relates to non-volatile main memory technical field of memory, in particular to a kind of multithreading persistence B+ tree data knot
Structure design and implementation methods.
Background technique
Non-volatile main memory (Non-Volatile Memory, NVM) is a kind of novel memory storage medium, and having can
Information is non-volatile, storage density is high, does not need dynamic refresh and the advantages such as quiescent dissipation is low after byte addressing, power down.But
Come with some shortcomings place, limited to write number and write the disadvantages of power consumption is higher such as readwrite performance asymmetry.Its appearance is to depositing
Storage field brings new huge opportunities and challenges, caused industrial circle and academia to isomery mixing memory hierarchy framework and its
The research boom of related system software.Nonvolatile memory is to Computer Systems Organization, system software, software library and applies journey
Sequence has many new enlightenments.Nonvolatile memory equipment can be with dynamic random access memory (Dynamic Random
Access Memory, DRAM) equipment collectively forms mixing main memory, wherein and provisional data are stored in DRAM in application program
On, the data that needs are persistently stored are stored on NVM.The appearance of non-volatile main memory promotes researcher to set about design based on master
The storage system deposited, including file system and Database Systems.Index structure is the key modules for constructing storage system, it is very
The performance of storage system is determined in big degree.In the storage system based on non-volatile main memory, index structure is needed while being protected
Efficient consistency and multithreading scalability are demonstrate,proved, this proposes new challenge to the designer of index structure.
Traditional index data structure such as B+ tree, sequence and balancing run occupy very big in entire tree operations expense
Ratio, more serious, persistence postpones that the time of tree operations holder lock, persistence in the related technology can be further increased
B+ tree faces serious performance issue under multithreading scene.Under multithreading scene, as non-volatile main memory persistence postpones
Increase, the time of tree operations holder lock approximately linearly increases, and the performance decline of B+ tree is extremely serious.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, an object of the present invention is to provide a kind of designs of multithreading persistence B+ data tree structure and realization side
Method, this method use the mixing main memory data structure of Nonvolatile memory and volatile ram, increase the concurrency of data retrieval
With realization lasting data storage, solve the lock overhead issues of amplification, and accelerate the system recovery procedure of data structure.
In order to achieve the above objectives, the embodiment of the present invention propose a kind of design of multithreading persistence B+ data tree structure with it is real
Existing method, comprising the following steps: one layer of shadow leaf node based on chain structure is introduced in preset B+ tree;By based on mixed
Leaf node based on chained list is stored in NVM by the data layout strategy for closing main memory, to generate the tree layer based on structure of arrays, and
And the other parts of index data structure are stored in DRAM, to generate the link layer based on list structure, so that passing through layering
Volatibility tree construction and persistence list structure design avoid balance and sort persistence expense;It designs Embedded thin
Granularity lock mechanism and optimism write mechanism, with the con current control for being respectively used between read-write operation and writing between write operation.
The multithreading persistence B+ data tree structure design and implementation methods of the embodiment of the present invention, by using non-volatile
The mixing main memory data structure of memory and volatile ram, so that the search operation with good spatial locality and balance,
Expensive persistence operation is effectively reduced, and also designs Embedded fine granularity lock and writes mechanism with optimism, solves amplification
Lock overhead issues, while using multithreading Restoration Mechanism and persistence Garbage Collector, for supporting non-volatile main memory
Coherency management, and accelerate the system recovery procedure of data structure.
In addition, multithreading persistence B+ data tree structure design and implementation methods according to the above embodiment of the present invention may be used also
With following additional technical characteristic:
Further, in one embodiment of the invention, the Embedded fine granularity lock machine is made as each chained list section
One update mark position of point design and deletion marker bit will be unsatisfactory for the persistence delay of preset condition from the version of read operation
Verifying removes on path, and the optimism writes mechanism and separates the concurrent control mechanism of tree node and chained list node, with
Persistence delay is removed from the locking path of tree node granularity.
Further, in one embodiment of the invention, the tree layer based on structure of arrays in the DRAM,
Each node can accommodate the key-value pair of preset quantity, wherein each key-value pair of tree node be directed toward next layer tree node or
Person's chained list node is more than or lower than default threshold, tree node can execute division with the key-value pair quantity in any tree node
Perhaps union operation is inserted into upper one layer of tree node or deletes a key-value pair.
Further, in one embodiment of the invention, the link layer based on structure of arrays in the NVM, will
Link layer is stored in non-volatile main memory, wherein the link layer is an orderly chained list, and each chained list node only stores a key
Value pair, and be connected with right pointer, guarantee that insertion/deletion/update of its atomicity and consistency operates using CPU atomic operation
Further, in one embodiment of the invention, each tree operations are searched for since root node, until finding
Corresponding leaf node, wherein before accessing any one tree node, execute prefetched instruction, entire tree node is read into CPU
In caching, to cover the memory access latency of the entire tree node, and bond number group and value array are stored in different masters respectively
It deposits in space, only to prefetch bond number group, reduces the total amount of data of pre- extract operation every time.
Optionally, the key array size for choosing preset threshold can be used linear search operation and replace binary chop operation,
Linear search operation is placed in the primary memory space and is carried out, and is accelerated using SIMD instruction, wherein each key-value pair is equipped with 1B
Fingerprint, and each fingerprint is the cryptographic Hash of corresponding key assignments, and by fingerprint storage of array on the head of leaf node.
Further, in one embodiment of the invention, if conflict between read-write operation, using being based on version number
Concurrent control mechanism, wherein on each tree node use version number's counter, version number is in each burl dotted state
It is incremented by when being changed, for insertion, deletion or updates operation, applies for lock before modifying tree node, and will corresponding version
This number is set to dirty, and after completing operation and after version number adds 1, discharges the lock of corresponding tree node, and if version number is repaired
Change or be locked, then read operation will repeat the above process, until version number is verified;If between writing write operation
Conflict then uses the lock mechanism of tree node granularity, wherein writes behaviour using what the lock of tree node granularity ensured to modify different tree nodes
Be performed simultaneously, between leaf node by right pointer be connected, and preset the leaf node cleavage direction can only from left to right,
And the lock of the bottom-up application tree node, and when the tree node occurs division or deletes, apply for upper one layer of burl
The key-value pair of the lock of point, chained list node and leaf node has one-to-one relationship, so that the write operation is only obtaining tree layer
After the lock of corresponding leaf node, the chained list node can be just modified.
Further, in one embodiment of the invention, before one chained list node of every sub-distribution and release, every time
One piece of non-volatile primary memory space is distributed from system hosts distributor, and by the address of the non-volatile primary memory space and length
It is persisted in a persistence chained list, and the primary memory space being assigned to is divided into the main memory block of default size, and lead to
The idle main memory block linked list maintenance an of volatibility is crossed, operation is distributed and discharged for the main memory of link layer, and is restored in system
When, restore thread scans persistence chained list on metadata information and link layer node, judge it is currently in use and not by
The main memory block used, to rebuild the idle main memory block chained list of volatibility.
It further, in one embodiment of the invention, can also include: by maintenance one global epoch meter
Number device and three garbage reclamation chained lists correctly to recycle the tree node and chained list node being released, wherein are executing relevant operation
Before, worker thread registers existing No. epoch first, and for tree/chained list node of each deletion, thread is according to the current overall situation
No. epoch is placed into corresponding garbage reclamation chained list.
Further, in one embodiment of the invention, further includes: when system normal shutdown, by all volatibility
Inside tree node and Garbage Collector be persisted to the predeterminated position of non-volatile main memory, and after system reboot, restore thread
The inside tree node of all volatibility and the Garbage Collector are copied in the DRAM from non-volatile main memory.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, in which:
Fig. 1 is the multithreading persistence B+ data tree structure design and implementation methods process according to one embodiment of the invention
Figure;
Fig. 2 is the multithreading persistence B+ tree construction schematic diagram based on chain structure according to one embodiment of the invention;
Fig. 3 is read/write conflict according to an embodiment of the invention and the optimisation strategy figure for writing write conflict;
Fig. 4 is the limited multithreading scalability analysis chart of persistence B+ tree according to an embodiment of the invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Describe with reference to the accompanying drawings the multithreading persistence B+ data tree structure proposed according to embodiments of the present invention design with
Implementation method.
Fig. 1 is the multithreading persistence B+ data tree structure design and implementation methods flow chart of one embodiment of the invention.
As shown in Figure 1, the multithreading persistence B+ data tree structure design and implementation methods the following steps are included:
In step s101, one layer of shadow leaf node based on chain structure is introduced in preset B+ tree.
Further, in one embodiment of the invention, each tree operations are searched for since root node, until finding
Corresponding leaf node, wherein before accessing any one tree node, execute prefetched instruction, entire tree node is read into CPU
In caching, to cover the memory access latency of entire tree node, and bond number group and value array are stored in different main memory skies respectively
Between in, only to prefetch bond number group, reduce the total amount of data of pre- extract operation every time.
In step s 102, the leaf node based on chained list is stored in by the data layout strategy based on mixing main memory
In NVM, to generate the tree layer based on structure of arrays, and the other parts of index data structure are stored in DRAM, with life
At the link layer based on list structure, so that the design by the volatibility tree construction and persistence list structure that are layered avoids balancing
With the persistence expense of sequence.
Further, in one embodiment of the invention, the tree layer based on structure of arrays in DRAM, it is each
A node can accommodate the key-value pair of preset quantity, wherein each key-value pair of tree node is directed toward next layer of tree node or chain
Table node, with the key-value pair quantity in any tree node be more than perhaps lower than default threshold tree node can execute division or
Union operation is inserted into or deletes a key-value pair in upper one layer of tree node.
Further, in one embodiment of the invention, the link layer based on structure of arrays in NVM, by link layer
It being stored in non-volatile main memory, wherein link layer is an orderly chained list, and each chained list node only stores a key-value pair, and
It is connected with right pointer, guarantees that insertion/deletion/update of its atomicity and consistency operates using CPU atomic operation.
Optionally, the key array size for choosing preset threshold can be used linear search operation and replace binary chop operation,
Linear search operation is placed in the primary memory space and is carried out, and is accelerated using SIMD instruction, wherein each key-value pair is equipped with the finger of 1B
Line, and each fingerprint is the cryptographic Hash of corresponding key assignments, and by fingerprint storage of array on the head of leaf node.
In step s 103, it designs Embedded fine granularity lock mechanism and optimism writes mechanism, to be respectively used to read-write operation
Between and write the con current control between write operation.
Further, in one embodiment of the invention, Embedded fine granularity lock machine is made as each chained list node and sets
It counts a update mark position and deletes marker bit, the persistence delay that will be unsatisfactory for preset condition is verified from the version of read operation
It is removed on path, and optimism writes mechanism and separates the concurrent control mechanism of tree node and chained list node, by persistence
Delay is removed from the locking path of tree node granularity.
Further, in one embodiment of the invention, if conflict between read-write operation, using being based on version number
Concurrent control mechanism, wherein on each tree node use version number's counter, version number is in each burl dotted state
It is incremented by when being changed, for insertion, deletion or updates operation, applies for lock before modifying tree node, and will corresponding version
This number is set to dirty, and after completing operation and after version number adds 1, discharges the lock of corresponding tree node, and if version number is repaired
Change or be locked, then read operation will repeat the above process, until version number is verified;If between writing write operation
Conflict then uses the lock mechanism of tree node granularity, wherein writes behaviour using what the lock of tree node granularity ensured to modify different tree nodes
Be performed simultaneously, be connected between leaf node by right pointer, and the cleavage direction of default leaf node can only from left to right, and from
The lock of tree node is applied at bottom upwards, and when tree node occurs division or deletes, applies for the lock of upper one layer of tree node, chained list section
Point and the key-value pair of leaf node have one-to-one relationship so that write operation only obtain tree layer correspond to leaf node lock it
Afterwards, chained list node can just be modified.
Further, in one embodiment of the invention, before one chained list node of every sub-distribution and release, every time
One piece of non-volatile primary memory space is distributed from system hosts distributor, and the address of the non-volatile primary memory space and length is lasting
Change into a persistence chained list, and the primary memory space being assigned to is divided into the main memory block of default size, and easily by one
The idle main memory block linked list maintenance for the property lost restores line operation is distributed and discharged for the main memory of link layer, and when system is restored
Journey scans the node of metadata information and link layer on persistence chained list, judges currently in use and main memory that is being not used
Block, to rebuild the idle main memory block chained list of volatibility.
It further, in one embodiment of the invention, can also include: by maintenance one global epoch meter
Number device and three garbage reclamation chained lists correctly to recycle the tree node and chained list node being released, wherein are executing relevant operation
Before, worker thread registers existing No. epoch first, and for tree/chained list node of each deletion, thread is according to the current overall situation
No. epoch is placed into corresponding garbage reclamation chained list.
Further, in one embodiment of the invention, further includes: when system normal shutdown, by all volatibility
Inside tree node and Garbage Collector be persisted to the predeterminated position of non-volatile main memory, and after system reboot, restore thread
The inside tree node of all volatibility and Garbage Collector are copied in DRAM from non-volatile main memory.
The embodiment of the present invention proposes a kind of mixing main memory data structure using Nonvolatile memory and volatile ram,
Traditional tree data structure is used in volatile ram, and the data structure of chain type is used in Nonvolatile memory, tree-shaped
Data structure increases the concurrency of data retrieval, and linked data structure realizes the lasting data storage in non-volatile media,
Tree has the search operation of good spatial locality and balance, and chain structure effectively reduces expensive persistence
Operation, and Embedded fine granularity lock has also been devised for the data structure and optimism writes mechanism, and the lock expense for solving amplification is asked
Topic, while using the Restoration Mechanism and persistence Garbage Collector of multithreading, for supporting the coherency management of non-volatile main memory,
And accelerate the system recovery procedure of data structure.
Specifically, the embodiment of the present invention proposes a kind of based on Nonvolatile memory NVM and volatile ram DRAM mixing
The data structure that main memory storage system optimizes.Wherein, the data structure after the optimization mainly includes following characteristics: the data
Structure includes two levels, and first level is the tree layer (Tree Layer) based on structure of arrays, is stored in DRAM, second
A level is the link layer (List Layer) based on list structure, is stored in NVM.Wherein, link layer effectively reduces data
The persistence of structure operates, and tree layer provides the search operation with good spatial locality and balance.
Wherein, the data structure after optimization specifically includes following characteristics:
(1) the tree layer based on structure of arrays being located in DRAM, wherein each of which node can accommodate the key of fixed quantity
Value pair, orderly key-value pair is stored in the continuous primary memory space, to guarantee good spatial locality, supports that the time is multiple
Miscellaneous degree is the tree operations of O (log n).Each key-value pair of tree node is directed toward next layer of tree node or chained list node.If
The key-value pair quantity of some tree node is more than that perhaps can execute division lower than some specific threshold tree node or merge behaviour
Make, a key-value pair is inserted into or deleted in upper one layer of tree node, any persistence expense can't be generated.Due to setting layer
It is served only for accelerating the search performance of link layer, when mistake occurs for system, restores the tree layer of volatibility by persistent link layer, and
And by the method balance and sorting operation are only occurred in DRAM, therefore, this method will not introduce excessively high persistence
Expense is write, the performance of index structure can be effectively promoted.
(2) link layer is specifically only stored in non-volatile main memory by the link layer based on structure of arrays being located in NVM.Chain
Layer is an orderly chained list, and each chained list node only stores a key-value pair, is connected with right pointer, utilizes CPU atomic operation
(the 64 bit atomic operations that x86 platform supports alignment) guarantee insertion/deletion/update operation of its atomicity and consistency.Tool
Body, by taking insertion operation as an example, after finding and being properly inserted position, it is only necessary to execute the operation of following two persistence
Guarantee the order of link layer, the operation of first persistence be by newly-generated chained list node (having pointed to postposition node) persistence,
Second persistence operation is by pointer (having pointed to newly-generated chained list node) persistence of preposition chained list node.Wherein, such as
System mistake has occurred between two operations in fruit, because newly-generated chained list node is inserted into link layer not yet, chain
The consistency of layer can't be affected.And for being not inserted into successful new node, persistence Garbage Collector can be avoided
The loss of this block main memory natively eliminates balancing run because link layer can accommodate an infinite number of chained list node.
(3) each tree operations require to search for since root node, until finding corresponding leaf node, it is also necessary to which reading is searched
All tree nodes on rope path, wherein the memory access latency of tree node just becomes the major influence factors of tree layer performance.The present invention
Embodiment before accessing a tree node, a prefetched instruction can be executed, entire tree node is read in cpu cache,
The memory access latency of entire tree node is masked, also bond number group and value array are stored in the different primary memory spaces respectively, thus
Realization only prefetches bond number group, reduces the purpose of the total amount of data of pre- extract operation every time.
(4) for the tree data structure being located in DRAM, the key array size of certain threshold value can be chosen, using linear
Search operation replaces binary chop operation.Further, linear search operation is placed in the primary memory space and is carried out, and utilize SIMD
Instruction accelerates.Wherein, for search operation, target key value and multiple and different key assignments are compared simultaneously using SIMD instruction,
Sequence with similar strategy is also used in balance, to move multiple data simultaneously, promote index performance.
(5) for being equipped with the fingerprint of 1B for each key-value pair of each leaf node on the leaf node of the data structure,
In, each fingerprint is the cryptographic Hash of corresponding key assignments, and by fingerprint storage of array on the head of leaf node.One lookup is grasped
When work, the only cryptographic Hash of target key value and the identical cryptographic Hash of some fingerprint, it can just go to compare this fingerprint correspondence
Key assignments.Wherein, the size of each fingerprint is far smaller than the size of each key assignments, therefore, the contrast operation based on fingerprint array
Additional concurrent ability can closer be increased.
Further, the embodiment of the present invention describes the concurrent control mechanism based on version number of data structure, should
Concurrent control mechanism main contents are as follows: right using the concurrent control mechanism based on version number for the conflict between read-write operation
Conflict between writing write operation, using the lock mechanism of tree node granularity.
Wherein, on the one hand, for the conflict between read-write operation, can be counted on each tree node using a version number
Device avoids each write operation from requiring the expense of application lock as the communication media between the read-write operation concurrently executed.Its
In, version number is incremented by when each burl dotted state is changed, and for insertion, deletion or updates operation, sets in modification
Apply locking before node, and version number is set to dirty, version number is added 1 after completing operation, then discharges this tree node
Lock, if version number is modified or is locked, read operation will repeat the above process, until version number's verifying is logical
It crosses.
On the other hand, for writing the conflict between write operation, it can ensure to modify different burls using the lock of tree node granularity
The write operation of point is performed simultaneously, because write operation only needs to hold the lock for the tree node that will be modified, each write operation passes through version
The mode of this number verifying reaches target leaf node, is connected between leaf node by right pointer, and stipulated that leaf node division side
To can only the situation that can not find target key value pair caused by splitting operation be avoided from left to right.Bottom-up application in next step
The lock of tree node, and only can just apply for the lock of upper one layer of tree node when tree node occurs division or deletes.Because of chain
The key-value pair of table node and leaf node has one-to-one relationship, so write operation is only obtaining the lock set layer and correspond to leaf node
Later, modification chained list node can just be removed.Therefore, the concurrent control mechanism of tree node solves the concurrency conflict of chained list level simultaneously
Problem.
Further, the embodiment of the present invention proposes concurrent control mechanism, and by version counter, which can be with
Support optimistic read.In optimistic reading mechanism, a read operation will acquire the snapshot of existing version without to current version into
Row locking, then reads and data and checks version, wherein if version does not change or is not flagged as dirty, shows
Read operation success, and the design by not needing locking bit, reading concurrent mechanism can be obtained by improvement.For writing write conflict,
The data structure uses version and locking to each first tree node lock, the node for needing a to write firstly, write operation is had good positioning,
Positioning node is determined by top-down read operation, and after needing the node write to be positioned, node, which is written, in write operation locks
Positioning, and start write operation and persistence process.For the write operation of any need balance, require from the data bottom to upper
The lock of layer, by the above method, the lock of tree node before which supports, rather than the lock entirely set.One is inserted into
Operation, firstly, version counter is locked and increased to lock operation position, upper before carrying out write operation and persistence
In the case of stating, version counter increases.Secondly, carrying out write operation and persistence operation to leaf node, finally increase version section
Point number simultaneously discharges lock.For a read operation, lookup and read operation are executed, and the version of snapshot and newest version are compared
Compared with if version is dirty and is modified, read operation failure, restarting verifying is until success.
Specifically, it for read/write conflict, because setting dirty for version in leaf node layer during write operation, and is writing
Version is increased after operation, thus it is longer write execute the time will will lead to it is higher read stop probability.Therefore, for
Read operation also will be removed in version number and ceaselessly be attempted always before constant, this behaviour even reading the operation of other keys
Work will will lead to higher reading suspension rate.Wherein, persistence delay can be verified in critical path from version and be removed, detailed process
To allow to carry out the control of element granularity based on the organizational form of element on chain surface layer, firstly, the data structure is embedding using one
Enter to decline type pointer, which includes updating position and delete position, secondly, the comprehensive array and chain surface layer of leaf node can be with
Support simultaneously it is optimistic read and element granularity writes locking, version counter is used for each chained list node, chain surface layer can
Locking is provided with the element modified each.Based on above description, chained list node can be used Embedded miniature
Lock operates to execute persistence, without generating any read operation to other keys.After persistence operation, chained list node updates
Version number in array node, persistence delay will be removed from the critical path that version is verified.In the embodiment of the present invention
In, need to be arranged embedded position for update and delete operation, the data structure to show that chained list node is being modified.Its
In, it is being inserted into after persistence chained list node, array layer is being updated using version mechanism.Finally, the data structure unlocks
Embedded position sum number group node, embedded position do not need to carry out persistence only for scheduling.For delete operation, only set
Deletion position is set and recycling memory headroom prevents read operation that hovering pointer is accessed.
Similar with read/write conflict for writing write conflict, the persistence expense in write operation equally understands the lock of delayed write operation
It is fixed.Wherein, for positioned at the chain surface layer of leaf node layer, which allows to carry out the different keys of the same leaf node
Concurrently write.It is concurrently write to reach this, first part is the insertion node generated for those, and node is connected to not yet
In chained list, this part of nodes can be randomly written into and persistence.Second part is the node modified, including is inserted
The node enter, delete, updated.The CAS operation of one atomicity can change the state of chained list node, can be by decoupling chained list
Layer and array layer con current control are realized.Specifically, allow to inaccessible node using random access.One insertion operation
Will be with two persistence operations, one is node persistence, by the newly-generated chained list node of persistence and is directed toward next
The pointer of node, the operation will make the node in chained list accessible.Based on above description, an insertion of the data structure
Operation can not needed when chained list node is generated with persistence generate lock, and lock only need the node be pointed to
And the pointer of previous node is updated and generation when persistence.
Firstly, each insertion operation obtains previous and the latter chained list node insertion position by version verifying,
Then a new chained list is needed to connect fraternal pointer into next node, and the entire node of persistence.Secondly, obtaining number
Group lock, and determine whether previous or the latter node is modified, if not provided, after the pointer for being directly connected to previous node arrives
One node, persistence node simultaneously update array layer using version mechanism, otherwise, by using it is traditional based on lock by the way of hold
Row insertion operation.Finally, releasing lock, the persistence cost of chained list node will be removed from locking path.For array
Layer and the progress of the con current control on chain surface layer are decoupling, and chain surface layer can be with atomicity in DRAM by the instruction of CAS a series of
Realize the concurrent mechanism without lock.But CAS operation instruction does not ensure that the persistence atomicity in NVM is write.Specifically, may be used
To guarantee following several respects by a persistent CAS operation.Firstly, the atomicity to a shared variable updates.Secondly,
Persistence includes the cache lines of shared variable to guarantee the persistence updated.The CAS of volatibility will cause in persistence memory
Incorrect behavior when a concurrent read operation reads the value of a shared variable, and makes one persistently based on read operation
Property write operation, when system exception appears in during write operation, it will cause system inconsistent.It is concurrently concurrently grasped to ensure
The consistency of work, the data structure need the CAS operation of persistence that persistence is waited to operate chained list node, when modification not yet
To leaf layer as it can be seen that visibility is realized by Embedded micro-lock, the CAS operation of persistence passes through the atom on decoupling chain surface layer
Property and array layer persistence visibility realize.For each insertion operation, firstly, determining the previous node of target with after
New chained list node is then directed toward next node and by its persistence by one node.Secondly, being repaired using CAS atomicity
Change the brotgher of node of previous node and persistence is carried out to it, the element being newly inserted into only is inserted into a upper level at it
When just as it can be seen that if CAS instruction execute failure if, will restart to execute from the first step, otherwise will use based on lock
Mechanism is inserted into a upper node layer and to globally visible.
Further, for each delete operation, positioning needs the node deleted, and uses CAS to logicality, atomicity
The node with deleted marker is deleted in instruction.Secondly, physically deleting the pointer by modification and the previous node of persistence
And it is automatically directed to next node.The data structure can also use CAS instruction and check whether destination node is repaired
Change or delete and whether is modified with previous node.For each update for modifying existing key operate its concurrent control mechanism with
Delete operation is similar, unlike, what chained list node notified that chained list node is carrying out by updating position is to update behaviour
Make.
Specifically, the consistency main memory management mechanism of the data structure of the embodiment of the present invention, in every sub-distribution and release one
Before a chained list node, larger one piece of non-volatile primary memory space is distributed from system hosts distributor every time, and this block is empty
Between address and length be persisted in a persistence chained list, the primary memory space being assigned to then is divided into particular size
Main memory block, and pass through the idle main memory block linked list maintenance of a volatibility, the main memory distribution and release operation for link layer.It is being
System restore when, restore thread scans persistence chained list on metadata information and link layer node, judge it is currently in use and
The main memory block being not used, so that the idle main memory block chained list of volatibility is rebuild, only after small main memory block is all used,
New main memory can be just distributed from system hosts distributor again.
Specifically, the consistency main memory management mechanism of the data structure of the embodiment of the present invention, it is global by maintenance one
Epoch counter and three garbage reclamation chained lists correctly to recycle the tree node and chained list node being released.Executing related behaviour
Before work, firstly, worker thread registration existing No. epoch, for tree/chained list node of each deletion, thread is according to currently
Global No. epoch is placed into corresponding garbage reclamation chained list.Wherein, if current No. epoch is T, the section of deletion
Point can be placed to [T mod 3] garbage reclamation into chained list, when garbage collector is wanted the master on garbage reclamation chained list
When counterfoil is moved on idle main memory block chained list, firstly, checking whether all worker threads are already in current epoch
In number, if checked successfully, it is incremented by No. epoch global.Pass through above-mentioned method, it is ensured that all threads are all in epoch T
In the range of T+1, thus by the main memory block safe retrieving on the corresponding garbage reclamation chained list of epoch T-1.
Specifically, the multithreading Restoration Mechanism of the data structure of the embodiment of the present invention will own when system normal shutdown
The inside tree node and Garbage Collector of volatibility are persisted to some specific position of non-volatile main memory, after system reboot,
Restore thread to copy the inside tree node of all volatibility and Garbage Collector in DRAM from non-volatile main memory to, very short
The process of system reboot can be completed in time.When being restored after system is abnormal, thread is recycled in offline shape
State scans all chained list nodes, rebuilds all inside tree node and Garbage Collector.Specifically, it was normally executed in system
Cheng Zhong records the position of some chained list nodes using one group of persistence tracker, in 10,000 insertion operations of every execution, tracking
The core address of the new chained list node of device meeting recorded at random, and it is persisted to a reserved area of non-volatile main memory, work as tracking
Chained list node be deleted when, corresponding tracker will be also reset.It mainly include two stages in the recovery process of system: first
First, tracker is ranked up, is then distributed to extensive according to the key assignments of the chained list node of tracker record in first stage
Multiple line journey, per thread independently scan the chained list node of disjoint link layer, rebuild data structure.Secondly, in second rank
These parts are built into a complete data structure using a thread after rebuilding disjoint part by section.
The embodiment of the present invention is using the index data structure of storage system under Nonvolatile memory scene as optimization object, needle
To the storage system for being currently based on main memory, a kind of one layer of shadow leaf segment based on chain structure of introducing in traditional B+ tree is proposed
Point, and using the data layout strategy based on mixing main memory, the leaf node based on chained list is stored in NVM, other parts are deposited
Storage eliminates sequence and balancing run bring persistence expense in DRAM, devises Embedded fine granularity lock and optimism
Mechanism is write, the con current control for being respectively used between read-write operation and writing between write operation, wherein Embedded fine granularity lock machine
System designs a update mark position for each chained list node and deletes marker bit, and this fine-grained concurrent control mechanism will not
The delay of necessary persistence is removed from the version of read operation verifying path, and optimistic mechanism of writing is by tree node and chained list node
Concurrent control mechanism is separated, and is further removed persistence delay from the locking path of tree node granularity, behaviour is write in reduction
Make the concurrency conflict between write operation.Optionally, the embodiment of the present invention also designs persistence Garbage Collector, for supporting
The coherency management of non-volatile main memory, the recovery of data structure when accelerating system crash finally by multithreading recovery technology
Process.
Next come according to specific embodiment to multithreading persistence B+ data tree structure design of the invention and realization side
Method is described in detail.
As shown in Fig. 2, the B+ tree that the embodiment of the present invention supports multithreading persistence concurrently to access is mixed using DRAM and NVM
Main memory framework being closed, being a tree similar with tradition B+ tree in DRAM, index when for running is in NVM
One data structure based on chained list, for storing all user data and its relationship, system is only protected in non-operating state
The list structure being located on NVM is deposited, when system restarting or abnormal restoring, is reconstructed using the list structure being located on NVM
Tree data structure in DRAM, and accelerate concurrent Index process using tree data structure at runtime.
In embodiments of the present invention, memory access latency can be reduced using the mechanism that prefetches, the search of tree is opened from root node
Begin, until finding corresponding leaf node, since this process needs to read all leaf nodes in searching route, these trees
The memory access latency of node will seriously affect the search performance entirely set.To solve this problem, it is being accessed in the embodiment of the present invention
Before each tree node, a prefetched instruction is executed, entire tree node is prefetched in cpu cache, to mask entire
The memory access latency of tree node, and bond number group and value array are buffered in the different primary memory spaces respectively, bond number group is only prefetched, is dropped
The total amount of data of low pre- extract operation every time.
In embodiments of the present invention, treatment process is accelerated using SIMD mechanism.Wherein, linear search operation is continuous
It is executed in the primary memory space, it is possible to be added using Single Instruction Multiple Data (SIMD) instruction
Speed.Most modern processors all support SIMD instruction, support to execute identical arithmetic to multiple data simultaneously or compare behaviour
Make.For search operation, target key value is compared with multiple and different key assignments simultaneously using SIMD compare instruction.Sequence and
Similar optimisation strategy is also used in balancing run, uses 24 so as to mobile multiple data, the embodiment of the present invention simultaneously
Core Intel processors support the SIMD operation of 256 bits, and therefore, which can compare 32 fingerprints simultaneously, accelerate
The search procedure of leaf node.
In an embodiment of the present invention, using the lock of concurrent control mechanism and tree node granularity based on version number, it is ensured that
The write operation for modifying different tree nodes may be performed simultaneously, and Fig. 2 gives the structure of version number, and version number uses one 32
Byte sequence structure.Wherein, first for whether Bei Suoding identifier, second is whether as root node identifier, third
Position is whether that for leaf node identifier, latter 29 are incremental version numbers, which is changed in each burl dotted state
When be incremented by.
Apply for the lock of this tree node before modifying tree node, then version number is set to it is dirty, and complete operation after
By version number plus 1, the lock of the tree node is then discharged.For inquiry operation, this section is recorded before reading tree node
The version number of point, after completing read operation by the newest version number of this tree node and before the version number that records carry out pair
Than, judge whether this tree node is modified in reading process by other operations, if version number is modified or is locked,
Read operation will re-execute the above process, until version number is verified.
In embodiments of the present invention, the delay of excessively high persistence can block the other of identical leaf node difference key-value pair and write behaviour
Make.Each key-value pair has very strong incidence relation with adjacent key-value pair on tree node based on structure of arrays, any one
Write operation may all trigger expensive balancing run (the most of key-value pair for modifying the same tree node), accordingly, it is difficult to design
The lock of key-value pair granularity for coordinating to access the concurrent write of identical leaf node difference key-value pair, and writes mechanism by optimism
Persistence delay is removed from the locking path of tree node.
In embodiments of the present invention, those are modified the write operation sequence of critical zone data using mutual exclusion lock by index structure
Change, avoids the concurrency conflict between write operation.Therefore, the persistence operation modified to non-critical zones data can be from mutual exclusion
Lock removes on path.
It as shown in Figure 3 and Figure 4, is the data structure to read/write conflict and the processing step for writing write conflict.Read-write is rushed
Prominent, each insertion operation navigates to the position of insertion first, gets preposition and postposition chained list node.It is operated for updating, the
One step applies for a new chained list node, and right pointer is directed toward postposition node, and the entire chained list node of persistence.Second step obtains
The lock for taking tree layer leaf node judges whether preposition and postposition node is modified.Third step, if do not modified, will before
The pointer for setting node is directed toward new node and persistence.4th step updates key-value pair and the version number of leaf node.5th step, if
It is modified, insertion operation will be executed in such a way that tradition is based on lock, and discharge lock.By above-mentioned optimisation strategy, will not repair
The persistence operation for changing link layer is removed from locking path.It is worth noting that, what is occurred between the first step and third step is
System collapse not will lead to the leakage of non-volatile main memory.
In embodiments of the present invention, read operation is verified by the described above first, two, three step by version number
Mode obtain corresponding chained list node pointer, by the 4th step described above, read the data of chained list node, then examine
Look into the insertion marker bit of this node.Wherein, if marker bit is arranged to dirty, illustrate that the chained list node read is being in more
State that is new or deleting, if update mark position be it is dirty, read operation can wait until always that update operation is completed and lasting
Change.If delete marker bit be it is dirty, read operation can be reformed from root node.Specifically, basic read-write concurrent control mechanism and base
In the difference of the read-write concurrent control mechanism of Embedded fine granularity lock: the latter will be directed to writing for identical leaf node difference key-value pair
The persistence expense of operation is checked from the version number of read operation and is removed on path.
In embodiments of the present invention, link layer, which only needs to execute by a series of CAS operation, guarantees atomicity
Operation is updated, and Embedded fine granularity lock supports that the key-value pair of modification only just can be visible after the completion of persistence operates.
By above-mentioned two technologies, the concurrent control mechanism for setting layer and link layer is separated, uses key-value pair grain in persistence link layer
The concurrent control mechanism of degree uses the locking concurrent mechanism of tree node granularity in volatibility tree layer, thus by the persistence of link layer
Expense is removed from the locking path of tree node.For each insertion operation, insertion is navigated in such a way that version number verifies
Position, get preposition and postposition node.Wherein, the first step distributes new chained list node, and is directed toward postposition node,
Then persistence operation is executed.The right pointer of preposition node is directed toward new chained list node by CAS instruction by second step.Due to
The state of preposition node is stored in right pointer, and CAS operation is also used to avoid for new node being inserted into deleted chained list section
After point, traditional atomic operation not can guarantee the persistence of data.
In embodiments of the present invention, the chained list node being newly inserted into just can only after the tree node for being updated to upper layer
See, to avoid seeing the new node not persisted by other operations.The data structure is in the new chained list node of persistence with before
After setting the pointer of node, using it is traditional based on lock by the way of its will be kept visible in leaf node that new key-value pair is inserted into upper layer.
For each delete operation, an existing key-value pair is deleted.Firstly, navigating to target chained list node, it is arranged using CAS operation
Marker bit is deleted, completes logic delete operation, avoiding other threads that any newly-generated chained list node is inserted into this will
Behind the node of deletion, the loss of new node otherwise may cause.Secondly, atomicity modify the right pointer of preposition node, will
It is directed toward postposition node and persistence, completes physics delete operation, checks whether destination node is being deleted using CAS operation
Or whether update and preposition node are being deleted.The data structure is after completing aforesaid operations, by the key-value pair from upper
It is deleted in layer tree node.Each update is operated, the value of an existing key-value pair is modified.In addition to target update node, more
New operation has no effect on other nodes of link layer, so concurrent control mechanism very simple.By update mark position, notify other
This chained list node of thread is in the process being updated.Fig. 3 (c) shows the concurrent implementation procedure of two insertion operations, this hair
The persistence of chained list level is postponed to remove from the locking path of tree node granularity by data structure described in bright embodiment.
It in embodiments of the present invention, is the consistency and persistence that guarantee link layer, in system crash, unfinished operation
The loss that (such as insertion and delete operation) may cause newly assigned chained list node, leads to the leakage of the primary memory space, and read
Operation may see the chained list/tree node deleted by other threads.To solve the above problems, the Data Structure Design
The consistency main memory management of lightweight and a persistence Garbage Collector.
In the embodiment of the present invention, larger one piece of non-volatile primary memory space is distributed from system hosts every time, and by this block
The address in space and length are persisted in a persistence chained list, and the primary memory space being assigned to is divided into the master of particular size
Counterfoil, and pass through the idle main memory block linked list maintenance of a volatibility, the main memory distribution and release operation for link layer.In system
When recovery, restores the node of the metadata information and link layer on thread scans persistence chained list, judge currently in use and do not have
There is the main memory block used, to rebuild the idle main memory block chained list of volatibility, is only all used in these small main memory blocks
Afterwards, new main memory can be just distributed from system hosts distributor again.
In the embodiment of the present invention, read operation is avoided to see the chained list node and tree node deleted by other threads, led to
It crosses one overall situation epoch counter of maintenance and three garbage reclamation chained lists correctly recycles the tree node being released and chained list section
Point.Wherein, before the operation for executing the data structure, firstly, worker thread registration existing No. epoch.For each deletion
Tree/chained list node, thread is placed into corresponding garbage reclamation chained list according to current global No. epoch, if mesh
Preceding No. epoch is T, and the node of deletion can be placed in [T mod 3] garbage reclamation chained list, when garbage collector is wanted
When main memory block on garbage reclamation chained list is moved on idle main memory block chained list, whether all worker threads are first checked for
It has been in current No. epoch, if checked successfully, has just been incremented by No. epoch global.Pass through above-mentioned method, it is ensured that institute
In the range of some threads are all in epoch T and T+1, thus by the master on the corresponding garbage reclamation chained list of epoch T-1
Counterfoil safe retrieving.
In the embodiment of the present invention, using multithreading Restoration Mechanism acceleration system recovering process, by the inside of all volatibility
Tree node and Garbage Collector are persisted to some specific position of non-volatile main memory.It is extensive when restarting after system normal shutdown
Multiple line journey copies the inside tree node of all volatibility and Garbage Collector in DRAM from non-volatile main memory to, very short
System reboot is completed in time.When system exception restores, the recycling thread chained list node all in offline status scan, weight
Build all inside tree node and Garbage Collector.Specifically, it in the normal implementation procedure of system, is tracked using one group of persistence
Device records the position of some chained list nodes, and in 10,000 insertion operations of every execution, tracker can the new chained list section of recorded at random
The core address of point, and it is persisted to a reserved area of non-volatile main memory, it is right when the chained list node of tracking is deleted
The tracker answered will be also reset.
When system is restored, recovery process mainly includes two stages: first stage, first according to tracker record
The key assignments of chained list node, is ranked up tracker, is then distributed to recovery thread, and each stage is independently scanned disjoint
The chained list node of link layer reconstructs the data structure;Second stage uses one after the data structure for rebuilding non-intersecting part
For a thread by these partial reconfigurations at a complete data structure, which can be effectively reduced the conflict between thread.
The multithreading persistence B+ data tree structure design and implementation methods of the embodiment of the present invention, by using non-volatile
The mixing main memory data structure of memory and volatile ram, so that the search operation with good spatial locality and balance,
Expensive persistence operation is effectively reduced, and also designs Embedded fine granularity lock and writes mechanism with optimism, solves amplification
Lock overhead issues, while using multithreading Restoration Mechanism and persistence Garbage Collector, for supporting non-volatile main memory
Coherency management, and accelerate the system recovery procedure of data structure.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, modifies, replacement and variant.
Claims (10)
1. a kind of multithreading persistence B+ data tree structure design and implementation methods, which comprises the following steps:
One layer of shadow leaf node based on chain structure is introduced in preset B+ tree;
The leaf node based on chained list is stored in NVM by the data layout strategy based on mixing main memory, to generate based on number
The tree layer of group structure, and the other parts of index data structure are stored in DRAM, to generate the chain based on list structure
Layer, so that persistently being melted by what the design of the volatibility tree construction and persistence list structure of layering avoided balancing and sorting
Pin;
It designs Embedded fine granularity lock mechanism and optimism writes mechanism, to be respectively used between read-write operation and write between write operation
Con current control.
2. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that its
In, the Embedded fine granularity lock machine is made as each chained list node and designs a update mark position and delete marker bit, will
The persistence delay for being unsatisfactory for preset condition is removed from the version of read operation verifying path, and the optimism is write mechanism and will be set
The concurrent control mechanism of node and chained list node is separated, and the persistence is postponed from the locking path of tree node granularity
Upper removal.
3. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that position
The tree layer based on structure of arrays in the DRAM, each of which node can accommodate the key-value pair of preset quantity, wherein burl
Each key-value pair of point is directed toward next layer of tree node or chained list node, is more than with the key-value pair quantity in any tree node
Perhaps division being executed lower than default threshold tree node, perhaps union operation is inserted into or deletes in upper one layer of tree node
One key-value pair.
4. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that position
The link layer based on structure of arrays in the NVM, link layer is stored in non-volatile main memory, wherein the link layer is one
Orderly chained list, each chained list node only stores a key-value pair, and is connected with right pointer, guarantees it using CPU atomic operation
The insertion/deletion of atomicity and consistency/update operation.
5. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that every
A tree operations are searched for since root node, until finding corresponding leaf node, wherein access any one tree node it
Before, prefetched instruction is executed, entire tree node is read in cpu cache, to cover the memory access latency of the entire tree node, and
And bond number group and value array are stored in the different primary memory spaces respectively, only to prefetch bond number group, reduce pre- extract operation every time
Total amount of data.
6. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that choosing
The key array size for taking preset threshold is operated using linear search and binary chop is replaced to operate, and linear search operation is put
It carries out in the primary memory space, and is accelerated using SIMD instruction, wherein each key-value pair is equipped with the fingerprint of 1B, and each fingerprint
It is the cryptographic Hash of corresponding key assignments, and by fingerprint storage of array on the head of leaf node.
7. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that its
In,
If the conflict between read-write operation, the concurrent control mechanism based on version number is used, wherein adopt on each tree node
With version number's counter, version number is incremented by when each burl dotted state is changed, and for insertion, deletes or more
New operation, applies for lock before modify tree node, and will corresponding version number be set to it is dirty, and after completing to operate and version number adds 1
Afterwards, the lock of corresponding tree node is discharged, and if version number is modified or is locked, read operation will repeat above-mentioned
Process, until version number is verified;
If the conflict between writing write operation, the lock mechanism of tree node granularity is used, wherein the lock using tree node granularity ensures
The write operation for modifying different tree nodes is performed simultaneously, and is connected between leaf node by right pointer, and preset the leaf node
Cleavage direction can only from left to right, and the lock of the bottom-up application tree node, and the tree node occur division or
When deletion, apply for the lock of upper one layer of tree node, the key-value pair of chained list node and leaf node has one-to-one relationship, so that described
Write operation only after obtaining tree layer and corresponding to the lock of leaf node, can just modify the chained list node.
8. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that
Before one chained list node of every sub-distribution and release, one piece of non-volatile primary memory space is distributed from system hosts distributor every time,
And the address of the non-volatile primary memory space and length are persisted in a persistence chained list, and described in being assigned to
The primary memory space is divided into the main memory block of default size, and passes through the idle main memory block linked list maintenance of a volatibility, to be used for chain
Layer main memory distribution and release operation, and system restore when, restore thread scans persistence chained list on metadata information and
The node of link layer judges currently in use and main memory block that is being not used, to rebuild the idle main memory block chain of volatibility
Table.
9. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that also
Include:
By maintenance one global epoch counter and three garbage reclamation chained lists come correctly recycle the tree node being released with
Chained list node, wherein before executing relevant operation, worker thread registers existing No. epoch first, for each deletion
Tree/chained list node, thread are placed into corresponding garbage reclamation chained list according to current global No. epoch.
10. multithreading persistence B+ data tree structure design and implementation methods according to claim 1, which is characterized in that
Further include:
When system normal shutdown, the inside tree node and Garbage Collector of all volatibility are persisted to non-volatile main memory
Predeterminated position, and after system reboot, restore thread for the inside tree node and the Garbage Collector of all volatibility
It is copied in the DRAM from non-volatile main memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811129623.3A CN109407979B (en) | 2018-09-27 | 2018-09-27 | Multithreading persistent B + tree data structure design and implementation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811129623.3A CN109407979B (en) | 2018-09-27 | 2018-09-27 | Multithreading persistent B + tree data structure design and implementation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109407979A true CN109407979A (en) | 2019-03-01 |
CN109407979B CN109407979B (en) | 2020-07-28 |
Family
ID=65465484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811129623.3A Active CN109407979B (en) | 2018-09-27 | 2018-09-27 | Multithreading persistent B + tree data structure design and implementation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109407979B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110825734A (en) * | 2019-10-09 | 2020-02-21 | 上海交通大学 | Concurrent updating method and read-write system for balance tree |
CN111159056A (en) * | 2019-12-11 | 2020-05-15 | 上海交通大学 | Scalable memory allocation method and system for nonvolatile memory |
CN111274456A (en) * | 2020-01-20 | 2020-06-12 | 中国科学院计算技术研究所 | Data indexing method and data processing system based on NVM (non-volatile memory) main memory |
CN111352860A (en) * | 2019-12-26 | 2020-06-30 | 天津中科曙光存储科技有限公司 | Method and system for recycling garbage in Linux Bcache |
CN111611246A (en) * | 2020-05-25 | 2020-09-01 | 华中科技大学 | Method and system for optimizing B + tree index performance based on persistent memory |
CN111651455A (en) * | 2020-05-26 | 2020-09-11 | 上海交通大学 | Efficient concurrent index data structure based on machine learning |
CN112286928A (en) * | 2019-09-16 | 2021-01-29 | 重庆傲雄在线信息技术有限公司 | Chain type storage system |
CN112543237A (en) * | 2020-11-27 | 2021-03-23 | 互联网域名系统北京市工程研究中心有限公司 | Lock-free DNS (Domain name Server) caching method and DNS server |
CN112612803A (en) * | 2020-12-22 | 2021-04-06 | 浙江大学 | Key value pair storage system based on persistent memory and data concurrent insertion method |
CN112732725A (en) * | 2021-01-22 | 2021-04-30 | 上海交通大学 | NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium |
CN112947856A (en) * | 2021-02-05 | 2021-06-11 | 彩讯科技股份有限公司 | Memory data management method and device, computer equipment and storage medium |
CN113656444A (en) * | 2021-08-26 | 2021-11-16 | 傲网信息科技(厦门)有限公司 | Data persistence method, server and management equipment |
WO2022068289A1 (en) * | 2020-09-29 | 2022-04-07 | 北京金山云网络技术有限公司 | Data access method, apparatus and device, and computer-readable storage medium |
CN114341817A (en) * | 2019-08-22 | 2022-04-12 | 美光科技公司 | Hierarchical memory system |
US20230078081A1 (en) * | 2020-02-14 | 2023-03-16 | Inspur Suzhou Intelligent Technology Co., Ltd. | B-plus tree access method and apparatus, and computer-readable storage medium |
CN115905246A (en) * | 2023-03-14 | 2023-04-04 | 智者四海(北京)技术有限公司 | KV cache method and device based on dynamic compression prefix tree |
CN116719832A (en) * | 2023-08-07 | 2023-09-08 | 金篆信科有限责任公司 | Database concurrency control method and device, electronic equipment and storage medium |
CN117131012A (en) * | 2023-08-28 | 2023-11-28 | 中国科学院软件研究所 | Sustainable and extensible lightweight multi-version ordered key value storage system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120221538A1 (en) * | 2011-02-28 | 2012-08-30 | International Business Machines Corporation | Optimistic, version number based concurrency control for index structures with atomic, non-versioned pointer updates |
CN103268291A (en) * | 2013-05-23 | 2013-08-28 | 清华大学 | Method for delaying persistent indexing metadata in flash memory storage system |
CN103765381A (en) * | 2011-08-29 | 2014-04-30 | 英特尔公司 | Parallel operation on B+ trees |
KR20140070834A (en) * | 2012-11-28 | 2014-06-11 | 연세대학교 산학협력단 | Modified searching method and apparatus for b+ tree |
CN104881371A (en) * | 2015-05-29 | 2015-09-02 | 清华大学 | Persistent internal memory transaction processing cache management method and device |
CN105930280A (en) * | 2016-05-27 | 2016-09-07 | 诸葛晴凤 | Efficient page organization and management method facing NVM (Non-Volatile Memory) |
US20160350015A1 (en) * | 2015-05-27 | 2016-12-01 | Nutech Ventures | Enforcing Persistency for Battery-Backed Mobile Devices |
CN106775435A (en) * | 2015-11-24 | 2017-05-31 | 腾讯科技(深圳)有限公司 | Data processing method, device and system in a kind of storage system |
CN107273443A (en) * | 2017-05-26 | 2017-10-20 | 电子科技大学 | A kind of hybrid index method based on big data model metadata |
CN107463447A (en) * | 2017-08-21 | 2017-12-12 | 中国人民解放军国防科技大学 | B + tree management method based on remote direct nonvolatile memory access |
-
2018
- 2018-09-27 CN CN201811129623.3A patent/CN109407979B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120221538A1 (en) * | 2011-02-28 | 2012-08-30 | International Business Machines Corporation | Optimistic, version number based concurrency control for index structures with atomic, non-versioned pointer updates |
CN103765381A (en) * | 2011-08-29 | 2014-04-30 | 英特尔公司 | Parallel operation on B+ trees |
KR20140070834A (en) * | 2012-11-28 | 2014-06-11 | 연세대학교 산학협력단 | Modified searching method and apparatus for b+ tree |
CN103268291A (en) * | 2013-05-23 | 2013-08-28 | 清华大学 | Method for delaying persistent indexing metadata in flash memory storage system |
US20160350015A1 (en) * | 2015-05-27 | 2016-12-01 | Nutech Ventures | Enforcing Persistency for Battery-Backed Mobile Devices |
CN104881371A (en) * | 2015-05-29 | 2015-09-02 | 清华大学 | Persistent internal memory transaction processing cache management method and device |
CN106775435A (en) * | 2015-11-24 | 2017-05-31 | 腾讯科技(深圳)有限公司 | Data processing method, device and system in a kind of storage system |
CN105930280A (en) * | 2016-05-27 | 2016-09-07 | 诸葛晴凤 | Efficient page organization and management method facing NVM (Non-Volatile Memory) |
CN107273443A (en) * | 2017-05-26 | 2017-10-20 | 电子科技大学 | A kind of hybrid index method based on big data model metadata |
CN107463447A (en) * | 2017-08-21 | 2017-12-12 | 中国人民解放军国防科技大学 | B + tree management method based on remote direct nonvolatile memory access |
Non-Patent Citations (1)
Title |
---|
舒继武等: "基于非易失性存储器的存储系统技术研究进展", 《科技导报》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11650843B2 (en) | 2019-08-22 | 2023-05-16 | Micron Technology, Inc. | Hierarchical memory systems |
CN114341817A (en) * | 2019-08-22 | 2022-04-12 | 美光科技公司 | Hierarchical memory system |
CN112286928B (en) * | 2019-09-16 | 2023-11-28 | 重庆傲雄在线信息技术有限公司 | Chain type storage system |
CN112286928A (en) * | 2019-09-16 | 2021-01-29 | 重庆傲雄在线信息技术有限公司 | Chain type storage system |
CN110825734B (en) * | 2019-10-09 | 2023-04-28 | 上海交通大学 | Concurrent updating method of balance tree and read-write system |
CN110825734A (en) * | 2019-10-09 | 2020-02-21 | 上海交通大学 | Concurrent updating method and read-write system for balance tree |
CN111159056A (en) * | 2019-12-11 | 2020-05-15 | 上海交通大学 | Scalable memory allocation method and system for nonvolatile memory |
CN111352860A (en) * | 2019-12-26 | 2020-06-30 | 天津中科曙光存储科技有限公司 | Method and system for recycling garbage in Linux Bcache |
CN111352860B (en) * | 2019-12-26 | 2022-05-13 | 天津中科曙光存储科技有限公司 | Garbage recycling method and system in Linux Bcache |
CN111274456A (en) * | 2020-01-20 | 2020-06-12 | 中国科学院计算技术研究所 | Data indexing method and data processing system based on NVM (non-volatile memory) main memory |
CN111274456B (en) * | 2020-01-20 | 2023-09-12 | 中国科学院计算技术研究所 | Data indexing method and data processing system based on NVM (non-volatile memory) main memory |
US20230078081A1 (en) * | 2020-02-14 | 2023-03-16 | Inspur Suzhou Intelligent Technology Co., Ltd. | B-plus tree access method and apparatus, and computer-readable storage medium |
US11762827B2 (en) * | 2020-02-14 | 2023-09-19 | Inspur Suzhou Intelligent Technology Co., Ltd. | B-plus tree access method and apparatus, and computer-readable storage medium |
CN111611246A (en) * | 2020-05-25 | 2020-09-01 | 华中科技大学 | Method and system for optimizing B + tree index performance based on persistent memory |
CN111611246B (en) * | 2020-05-25 | 2023-04-25 | 华中科技大学 | Method and system for optimizing index performance of B+ tree based on persistent memory |
CN111651455A (en) * | 2020-05-26 | 2020-09-11 | 上海交通大学 | Efficient concurrent index data structure based on machine learning |
WO2022068289A1 (en) * | 2020-09-29 | 2022-04-07 | 北京金山云网络技术有限公司 | Data access method, apparatus and device, and computer-readable storage medium |
CN112543237B (en) * | 2020-11-27 | 2023-07-11 | 互联网域名系统北京市工程研究中心有限公司 | Lock-free DNS caching method and DNS server |
CN112543237A (en) * | 2020-11-27 | 2021-03-23 | 互联网域名系统北京市工程研究中心有限公司 | Lock-free DNS (Domain name Server) caching method and DNS server |
CN112612803A (en) * | 2020-12-22 | 2021-04-06 | 浙江大学 | Key value pair storage system based on persistent memory and data concurrent insertion method |
CN112612803B (en) * | 2020-12-22 | 2022-07-12 | 浙江大学 | Key value pair storage system based on persistent memory and data concurrent insertion method |
CN112732725B (en) * | 2021-01-22 | 2022-03-25 | 上海交通大学 | NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium |
CN112732725A (en) * | 2021-01-22 | 2021-04-30 | 上海交通大学 | NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium |
CN112947856A (en) * | 2021-02-05 | 2021-06-11 | 彩讯科技股份有限公司 | Memory data management method and device, computer equipment and storage medium |
CN112947856B (en) * | 2021-02-05 | 2024-05-03 | 彩讯科技股份有限公司 | Memory data management method and device, computer equipment and storage medium |
CN113656444A (en) * | 2021-08-26 | 2021-11-16 | 傲网信息科技(厦门)有限公司 | Data persistence method, server and management equipment |
CN113656444B (en) * | 2021-08-26 | 2024-02-27 | 友安云(厦门)数据科技有限公司 | Data persistence method, server and management equipment |
CN115905246A (en) * | 2023-03-14 | 2023-04-04 | 智者四海(北京)技术有限公司 | KV cache method and device based on dynamic compression prefix tree |
CN116719832A (en) * | 2023-08-07 | 2023-09-08 | 金篆信科有限责任公司 | Database concurrency control method and device, electronic equipment and storage medium |
CN116719832B (en) * | 2023-08-07 | 2023-11-24 | 金篆信科有限责任公司 | Database concurrency control method and device, electronic equipment and storage medium |
CN117131012A (en) * | 2023-08-28 | 2023-11-28 | 中国科学院软件研究所 | Sustainable and extensible lightweight multi-version ordered key value storage system |
CN117131012B (en) * | 2023-08-28 | 2024-04-16 | 中国科学院软件研究所 | Sustainable and extensible lightweight multi-version ordered key value storage system |
Also Published As
Publication number | Publication date |
---|---|
CN109407979B (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109407979A (en) | Multithreading persistence B+ data tree structure design and implementation methods | |
US11288252B2 (en) | Transactional key-value store | |
Fang et al. | High performance database logging using storage class memory | |
US11023453B2 (en) | Hash index | |
Levandoski et al. | LLAMA: A cache/storage subsystem for modern hardware | |
JP5647203B2 (en) | Memory page management | |
KR930002331B1 (en) | Method and apparatus for concurrent modification of an index tree | |
Lee et al. | A case for flash memory SSD in enterprise database applications | |
CN109407978A (en) | The design and implementation methods of high concurrent index B+ linked list data structure | |
CN104246764B (en) | The method and apparatus for placing record in non-homogeneous access memory using non-homogeneous hash function | |
CN100412823C (en) | Method and system for managing atomic updates on metadata tracks in a storage system | |
CN100367239C (en) | Cache-conscious concurrency control scheme for database systems | |
US20180011892A1 (en) | Foster twin data structure | |
US20060265373A1 (en) | Hybrid multi-threaded access to data structures using hazard pointers for reads and locks for updates | |
CN105408895A (en) | Latch-free, log-structured storage for multiple access methods | |
JPH0887511A (en) | Method and system for managing b-tree index | |
CN112597254B (en) | Hybrid DRAM-NVM (dynamic random Access memory-non volatile memory) main memory oriented online transactional database system | |
CN107665219B (en) | Log management method and device | |
US20180004798A1 (en) | Read only bufferpool | |
CN111414134B (en) | Transaction write optimization framework method and system for persistent memory file system | |
Wang et al. | Persisting RB-Tree into NVM in a consistency perspective | |
TW202139000A (en) | Method of data storage, key -value store and non-transitory computer readable medium | |
CN111414320B (en) | Method and system for constructing disk cache based on nonvolatile memory of log file system | |
Li et al. | Phast: Hierarchical concurrent log-free skip list for persistent memory | |
US8001084B2 (en) | Memory allocator for optimistic data access |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |