CN106708442A

CN106708442A - Massive data storage method simultaneously applicable to disk and solid state disk reading and writing features

Info

Publication number: CN106708442A
Application number: CN201611255923.7A
Authority: CN
Inventors: 龚才鑫; 龚奕利
Original assignee: Wuhan Safety Technology Co Ltd
Current assignee: Hard rock technology (Wuhan) Co., Ltd
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2017-05-24
Anticipated expiration: 2036-12-30
Also published as: CN106708442B

Abstract

The invention provides a massive data storage method simultaneously applicable to disk and solid state disk reading and writing features. Full sequencing of records in each block is changed into partial sequencing, a Bloom filter is added to the tail portion of each block, a Log-Structured Append-Tree is created, when the quantity of data stored in each block in the tree reaches a threshold and data in the block is directly added to corresponding child blocks, the data of the child blocks is composed of multiple collating sequences rather than full sequencing is achieved in the blocks in a merging sorting mode; each block in the tree stores one Bloom filter. According to the method, on the condition that no other properties are sacrificed, write amplification is greatly reduced, and the random writing efficiency is greatly improved. Besides, the service life of a solid state disk is better protected and prolonged. In read and write mixed scenes, the random read property is also enhanced, and the method has important market value.

Description

The mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously

Technical field

The invention belongs to mass data storage field, more particularly to storage tree, the method can simultaneously adapt to disk and solid-state Disk read-write characteristic.

Background technology

The index tree commonly used on existing hard disk has B-tree, LSM-tree, buffer-tree etc..Wherein B-tree is Traditional classical tree, but because its inevitable random write disk in the scene of random write, when storage mass data when property Can be relatively low, so its variant is frequently used during storage mass data, to the variant and LSM-tree of B-tree in such as BigTable Be used in combination.For the storage of mass data, often LSM-tree or buffer-tree (being also called fractal-tree) is used Used as index tree, the common feature of both is that the record that is written into is postponed and write, the batch processing again when running up to a certain amount of. The random write disk caused in the random write scene that so can preferably solve the problems, such as B-tree so that write handling capacity and obtain Larger lifting.

In the scene of random write because the number of plies of LSM-tree and buffer-tree it is more and tree in block size ratio Block size in B-tree is much larger, so reading to amplify larger so that random reading performance has substantially reduction.In order to solve this The projects such as problem, bigtable/leveldb save bloom filter information when LSM-tree is realized in each node, this The reading that sample can be very good to reduce LSM-tree is amplified, and preferably solves the problems, such as that random reading performance is low.

But either B-tree or LSM-tree/buffer-tree, writing for these trees amplifies all larger.Due to disk The limitation of handling capacity, it is larger to write the further substantial lifting amplified and limit these index tree random write performances, and The life-span of serious infringement solid state hard disc.Larger amplification of writing has been occupied the handling capacity of most disk and then has been caused mixed in read-write In the scene of conjunction, random write influence random write the utilization of disk performance is caused random reading performance also have it is a certain degree of under Drop.

The content of the invention

Problem to be solved by this invention is:The problem so that random write inefficiency is amplified in larger the writing of traditional tree, The also serious life-span for affecting solid state hard disc is amplified in larger writing in solid state hard disc disk.Big portion has been occupied in larger amplification of writing Point mechanical disk or solid state hard disc handling capacity so that cause read-write mixing scene in, random write influence random write to machine The utilization of tool disk or solid state hard disc performance and cause random reading performance also have a certain degree of decline.Thus devise referred to as The tree of Log-Structured Append-Tree (log-structured additional tree, abbreviation LSA-tree).

The present invention provides a kind of while adapting to the mass data storage means of disk and solid state disk read-write characteristic, by one The sequence completely of the record in block is changed to portions sequence, then adds Bloom filter in the afterbody of each block, and implementation is as follows, Internal memory includes the metadata information of variable memory cache, immutable memory cache and tree, and the data in disk use LSA- Tree structure organizations, if the tree is divided into n-layer, at least t in i-th layerⁱThe individual most t of blockⁱ+ 1 block, 1≤i≤n-1, parameter t are adjacent The multiple of two-layer block number threshold value, last layer is less than or equal to tⁿIndividual block；Each block has a scope for key, when the storage of each block When data volume reaches respective threshold, during the data brush in block to enter to have in scope in next layer the block of covering overlapping relation, will The data to be brushed are directly appended to when in corresponding block, and a certain piece of data are made up of several collating sequences, rather than passing through The mode of merger sequence is realized being sorted completely in block；The in store Bloom filter of each block in tree；

And, operation of the background thread to the block in LSA-tree trees is divided three classes, including lower brush, division and merging；Institute There is block initiation of the operation all only to non-final one layer to process；By a certain piece of current layer with one or more blocks of lower floor on key Covering overlapping relation be referred to as set membership, the block of current layer is referred to as parent block, and next layer one or more blocks are referred to as child Sub-block；

Lower brush operation is that the data in block are displaced downwardly in next layer, but the scope of the block still retains, layer where the block The number of block does not change；

The trigger condition of lower brush operation is that the data volume of block storage reaches storage threshold value and child's block number mesh of the block is small In 2t；

, it is necessary to two execution conditions below carry out lower brush after being satisfied by after triggering,

Condition 1, the number of the block of lower floor is less than tⁱ⁺¹+ 1 and i+1<N, or less than tⁿAnd i+1=n；

Condition 2, if lower floor is non-final one layer, child's block all need to not up to store threshold value；

Splitting operation is that block is split into two, so that child's block number mesh of two newly-generated blocks is equal；

The trigger condition of splitting operation is that the data volume of block storage reaches storage threshold value and child's block number mesh of the block is big In 2t；

The execution condition that the operation need to meet is that the number of the block of layer is less than t where the blockⁱ+1；

Union operation is that the data in block are displaced downwardly in next layer, and the scope of the block is deleted after lower brush, to cause The number of the block of layer subtracts 1 where the block；

The trigger condition of union operation is that the block number mesh of layer is equal to t where the blockⁱ+1；

The operation needs to meet following two execution condition,

And, when user thread is inserted to be recorded, there are following three kinds of situations,

If 1) variable memory cache is not up to capacity threshold, record addition is entered into user journal, then record insertion is variable Memory cache；

If 2) variable memory cache reaches capacity threshold and immutable memory cache does not exist, first can not by its RNTO Become memory cache, then a newly-built variable memory cache insertion record；

If 3) variable memory cache reaches capacity threshold and immutable memory cache is present, wait background thread will be immutable Destroyed after memory cache write-in disk, 2) user thread according still further to being processed；

And, based on LSA-tree trees, background thread comprises the following steps immutable memory cache write-in disk,

Step 1.1, if the number of the block of last layer is equal to tⁿ, then n=n+1, and newly-built one layer are made, newly-built layer is new Last layer；

Step 1.2, chooses task to be processed, and each task includes what will be performed on to be processed piece of selection and the block Operation, also regards immutable memory cache as a kind of special block；This selection operation is provided with three kinds of priority, from high to low successively It is as follows,

Priority 1：The lower brush operation of immutable memory cache, if being unsatisfactory for the execution condition of lower brush, continuation judges excellent The condition of first level 2；

Priority 2：For non-final one layer, judge whether that block number is equal to t since upper strataⁱ+ 1 and lower floor's block number is small In tⁱ⁺¹+ 1 and i+1<The layer of n, or whether in the presence of layer block number be less than tⁿAnd the layer of i+1=n；

If in the presence of selecting a certain piece in this layer to merge operation to reduce the number of the block of this layer；Then waiting Selected works choose optimal block in closing, to the optimal piece of execution union operation selected；

If in the absence of such layer, continuation judges priority 3 condition；

Priority 3：Judge whether that the data volume of storage reaches the block of storage threshold value successively from upper strata to lower floor, if depositing Then choosing first block that ergodic process is run into；If the number of child's block of the block is less than 2t, under being performed to the block Brush operation；

If child's block number mesh of the block is more than or equal to 2t, will be to the execution splitting operation；

If operation is brushed under being that the block chosen will be carried out, but because the block is in the presence of the child's block for having arrived at storage threshold value So that the execution condition of the operation is unsatisfactory for, be then changed to select child's block to carry out lower brush or splitting operation, the like carry out Recursive lookup, until final choice meets lower brush or divides the block of execution condition to first；

If final non-selected to any object block and operation, when user continues into data, the weight since step 1.1 It is new to perform；

Step 1.3, according to the actual disk operating of the tasks carrying for obtaining, including lower brush operation, union operation or division Operation；

Step 1.4, applies for an exclusive lock, after applying successfully, the tree that the actual disk operating for performing is changed Structural information write-in tree metadata change journal, and the tree in this information updating internal memory metamessage；

Step 1.5, if what is processed is the operation that moves down of immutable memory cache, destroys immutable memory cache；If useful Family thread is just slept, then wake up user thread；All locks unblock acquired in this thread, this thread are continued out from step 1.1 Begin to perform.

And, it is as follows the step of execution when user need to read data：

Step 2.1, reads variable memory cache, if the record required for reading is returned；

Step 2.2, reads immutable memory cache, if the record required for reading is returned；

Step 2.3, reads the 1st layer to n-th layer successively, finds and returns, if not found to last layer, database of descriptions In do not exist corresponding record.

And, in step 1.3, if task is lower brush operation, it is divided into 3 kinds of situations,

Situation 1, if treating, the block of lower brush, in the absence of child's block, is directly entered step 1.4 and changes the metamessage of the block with reality Now move；The scope of the block for being moved down of current layer retains；

Situation 2, if it is last layer to treat that the block of lower brush has child's block and next layer,

For the record fallen in the range of a certain piece in last layer, the block is directly changed；

For the record fallen outside last layer all pieces of scope, the distance of chosen distance and the key for being inserted into record Minimum child's block is modified, and changes the scope of child's block；

The concrete operations for changing last layer of child's block are, if the data of block storage are not up to threshold value, to be added Operation；If reaching, the data being written into carry out merger sequence and generate several new blocks with original data；

Situation 3, if it is non-final one layer to treat that the block of lower brush has child's block and next layer,

For the record fallen in the range of a certain piece of next layer, directly by data supplementing to the block；

For the record fallen outside all pieces of scope, the minimum child's block of the distance of the key of record is selected and is inserted into Added, and changed the key range of child's block；The scope of the block for being moved down of current layer retains.

And, if task is union operation, and the data in block are displaced downwardly to next layer by lower brush operation using the same manner In, the scope of the block is deleted after lower brush, to cause that the number of the block of layer where the block subtracts 1.

And, the data stored in block have index data, Bloom filter and user record, and index data and Bu Long are filtered Device storage is stored in the front end of block in the end of block, user record.

And, in the middle of block free time cavity (hole, in logic idle address space, but there is no actual machine magnetic Bound with it the address of disk or solid state hard disc) do not store this secondary all data write but store index data and Bu Long During filter, by index data and Bloom filter storage in the rear end of block, user record is appended to the afterbody of block；

And, the free time cavity in the middle of block does not store the index data and Bloom filter of this secondary data write When, the data that will be write and original aggregation of data sort, and generate a new block；Or, by by index data, the grand mistake of cloth Filter and user record are all appended to the afterbody of block, and replacement carries out merger sequence.

According to the present invention, in the case where any other performance is not sacrificed so that write amplification and substantially reduce, considerably increase Random write efficiency.In the scene of read-write mixing, random reading performance has also strengthened.Solid-state disk service life is served preferably Protection and extension, with important market value.

Brief description of the drawings

Fig. 1 is the basic framework figure that uses in this storage method for the embodiment of the present invention, predominantly the structure of LSA-tree Schematic diagram.

Fig. 2 be the embodiment of the present invention perform disk operating when, will be brushed under data in block last layer logic illustrate Figure.

Fig. 3 be the embodiment of the present invention perform disk operating when, the logic that non-final a layer is brushed under data in block is shown It is intended to.

Fig. 4 is the schematic diagram of the magnetic disk of block designed in the embodiment of the present invention.

Fig. 5 is the schematic diagram of the optional magnetic disk of block designed in the embodiment of the present invention.

Specific implementation method

The invention solves the problems that key problem be：The property for causing write performance or read-write mixing is amplified in larger the writing of traditional tree Can be low.The also serious life-span for affecting solid state hard disc is amplified in larger writing in solid state hard disc disk.The present invention is by by one The sequence completely of the record in individual block is changed to portions sequence, then causes the program pair plus Bloom filter in the afterbody of each block The method that the influence of reading performance is preferably minimized is to solve the above problems.

Fig. 1 is the basic framework figure that the embodiment of the present invention provides storage method, is divided into memory part and disk segment.It is interior Depositing includes variable memory cache and each one of immutable memory cache, and the metadata information set.The metadata information of tree Describe the metamessage of each block in tree.The scope of the metamessage of block including block, affiliated layer, in the middle of block free time cavity it is big It is small, number of times being added etc..The metamessage of these blocks is grouped by affiliated layer, and the metamessage of block is by by metamessage in every group The scope of the block of middle preservation is compared, and causes that every group of metamessage sequences sequence.Data in disk are tied using LSA-tree Structure tissue.

Block in internal memory uses full ordering structure, is divided into two kinds of variable memory cache and immutable memory cache, Qian Zheshi The not up to block of block memory capacity threshold value, the record of user can be inserted directly into；The latter's size reaches threshold value, and can only be read can not It is changed again.When user thread is inserted to be recorded, there are three kinds of situations：

If 1) variable memory cache is not up to capacity threshold, record addition is entered into user journal, then record insertion is variable Memory cache, returns；

If 2) variable memory cache reaches capacity threshold and immutable memory cache does not exist, first can not by its RNTO Become memory cache, then newly-built one " variable memory cache " insertion record, return；

If 3) variable memory cache reaches capacity threshold and immutable memory cache is present, wait background thread will be immutable (this process is detailed below) is destroyed after memory cache write-in disk, 2) user thread according still further to being processed.

Data in disk are organized using the structure of LSA-tree.The tree is divided into n-layer, and each layer is by multiple block groups Into every layer of quantity of block is incremented by with exponential.The block number of i-th (1≤i≤n-1) layer is tⁱOr tⁱ+ 1, last layer (n-th layer) block Number be less than or equal to tⁿ(t is the positive integer more than or equal to 2, for example 10).As being designated as from high to low in Fig. 1：L₁Layer has t¹It is individual Block, L₂Layer has t²Individual block ..., L_n-1Layer has t^n-1Individual block, L_nLayer has x block, and (x is more than 0 less than or equal to tⁿ).Parameter t is adjacent two The multiple of layer block number threshold value, those skilled in the art can as needed preset number of plies n, parameter t, such as n=7, t during specific implementation =10.Each block has a scope for key, when the data volume of each block storage reaches respective threshold, the data brush in block is entered Next layer has in the block of covering overlapping relation on key range.In most cases, the data that the process will be brushed directly are added To corresponding block (data are made up of several collating sequences in the block for so obtaining), by way of being sorted merger Realize, so as to avoid excessive writing amplification.When the threshold value of the block size in tree reaches 10,000,000 ranks, such as 64MB, even if splitting into Several pieces are write in next layer of block, and the average amount for writing each piece also reaches number million, the disk that can be utilized well with The order write performance of solid state hard disc.

The in store Bloom filter of each block in tree, user need not read each sequence in block when reading record, And only need to read the Bloom filter for accounting for a small amount of space and judge the record of inquiry whether in certain sequence in block, to use The read operation performance at family is barely affected compared with full block sequencing.

Operation of the background thread to the block in tree is divided three classes：Lower brush, division and merging.All operations are all only to non-final One layer of block initiation treatment, is set to i-th layer of L_i(1≤i≤n-1).For convenience of describing, by a certain piece of current layer and the one of lower floor Covering overlapping relation of the individual or multiple blocks on key is referred to as set membership, and the block of current layer is referred to as parent block, the one of next layer Individual or multiple blocks are referred to as child's block.

Lower brush operation is that the data in block are displaced downwardly in next layer, but the scope of the block still retains, layer where the block The number of block does not change.It is lower brush operation trigger condition be：The data volume of block storage reaches storage threshold value and the block Child's block number mesh is less than 2t.The operation needs to meet following two and performs condition and can just carry out：Condition 1, the number of the block of lower floor Less than tⁱ⁺¹+1(i+1<N, next layer is non-final one layer) or tⁿ(i+1=n, i.e., next layer is last layer of L_n)；Condition 2, if Lower floor is non-final one layer of (i+1<N), child's block all need to not up to store threshold value.Lower brush operation Detailed operating procedures are referring to step 1.3。

Splitting operation is that block is split into two, so that child's block number mesh of two newly-generated blocks is equal.Splitting operation Trigger condition be：The data volume of block storage reaches storage threshold value and child's block number mesh of the block is more than 2t.The operation need to expire Foot execution condition be：The number of the block of layer is less than t where the blockⁱ+1.Detailed operating procedures are referring to step 1.3.

Union operation is similar with the operation of lower brush, and the data in block are displaced downwardly in next layer, is not both uniquely in lower brush The scope of the block is deleted afterwards, to cause that the number of the block of layer where the block subtracts 1.The trigger condition of union operation is：The block institute T is equal in the block number mesh of layerⁱ+1.The operation needs to meet following two and performs condition and can just carry out：Condition 1, the block of lower floor Number is less than tⁱ⁺¹+1(i+1<N, next layer is non-final one layer) or tⁿ(i+1=n, i.e., next layer is last layer of L_n)；Condition 2, if lower floor is non-final one layer, child's block all need to not up to store threshold value.

Operation is unsatisfactory for layer or the block referred to as blocking layer or block of execution condition, block the carrying out of the operation.

There is no the operation in the data block of logic dependencies can be parallel.Variable internal memory delays in block and internal memory in disk Deposit and immutable memory cache has an exclusive lock to be bound one by one with it.When certain operation modified block, it is necessary to to change Block add exclusive lock successively, with prevent a certain piece by multiple threads simultaneously change, cause error in data.

In embodiment, the idiographic flow (operation stream that i.e. background thread is performed of immutable memory cache write-in LSA-Tree Journey) it is as follows：

Step 1.1, if the number of the block of last layer is equal to tⁿ, then n=n+1, and newly-built one layer are made, newly-built layer is new Last layer.Into step 1.2.

Step 1.2, chooses task to be processed, and each task includes to be processed piece of selection (here by " immutable internal memory Caching " also regards a kind of special block as) and the block on by operation to be performed.This selection operation is provided with three kinds of priority, and (this three Kind priority ensure that the block number of each layer of tree is met above to every layer of requirement of block number, and allow that tree is efficiently deposited Store up the data that immutable memory cache is brushed down), being followed successively by from high to low：

Priority 1：The lower brush operation of immutable memory cache, if being unsatisfactory for the execution condition of lower brush, continuation judges excellent The condition of first level 2.

Priority 2：For non-final one layer of L_i(1≤i≤n-1), judges whether that block number is equal to t since upper strataⁱ+1 And lower floor's block number is less than tⁱ⁺¹+1(i+1<N, next layer is non-final one layer) or less than tⁿ(next layer is last layer of L_n) layer.

If in the presence of selecting a certain piece in this layer to merge operation to reduce the number of the block of this layer.The plan of selection Slightly：

The all piece addition candidate collection (constraints of every layer block number by set of the number less than or equal to t of child's block will be met Block as being apparent from there will necessarily be at least one in gathering).Then optimal block is chosen in candidate collection, Selection Strategy is： Given birth to after the data volume of block storage is the bigger the better divided by with child's block number purpose value, and the block merges with the scope of adjacent block Into new scope child's block number it is the smaller the better.To the optimal piece of execution union operation selected.If in the absence of so Layer, then continue judge priority 3 condition.

Priority 3：Judge whether that the data volume of storage reaches the block of storage threshold value successively from upper strata to lower floor, if depositing Then choosing first block that ergodic process is run into (block of obstruction " immutable memory cache " is preferential).If child's block of the block Number be less than 2t, then lower brush operation will be performed to the block；If child's block number mesh of the block is more than or equal to 2t, will be to the execution Splitting operation.If operation is brushed under being that the block chosen will be carried out, but because there is the child's block for having arrived at storage threshold value in the block And cause the operation execution condition be unsatisfactory for, then be changed to select child's block carry out lower brush or splitting operation, the like enter Row recursive lookup, until final choice meets lower brush or divides the block of execution condition to first.If final non-selected to any Object block and operation, then re-execute since step 1.1.

On it have selected to be processed piece and the block by operation to be performed after, then the block that will be changed locked successively, If after all locks are all locked successfully, becoming work(and obtaining task, if any block is locked failing, to added all lock solutions Lock, and re-executed since step 1.1.

Step 1.1 ensure that the block number of last layer when being performed the step of after step 1.2 is necessarily smaller than tⁿ。 If being equal to t in the presence of one or more block numbers in the priority 2 so in step 1.2ⁱ+ 1 layer, must choose one simultaneously completely Sufficient block number is equal to tⁱ+ 1 and lower floor's block number be less than tⁱ⁺¹+1(i+1<N, next layer is non-final one layer) or less than tⁿ(next layer is for most Later layer Lⁿ) layer merge operation.

The purpose of setting up of the priority 2 in step 1.2 is to ensure there is the meeting execution condition of the task calmly in priority 3 (t will not be equal to because of all layers of block numberⁱ+ 1 and cause priority 3 in all tasks be blocked) so that tree necessarily may be used Normally to operate.

The purpose that priority 3 in step 1.2 is set up is to plant the block for reaching storage threshold value in treatment tree, both can be resistance The block of layer task causes tree to continue to store the data brushed under immutable memory cache, or be not obstruction upper strata beyond the Great Wall The block of task and optimize performance.

Step 1.3, according to the actual disk operating of the tasks carrying for obtaining, concrete operations are following (logically independent without mutual The task of reprimand can be with executed in parallel)：

If 1) lower brush operation, is divided into 3 kinds of situations：

Situation 1：If treating, the block of lower brush, in the absence of child's block, is directly entered metadata (unit's letter that step 1.4 changes the block Breath) moved down with realization.The scope of the block for being moved down of current layer retains.

Situation 2：If it is last layer to treat that the block of lower brush has child's block and next layer.For fall in last layer certain Record in the range of one piece, directly changes the block；For the record fallen outside last layer all pieces of scope, selection away from Modified from the child's block for being inserted into minimum with a distance from the key of record.Need to change the scope of child's block for the latter.Repair The concrete operations for changing last layer of child's block are：If the data of block storage are not up to threshold value, additional operation is carried out；If reaching Arrive, then the data being written into and original data carry out merger sequence generation several new blocks (makes newly-generated block sum Than original block sum most the more).The scope of the block for being moved down of current layer retains.Referring to Fig. 2, there is the child being added Sub-block, there is also child's block of the sequence that is merged.

Situation 3：If it is non-final one layer to treat that the block of lower brush has child's block and next layer.During lower brush data, for falling Record in the range of a certain piece of next layer, directly by data supplementing to the block；For the note fallen outside all pieces of scope Record, the child's block for selecting and being inserted into the distance minimum of the key of record is added, and changes the key range of child's block.Current layer The block for being moved down scope retain.Referring to Fig. 3, the child's block being added is only existed.

Compared to the prior art, this operation almost completely avoid merger sequence, and be replaced with additional operation, therefore greatly Big reducing writes amplification, improves write performance.

If 2) union operation, concrete operations flow is similar with the operation of lower brush, i.e., according to above-mentioned lower brush operation the same manner Data in block are displaced downwardly in next layer, the scope for not being both the block after lower brush uniquely is deleted, to cause the block institute Subtract 1 in the number of the block of layer.

If 3) splitting operation, then the block splitting is into two new blocks, the child that two newly-generated blocks possess after division Block number mesh is equal.

Step 1.4, applies for an exclusive lock, to ensure that only one of which background thread can carry out this step in a certain moment Suddenly；After applying successfully, the structural information of the tree that the actual disk operating for performing is changed writes " tree metadata change journal ", And the metamessage of the tree in this information updating internal memory.

Step 1.5, if what is processed is the operation that moves down of " immutable memory cache ", destroys " immutable memory cache ", if There is user thread just to sleep, then wake up user thread；By all locks unblock acquired in this thread.This thread from step 1.1 after It is continuous to start to perform.

It is as follows the step of execution when user need to read data：

Step 2.1, reads " variable memory cache ", if the record required for reading is returned；

Step 2.2, reads " immutable memory cache ", if the record required for reading is returned；

Step 2.3, successively read layer L₁->L_n, find and return, if not found to last layer, in database of descriptions not There is corresponding record.In reading process, need not be held in the reading process that disk is caused by MVCC (Multi version concurrency control) There is any lock.

Fig. 4 is that this method realizes organizational form of the block on disk, and shown in such as Fig. 4 (left side), the data stored in block have rope Argument evidence, Bloom filter and user record；The above two are stored at the end of block, and user record stores the head in block；Depositing During storage data, in fact it could happen that three kinds of situations：

1) the idle cavity in the middle of block stores this secondary all data write, and takes as shown in Fig. 4 (left side) Storage mode, write the front end that the user record that n-th writes is sequentially stored in block for the 1st time, write the rope that n-th is write the 1st time Argument evidence and Bloom filter are sequentially stored in the rear end of block；

2) the free time cavity in the middle of block store secondary all data write but can store index data with During Bloom filter, take the storage mode as shown in Fig. 4 (right side), the index data that (n+1)th time is write and Bloom filter according to The user record that (n+1)th time is write is appended to the afterbody of block behind the rear end of block for secondary storage；

3) when middle free time cavity does not store index data and Bloom filter, then the data that will write and original Aggregation of data sequence generation one new block.When realizing, it is considered as when reaching 95% by the data that will be stored in block and reaches Storage threshold value can almost avoid the occurrence of this kind of completely.Further, present invention proposition, it is highly preferred that can substitute Merger is sorted, and above-mentioned merging method is implemented without by way of Fig. 5, will index data, Bloom filter and user note Record is all appended to the afterbody of block.

When realizing, block can be realized with the mode of file, and the threshold value of each file is 64MB, but can exceed 64MB, such as be worked as When being stored in the way of Fig. 4 (right side).

Note, in the record in reading block, sequence additional rearward on first read time, if finding required record, other The sequence not read just without reading, can be returned directly.

The process described above is only " to be changed to partial ordered mode by by the complete sortord of the record in a block (being made up of multiple collating sequences) writes amplification to greatly reduce, then makes plus Bloom filter in the index information of each block Influence of the program to reading performance is preferably minimized " example of thought.It is all it is of the invention spirit with principle within, done Any modification, improve etc., should be included within the scope of the present invention, applied in the block such as in buffer-tree Same logic is also within protection scope of the present invention.

Claims

1. a kind of while adapting to the mass data storage means of disk and solid state disk read-write characteristic, it is characterised in that：By one The sequence completely of the record in block is changed to portions sequence, then adds Bloom filter in the afterbody of each block, and implementation is as follows, Internal memory includes variable memory cache and immutable memory cache, the metadata information of tree, sets up and is referred to as Log-Structured The structure of Append-Tree trees, the data in disk use Log-Structured Append-Tree structure organizations, if the tree It is divided into n-layer, at least t in i-th layerⁱThe individual most t of blockⁱ+ 1 block, 1≤i≤n-1, parameter t are the multiple of adjacent two layers block number threshold value, Last layer is less than or equal to tⁿIndividual block；Each block has a scope for key, when the data volume of each block storage reaches respective threshold When, the data brush in block being entered to have in scope in next layer in the block of covering overlapping relation, the data that will be brushed directly are added When in corresponding block, a certain piece of data are made up of several collating sequences, rather than being realized by way of being sorted merger Sorted completely in block；The in store Bloom filter of each block in tree.

2. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 1, and it is special Levy and be：Operation of the background thread to the block in Log-Structured Append-Tree trees is divided three classes, including lower brush, point Split and merge；Block of all operations all only to non-final one layer is initiated；By a certain piece of current layer with lower floor one or more Covering overlapping relation of the block on key is referred to as set membership, and the block of current layer is referred to as parent block, one or more of next layer Block is referred to as child's block；

Lower brush operation is that the data in block are displaced downwardly in next layer, but the scope of the block still retains, the block of layer where the block Number does not change；

The trigger condition of lower brush operation is that the data volume of block storage reaches storage threshold value and child's block number mesh of the block is less than 2t；

Condition 1, the number of the block of lower floor is less than tⁱ⁺¹+ 1 and i+1<N, or less than tⁿAnd i=n-1；

The trigger condition of splitting operation is that the data volume of block storage reaches storage threshold value and child's block number mesh of the block is more than 2t；

Union operation is that the data in block are displaced downwardly in next layer, and the scope of the block is deleted after lower brush, to cause the block The number of the block of place layer subtracts 1；

The operation needs to meet following two execution condition,

Condition 2, if lower floor is non-final one layer, child's block all need to not up to store threshold value.

3. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 2, and it is special Levy and be：When user thread is inserted to be recorded, there are following three kinds of situations,

If 1) variable memory cache is not up to capacity threshold, record addition is entered into user journal, then record is inserted into variable internal memory Caching；

It is first that its RNTO is immutable interior if 2) variable memory cache reaches capacity threshold and immutable memory cache does not exist Deposit caching, then a newly-built variable memory cache insertion record；

If 3) variable memory cache reaches capacity threshold and immutable memory cache is present, background thread is waited by immutable internal memory Destroyed after caching write-in disk, 2) user thread according still further to being processed.

4. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 3, and it is special Levy and be：Based on Log-Structured Append-Tree trees, background thread includes immutable memory cache write-in disk Following steps,

Step 1.1, if the number of the block of last layer is equal to tⁿ, then make n=n+1, and newly-built one layer, newly-built layer be it is new most Later layer；

Step 1.2, chooses task to be processed, and each task includes the behaviour that will be performed on to be processed piece of selection and the block Make, also regard immutable memory cache as a kind of special block；This selection operation is provided with three kinds of priority, from high to low successively such as Under,

Priority 1：The lower brush operation of immutable memory cache, if being unsatisfactory for the execution condition of lower brush, continuation judges priority 2 Condition；

Priority 2：For non-final one layer, judge whether that block number is equal to t since upper strataⁱ+ 1 and lower floor's block number be less than tⁱ⁺¹ + 1 and i+1<The layer of n, or it is less than t with the presence or absence of block numberⁿAnd the layer of i=n-1；

If in the presence of selecting a certain piece in this layer to merge operation to reduce the number of the block of this layer；Then in Candidate Set Optimal block is chosen in conjunction, to the optimal piece of execution union operation selected；

If in the absence of such layer, continuation judges priority 3 condition；

Priority 3：Judge whether that the data volume of storage reaches the block of storage threshold value successively from upper strata to lower floor, if in the presence of if Choose first block that ergodic process is run into；If the number of child's block of the block is less than 2t, lower brush behaviour will be performed to the block Make；

If operation is brushed under being that the block chosen will be carried out, but because the block is caused in the presence of the child's block for having arrived at storage threshold value The execution condition of the operation is unsatisfactory for, then be changed to select child's block carry out lower brush or splitting operation, the like carry out recurrence Search, until final choice meets lower brush or divides the block of execution condition to first；

If final non-selected to any object block and operation, when user continues into data, held again since step 1.1 OK；

Step 1.3, according to the actual disk operating of the tasks carrying for obtaining, including lower brush operation, union operation or splitting operation；

Step 1.4, applies for an exclusive lock, after applying successfully, the structure of the tree that the actual disk operating for performing is changed Information write-in tree metadata change journal, and the tree in this information updating internal memory metamessage；

Step 1.5, if what is processed is the operation that moves down of immutable memory cache, destroys immutable memory cache；If there is user's line Journey is just slept, then wake up user thread；All locks unblock acquired in this thread, this thread are held since step 1.1 continues OK.

5. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 4, and it is special Levy and be：

It is as follows the step of execution when user need to read data：

Step 2.3, reads the 1st layer to n-th layer successively, finds and returns, if not found to last layer, in database of descriptions not There is corresponding record.

6. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 4, and it is special Levy and be：In step 1.3, if task is lower brush operation, it is divided into 3 kinds of situations,

Situation 1, if treating, the block of lower brush, in the absence of child's block, is directly entered step 1.4 and changes the metamessage of the block to realize down Move；The scope of the block for being moved down of current layer retains；

For the record fallen outside last layer all pieces of scope, chosen distance is minimum with the distance of the key for being inserted into record Child's block modify, and change the key range of child's block；

The concrete operations for changing last layer of child's block are, if the data of block storage are not up to threshold value, carry out additional operation； If reaching, the data being written into carry out merger sequence and generate several new blocks with original data；

For the record fallen outside all pieces of scope, the child's block for selecting and being inserted into the distance minimum of the key of record is carried out It is additional, and change the key range of child's block；The scope of the block for being moved down of current layer retains.

7. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 6, and it is special Levy and be：If task is union operation, and be displaced downwardly to the data in block in next layer using the same manner by lower brush operation, lower brush The scope of the block is deleted afterwards, to cause that the number of the block of layer where the block subtracts 1.

8. the magnanimity of disk and solid state disk read-write characteristic is adapted to simultaneously according to claim 1 or 2 or 3 or 4 or 5 or 6 or 7 Date storage method, it is characterised in that：The data stored in block have index data, Bloom filter and user record, index number According to, at the end of block, user record is stored in the front end of block with Bloom filter storage.

9. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 8, and it is special Levy and be：Free time cavity in the middle of block does not store this secondary all data write but stores index data and Bu Long mistakes During filter, by index data and Bloom filter storage in the rear end of block, user record is appended to the afterbody of block.

10. the mass data storage means of disk and solid state disk read-write characteristic are adapted to simultaneously according to claim 8, and it is special Levy and be：When free time cavity in the middle of block does not store the index data and Bloom filter of this secondary data write, will The data write and original aggregation of data sort, and generate a new block；Or, by by index data, Bloom filter and User record is all appended to the afterbody of block, and replacement carries out merger sequence.