CN106844650A - A kind of daily record merges the merging method and system of tree - Google Patents

A kind of daily record merges the merging method and system of tree Download PDF

Info

Publication number
CN106844650A
CN106844650A CN201710047936.3A CN201710047936A CN106844650A CN 106844650 A CN106844650 A CN 106844650A CN 201710047936 A CN201710047936 A CN 201710047936A CN 106844650 A CN106844650 A CN 106844650A
Authority
CN
China
Prior art keywords
sstable
real
key
virtual
merging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710047936.3A
Other languages
Chinese (zh)
Inventor
潘锋烽
熊劲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201710047936.3A priority Critical patent/CN106844650A/en
Publication of CN106844650A publication Critical patent/CN106844650A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes that a kind of daily record merges the merging method and system of tree, and method includes real combining step, and data merge and merge with metadata, generate Real SSTable, and data are merged into and merge SSTable;Empty combining step, generates Virtual SSTable, and only metadata is merged, the data source of record Virtual SSTable;The read step of Real SSTable, is read out to Real SSTable, when key falls in the key range of Real SSTable, the corresponding value values of key is directly searched on the Real SSTable;The read step of Virtual SSTable;The Virtual SSTable are merged in reading process, Virtual SSTable are become into Real SSTable.

Description

A kind of daily record merges the merging method and system of tree
Technical field
Merge number technical field the present invention relates to daily record, more particularly to a kind of daily record merges the merging method and system of tree.
Background technology
Daily record merges tree (Log-Structured Merge Tree, abbreviation LSM-Tree) and is made up of multicompartment, including one Individual memory subassembly and multiple DPU disk pack units, component size are exponentially increased, and its framework is as shown in Fig. 2 memory subassembly has memory table (Memtable) constitute, each DPU disk pack unit is made up of one or more sequencing character string table (SSTable), LSM- The main thought of Tree is, by writing or updating and be stored in internal memory to data, to reach these operations are suitable after the threshold value specified It is written in storage device to sequence, insertion, updates, deletes etc. and update operation by memory subassembly service, searches and the operation such as scanning By all component service, it uses strange land update mode, therefore, in LSM-tree, same key there may be multiple versions Value, deletion action is to add one in memory subassembly to delete mark, data is deleted without real, follow-up Can do the deletion and merging of data during compact according to the new and old edition and deleted marker of same key, different key it Between do sorting operation, generate new SSTable files, and delete old SSTable files.
In order to keep LSM-tree for future reading and writing operation efficiently, it is necessary to constantly by data from component high to low Component is moved, and when the size of certain component exceedes threshold value, is triggered it and is merged (Compaction) operation, is existed per secondary data movement Carried out between adjacent component, be ranked up for two components data by period, delete invalid data and legacy data, this process claims It is compaction, by taking Fig. 3 as an example, when component C2 exceedes threshold value, then selects a SSTable T22 to be closed in component C2 And, from component C3 there is the SSTable T32 and T33 of overlap with the key-range of T22 in component C2 in selection.By T22 Three new SSTable are merged into T32, T33, is T35, T36, T37;And the SSTable of neotectonics is placed on C3.With this The mode of kind, component C3 is moved down into by data T22 from component C2.
Compaction can control the I/O operation that the key/value based on LSM-Tree is stored, in Compaction processes In, key-value pair can flow to larger level (high level) from smaller level (low layer), due to every layer of presetting capacity Limitation, key-value pair can cause the pause of writing of system as the slow flowing of larger level, Fig. 4 show a key-value pair from Smaller level flow to the read-write operation process during larger level, during compaction, a key assignments Write out many times to that can be read into, even in same layer, reason is that compaction processes are a polling dispatchings, And the speed of poll fast can cross level layers of larger in smaller level layers, as a result phase is moved in a key-value pair Just be already engaged in multiple compaction processes before adjacent bed, it is this it is serious write amplification phenomenon and result in often occur writing temporary Stop, so as to reduce the write performance of system.
The existing solution that scale-up problem is write for LSM-Tree, mainly there is following several ways:
Technical scheme 1:The condition of traditional LSM-Tree triggerings compaction is amplified, that is, increases the threshold value of each component Size, main disadvantage is that:Component 0 to the scale-up problem of writing of component 1 can only be alleviated, and succeeding layer level remains unchanged to exist and writes amplification Problem.
Technical scheme 2:Key is divided into multiple key range by bLSM so that compaction falls in a small amount of key In range, it is to avoid the data in uncorrelated key range carry out compaction, as shown in fig. 5, it is assumed that in the range of N~Q not It is disconnected to there are data to insert, then can only trigger compaction, the compaction without triggering other scopes in the range of N~Q, As shown in Figure 5.Its major defect is:The amplification of writing that the compaction in a certain key range cannot be avoided to be brought is asked Topic.
BLSM is published in Proceedings of the 2012ACM SIGMOD International Conference on Management of Data。
Technical scheme 3:VT-Tree can judge when key-value pair in continuous multilayer all without identical key it is corresponding its During the key-value pair of its version, layer or last layer containing identical key values directly can be reached across these layers, so as to save Key-value pair brought I/O expenses of movement before multilayer, as shown in Figure 6.Its major defect is:Hot spot data cannot be alleviated Carry out that compaction brought writes scale-up problem.
VT-Tree is published in 11th USENIX Conference on File and Storage Technologies (FAST’13)
Technical scheme 4:Key is stored separately with value in LSM-Tree in Wisckey, will key and value Pointer is stored in LSM-Tree, and the True Data of value is then stored elsewhere, therefore is carrying out compaction mistakes Cheng Zhong, only key can carry out repetitive read-write with the pointer of value, as shown in Figure 7.Its major defect is:Key due to Compaction is read and write repeatedly, and under some key larger scene, the scale-up problem of writing of LSM-Tree is still present.
Wisckey is published in 14th USENIX Conference on File and Storage Technologies (FAST’16)
The content of the invention
Cannot be solved because data are (same due to merging in LSM-Tree present invention aim to address above-mentioned prior art Layer & cross-layers) cause the LSM-Tree for reading and writing and triggering repeatedly to write scale-up problem, propose a kind of daily record merge tree merging method and System.
The present invention proposes that a kind of daily record merges the merging method of tree, including:
Real combining step, including data merging merges with metadata, and generates Real SSTable, and wherein data are merged into SSTable is merged, metadata merges to be included merging key range, file number, file size information;
Empty combining step, generates Virtual SSTable, and only metadata merged, and metadata merges bag Include merging key range, file number, file size information, and record the data of the Virtual SSTable Source;
The read step of Real SSTable, is read out to the Real SSTable, wherein when key falls described In the key range of Real SSTable, then the value values corresponding to key are directly searched on the Real SSTable;
The read step of Virtual SSTable, when key falls in the key range of Virtual SSTable, then leads to Cross metadata information and search the Virtual SSTable data sources corresponding with key, and in the Real SSTable Value values corresponding to middle lookup key;
The combining step of Virtual SSTable, merges in reading process to the Virtual SSTable, The Virtual SSTable are become into Real SSTable, so as to lift reading performance.
The real combining step includes:
11. key/value for being successively read each Real SSTable;
12. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In;
After 13. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
The empty combining step includes:
21. metadata for collecting Real SSTable;
The 22. key range in Real SSTable, obtain new key range, cover all Real The scope of SSTable;
New key range, according to the number N of Real SSTable, are divided into N number of by 23., represent N number of Virtual The file size of SSTable, wherein Virtual SSTable is arranged to identical with the file size of Real SSTable, The reference number of a document of Virtual SSTable passes through empty merging process and distributes unitedly, and data source is involved in then merging for void All Real SSTable.
The read step of the Virtual SSTable includes
41. judge whether the key for requiring to look up falls in the key scopes of Virtual SSTable;
42. if it was not then return
If 43. are finding data source, i.e. Real SSTables by the metadata of Virtual SSTable;
44. carry out lookup key according to the index of Real SSTable;
45. if it is found, then return to the value values of key;
If 46. do not find, the lookup of next Real SSTable is carried out, until by Virtual SSTable Data source included in Real SSTable all search one time;
If 47., again without finding, return.
The combining step of the Virtual SSTable includes
The 51. data source Real SSTables for obtaining Virtual SSTable;
52. key/value for being successively read each Real SSTable;
53. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In;
After 54. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
The present invention also proposes that a kind of daily record merges the combination system of tree, including:
Real merging module, including data merging merges with metadata, and generates Real SSTable, and wherein data are merged into SSTable is merged, metadata merges to be included merging key range, file number, file size information;
Empty merging module, for generating Virtual SSTable, and is only merged to metadata, and metadata is closed And including merging key range, file number, file size information, and record the data of the Virtual SSTable Source;
The read module of Real SSTable, for being read out to the Real SSTable, wherein when key falls in institute In stating the key range of Real SSTable, then the value values corresponding to key are directly searched on the Real SSTable;
The read module of Virtual SSTable, for falling in the key range of Virtual SSTable as key, The Virtual SSTable data sources corresponding with key is then searched by metadata information, and in the Real The value values corresponding to key are searched in SSTable;
The merging module of Virtual SSTable, for being closed to the Virtual SSTable in reading process And, the Virtual SSTable are become into Real SSTable, so as to lift reading performance.
The real merging module includes:
11. key/value for being successively read each Real SSTable;
12. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In;
After 13. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
The empty merging module includes:
21. metadata for collecting Real SSTable;
The 22. key range in Real SSTable, obtain new key range, cover all Real The scope of SSTable;
New key range, according to the number N of Real SSTable, are divided into N number of by 23., represent N number of Virtual The file size of SSTable, wherein Virtual SSTable is arranged to identical with the file size of Real SSTable, The reference number of a document of Virtual SSTable passes through empty merging process and distributes unitedly, and data source is involved in then merging for void All Real SSTable.
The read module of the Virtual SSTable includes
41. judge whether the key for requiring to look up falls in the key scopes of Virtual SSTable;
42. if it was not then return
If 43. are finding data source, i.e. Real SSTables by the metadata of Virtual SSTable;
44. carry out lookup key according to the index of Real SSTable;
45. if it is found, then return to the value values of key;
If 46. do not find, the lookup of next Real SSTable is carried out, until by Virtual SSTable Data source included in Real SSTable all search one time;
If 47., again without finding, return.
The merging module of the Virtual SSTable includes
The 51. data source Real SSTables for obtaining Virtual SSTable;
52. key/value for being successively read each Real SSTable;
53. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In;
After 54. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
From above scheme, the advantage of the invention is that:
1. the present invention can reduce merge brought write scale-up problem so that the write performance of lifting system, and and its He reduce LSM-Tree write scale-up problem method be it is orthogonal, it is stackable to use;
2. the present invention can reduce the I/O amounts in merging, and for this kind of storage devices of SSD, extend its service life;
Fig. 1 illustrates the performance evaluating of the present invention and RocksDB, from figure 1 it appears that advantage of the invention is that:
Under the load of Write-intensive (writing intensive), overall performance lifts 30%~1 times;
Under the load of Read-intensive (reading intensive), overall performance substantially maintains an equal level with RocksDB;
LSM-Tree itself is mainly used in Write-intensive loads.
Brief description of the drawings
Fig. 1 is performance evaluating figure of the present invention;
Fig. 2 is LSM-Tree Organization Charts;
Fig. 3 is LSM-Tree Compaction exemplary plots;
Fig. 4 is the flow process figure that LSM-Tree carries out key-value pair during Compaction;
Fig. 5 is bLSM figures;
Fig. 6 is VT-Tree figures;
Fig. 7 is Wisckey figures;
Fig. 8 is real merging particular flow sheet;
Fig. 9 is empty merging particular flow sheet;
Figure 10 is the reading flow chart of Virtual SSTable;
Figure 11 is the merging figure of Virtual SSTable;
Figure 12 is influence figures of the different VCT for write performance;
Figure 13 is the influence figure that different VCT measured and merged the time for I/O;
Figure 14 is influence figures of the different MCT for reading performance.
Specific embodiment
It is below overall flow of the invention, it is as follows:
1. it is real to merge
Real merging includes that data merge and metadata merges, and the SSTable that it is produced is Real SSTable.Wherein data Merging is that SSTable merges, and metadata merging includes merging the information such as key range, file number, file size, Idiographic flow is as shown in Figure 8.
11. key/value for being successively read each Real SSTable;
12. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In (be defaulted as 2MB)
After 13. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
2. it is empty to merge
It is Virtual SSTable that void merges the SSTable for producing.Void merges and only metadata is merged.Metadata merges Including merging key range, file number, file size etc., and it is by which to record the Virtual SSTable Real SSTable constitute (referred to as parentSST), the i.e. data source of Virtual SSTable.Idiographic flow such as Fig. 9 institutes Show.
21. metadata for collecting Real SSTable involved during void merges;
The 22. key range in Real SSTable, obtain a new key range, cover above-mentioned all The scope of Real SSTable;
23. according to void merge in Real SSTable number N, new key range are divided into N number of, this N number of key's The N number of Virtual SSTable of Range Representation, wherein file size unification are arranged to as Real SSTable sizes (acquiescence It is 2MB), reference number of a document passes through empty merging process and distributes unitedly, and data source is then involved all during this void merges Real SSTable。
The reading of 3.Real SSTable
Real SSTable reading flow (with tradition based on LSM-Tree key assignments system, such as LevelDB, RocksDB, identical):When key falls in the key range of Real SSTable, then search directly over corresponding to key Value values.
The reading of 4.Virtual SSTable
The reading flow of Virtual SSTable:When key falls in the key range of Virtual SSTable, then lead to Cross metadata information and find the corresponding data sources of Virtual SSTable, i.e. Real SSTable, and in Real SSTable The value values corresponding to key are searched, idiographic flow is as shown in Figure 10.
41. judge whether the key for requiring to look up falls in the key scopes of Virtual SSTable;
42. if it was not then return
If 43. are finding its data source, i.e. Real SSTables by the metadata of Virtual SSTable;
44. carry out lookup key according to the index of Real SSTable;
45. if it is found, then return to the value values of key;
If 46. do not find, the lookup of next Real SSTable is carried out, until by the Virtual Real SSTable included in the data source of SSTable are looked for one time;
If 47., again without finding, return;
The merging of 5.Virtual SSTable
In order to reduce influences of the Virtual SSTable to reading performance, can be to Virtual SSTable in reading process Merge, Virtual SSTable are become into Real SSTable, so as to lift reading performance.Idiographic flow is as shown in figure 11.
51. data source-Real the SSTables for obtaining Virtual SSTable;
52. key/value for being successively read each Real SSTable;
53. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In (be defaulted as 2MB)
After 54. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
Example 1:Union operation (void merges and the real combination for merging)
In the present invention when a certain layer of LSM-Tree reaches threshold value, it is necessary to merge, then the following institute of idiographic flow Show:
A. selection needs the SSTable for merging;
B. the number comprising Real SSTable in these SSTable is counted, N is designated as;
If c. N exceedes threshold value VCT, real merging is carried out;
D.N is less than VCT, then carry out empty merging;
Wherein VCT is to discriminate between the real parameter merged with empty merging, and different VCT are different for the influence for merging:
VCT is smaller, and the real number of times that merges is more, so as to cause I/O expenses larger;
VCT is bigger, and it is more that void merges number of times, although can save I/O expenses, but the follow-up real expense that merges is larger, So as to cause the merging time more long;
Figure 12 is illustrated under Write-100% loads, the influence of different VCT to write performance, wherein working as VCT=12 When, write performance is optimal, and it is main reason is that two aspects:It is the I/O amounts of saving and total merging time, specific as schemed Shown in 13, VCT=12 can save certain I/O amounts, and the time for merging is minimum, therefore VCT=12 is most in this example Excellent.
Example 2:Get operates (reading of Real SSTable and Virtual SSTable, the conjunction of Virtual SSTable And)
Its basic procedure is as follows:
A. looked for from top to bottom from LSM-Tree;
B. the SSTable for each layer, SSTable is positioned by metadata information thereon-key range;
C. if Real SSTable, then directly searched;
D. if Virtual SSTable, then by the parentSST of Virtual SSTable, its data source is found Afterwards, then searched;
E. do not find, then continue to be searched in this layer or the next layer of corresponding SSTable of positioning, until finding or Untill person has looked for;
Get operations of the invention can also be related to the merging of Virtual SSTable, and idiographic flow is as follows:
A. when Get operations are related to Virtual SSTable, two values of current Virtual SSTable can be judged: Read the Real SSTable that the cumulative frequency (R) and the Virtual SSTable of the Virtual SSTable are included Number (M)
B. R is worked as>RCT&&M>During MCT, then the merging of Virtual SSTable, otherwise nonjoinder are carried out;
The Real included using the cumulative frequency for reading the Virtual SSTable and with the Virtual SSTable The number of SSTable is used as triggering the condition that Virtual SSTable merge, main reason is that:
Read cumulative frequency to consider from temperature angle, temperature is higher, and virtual SSTable are written infrequently, and temperature is lower, Virtual SSTable are seldom read;
The Real SSTable numbers that Virtual SSTable are included are more, and reading performance is poorer;Conversely, reading performance is got over It is good;
Therefore when R and M meets condition simultaneously, show that the temperature of current Virtual SSTable is higher, and included Real SSTable numbers have exceeded threshold range, can have a strong impact on reading performance, it is necessary to merge.The default value of RCT is used 5, and the selection for MCT, Figure 14 illustrate influences of the different MCT for reading performance.
When MCT=3~5, reading performance is essentially identical with RocksDB;
When MCT=7~11, reading performance is poor, and it is main reason is that the Virtual SSTable shadows not being merged Reading performance is rung;
Therefore, the selection of MCT needs to consider two factors:The expense that reading performance and Merging zone method come, summary two because Element, MCT=5 is more suitable in this example, and reason is that 1) reading performance is substantially the same with RocksDB;2) compared to MCT=3, Less Virtual SSTable can be triggered to merge, save resources.
The present invention also proposes that a kind of daily record merges the combination system of tree, including:
Real merging module, including data merging merges with metadata, and generates Real SSTable, and wherein data are merged into SSTable is merged, metadata merges to be included merging key range, file number, file size information;
Empty merging module, for generating Virtual SSTable, and is only merged to metadata, and metadata is closed And including merging key range, file number, file size information, and record the data of the Virtual SSTable Source;
The read module of Real SSTable, for being read out to the Real SSTable, wherein when key falls in institute In stating the key range of Real SSTable, then the value values corresponding to key are directly searched on the Real SSTable;
The read module of Virtual SSTable, for falling in the key range of Virtual SSTable as key, The Virtual SSTable data sources corresponding with key is then searched by metadata information, and in the Real The value values corresponding to key are searched in SSTable;
The merging module of Virtual SSTable, for being closed to the Virtual SSTable in reading process And, the Virtual SSTable are become into Real SSTable, so as to lift reading performance.
The real merging module includes:
11. key/value for being successively read each Real SSTable;
12. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In;
After 13. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.
The empty merging module includes:
21. metadata for collecting Real SSTable;
The 22. key range in Real SSTable, obtain new key range, cover all Real The scope of SSTable;
New key range, according to the number N of Real SSTable, are divided into N number of by 23., represent N number of Virtual The file size of SSTable, wherein Virtual SSTable is arranged to identical with the file size of Real SSTable, The reference number of a document of Virtual SSTable passes through empty merging process and distributes unitedly, and data source is involved in then merging for void All Real SSTable.
The read module of the Virtual SSTable includes
41. judge whether the key for requiring to look up falls in the key scopes of Virtual SSTable;
42. if it was not then return
If 43. are finding data source, i.e. Real SSTables by the metadata of Virtual SSTable;
44. carry out lookup key according to the index of Real SSTable;
45. if it is found, then return to the value values of key;
If 46. do not find, the lookup of next Real SSTable is carried out, until by Virtual SSTable Data source included in Real SSTable all search one time;
If 47., again without finding, return.
The merging module of the Virtual SSTable includes
The 51. data source Real SSTables for obtaining Virtual SSTable;
52. key/value for being successively read each Real SSTable;
53. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size In;
After 54. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins Wherein, until writing.

Claims (10)

1. a kind of daily record merges the merging method of tree, it is characterised in that including:
Real combining step, including data merge and merge with metadata, and generate Real SSTable, wherein data merge into by SSTable is merged, and metadata merges to be included merging key range, file number, file size information;
Empty combining step, generates Virtual SSTable, and only metadata merged, and metadata is merged including closing And key range, file number, file size information, and record the data source of the Virtual SSTable;
The read step of Real SSTable, is read out to the Real SSTable, wherein when key falls in the Real In the key range of SSTable, then the value values corresponding to key are directly searched on the Real SSTable;
The read step of Virtual SSTable, when key falls in the key range of Virtual SSTable, then by unit Data source Virtual SSTable corresponding with key described in data information search, and looked into the Real SSTable Look for the value values corresponding to key;
The combining step of Virtual SSTable, merges in reading process to the Virtual SSTable, by institute State Virtual SSTable and become Real SSTable, so as to lift reading performance.
2. daily record as claimed in claim 1 merges the merging method of tree, it is characterised in that the real combining step includes:
11. key/value for being successively read each Real SSTable;
12. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size;
After 13. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins wherein, Until writing.
3. daily record as claimed in claim 1 merges the merging method of tree, it is characterised in that the empty combining step includes:
21. metadata for collecting Real SSTable;
The 22. key range in Real SSTable, obtain new key range, cover all Real SSTable's Scope;
New key range, according to the number N of Real SSTable, are divided into N number of by 23., represent N number of Virtual SSTable, The file size of wherein Virtual SSTable is arranged to identical with the file size of Real SSTable, Virtual The reference number of a document of SSTable passes through empty merging process and distributes unitedly, and data source is then all Real involved in empty merging SSTable。
4. daily record as claimed in claim 1 merges the merging method of tree, it is characterised in that the Virtual SSTable's Read step includes
41. judge whether the key for requiring to look up falls in the key scopes of Virtual SSTable;
42. if it was not then return
If 43. are finding data source, i.e. Real SSTables by the metadata of Virtual SSTable;
44. carry out lookup key according to the index of Real SSTable;
45. if it is found, then return to the value values of key;
If 46. do not find, the lookup of next Real SSTable is carried out, until by the number of Virtual SSTable All searched one time according to the Real SSTable included in source;
If 47., again without finding, return.
5. daily record as claimed in claim 1 merges the merging method of tree, it is characterised in that the Virtual SSTable's Combining step includes
The 51. data source Real SSTables for obtaining Virtual SSTable;
52. key/value for being successively read each Real SSTable;
53. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size;
After 54. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins wherein, Until writing.
6. a kind of daily record merges the combination system of tree, it is characterised in that including:
Real merging module, including data merge and merge with metadata, and generate Real SSTable, wherein data merge into by SSTable is merged, and metadata merges to be included merging key range, file number, file size information;
Empty merging module, for generating Virtual SSTable, and only merges to metadata, and metadata merges bag Include merging key range, file number, file size information, and record the data of the Virtual SSTable Source;
The read module of Real SSTable, for being read out to the Real SSTable, wherein when key falls described In the key range of Real SSTable, then the value values corresponding to key are directly searched on the Real SSTable;
The read module of Virtual SSTable, for falling in the key range of Virtual SSTable as key, then leads to Cross metadata information and search the Virtual SSTable data sources corresponding with key, and in the Real SSTable Value values corresponding to middle lookup key;
The merging module of Virtual SSTable, for being merged to the Virtual SSTable in reading process, The Virtual SSTable are become into Real SSTable, so as to lift reading performance.
7. daily record as claimed in claim 6 merges the combination system of tree, it is characterised in that the real merging module includes:
11. key/value for being successively read each Real SSTable;
12. are sorted by merger, and qualified key/value is sequentially written in the Real SSTable of fixed size;
After 13. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins wherein, Until writing.
8. daily record as claimed in claim 6 merges the combination system of tree, it is characterised in that the empty merging module includes:
21. metadata for collecting Real SSTable;
The 22. key range in Real SSTable, obtain new key range, cover all Real SSTable's Scope;
New key range, according to the number N of Real SSTable, are divided into N number of by 23., represent N number of Virtual SSTable, The file size of wherein Virtual SSTable is arranged to identical with the file size of Real SSTable, Virtual The reference number of a document of SSTable passes through empty merging process and distributes unitedly, and data source is then all Real involved in empty merging SSTable。
9. daily record as claimed in claim 6 merges the combination system of tree, it is characterised in that the Virtual SSTable's Read module includes
41. judge whether the key for requiring to look up falls in the key scopes of Virtual SSTable;
42. if it was not then return
If 43. are finding data source, i.e. Real SSTables by the metadata of Virtual SSTable;
44. carry out lookup key according to the index of Real SSTable;
45. if it is found, then return to the value values of key;
If 46. do not find, the lookup of next Real SSTable is carried out, until by the number of Virtual SSTable All searched one time according to the Real SSTable included in source;
If 47., again without finding, return.
10. daily record as claimed in claim 6 merges the combination system of tree, it is characterised in that the Virtual SSTable's Merging module includes
The 51. data source Real SSTables for obtaining Virtual SSTable;
52. key/value for being successively read each Real SSTable;
53. are sorted by merger, and qualified key/value is sequentially written in the RealSSTable of fixed size;
After 54. write completely, a new Real SSTable is re-created, continued sorted key/value write-ins wherein, Until writing.
CN201710047936.3A 2017-01-20 2017-01-20 A kind of daily record merges the merging method and system of tree Pending CN106844650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710047936.3A CN106844650A (en) 2017-01-20 2017-01-20 A kind of daily record merges the merging method and system of tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710047936.3A CN106844650A (en) 2017-01-20 2017-01-20 A kind of daily record merges the merging method and system of tree

Publications (1)

Publication Number Publication Date
CN106844650A true CN106844650A (en) 2017-06-13

Family

ID=59119444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710047936.3A Pending CN106844650A (en) 2017-01-20 2017-01-20 A kind of daily record merges the merging method and system of tree

Country Status (1)

Country Link
CN (1) CN106844650A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147359A (en) * 2017-12-13 2019-08-20 北京奇虎科技有限公司 A kind of increment generation method, device and a kind of data-updating method, device
CN110532228A (en) * 2019-09-02 2019-12-03 深圳市网心科技有限公司 A kind of method, system, equipment and the readable storage medium storing program for executing of block chain reading data
CN110716690A (en) * 2018-07-12 2020-01-21 阿里巴巴集团控股有限公司 Data recovery method and system
CN111694992A (en) * 2019-03-15 2020-09-22 阿里巴巴集团控股有限公司 Data processing method and device
CN112307016A (en) * 2019-07-29 2021-02-02 华为技术有限公司 Data unit merging method and device
CN112486994A (en) * 2020-11-30 2021-03-12 武汉大学 Method for quickly reading data of key value storage based on log structure merging tree
CN112527804A (en) * 2021-01-27 2021-03-19 中智关爱通(南京)信息科技有限公司 File storage method, file reading method and data storage system
EP3825866A4 (en) * 2018-08-14 2021-08-25 Huawei Technologies Co., Ltd. Partition merging method and database server
CN116595015A (en) * 2023-07-18 2023-08-15 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198150A (en) * 2013-04-24 2013-07-10 清华大学 Big data indexing method and system
CN104142958A (en) * 2013-05-10 2014-11-12 华为技术有限公司 Storage method for data in Key-Value system and related device
CN104809237A (en) * 2015-05-12 2015-07-29 百度在线网络技术(北京)有限公司 LSM-tree (The Log-Structured Merge-Tree) index optimization method and LSM-tree index optimization system
CN104915145A (en) * 2014-03-11 2015-09-16 华为技术有限公司 Method and device for reducing LSM Tree writing amplification
CN105138622A (en) * 2015-08-14 2015-12-09 中国科学院计算技术研究所 Append operation method for LSM tree memory system and reading and merging method for loads of append operation
CN105159915A (en) * 2015-07-16 2015-12-16 中国科学院计算技术研究所 Dynamically adaptive LSM (Log-structured merge) tree combination method and system
CN105302487A (en) * 2015-10-20 2016-02-03 中国科学院信息工程研究所 Flow control based treelike storage structure write amplification optimization method
CN105468298A (en) * 2015-11-19 2016-04-06 中国科学院信息工程研究所 Key value storage method based on log-structured merged tree

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198150A (en) * 2013-04-24 2013-07-10 清华大学 Big data indexing method and system
CN104142958A (en) * 2013-05-10 2014-11-12 华为技术有限公司 Storage method for data in Key-Value system and related device
CN104915145A (en) * 2014-03-11 2015-09-16 华为技术有限公司 Method and device for reducing LSM Tree writing amplification
CN104809237A (en) * 2015-05-12 2015-07-29 百度在线网络技术(北京)有限公司 LSM-tree (The Log-Structured Merge-Tree) index optimization method and LSM-tree index optimization system
CN105159915A (en) * 2015-07-16 2015-12-16 中国科学院计算技术研究所 Dynamically adaptive LSM (Log-structured merge) tree combination method and system
CN105138622A (en) * 2015-08-14 2015-12-09 中国科学院计算技术研究所 Append operation method for LSM tree memory system and reading and merging method for loads of append operation
CN105302487A (en) * 2015-10-20 2016-02-03 中国科学院信息工程研究所 Flow control based treelike storage structure write amplification optimization method
CN105468298A (en) * 2015-11-19 2016-04-06 中国科学院信息工程研究所 Key value storage method based on log-structured merged tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FENGFENG PAN 等: ""dCompaction: Delayed Compaction for the LSM-Tree"", 《INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING》 *
FENG-FENG PAN 等: ""dCompaction:Speeding up Compaction of the LSM-Tree via Delayed Compaction"", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147359A (en) * 2017-12-13 2019-08-20 北京奇虎科技有限公司 A kind of increment generation method, device and a kind of data-updating method, device
CN110716690A (en) * 2018-07-12 2020-01-21 阿里巴巴集团控股有限公司 Data recovery method and system
CN110716690B (en) * 2018-07-12 2023-02-28 阿里巴巴集团控股有限公司 Data recovery method and system
EP3825866A4 (en) * 2018-08-14 2021-08-25 Huawei Technologies Co., Ltd. Partition merging method and database server
US11762881B2 (en) 2018-08-14 2023-09-19 Huawei Cloud Computing Technologies Co., Ltd. Partition merging method and database server
CN111694992A (en) * 2019-03-15 2020-09-22 阿里巴巴集团控股有限公司 Data processing method and device
CN111694992B (en) * 2019-03-15 2023-05-26 阿里巴巴集团控股有限公司 Data processing method and device
CN112307016B (en) * 2019-07-29 2022-08-26 华为技术有限公司 Data unit merging method and device
CN112307016A (en) * 2019-07-29 2021-02-02 华为技术有限公司 Data unit merging method and device
CN110532228A (en) * 2019-09-02 2019-12-03 深圳市网心科技有限公司 A kind of method, system, equipment and the readable storage medium storing program for executing of block chain reading data
CN112486994A (en) * 2020-11-30 2021-03-12 武汉大学 Method for quickly reading data of key value storage based on log structure merging tree
CN112486994B (en) * 2020-11-30 2024-04-19 武汉大学 Data quick reading method based on key value storage of log structure merging tree
CN112527804A (en) * 2021-01-27 2021-03-19 中智关爱通(南京)信息科技有限公司 File storage method, file reading method and data storage system
CN116595015A (en) * 2023-07-18 2023-08-15 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN116595015B (en) * 2023-07-18 2023-12-15 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106844650A (en) A kind of daily record merges the merging method and system of tree
CN101937377B (en) Data recovery method and device
CN110268394A (en) KVS tree
US7689574B2 (en) Index and method for extending and querying index
CN110383261A (en) Stream for multithread storage device selects
CN110268399A (en) Merging tree for attended operation is modified
CN110291518A (en) Merge tree garbage index
KR100856245B1 (en) File system device and method for saving and seeking file thereof
US10445022B1 (en) Optimization of log-structured merge (LSM) tree-based databases using object solid state drive (SSD) devices
CN111399777A (en) Differentiated key value data storage method based on data value classification
CN111026329B (en) Key value storage system based on host management tile record disk and data processing method
CN103198150B (en) A kind of large data index method and system
US10496612B2 (en) Method for reliable and efficient filesystem metadata conversion
CN107391774A (en) The rubbish recovering method of JFS based on data de-duplication
CN108959119A (en) The method and system of garbage collection in storage system
CN104461388B (en) A kind of storage array configuration is preserved and referee method
CN114780530A (en) Time sequence data storage method and system based on LSM tree key value separation
JP4825719B2 (en) Fast file attribute search
CN105068761B (en) A kind of video interception storage method and system convenient for retrieval
CN106648991A (en) Duplicated data deletion method in data recovery system
CN106528436B (en) Data storage device and data maintenance method thereof
KR100809452B1 (en) Methods for automatically classifying patents using computing machines and systems thereof
CN111324284B (en) Memory device
CN113391916A (en) Organization architecture data processing method, device, computer equipment and storage medium
CN106126555A (en) A kind of file management method and file system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170613

WD01 Invention patent application deemed withdrawn after publication