US10725988B2 - KVS tree - Google Patents
KVS tree Download PDFInfo
- Publication number
- US10725988B2 US10725988B2 US15/428,877 US201715428877A US10725988B2 US 10725988 B2 US10725988 B2 US 10725988B2 US 201715428877 A US201715428877 A US 201715428877A US 10725988 B2 US10725988 B2 US 10725988B2
- Authority
- US
- United States
- Prior art keywords
- key
- kvset
- node
- value
- tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000013507 mapping Methods 0.000 claims abstract description 90
- 238000005056 compaction Methods 0.000 claims description 275
- 238000003860 storage Methods 0.000 claims description 142
- 238000000034 method Methods 0.000 claims description 129
- 238000012545 processing Methods 0.000 claims description 39
- 230000004044 response Effects 0.000 claims description 36
- 238000000926 separation method Methods 0.000 claims description 9
- 230000014759 maintenance of location Effects 0.000 description 29
- 238000010586 diagram Methods 0.000 description 26
- 230000001133 acceleration Effects 0.000 description 19
- 238000003199 nucleic acid amplification method Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 15
- 238000012423 maintenance Methods 0.000 description 15
- 230000015654 memory Effects 0.000 description 15
- 238000009826 distribution Methods 0.000 description 14
- 230000003068 static effect Effects 0.000 description 11
- 101100234797 Caenorhabditis elegans kvs-4 gene Proteins 0.000 description 10
- 230000008901 benefit Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 230000003321 amplification Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000037406 food intake Effects 0.000 description 6
- 238000013403 standard screening design Methods 0.000 description 6
- 230000000903 blocking effect Effects 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 239000000470 constituent Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000005291 magnetic effect Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- ABDDQTDRAHXHOC-QMMMGPOBSA-N 1-[(7s)-5,7-dihydro-4h-thieno[2,3-c]pyran-7-yl]-n-methylmethanamine Chemical compound CNC[C@@H]1OCCC2=C1SC=C2 ABDDQTDRAHXHOC-QMMMGPOBSA-N 0.000 description 2
- VJPPLCNBDLZIFG-ZDUSSCGKSA-N 4-[(3S)-3-(but-2-ynoylamino)piperidin-1-yl]-5-fluoro-2,3-dimethyl-1H-indole-7-carboxamide Chemical compound C(C#CC)(=O)N[C@@H]1CN(CCC1)C1=C2C(=C(NC2=C(C=C1F)C(=O)N)C)C VJPPLCNBDLZIFG-ZDUSSCGKSA-N 0.000 description 2
- IRPVABHDSJVBNZ-RTHVDDQRSA-N 5-[1-(cyclopropylmethyl)-5-[(1R,5S)-3-(oxetan-3-yl)-3-azabicyclo[3.1.0]hexan-6-yl]pyrazol-3-yl]-3-(trifluoromethyl)pyridin-2-amine Chemical compound C1=C(C(F)(F)F)C(N)=NC=C1C1=NN(CC2CC2)C(C2[C@@H]3CN(C[C@@H]32)C2COC2)=C1 IRPVABHDSJVBNZ-RTHVDDQRSA-N 0.000 description 2
- ZRPZPNYZFSJUPA-UHFFFAOYSA-N ARS-1620 Chemical compound Oc1cccc(F)c1-c1c(Cl)cc2c(ncnc2c1F)N1CCN(CC1)C(=O)C=C ZRPZPNYZFSJUPA-UHFFFAOYSA-N 0.000 description 2
- IDRGFNPZDVBSSE-UHFFFAOYSA-N OCCN1CCN(CC1)c1ccc(Nc2ncc3cccc(-c4cccc(NC(=O)C=C)c4)c3n2)c(F)c1F Chemical compound OCCN1CCN(CC1)c1ccc(Nc2ncc3cccc(-c4cccc(NC(=O)C=C)c4)c3n2)c(F)c1F IDRGFNPZDVBSSE-UHFFFAOYSA-N 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- XIIOFHFUYBLOLW-UHFFFAOYSA-N selpercatinib Chemical compound OC(COC=1C=C(C=2N(C=1)N=CC=2C#N)C=1C=NC(=CC=1)N1CC2N(C(C1)C2)CC=1C=NC(=CC=1)OC)(C)C XIIOFHFUYBLOLW-UHFFFAOYSA-N 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- VCGRFBXVSFAGGA-UHFFFAOYSA-N (1,1-dioxo-1,4-thiazinan-4-yl)-[6-[[3-(4-fluorophenyl)-5-methyl-1,2-oxazol-4-yl]methoxy]pyridin-3-yl]methanone Chemical compound CC=1ON=C(C=2C=CC(F)=CC=2)C=1COC(N=C1)=CC=C1C(=O)N1CCS(=O)(=O)CC1 VCGRFBXVSFAGGA-UHFFFAOYSA-N 0.000 description 1
- DOMQFIFVDIAOOT-ROUUACIJSA-N (2S,3R)-N-[4-(2,6-dimethoxyphenyl)-5-(5-methylpyridin-3-yl)-1,2,4-triazol-3-yl]-3-(5-methylpyrimidin-2-yl)butane-2-sulfonamide Chemical compound COC1=C(C(=CC=C1)OC)N1C(=NN=C1C=1C=NC=C(C=1)C)NS(=O)(=O)[C@@H](C)[C@H](C)C1=NC=C(C=N1)C DOMQFIFVDIAOOT-ROUUACIJSA-N 0.000 description 1
- MAYZWDRUFKUGGP-VIFPVBQESA-N (3s)-1-[5-tert-butyl-3-[(1-methyltetrazol-5-yl)methyl]triazolo[4,5-d]pyrimidin-7-yl]pyrrolidin-3-ol Chemical compound CN1N=NN=C1CN1C2=NC(C(C)(C)C)=NC(N3C[C@@H](O)CC3)=C2N=N1 MAYZWDRUFKUGGP-VIFPVBQESA-N 0.000 description 1
- UKGJZDSUJSPAJL-YPUOHESYSA-N (e)-n-[(1r)-1-[3,5-difluoro-4-(methanesulfonamido)phenyl]ethyl]-3-[2-propyl-6-(trifluoromethyl)pyridin-3-yl]prop-2-enamide Chemical compound CCCC1=NC(C(F)(F)F)=CC=C1\C=C\C(=O)N[C@H](C)C1=CC(F)=C(NS(C)(=O)=O)C(F)=C1 UKGJZDSUJSPAJL-YPUOHESYSA-N 0.000 description 1
- ZGYIXVSQHOKQRZ-COIATFDQSA-N (e)-n-[4-[3-chloro-4-(pyridin-2-ylmethoxy)anilino]-3-cyano-7-[(3s)-oxolan-3-yl]oxyquinolin-6-yl]-4-(dimethylamino)but-2-enamide Chemical compound N#CC1=CN=C2C=C(O[C@@H]3COCC3)C(NC(=O)/C=C/CN(C)C)=CC2=C1NC(C=C1Cl)=CC=C1OCC1=CC=CC=N1 ZGYIXVSQHOKQRZ-COIATFDQSA-N 0.000 description 1
- MOWXJLUYGFNTAL-DEOSSOPVSA-N (s)-[2-chloro-4-fluoro-5-(7-morpholin-4-ylquinazolin-4-yl)phenyl]-(6-methoxypyridazin-3-yl)methanol Chemical compound N1=NC(OC)=CC=C1[C@@H](O)C1=CC(C=2C3=CC=C(C=C3N=CN=2)N2CCOCC2)=C(F)C=C1Cl MOWXJLUYGFNTAL-DEOSSOPVSA-N 0.000 description 1
- APWRZPQBPCAXFP-UHFFFAOYSA-N 1-(1-oxo-2H-isoquinolin-5-yl)-5-(trifluoromethyl)-N-[2-(trifluoromethyl)pyridin-4-yl]pyrazole-4-carboxamide Chemical compound O=C1NC=CC2=C(C=CC=C12)N1N=CC(=C1C(F)(F)F)C(=O)NC1=CC(=NC=C1)C(F)(F)F APWRZPQBPCAXFP-UHFFFAOYSA-N 0.000 description 1
- HCDMJFOHIXMBOV-UHFFFAOYSA-N 3-(2,6-difluoro-3,5-dimethoxyphenyl)-1-ethyl-8-(morpholin-4-ylmethyl)-4,7-dihydropyrrolo[4,5]pyrido[1,2-d]pyrimidin-2-one Chemical compound C=1C2=C3N(CC)C(=O)N(C=4C(=C(OC)C=C(OC)C=4F)F)CC3=CN=C2NC=1CN1CCOCC1 HCDMJFOHIXMBOV-UHFFFAOYSA-N 0.000 description 1
- BYHQTRFJOGIQAO-GOSISDBHSA-N 3-(4-bromophenyl)-8-[(2R)-2-hydroxypropyl]-1-[(3-methoxyphenyl)methyl]-1,3,8-triazaspiro[4.5]decan-2-one Chemical compound C[C@H](CN1CCC2(CC1)CN(C(=O)N2CC3=CC(=CC=C3)OC)C4=CC=C(C=C4)Br)O BYHQTRFJOGIQAO-GOSISDBHSA-N 0.000 description 1
- YGYGASJNJTYNOL-CQSZACIVSA-N 3-[(4r)-2,2-dimethyl-1,1-dioxothian-4-yl]-5-(4-fluorophenyl)-1h-indole-7-carboxamide Chemical compound C1CS(=O)(=O)C(C)(C)C[C@@H]1C1=CNC2=C(C(N)=O)C=C(C=3C=CC(F)=CC=3)C=C12 YGYGASJNJTYNOL-CQSZACIVSA-N 0.000 description 1
- WNEODWDFDXWOLU-QHCPKHFHSA-N 3-[3-(hydroxymethyl)-4-[1-methyl-5-[[5-[(2s)-2-methyl-4-(oxetan-3-yl)piperazin-1-yl]pyridin-2-yl]amino]-6-oxopyridin-3-yl]pyridin-2-yl]-7,7-dimethyl-1,2,6,8-tetrahydrocyclopenta[3,4]pyrrolo[3,5-b]pyrazin-4-one Chemical compound C([C@@H](N(CC1)C=2C=NC(NC=3C(N(C)C=C(C=3)C=3C(=C(N4C(C5=CC=6CC(C)(C)CC=6N5CC4)=O)N=CC=3)CO)=O)=CC=2)C)N1C1COC1 WNEODWDFDXWOLU-QHCPKHFHSA-N 0.000 description 1
- SRVXSISGYBMIHR-UHFFFAOYSA-N 3-[3-[3-(2-amino-2-oxoethyl)phenyl]-5-chlorophenyl]-3-(5-methyl-1,3-thiazol-2-yl)propanoic acid Chemical compound S1C(C)=CN=C1C(CC(O)=O)C1=CC(Cl)=CC(C=2C=C(CC(N)=O)C=CC=2)=C1 SRVXSISGYBMIHR-UHFFFAOYSA-N 0.000 description 1
- YFCIFWOJYYFDQP-PTWZRHHISA-N 4-[3-amino-6-[(1S,3S,4S)-3-fluoro-4-hydroxycyclohexyl]pyrazin-2-yl]-N-[(1S)-1-(3-bromo-5-fluorophenyl)-2-(methylamino)ethyl]-2-fluorobenzamide Chemical compound CNC[C@@H](NC(=O)c1ccc(cc1F)-c1nc(cnc1N)[C@H]1CC[C@H](O)[C@@H](F)C1)c1cc(F)cc(Br)c1 YFCIFWOJYYFDQP-PTWZRHHISA-N 0.000 description 1
- XYWIPYBIIRTJMM-IBGZPJMESA-N 4-[[(2S)-2-[4-[5-chloro-2-[4-(trifluoromethyl)triazol-1-yl]phenyl]-5-methoxy-2-oxopyridin-1-yl]butanoyl]amino]-2-fluorobenzamide Chemical compound CC[C@H](N1C=C(OC)C(=CC1=O)C1=C(C=CC(Cl)=C1)N1C=C(N=N1)C(F)(F)F)C(=O)NC1=CC(F)=C(C=C1)C(N)=O XYWIPYBIIRTJMM-IBGZPJMESA-N 0.000 description 1
- KVCQTKNUUQOELD-UHFFFAOYSA-N 4-amino-n-[1-(3-chloro-2-fluoroanilino)-6-methylisoquinolin-5-yl]thieno[3,2-d]pyrimidine-7-carboxamide Chemical compound N=1C=CC2=C(NC(=O)C=3C4=NC=NC(N)=C4SC=3)C(C)=CC=C2C=1NC1=CC=CC(Cl)=C1F KVCQTKNUUQOELD-UHFFFAOYSA-N 0.000 description 1
- KCBWAFJCKVKYHO-UHFFFAOYSA-N 6-(4-cyclopropyl-6-methoxypyrimidin-5-yl)-1-[[4-[1-propan-2-yl-4-(trifluoromethyl)imidazol-2-yl]phenyl]methyl]pyrazolo[3,4-d]pyrimidine Chemical compound C1(CC1)C1=NC=NC(=C1C1=NC=C2C(=N1)N(N=C2)CC1=CC=C(C=C1)C=1N(C=C(N=1)C(F)(F)F)C(C)C)OC KCBWAFJCKVKYHO-UHFFFAOYSA-N 0.000 description 1
- CYJRNFFLTBEQSQ-UHFFFAOYSA-N 8-(3-methyl-1-benzothiophen-5-yl)-N-(4-methylsulfonylpyridin-3-yl)quinoxalin-6-amine Chemical compound CS(=O)(=O)C1=C(C=NC=C1)NC=1C=C2N=CC=NC2=C(C=1)C=1C=CC2=C(C(=CS2)C)C=1 CYJRNFFLTBEQSQ-UHFFFAOYSA-N 0.000 description 1
- 101100006960 Caenorhabditis elegans let-2 gene Proteins 0.000 description 1
- 241001137251 Corvidae Species 0.000 description 1
- GISRWBROCYNDME-PELMWDNLSA-N F[C@H]1[C@H]([C@H](NC1=O)COC1=NC=CC2=CC(=C(C=C12)OC)C(=O)N)C Chemical compound F[C@H]1[C@H]([C@H](NC1=O)COC1=NC=CC2=CC(=C(C=C12)OC)C(=O)N)C GISRWBROCYNDME-PELMWDNLSA-N 0.000 description 1
- AYCPARAPKDAOEN-LJQANCHMSA-N N-[(1S)-2-(dimethylamino)-1-phenylethyl]-6,6-dimethyl-3-[(2-methyl-4-thieno[3,2-d]pyrimidinyl)amino]-1,4-dihydropyrrolo[3,4-c]pyrazole-5-carboxamide Chemical compound C1([C@H](NC(=O)N2C(C=3NN=C(NC=4C=5SC=CC=5N=C(C)N=4)C=3C2)(C)C)CN(C)C)=CC=CC=C1 AYCPARAPKDAOEN-LJQANCHMSA-N 0.000 description 1
- 208000034972 Sudden Infant Death Diseases 0.000 description 1
- 206010042440 Sudden infant death syndrome Diseases 0.000 description 1
- 235000009499 Vanilla fragrans Nutrition 0.000 description 1
- 244000263375 Vanilla tahitensis Species 0.000 description 1
- 235000012036 Vanilla tahitensis Nutrition 0.000 description 1
- LXRZVMYMQHNYJB-UNXOBOICSA-N [(1R,2S,4R)-4-[[5-[4-[(1R)-7-chloro-1,2,3,4-tetrahydroisoquinolin-1-yl]-5-methylthiophene-2-carbonyl]pyrimidin-4-yl]amino]-2-hydroxycyclopentyl]methyl sulfamate Chemical compound CC1=C(C=C(S1)C(=O)C1=C(N[C@H]2C[C@H](O)[C@@H](COS(N)(=O)=O)C2)N=CN=C1)[C@@H]1NCCC2=C1C=C(Cl)C=C2 LXRZVMYMQHNYJB-UNXOBOICSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- DGLFSNZWRYADFC-UHFFFAOYSA-N chembl2334586 Chemical compound C1CCC2=CN=C(N)N=C2C2=C1NC1=CC=C(C#CC(C)(O)C)C=C12 DGLFSNZWRYADFC-UHFFFAOYSA-N 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 230000001976 improved effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 239000012464 large buffer Substances 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- FMASTMURQSHELY-UHFFFAOYSA-N n-(4-fluoro-2-methylphenyl)-3-methyl-n-[(2-methyl-1h-indol-4-yl)methyl]pyridine-4-carboxamide Chemical compound C1=CC=C2NC(C)=CC2=C1CN(C=1C(=CC(F)=CC=1)C)C(=O)C1=CC=NC=C1C FMASTMURQSHELY-UHFFFAOYSA-N 0.000 description 1
- NNKPHNTWNILINE-UHFFFAOYSA-N n-cyclopropyl-3-fluoro-4-methyl-5-[3-[[1-[2-[2-(methylamino)ethoxy]phenyl]cyclopropyl]amino]-2-oxopyrazin-1-yl]benzamide Chemical compound CNCCOC1=CC=CC=C1C1(NC=2C(N(C=3C(=C(F)C=C(C=3)C(=O)NC3CC3)C)C=CN=2)=O)CC1 NNKPHNTWNILINE-UHFFFAOYSA-N 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000505 pernicious effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 235000015108 pies Nutrition 0.000 description 1
- LZMJNVRJMFMYQS-UHFFFAOYSA-N poseltinib Chemical compound C1CN(C)CCN1C(C=C1)=CC=C1NC1=NC(OC=2C=C(NC(=O)C=C)C=CC=2)=C(OC=C2)C2=N1 LZMJNVRJMFMYQS-UHFFFAOYSA-N 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 230000037390 scarring Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- XGVXKJKTISMIOW-ZDUSSCGKSA-N simurosertib Chemical compound N1N=CC(C=2SC=3C(=O)NC(=NC=3C=2)[C@H]2N3CCC(CC3)C2)=C1C XGVXKJKTISMIOW-ZDUSSCGKSA-N 0.000 description 1
- 238000000060 site-specific infrared dichroism spectroscopy Methods 0.000 description 1
- 230000008093 supporting effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Definitions
- Embodiments described herein generally relate to a key-value data store and more specifically to implementing a KVS tree.
- Data structures are organizations of data that permit a variety of ways to interact with the data stored therein.
- Data structures may be designed to permit efficient searches of the data, such as in a binary search tree, to permit efficient storage of sparse data, such as with a linked list, or to permit efficient storage of searchable data such as with a B-tree, among others.
- FIG. 1 illustrates an example of a KVS tree, according to an embodiment.
- FIG. 4 is a block diagram illustrating an example of a storage organization for keys and values, according to an embodiment.
- FIG. 5 is a block diagram illustrating an example of a configuration for key-blocks and value-blocks, according to an embodiment.
- FIG. 6 illustrates an example of a KB tree, according to an embodiment.
- FIG. 7 is a block diagram illustrating KVS tree ingestion, according to an embodiment.
- FIG. 8 illustrates an example of a method for KVS tree ingestion, according to an embodiment.
- FIG. 12 illustrates an example of a method for key-value compaction, according to an embodiment.
- FIG. 13 illustrates an example of a spill value and its relation to a tree, according to an embodiment.
- FIG. 14 illustrates an example of a method for a spill value function, according to an embodiment.
- FIG. 15 is a block diagram illustrating spill compaction, according to an embodiment.
- FIG. 16 illustrates an example of a method for spill compaction, according to an embodiment.
- FIG. 17 is a block diagram illustrating hoist compaction, according to an embodiment.
- FIG. 19 illustrates an example of a method for performing maintenance on a KVS tree, according to an embodiment.
- FIG. 20 illustrates an example of a method for modifying KVS tree operation, according to an embodiment.
- FIG. 21 is a block diagram illustrating a key search, according to an embodiment.
- FIG. 22 illustrates an example of a method for performing a key search, according to an embodiment.
- FIG. 23 is a block diagram illustrating a key scan, according to an embodiment.
- FIG. 24 is a block diagram illustrating a key scan, according to an embodiment.
- FIG. 25 is a block diagram illustrating a prefix scan, according to an embodiment.
- LSM trees have become a popular storage structure for data in which high volume writes are expected and also for which efficient access to the data is expected.
- portions of the LSM are tuned for the media upon which they are kept and a background process generally addresses moving data between the different portions (e.g., from the in-memory portion to the on-disk portion).
- in-memory refers to a random access and byte-addressable device (e.g., static random access memory (SRAM) or dynamic random access memory (DRAM)) and on-disk refers to a block addressable device (e.g., hard disk drive, compact disc, digital versatile disc, or solid-state drive (SSD) such as a flash memory based device), which also be referred to as a media device or a storage device.
- SSD solid-state drive
- LSM trees leverage the ready access provided by the in-memory device to sort incoming data, by key, to provide ready access to the corresponding values. As the data is merged onto the on-disk portion, the resident on-disk data is merged with the new data and written in blocks back to disk.
- LSM trees have become a popular structure underlying a number of data base and volume storage (e.g., cloud storage) designs, they do have some drawbacks.
- Write amplification is an increase in the minimum number of writes for data that is imposed by a given storage technique. For example, to store data, it is written at least once to disk. This may be accomplished, for example, by simply appending the latest piece of data onto the end of already written data.
- This structure is slow to search (e.g., it grows linearly with the amount of data), and may result in inefficiencies as data is changed or deleted.
- LSM trees increase write amplification as they read data from disk to be merged with new data and then re-write that data back to disk.
- the write amplification problem may be exacerbated when storage device activities are included, such as defragmenting hard disk drives or garbage collection of SSDs.
- Write amplification on SSDs may be particularly pernicious as these devices may “wear out” as a function of a number of writes. That is, SSDs have a limited lifetime measured in writes. Thus, write amplification with SSDs works to shorten the usable life of the underlying hardware.
- LSM trees ensure that on-disk portions are sorted by key. If the amount of data resident on-disk is large, a large amount of temporary, or scratch, space may be consumed to perform the merge. This may be somewhat mitigated by dividing the on-disk portions into non-overlapping structures to permit merges on data subsets, but a balance between structure overhead and performance may be difficult to achieve.
- a third issue with LSM trees includes possibly limited write throughput. This issue stems from the essentially always sorted nature of the entirety of the LSM data. Thus, large volume writes that overwhelm the in-memory portion must wait until the in-memory portion is cleared with a possibly time-consuming merge operation.
- a write buffer (WB) tree has been proposed in which smaller data inserts are manipulated to avoid the merge issues in this scenario. Specifically, a WB tree hashes incoming keys to spread data, and stores the key-hash and value combinations in smaller intake sets. These sets may be merged at various times or written to child nodes based on the key-hash value. This avoids the expensive merge operation of LSM trees while being performant in looking up a particular key.
- WB trees being sorted by key-hash, result in expensive whole tree scans to locate values that are not directly referenced by a key-hash, such as happens when searching for a range of keys.
- KVS trees are a tree data structure including nodes with connections between parent and child based on a predetermined derivation of a key rather than the content of the tree.
- the nodes include temporally ordered sequences of key-value sets (kvsets).
- the kvsets contain key-value pairs in a key-sorted structure.
- Kvsets are also immutable once written.
- the KVS tree achieves the write-throughput of WB trees while improving upon WB tree searching by maintaining kvsets in nodes, the kvsets including sorted keys as well as, in an example, key metrics (such as bloom filters, minimum and maximum keys, etc.), to provide efficient search of the kvsets.
- key metrics such as bloom filters, minimum and maximum keys, etc.
- KVS trees may improve upon the temporary storage issues of LSM trees by separating keys from values and merging smaller kvset collections. Additionally, the described KVS trees may reduce write amplification through a variety of maintenance operations on kvsets. Further, as the kvsets in nodes are immutable, issues such as write wear on SSDs may be managed by the data structure, reducing garbage collection activities of the device itself. This has the added benefit of freeing up internal device resources (e.g., bus bandwidth, processing cycles, etc.) that result in better external drive performance (e.g., read or write speed). Additional details and example implementations of KVS trees and operations thereon are described below.
- FIG. 1 illustrates an example of a KVS tree 100 , according to an embodiment.
- the KVS tree 100 is a key-value data structure that is organized as a tree.
- values are stored in the tree 100 with corresponding keys that reference the values.
- key-entries are used to contain both the key and additional information, such as a reference to the value, however, unless otherwise specified, the key-entries are simply referred to as keys for simplicity.
- Keys themselves have a total ordering within the tree 100 . Thus, keys may be sorted amongst each other. Keys may also be divided into sub-keys. Generally, sub-keys are non-overlapping portions of a key.
- the total ordering of keys is based on comparing like sub-keys between multiple keys (e.g., a first sub-key of a key is compared to the first sub-key of another key).
- a key prefix is a beginning portion of a key. The key prefix may be composed of one or more sub-keys when they are used.
- the tree 100 includes one or more nodes, such as node 110 .
- the node 110 includes a temporally ordered sequence of immutable key-value sets (kvsets). As illustrated, kvset 115 includes an ‘N’ badge to indicate that it is the newest of the sequence while kvset 120 includes an ‘O’ badge to indicate that it is the oldest of the sequence. Kvset 125 includes an ‘I’ badge to indicate that it is intermediate in the sequence.
- badges are used throughout to label kvsets, however, another badge (such as an ‘X’) denotes a specific kvset rather than its position in a sequence (e.g., new, intermediate, old, etc.), unless it is a tilde ‘ ⁇ ’ in which case it is simply an anonymous kvset.
- another badge such as an ‘X’
- X denotes a specific kvset rather than its position in a sequence (e.g., new, intermediate, old, etc.), unless it is a tilde ‘ ⁇ ’ in which case it is simply an anonymous kvset.
- older key-value entries occur lower in the tree 100 .
- bringing values up a tree-level such as from L2 to L1 results in a new kvset in the oldest position in the recipient node.
- the node 110 also includes a determinative mapping for a key-value pair in a kvset of the node to any one child node of the node 110 .
- the determinative mapping means that, given a key-value pair, an external entity could trace a path through the tree 100 of possible child nodes without knowing the contents of the tree 100 . This, for example, is quite different than a B-tree, for example, where the contents of the tree will determine where a given key's value will fall in order to maintain the search-optimized structure of the tree.
- the determinative mapping provides a rule such that, for example, given a key-value pair, one may calculate the child at L3 this pair would map even if the maximum tree-level (e.g., tree depth) is only at L1.
- the determinative mapping includes a portion of a hash of a portion of the key.
- a sub-key may be hashed to arrive at a mapping set. A portion of this set may be used for any given level of the tree.
- the portion of the key is the entire key. There is no reason that the entire key may not be used.
- the hash includes a multiple of non-overlapping portions including the portion of the hash.
- each of the multiple of non-overlapping portions corresponds to a level of the tree.
- the portion of the hash is determined from the multiple of non-overlapping portions by a level of the node.
- a maximum number of child nodes for the node is defined by a size of the portion of the hash.
- the size of the portion of the hash is a number of bits. These examples may be illustrated by taking a hash of a key that results in 8 bits. These eight bits may be divided into three sets of the first two bits, bits three through six (resulting in four bits), and bits seven and eight.
- Child nodes may be index based on a set of bits, such that children at the first level (e.g., L1) have two bit names, children on the second level (e.g., L2) have four-bit names, and children on the third level (e.g., L3) have two bit names.
- L1 has two bit names
- L2 has four-bit names
- L3 has two bit names.
- Kvsets are the key and value store organized in the nodes of the tree 100 .
- the immutability of the kvsets means that the kvset, once placed in a node, does not change.
- a kvset may, however, be deleted, some or all of its contents may be added to a new kvsets, etc.
- the immutability of the kvset also extends to any control or meta-data contained within the kvset. This is generally possible because the contents to which the meta-data applies are unchanging and thus, often the meta-data will also be static at that point.
- the KVS tree 100 does not require uniqueness among keys throughout the tree 100 , but a kvset does have only one of a key. That is, every key in a given kvset is different than the other keys of the kvset. This last statement is true for a particular kvset, and thus may not apply when, for example, a kvset is versioned. Kvset versioning may be helpful for creating a snapshot of the data. With a versioned kvset, the uniqueness of a key in the kvset is determined by a combination of the kvset identification (ID) and the version. However, two different kvsets (e.g., kvset 115 and kvset 120 ) may each include the same key.
- ID the kvset identification
- two different kvsets e.g., kvset 115 and kvset 120 ) may each include the same key.
- the kvset includes a key-tree to store key entries of key-value pairs of the kvset.
- a variety of data structures may be used to efficiently store and retrieve unique keys in the key-tree (it may not even be a tree), such as binary search trees, B-trees, etc.
- the keys are stored in leaf nodes of the key-tree.
- a maximum key in any subtree of the key-tree is in a rightmost entry of a rightmost child.
- a rightmost edge of a first node of the key-tree is linked to a sub-node of the key-tree.
- all keys in a subtree rooted at the sub-node of the key-tree are greater than all keys in the first node of the key tree.
- key entries of the kvset are stored in a set of key-blocks including a primary key-block and zero or more extension key-blocks.
- members of the set of key-blocks correspond to media blocks for a storage medium, such as an SSD, hard disk drive, etc.
- each key-block includes a header to identify it as a key-block.
- the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset.
- the primary key-block includes a header to a key-tree of the kvset.
- the header may include a number of values to make interacting with the keys, or kvset generally, easier.
- the primary key-block, or header includes a copy of a lowest key in a key-tree of the kvset.
- the lowest key is determined by a pre-set sort-order of the tree (e.g., the total ordering of keys in the tree 100 ).
- the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree.
- the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- values of the kvset are stored in a set of value-blocks.
- members of the set of value-blocks correspond to media blocks for the storage medium.
- each value-block includes a header to identify it as a value-block.
- a value block includes storage section to one or more values without separation between.
- the bits of a first value run into bits of a second value on the storage medium without a guard, container, or other delimiter between them.
- the primary key-block includes a list of media block identifications for value-blocks in the set of value blocks.
- the primary key-block manages storage references to value-blocks.
- the primary key-block includes a set of metrics for the kvset.
- the set of metrics include a total number of keys stored in the kvset.
- the set of metrics include a number of keys with tombstone values stored in the kvset.
- a tombstone is a data marker indicating that the value corresponding to the key has been deleted.
- a tombstone will reside in the key entry and no value-block space will be consumed for this key-value pair.
- the purpose of the tombstone is to mark the deletion of the value while avoiding the possibly expensive operation of purging the value from the tree 100 .
- the tombstone uses a temporally ordered search, one knows that the corresponding value is deleted even if an expired version of the key-value pair resides at an older location within the tree 100 .
- the set of metrics stored in the primary key-block include a sum of all key lengths for keys stored in the kvset. In an example, the set of metrics include a sum of all value lengths for keys stored in the kvset. These last two metrics give an approximate (or exact) amount of storage consumed by the kvset. In an example, the set of metrics include an amount of unreferenced data in value-blocks (e.g., unreferenced values) of the kvset. This last metric gives an estimate of the space that may be reclaimed in a maintenance operation. Additional details of key-blocks and value-blocks are discussed below with respect to FIGS. 4 and 5 .
- the tree 100 includes a first root 105 in a first computer readable medium of the at least one machine readable medium, and a second root 110 in a second computer readable medium of the at least one computer readable medium.
- the second root is the only child to the first root.
- the first computer readable medium is byte addressable and wherein the second computer readable is block addressable. This is illustrated in FIG. 1 with node 105 being in the MEM tree-level to signify its in-memory location while node 110 is at L0 to signify it being in the root on-disk element of the tree 100 .
- FIGS. 2 and 3 illustrate a technique to leverage the structure of the KVS tree 100 to implement an effective use of multi-stream storage devices.
- Storage devices comprising flash memory, or SSDs, may operate more efficiently and have greater endurance (e.g., will not “wear out”) if data with a similar lifetime is grouped in flash erase blocks.
- Storage devices comprising other non-volatile media may also benefit from grouping data with a similar lifetime, such as shingled magnetic recording (SMR) hard-disk drives (HDDs).
- SMR shingled magnetic recording
- HDDs hard-disk drives
- data has a similar lifetime if it is deleted at the same time, or within a relatively small time interval.
- the method for deleting data on a storage device may include explicitly deallocating, logically overwriting, or physically overwriting the data on the storage device.
- the storage device may provide an interface for data access commands (e.g., reading or writing) that identify a logical lifetime group with which the data is associated.
- data access commands e.g., reading or writing
- the industry standard SCSI and proposed NVMe storage device interfaces specify write commands comprising data to be written to a storage device and a numeric stream identifier (stream ID) for a lifetime group called a stream, to which the data corresponds.
- a storage device supporting a plurality of streams is a multi-stream storage device.
- Temperature is a stability value to classify data, whereby the value corresponds to a relative probability that the data will be deleted in any given time interval.
- HOT data may be expected to be deleted (or changed) within a minute while COLD data may be expected to last an hour.
- a finite set of stability values may be used to specify such a classification.
- the set of stability values may be ⁇ Hot, Warm, Cold ⁇ where, in a given time interval, data classified as Hot has a higher probability of being deleted than data classified as Warm, which in turn has a higher probability of being deleted than data classified as Cold.
- FIGS. 2 and 3 address assigning different stream IDs to different writes based on a given stability value as well as one or more attributes of the data with respect to one or more KVS trees.
- a first set of stream identifiers may be used with write commands for data classified as Hot
- a second set of stream identifiers may be used with write commands for data classified as Warm
- a third set of stream identifiers may be used with write commands for data classified as Cold, where a stream identifier is in at most one of these three sets.
- FIG. 2 is a block diagram illustrating an example of a write to a multi-stream storage device (e.g., device 260 or 265 ), according to an embodiment.
- FIG. 2 illustrates multiple KVS trees, KVS tree 205 and KVS tree 210 . As illustrated, each tree is respectively performing a write operation 215 and 220 .
- These write operations are handled by a storage subsystem 225 .
- the storage subsystem may be a device driver, such as for device 260 , may be a storage product to manage multiple devices (e.g., device 260 and device 265 ) such as those found in operating systems, network attached storage devices, etc.
- the storage subsystem 225 will complete the writes to the storage devices in operations 250 and 255 respectively.
- the stream-mapping circuits 230 provide a stream ID to a given write 215 to be used in the device write 250 .
- the immutability of kvsets results in entire kvsets being written or deleted at a time.
- the data comprising a kvset has a similar lifetime.
- Data comprising a new kvset may be written to a single storage device or to several storage devices (e.g., device 260 and device 265 ) using techniques such as erasure coding or RAID. Further, as the size of kvsets may be larger than any given device write 250 , writing the kvset may involve directing multiple write commands to a given storage device 260 .
- one or more of the following may be provided for selecting a stream ID for each such write command 250 :
- the stream-mapping circuits 230 may include an electronic hardware implemented controller 235 , accessible stream ID (A-SID) table 240 and a selected stream ID (S-SID) table 245 .
- the controller 235 is arranged to accept as input a stream-mapping tuple and respond with the stream ID.
- the controller 235 is configured to a plurality of storage devices 260 and 265 storing a plurality of KVS trees 205 and 210 .
- the controller 235 is arranged to obtain (e.g., by configuration, querying, etc.) a configuration for accessible devices.
- the controller 235 is also arranged to configure the set of stability values TEMPSET, and for each value TEMP in TEMPSET configure a fraction, number, or other determiner of the number of streams on a given storage device to use for data classified by that value.
- the controller 235 is arranged to obtain (e.g., receive via configuration, message, etc., retrieve from configuration device, firmware, etc.) a temperature assignment method.
- the temperature assignment method will be used to assign stability values to the write request 215 in this example.
- a stream-mapping tuple may include any one or more of DID, FID, TID, LNUM, NNUM, KVSETID, WTYPE or WLAST and be used as input to the temperature assignment method executed by the controller 235 to select a stability value TEMP from the TEMPSET.
- a KVS tree scope is a collection of parameters for a write specific to the KVS tree component (e.g., kvset) being written.
- the KVS tree scope includes one or more of FID, TID, LNUM, NNUM, or KVSETID.
- the stream-mapping tuple may include components of the KVS tree scope as well as device specific or write specific components, such as DID, WLAST, or WTYPE.
- a stability, or temperature, scope tuple TSCOPE is derived from the stream-mapping tuple. The following are example constituent KVS tree scope components that may be used to create TSCOPE:
- the controller 235 may implement a static temperature assignment method.
- the static temperature assignment method may read the selected TEMP, for example, from a configuration file, database, KVS tree meta data, or meta data in the KVS tree 105 TID or other database, including metadata stored in the KVS tree TID.
- these data sources include mappings from the TSCOPE to a stability value.
- the mapping may be cached (e.g., upon controller 235 's activation or dynamically during later operation) to speed the assignment of stability values as write requests arrive.
- the controller 235 may implement a dynamic temperature assignment method.
- the dynamic temperature assignment method may compute the selected TEMP based on a frequency with which kvsets are written to TSCOPE.
- the frequency with which the controller 235 executes the temperature assignment method for a given TSCOPE may be measured and clustered around TEMPS in TEMPSET.
- a computation may, for example, define a set of frequency ranges and a mapping from each frequency range to a stability value so that the value of TEMP is determined by the frequency range containing the frequency with which kvsets are written to TSCOPE.
- the controller 235 is arranged to obtain (e.g., receive via configuration, message, etc., retrieve from configuration device, firmware, etc.) a stream assignment method.
- the stream assignment method will consume the KVS tree 205 aspects of the write 215 as well as the stability value (e.g., from the temperature assignment) to produce the stream ID.
- controller 235 may use the stream-mapping tuple (e.g., including KVS tree scope) in the stream assignment method to select the stream ID.
- any one or more of DID, FID, TID, LNUM, NNUM, KVSETID, WTYPE or WLAST along with the stability value may be used in the stream assignment method executed by the controller 235 to select the stream ID.
- a stream-scope tuple SSCOPE is derived from the stream-mapping tuple. The following are example constituent KVS tree scope components that may be used to create SSCOPE:
- the controller 235 may be arranged to, prior to accepting inputs, initialize the A-SID table 240 and the S-SID table 245 .
- A-SID table 240 is a data structure (table, dictionary, etc.) that may store entries for tuples (DID, TEMP, SID) and may retrieve such entries with specified values for DID and TEMP.
- the notation A-SID(DID, TEMP) refers to all entries in A-SID table 240 , if any, with the specified values for DID and TEMP.
- the A-SID table 240 may be initialized for each configured storage device 260 and 265 and temperature value in TEMPSET.
- the A-SID table 240 initialization may proceed as follows: For each configured storage device DID, the controller 235 may be arranged to:
- the A-SID table 240 includes an entry for each configured storage device DID and value TEMP in TEMPSET assigned a unique SID.
- the technique for obtaining the number of streams available for a configured storage device 260 and a usable SID for each differs by storage device interface, however, these are readily accessible via the interfaces of multi-stream storage devices
- the S-SID table 245 maintains a record of streams already in use (e.g., already a part of a given write).
- S-SID table 245 is a data structure (table, dictionary, etc.) that may store entries for tuples (DID, TEMP, SSCOPE, SID, Timestamp) and may retrieve or delete such entries with specified values for DID, TEMP, and optionally SSCOPE.
- the notation S-SID(DID, TEMP) refers to all entries in S-SID table 245 , if any, with the specified values for DID and TEMP,
- the S-SID table 245 may be initialized by the controller 235 .
- the controller 235 is arranged to initialize the S-SID table 245 for each configured storage device 260 and 265 and temperature value in TEMPSET.
- the entries in S-SID table 245 represent currently, or already, assigned streams for write operations.
- the S-SID table 245 is empty after initiation, entries being created by the controller 235 as stream Ds are assigned.
- the controller 235 may implement a static stream assignment method.
- the static stream assignment method selects the same stream ID for a given DID, TEMP, and SSCOPE.
- the static stream assignment method may determine whether S-SID(DID, TEMP) has an entry for SSCOPE. If there is no conforming entry, the static stream assignment method selects a stream ID SID from A-SID(DID, TEMP) and creates an entry in S-SID table 245 for (DID, TEMP, SSCOPE, SID, timestamp), where timestamp is the current time after the selection.
- the selection from A-SID(DID, TEMP) is random, or the result of a round-robin process.
- the stream ID SID is returned to the storage subsystem 225 .
- the entry in S-SID table 245 for (DID, TEMP, SSCOPE) is deleted. This last example demonstrates the usefulness of having WLAST to signal the completion of a write 215 for a kvset or the like that would be known to the tree 205 but not to the storage subsystem 225 .
- the controller 235 may implement a least recently used (LRU) stream assignment method.
- the LRU stream assignment method selects the same stream ID for a given DID, TEMP, and SSCOPE within a relatively small time interval.
- the LRU assignment method determines whether S-SID(DID, TEMP) has an entry for SSCOPE. If the entry exists, the LRU assignment method then selects the stream ID in this entry and sets the timestamp in this entry in S-SID table 245 to the current time.
- the LRU stream assignment method determines whether the number of entries S-SID(DID, TEMP) equals the number of entries A-SID(DID, TEMP). If this is true, then the LRU assignment method selects the stream ID SID from the entry in S-SID(DID, TEMP) with the oldest timestamp. Here, the entry in S-SID table 245 is replaced with the new entry (DID, TEMP, SSCOPE, SID, timestamp) where timestamp is the current time after the selection.
- the method selects a stream ID SID from A-SID(DID, TEMP) such that there is no entry in S-SID(DID, TEMP) with the selected stream ID and creates an entry in S-SID table 245 for (DID, TEMP, SSCOPE, SID, timestamp) where timestamp is the current time after the selection.
- the stream ID SID is returned to the storage subsystem 225 .
- the entry in S-SID table 245 for (DID, TEMP, SSCOPE) is deleted.
- the controller 235 is configured to assign a stability value for a given stream-mapping tuple received as part of the write request 215 . Once the stability value is determined, the controller 235 is arranged to assign the SID.
- the temperature assignment and stream assignment methods may each reference and update the A-SID table 240 and the S-SID table 245 .
- the controller 235 is also arranged to provide the SID to a requester, such as the storage subsystem 225 .
- KVS trees may be used in a forest, or grove, whereby several KVS trees are used to implement a single structure, such as a file system. For example, one KVS tree may use block number as the key and bits in the block as a value while a second KVS tree may use file path as the key and a list of block numbers as the value. In this example, it is likely that kvsets for a given file referenced by path and the kvsets holding the block numbers have similar lifetimes. Thus the inclusion of FID above.
- a computing system implementing several KVS trees stored on one or more storage devices may use knowledge of the KVS tree to more efficiently select streams in multi-stream storage devices.
- the system may be configured so that the number of concurrent write operations (e.g., ingest or compaction) executed for the KVS trees is restricted based on the number of streams on any given storage device that are reserved for the temperature classifications assigned to kvset data written by these write operations. This is possible because, within a kvset, the life expectancy of that data is the same as kvsets are written and deleted in their entirety.
- keys and values may be separated.
- key write will have the same life-time which is likely shorter than value life-times when key compaction, discussed below, is performed.
- tree-level experimentally appears to be a strong indication of data life-time, the older data, and thus greater (e.g., deeper) tree-level, having a longer life-time than younger data at higher tree-levels.
- KVS tree and controller 235 it may be advantageous for the number of ingest operations to be a fraction of H (e.g., one-half) and the number of compaction operations to be a fraction of C (e.g., three-fourths) because LRU stream assignment with SSCOPE computed as (TID, LNUM) may not take advantage of WLAST in a stream-mapping tuple to remove unneeded S-SID table 245 entries upon receiving the last write for a given KVSET in TID, resulting in a suboptimal SID selection.
- H e.g., one-half
- C e.g., three-fourths
- LSM Tree variants store collections of key-value pairs and tombstones whereby a given collection may be created by an ingest operation or garbage collection operation (often referred to as a compaction or merge operation), and then later deleted in whole as the result of a subsequent ingest operation or garbage collection operation.
- garbage collection operation often referred to as a compaction or merge operation
- garbage collection operation often referred to as a compaction or merge operation
- the data comprising such a collection has a similar lifetime, like the data comprising a kvset in a KVS tree.
- a tuple similar to the stream-mapping tuple above may be defined for most other LSM Tree variants, where the KVSETID may be replaced by a unique identifier for the collection of key-value pairs or tombstones created by an ingest operation or garbage collection operation in a given LSM Tree variant.
- the stream-mapping circuits 230 may then be used as described to select stream identifiers for the plurality of write commands used to store the data comprising such a collection of key-value pairs and tombstones.
- FIG. 3 illustrates an example of a method 300 to facilitate writing to a multi-stream storage device, according to an embodiment.
- the operations of the method 300 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- the method 300 provides a number of examples to implement the discussion above with respect to FIG. 2 .
- notification of a KVS tree write request for a multi-stream storage device is received.
- the notification includes a KVS tree scope corresponding to data in the write request.
- the KVS tree scope includes at least one of: a kvset ID corresponding to a kvset of the data; a node ID corresponding to a node of the KVS tree corresponding to the data; a level ID corresponding to a tree-level corresponding to the data; a tree ID for the KVS tree; a forest ID corresponding to the forest to which the KVS tree belongs; or a type corresponding to the data.
- the type is either a key-block type or a value-block type.
- the notification includes a device ID for the multi-stream device.
- the notification includes a WLAST flag corresponding to a last write request in a sequence of write requests to write a kvset, identified by the kvset ID, to the multi-stream storage device.
- assigning the stream ID to the write request based on the KVS tree scope and the stability value of the write request includes creating a stream-scope value from the KVS tree scope.
- the stream-scope value includes a level ID for the data.
- the stream-scope value includes a tree ID for the data.
- the stream-scope value includes a level ID for the data.
- the stream-scope value includes a node ID for the data.
- the stream-scope value includes a kvset ID for the data.
- assigning the stream ID to the write request based on the KVS tree scope and the stability value of the write request also includes performing a lookup in a selected-stream data structure using the stream-scope value.
- performing the lookup in the selected-stream data structure includes: failing to find the stream-scope value in the selected-stream data structure; performing a lookup on an available-stream data structure using the stability value; receiving a result of the lookup that includes a stream ID; and adding an entry to the selected-stream data structure that includes the stream ID, the stream-scope value, and a timestamp of a time when the entry is added.
- multiple entries of the available-stream data structure correspond to the stability value, and wherein the result of the lookup is at least one of a round-robin or random selection of an entry from the multiple entries.
- the available-stream data structure may be initialized by: obtaining a number of streams available from the multi-stream storage device; obtain a stream ID for all streams available from the multi-stream storage device, each stream ID being unique; add stream IDs to stability value groups; and creating a record in the available-stream data structure for each stream ID, the record including the stream ID, a device ID for the multi-stream storage device, and a stability value corresponding to a stability value group of the stream ID.
- locating the stream ID from either the selected-stream data structure or an available-stream data structure based on the contents of the selected stream data structure includes: comparing a first number of entries from the selected-stream data structure to a second number of entries from the available-stream data structure to determine that the first number of entries and the second number of entries are equal; locating a group of entries from the selected-stream data structure that correspond to the stability value; and returning a stream ID of an entry in the group of entries that has the oldest timestamp.
- locating the stream ID from either the selected-stream data structure or an available-stream data structure based on the contents of the selected stream data structure includes: comparing a first number of entries from the selected-stream data structure to a second number of entries from the available-stream data structure to determine that the first number of entries and the second number of entries are not equal; performing a lookup on the available-stream data structure using the stability value and stream IDs in entries of the selected stream data structure; receiving a result of the lookup that includes a stream ID that is not in the entries of the selected-stream data structure; and adding an entry to the selected-stream data structure that includes the stream ID, the stream-scope value, and a timestamp of a time when the entry is added.
- assigning the stream ID to the write request based on the KVS tree scope and the stability value of the write request also includes returning a stream ID corresponding to the stream-scope from the selected-stream data structure.
- returning the stream ID corresponding to the stream-scope from the selected-stream data structure includes updating a timestamp for an entry in the selected-stream data structure corresponding to the stream ID.
- the write request includes a WLAST flag, and wherein returning the stream ID corresponding to the stream-scope from the selected-stream data structure includes removing an entry from the selected-stream data structure corresponding to the stream ID.
- the method 300 may be extended to include removing entries from the selected-stream data structure with a timestamp beyond a threshold.
- the method 300 may be optionally extended to include assigning the stability value based on the KVS tree scope.
- the stability value is one of a predefined set of stability values.
- the predefined set of stability values includes HOT, WARM, and COLD, wherein HOT indicates a lowest expected lifetime of the data on the multi-stream storage device and COLD indicates a highest expected lifetime of the data on the multi-stream storage device.
- assigning the stability value includes locating the stability value from a data structure using a portion of the KVS tree scope.
- the portion of the KVS tree scope includes a level ID for the data.
- the portion of the KVS tree scope includes a type for the data.
- the portion of the KVS tree scope includes a tree ID for the data. In an example, the portion of the KVS tree scope includes a level ID for the data. In an example, the portion of the KVS tree scope includes a node ID for the data.
- FIG. 4 is a block diagram illustrating an example of a storage organization for keys and values according to an embodiment.
- a kvset may be stored using key-blocks to hold keys (along with tombstones as needed) and value-blocks to hold values.
- the key-blocks may also contain indexes and other information (such as bloom filters) for efficiently locating a single key, locating a range of keys, or generating the total ordering of all keys in the kvset, including key tombstones, and for obtaining the values associated with those keys, if any.
- a tree representation for the kvset is illustrated to span the key-blocks 410 and 415 .
- the leaf nodes contain value references (VID) to the values 425 , 430 , 435 , and 445 , and two keys with tombstones. This illustrates that, in an example, the tombstone does not have a corresponding value in a value block, even though it may be referred to as a type of key-value pair.
- the illustration of the value blocks demonstrates that each may have a header and values that run next to each other without delineation.
- the reference to particular bits in the value block for a value, such as value 425 are generally stored in the corresponding key entry, for example, in an offset and extent format.
- FIG. 5 is a block diagram illustrating an example of a configuration for key-blocks and value-blocks, according to an embodiment.
- the key-block and value block organization of FIG. 5 illustrates the generally simple nature of the extension key-block and the value-blocks. Specifically, each are generally a simple storage container with a header to identify its type (e.g., key-block or value-block) and perhaps a size, location on storage, or other meta data.
- the value-block includes a header 540 with a magic number indicating that it is a value-block and storage 545 to store bits of values.
- the key-extension block includes a header 525 indicating that it is an extension block and stores a portion of the key structure 530 , such as a KB tree, B-tree, or the like.
- the primary key-block provides a location for many kvset meta data in addition to simply storing the key structure.
- the primary key-block includes a root of the key structure 520 .
- the primary key block may also include a header 505 , bloom filters 510 , or a portion of the key structure 515 .
- Reference to the components of the primary key-block are included in the header 505 , such as the blocks of the bloom filter 510 , or the root node 520 .
- Metrics such as kvset size, value-block addresses, compaction performance, or use may also be contained in the header 505 .
- the bloom filters 510 are computed when the kvset is created and provide a ready mechanism to ascertain whether a key is not in the kvset without performing a search on the key structure. This advance permits greater efficiency in scanning operations as noted below.
- FIG. 6 illustrates an example of a KB tree 600 , according to an embodiment.
- An example key structure to use in a kvset's key-blocks is the KB tree.
- the KB tree 600 has structural similarities to B+ trees.
- the KB tree 600 has 4096-byte nodes (e.g., node 605 , 610 , and 615 ). All keys of the KB tree reside in leaf nodes (e.g., node 615 ).
- Internal nodes e.g., node 610
- the result of a key lookup is a value reference, which may be, in an example, to a value-block ID, an offset and a length.
- the KB tree 600 has the following properties:
- the KB tree 600 may be searched via a binary search among the keys in the root node 605 to find the appropriate “edge” key.
- the link to the edge key's child may be followed. This procedure is then repeated until a match is found in a leaf node 615 or no match is found.
- creating the KB tree 600 may be different than other tree structures that mutate over time.
- the KB tree 600 may be created in a bottom-up fashion.
- the leaf nodes 615 are created first, followed by their parents 610 , and so on until there is one node left—the root node 605 .
- creation starts with a single empty leaf node, the current node. Each new key is added to the current node. When the current node becomes full, a new leaf node is created and it becomes the current node. When the last key is added, all leaf nodes are complete.
- nodes at the next level up i.e., the parents of the leaf nodes
- nodes at the next level up are created in a similar fashion, using the maximum key from each leaf node as the input stream. When those keys are exhausted, that level is complete. This process repeats until the most recently created level consists of a single node, the root node 605 .
- an edge that crosses from a first key-block to a second key-block includes a reference to the second key-block.
- FIG. 7 is a block diagram illustrating KVS tree ingestion, according to an embodiment.
- a KVS tree the process of writing a new kvset to the root node 730 is referred to as an ingest.
- Key-value pairs 705 (including tombstones) are accumulated in-memory 710 of the KVS tree, and are organized into kvsets ordered from newest 715 to oldest 720 .
- the kvset 715 may be mutable to accept key-value pairs synchronously. This is the only mutable kvset variation in the KVS tree.
- the ingest 725 writes the key-value pairs and tombstones in the oldest kvset 720 in main memory 710 to a new (and the newest) kvset 735 in the root node 730 of the KVS tree, and then deletes that kvset 720 from main memory 710 .
- a key-value set (kvset) is received to store in a key-value data structure.
- the key-value data structure is organized as a tree and the kvset includes a mapping of unique keys to values.
- the keys and the values of the kvset are immutable and nodes of the tree have a temporally ordered sequence of kvsets.
- the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset. In an example, the primary key-block includes a list of media block identifications for value-blocks in the set of value blocks. In an example, the primary key-block includes a copy of a lowest key in a key-tree of the kvset, the lowest key determined by a pre-set sort-order of the tree. In an example, the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree. In an example, the primary key-block includes a header to a key-tree of the kvset.
- the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- values are stored in a set of value-blocks operation 805 .
- members of the set of value-blocks corresponding to media blocks for the at least one storage medium with each value-block including a header to identify it as a value-block.
- a value block includes storage section to one or more values without separation between values.
- the primary key-block includes a set of metrics for the kvset.
- the set of metrics include a total number of keys stored in the kvset.
- the set of metrics include a number of keys with tombstone values stored in the kvset.
- the set of metrics include a sum of all key lengths for keys stored in the kvset.
- the set of metrics include a sum of all value lengths for keys stored in the kvset.
- the set of metrics include an amount of unreferenced data in value-blocks of the kvset.
- the kvset is written to a sequence of kvsets of a root-node of the tree.
- the method 800 may be extended to include operations 815 - 825 .
- a key and a corresponding value to store in the key-value data structure are received.
- the key and the value are placed in a preliminary kvset, the preliminary kvset being mutable.
- a rate of writing to the preliminary root node is beyond a threshold.
- the method 800 may be extended to throttle write requests to the key-value data structure.
- the kvset is written to the key-value data structure when a metric is reached.
- the metric is a size of a preliminary root node.
- the metric is an elapsed time.
- KVS trees may use compaction. Details of several compaction operations are discussed below with respect to FIGS. 9-18 .
- the illustrated compaction operations are forms of garbage collection because they may remove obsolete data, such as keys or key-value pairs during the merge.
- Compaction occurs under a variety of triggering conditions, such as when the kvsets in a node meet specified or computed criteria.
- compaction criteria include the total size of the kvsets or the amount of garbage in the kvsets.
- garbage in kvsets is key-value pairs or tombstones in one kvset rendered obsolete, for example, by a key-value pair or tombstone in a newer kvset, or a key-value pair that has violated a time-to-live constraint, among others.
- garbage in kvsets is unreferenced data in value-blocks (unreferenced values) resulting from key compactions.
- the inputs to a compaction operation are some or all of the kvsets in a node at the time the compaction criteria are met. These kvsets are called a merge set and comprise a temporally consecutive sequence of two or more kvsets.
- the method 800 may be extended to support compaction, however, the following operations may also be triggered when, for example, there are free processing resources, or other convenient scenarios to perform the maintenance.
- the KVS tree may be compacted.
- the compacting is performed in response to a trigger.
- the trigger is an expiration of a time period.
- the trigger is a metric of the node.
- the metric is a total size of kvsets of the node.
- the metric is a number of kvsets of the node.
- the metric is a total size of unreferenced values of the node.
- the metric is a number of unreferenced values.
- FIG. 9 is a block diagram illustrating key compaction, according to an embodiment.
- Key compaction reads the keys and tombstones, but not values, from the merge set, removes all obsolete keys or tombstones, writes the resulting keys and tombstones into one or more new kvsets (e.g., by writing into new key-blocks), deletes the key-stores, but not the values, from the node.
- the new kvsets atomically replace, and are logically equivalent to, the merge set both in content and in placement within the logical ordering of kvsets from newest to oldest in the node.
- the kvsets KVS 3 (the newest), KVS 2 , and KVS 1 (the oldest) undergo key compaction for the node.
- the key-stores for these kvsets are merged, collisions on keys A and B occur.
- the new kvset, KVS 4 (illustrated below), may only contain one of each merged key, the collisions are resolved in favor of the most recent (the leftmost as illustrated) keys, referring to value ID 10 and value ID 11 for keys A and B respectively.
- Key C has no collision and so will be included in the new kvset.
- the key entries that will be part of the new kvset, KVS 4 are shaded in the top node.
- KVS 4 is drawn to span KVS 1 , KVS 2 , and KVS 3 in the node and the value entries are drawn in a similar location in the node.
- the purpose of these positions demonstrates that the values are not changed in a key compaction, but rather only the keys are changed. As explained below, this provides a more efficient search by reducing the number of kvsets searched in any given node and may also provide valuable insights to direct maintenance operations.
- the values 20 and 30 are illustrated with dashed lines, denoting that they persist in the node but are no longer referenced by a key entry as their respective key entries were removed in the compaction.
- Key compaction is non-blocking as a new kvset (e.g., KVS 5 ) may be placed in the newest position (e.g., to the left) of KVS 3 or KVS 4 during the compaction because, by definition, the added kvset will be logically newer than the kvset resulting from the key compaction (e.g., KVS 4 ).
- FIG. 10 illustrates an example of a method 1000 for key compaction, according to an embodiment.
- the operations of the method 1000 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- a subset of kvsets from a sequence of kvsets for the node is selected.
- the subset of kvsets are contiguous kvsets and include an oldest kvset.
- a set of collision keys is located.
- Members of the set of collision keys including key entries in at least two kvsets in the sequence of kvsets for the node.
- a most recent key entry for each member of the set of collision keys is added to a new kvset.
- the node has no children
- the subset of kvsets includes the oldest kvset
- writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that include a tombstone.
- writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that are expired.
- entries for each key in members of the subset of kvsets that are not in the set of collision keys are added to the new kvset.
- operation 1020 and 1015 may operate concurrently to add entries to the new kvset.
- the subset of kvsets is replaced with the new kvset by writing the new kvset and removing (e.g., deleting, marking for deletion, etc.) the subset of kvsets.
- FIG. 11 is a block diagram illustrating key-value compaction, according to an embodiment.
- Key value compaction differs from key compaction in its treatment of values.
- Key-value compaction reads the key-value pairs and tombstones from the merge set, removes obsolete key-value pairs or tombstones, writes the resulting key-value pairs and tombstones to one or more new kvsets in the same node, and deletes the kvsets comprising the merge set from the node.
- the new kvsets atomically replace, and are logically equivalent to, the merge set both in content and in placement within the logical ordering of kvsets from newest to oldest in the node.
- Kvsets KVS 3 , KVS 2 , and KVS 1 comprise the merge set.
- the shaded key entries and values will be kept in the merge and placed in the new KVS 4 , written to the node to replace KVS 3 , KVS 2 , and KVS 1 .
- the key collisions for keys A and B are resolved in favor of the most recent entries.
- What is different in key-value compaction from key compaction is the removal of the unreferenced values.
- KVS 4 is illustrated to consume only the space required to hold its current keys and values.
- KVS 4 when keys and values are stored separately in key-block and value-blocks, KVS 4 includes both new key-blocks (like the result of key compaction) and new value blocks (unlike the result of key compaction). Again, however, key-value compaction does not block writing additional kvsets to the node while the key-value compaction is executing because the added kvsets will be logically newer than the KVS 4 , the result of the key-value compaction. Accordingly, KVS 4 is illustrated in the oldest position (e.g., to the right) of the node.
- FIG. 12 illustrates an example of a method 1200 for key-value compaction, according to an embodiment.
- the operations of the method 1200 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- a subset of kvsets (e.g., a merge set) from a sequence of kvsets for the node is selected.
- the subset of kvsets are contiguous kvsets and include an oldest kvset.
- a set of collision keys is located.
- Members of the set of collision keys including key entries in at least two kvsets in the sequence of kvsets for the node.
- a most recent key entry, and corresponding value, for each member of the set of collision keys is added to a new kvset.
- writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that include a tombstone.
- writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that are expired.
- entries for each key, and value, in members of the subset of kvsets that are not in the set of collision keys are added to the new kvset.
- the subset of kvsets is replaced with the new kvset by writing the new kvset (e.g., to storage) and removing the subset of kvsets.
- Spill and hoist compactions discussed below with respect to FIGS. 15-18 are a form of key-value compaction where the resultant kvsets are placed in a child node or a parent node respectively. As each traverses the tree, and the KVS tree enforces a determinative mapping between parents and children, a brief discussion of this determinative mapping is here presented before discussing these other compaction operations.
- FIG. 13 illustrates an example of a spill value and its relation to a tree, according to an embodiment.
- the determinative mapping ensures that, given a key, one may know which child a key-value pair will be mapped to without regard to the KVS tree's contents.
- a spill function accepts a key and produces a spill value corresponding to the determinative mapping for the KVS tree.
- the spill function accepts both the key and a current tree-level and produces a spill value specific to a parent or a child node for the key at that tree-level.
- a simple determinative mapping may include, for example, an alphabetical mapping where, for keys composed of alphabet characters, each tree-level includes a child for each letter of the alphabet, and the mapping uses the characters of the keys in turn; such as the first character determines the L1 child, the second character determines the L2 child, and so one. While simple and meeting the determinative mapping of the KVS tree, this technique suffers somewhat from rigidity, poor balance in the tree, and a lack of control over tree fanning.
- a better technique is to perform a hash on the key's and designate portions of the hash for each tree-level mapping. This ensures that the keys are evenly spread (assuming an adequate hash technique) as they traverse the tree and that fan-out is controlled by selecting the size of the hash portions for any given tree-level. Further, as hash techniques generally allow the size of the resultant hash to be configured, an adequate number of bits, for example, may be ensured, avoiding a problem with the simple technique discussed above, where a short word (such as “the”) has only enough characters for a three level tree.
- FIG. 13 illustrates a result of the key hash with portions 1305 , 1310 , and 1315 respectively corresponding to L1, L2, and L3 of the tree.
- a traversal of the tree proceeds along the dashed lines and nodes. Specifically, starting at the root node 1320 , portion 1305 directs the traversal to node 1325 . Next, portion 1310 directs the traversal to node 1330 . The traversal completes as portion 1315 points toward node 1335 at the deepest level of the tree possible based on the size and apportionment of the illustrated key hash.
- a hash of the key K (or a subkey of key K) is called the spill value for key K.
- two different keys may have the same spill value.
- sub keys it is often desirable for this to occur to enable prefix scarring or tombstones as discussed below.
- the spill value for a given key K is a constant, and the binary representation of the spill value comprises B bits.
- the B bits in a spill value are numbered zero through (B ⁇ 1).
- the KVS tree is configured such that nodes at tree-level L all have the same number of child nodes, and this number of child nodes is an integer power of two greater than or equal to two.
- the bits of the spill value for a key K for key distribution may be used as illustrated below.
- the spill value for key K specifies the child node of the node used for spill compaction as follows:
- the table below illustrates a specific example of the above radix-based key distribution technique given a KVS tree with seven (7) levels, a key K, and a 16-bit spill value for key K:
- Level 0 1 2 3 4 5 Child node count 2 8 4 16 32 2 Spill value bits 0 1-3 4-5 6-9 10-14 15 Key K spill value 0 110 01 1110 10001 1 Child node selected 0 6 1 14 17 1
- Level is a level number in the KVS tree; Child node count is the number of child nodes configured for all nodes at the specified level; Spill value bits is the spill value bit numbers that spill compaction uses for key distribution at the specified level;
- Key K spill value is the binary representation of the given 16-bit spill value for the given key K, specifically 0110011110100011—for clarity, the spill value is segmented into the bits that spill compaction uses for key distribution at the specified level;
- Child node selected is the child node number that spill compaction selects for any (non-obsolete) key-value pair or tombstone with the given spill value—this includes all(non-obsolete) key-value pairs or tombstones with the given key K, as well as other keys different from key K that may
- the spill value computation and spill value size (in bits) may be the same for all keys.
- using an adequate hash permits controlling the number of bits in the spill value while also, for example, ensuring a spill value size sufficient to accommodate a desired number of tree-levels and a desired number of child nodes for the nodes at each level.
- the spill value for a key K may be either computed as needed or stored on storage media (e.g., cached).
- FIG. 14 illustrates an example of a method 1400 for a spill value function, according to an embodiment.
- the operations of the method 1400 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- a portion of a key is extracted.
- the portion of the key is the entire key.
- a spill value is derived from the portion of the key.
- deriving the spill value from the portion of the key includes performing a hash of the portion of the key.
- a portion of the spill value is returned based on the tree-level of the parent node.
- returning the portion of the spill value based on the tree-level of the parent node includes applying a pre-set apportionment to the spill value, and returning the portion of the spill value corresponding to the pre-set apportionment and the tree-level of the parent node.
- the pre-set apportionment defines the portions of the spill value that apply to respective levels of the tree.
- the pre-set apportionment defines a maximum number of child nodes for at least some of the tree-levels. In an example, the pre-set apportionment defines a maximum depth to the tree. In an example, the pre-set apportionment defines a sequence of bit-counts, each bit-count specifying a number of bits, the sequence ordered from low tree-levels to high-tree levels such that the spill value portion for the lowest tree-level is equal to a number of bits equal to the first bit-count starting at the beginning of the spill value and the spill value portion for the n-th tree-level is equal to the n-th bit-count in the sequence of bit counts with an offset into the spill value of the sum of bit counts starting at the first bit-count and ending at a n minus one bit-count.
- FIG. 15 is a block diagram illustrating spill compaction, according to an embodiment.
- spill compaction is a combination of a key-value compaction with a tree traversal (to a child node) to place the resultant kvsets.
- spill compaction (or just spill) reads the key-value pairs and tombstones from the merge set, removes all obsolete key-value pairs or tombstones (garbage), writes the resulting key-value pairs and tombstones to new kvsets in some or all of the child nodes of the node containing the merge set, and deletes the kvsets comprising the merge set.
- These new kvsets atomically replace, and are logically equivalent to, the merge set.
- Spill compaction uses a deterministic technique for distributing the key-value pairs and tombstones in a merge set to the child nodes of the node containing the merge set.
- spill compaction may use any such key distribution method such that for a given node and a given key K, spill compaction always writes any (non-obsolete) key-value pair or tombstone with key K to the same child node of that node.
- spill compaction uses a radix-based key distribution method such as the one in the example presented in detail below.
- the parent node includes two kvsets that comprise the merge set.
- Key-value pairs 1505 , 1510 , and 1515 in the two kvsets respectively have spill values of 00X, 01X, and 11X, which respectively correspond to three of the parent node's four child nodes.
- key-value pair 1505 is placed into the new kvset X
- key-value pair 1510 is placed into the new kvset Y
- key-value pair 1515 is placed into the new kvset Z, with each new kvset being written to the child corresponding to the spill value.
- the new kvsets are written to the newest (e.g., left-most) position in the respective child nodes.
- the merge set for a spill compaction must include the oldest kvset in the node containing the merge set. In an example, if the node containing the merge set has no child nodes at the start of a spill compaction, the configured number of child nodes is created.
- a subset of the sequence of kvsets is selected.
- the subset includes contiguous kvsets that also includes an oldest kvset.
- a child-mapping for each key in each kvset of the subset of kvsets is calculated.
- the child mapping is a determinative map from a parent node to a child node based on a particular key and a tree-level of the parent node.
- keys and corresponding values are collected into kvsets based on the child-mapping with each kvset set mapped to exactly one child node. Key collisions may occur during this collection. As discussed above with respect to FIGS. 10 and 12 , such a collision is resolved in favor of the newer key entry.
- the kvsets are written to a newest position in respective sequences of kvsets in respective child nodes.
- the method 1600 may be extended to include performing a second spill operation on a child node in response to a metric of the child node exceeding a threshold after operation of the spill operation.
- FIG. 17 is a block diagram illustrating hoist compaction, according to an embodiment.
- Hoist compaction differs from spill compaction in that the new kvset is written to a parent node.
- hoist compaction, or just hoist reads the key-value pairs and tombstones from the merge set, removes all obsolete key-value pairs or tombstones, writes the resulting key-value pairs and tombstones to new kvsets in the parent node of the node containing the merge set, and deletes the kvsets comprising the merge set.
- These new kvsets atomically replace, and are logically equivalent to, the merge set.
- a hoist compaction includes the newest kvset in the node containing the merge set and the kvsets resulting from the hoist compaction are placed in the oldest position in the sequence of kvsets in the parent node of the node Unlike the other compactions discussed above, in order to ensure that the newest kvset from the node being compacted is in the merge set, new kvsets cannot be added to the node containing the merge set while the hoist compaction is executing. Thus, the hoist compaction is a blocking compaction.
- the key-value pairs of KVS 1705 and 1710 are merged into the new KVS M 1715 and stored in the oldest position in the parent node's sequence of kvsets.
- a hoist compaction may be applied to a merge set when, for example, the goal is to reduce the number of levels in a KVS tree and thereby increase the efficiency of searching for keys in the KVS tree.
- FIG. 18 illustrates an example of a method 1800 for hoist compaction, according to an embodiment.
- the operations of the method 1800 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- electronic hardware such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- a key and value compaction is performed on the child node to produce a new kvset without writing the new kvset to the child node.
- the new kvset is written to the node in an oldest position for a sequence of kvsets of the node.
- Key-value compaction, spill compaction, and hoist compaction operations may physically remove obsolete key-value pairs and tombstones from a merge set and may thereby reduce the amount (for example in bytes) of key-value data stored in a KVS tree. In doing do, these compaction operations read non-obsolete values from value-blocks, for example, in the merge set and write these values to value-blocks in the kvsets resulting from the compaction operation.
- a key compaction operation may physically remove keys (and tombstones) but only logically removes values from a merge set.
- the values physically remain in the kvsets resulting from the key compaction.
- Key compaction may increase the efficiency of searching for keys in the node containing the merge set by reducing the number of kvsets in that node while avoiding the additional reading and writing of value-blocks incurred by, for example, a key-value compaction operation.
- the key compaction provides useful information for future maintenance operations. Key compaction is uniquely supported by KVS trees due to the separation of keys and values in key-blocks and value-blocks as described above.
- the KVS tree maintenance techniques operate when a trigger condition is met. Controlling when and where (e.g., which nodes) maintenance occurs may provide optimizations to processing, or time, spent versus increased space or searching efficiency.
- Some metrics gathered during maintenance, or during ingestion, may enhance the system's ability to optimize later maintenance operations. Here, these metrics are referred to either as a garbage metric or an estimated garbage metric based on how the metric was computed. Examples of such garbage metrics include the number of obsolete key-value pairs and tombstones in a node or the amount of storage capacity they consume, and the amount of storage capacity consumed by unreferenced data in value-blocks in a node. Such garbage metrics indicate how much garbage may be eliminated by performing, for example, a key-value compaction, spill compaction, or hoist compaction on the kvsets of a node.
- some kvset statistics may be gathered or maintained.
- these statistics are maintained within the kvset set itself, such as in a primary key-block header for the kvset.
- Computed garbage metrics involve the computation of known quantities to produce a known result. For example, if it is known that there are n-bits that are obsolete in a kvset, key-value compacting the kvset will result in freeing those n-bits.
- a source of metrics for computed garbage metrics are key compactions. Key compactions logically remove obsolete key-value pairs and tombstones, and physically remove redundant keys, from a merge set. However, unreferenced data may remain in the value-blocks of the kvsets resulting from key compactions. Thus, key compaction results in knowing which values are unreferenced in the new kvset and their size. Knowing the size of those values permits an accurate count of storage that will be freed under other compactions. Thus, when executing a key compaction on a merge set in a KVS tree, garbage metrics for each of the resulting kvsets may be recorded in the respective kvsets.
- Example garbage metrics that may be maintained from a key compaction include:
- garbage metrics recorded from the first key compaction may be added to like garbage metrics recorded from the second key compaction. For example, if the first key compaction operation resulted in a single kvset S with associated key compaction garbage metrics specifying Ucnt count of unreferenced values, then Ucnt may be included in the count of unreferenced values in the key compaction garbage metrics resulting from the second key compaction operation.
- the key compaction garbage metrics recorded may include:
- Estimated garbage metrics provide a value that estimates the gain from performing a compaction on a node. Generally, estimated garbage metrics are gathered without performing a key compaction. The following terms are used in the discussion below. Let:
- a form of estimated garbage metrics are historical garbage metrics.
- Historical garbage collection information may be used to estimate garbage metrics for a given node in a KVS tree. Examples of such historical garbage collection information include, but are not limited to:
- a Node Simple Moving Average may be performed to create the historical garbage metrics.
- NSMA(E) mean of fractions of obsolete key-value pairs in the most recent E executions of garbage collection operations in the given node, where E is configurable.
- the NodeSMA estimated garbage metrics for the given node may include the following:
- LevelSMA Level Simple Moving Average
- E mean of fractions of obsolete key-value pairs in the most recent E executions of garbage collection operations in any node at the same level of the KVS tree as the given node, where E is configurable.
- the LevelSMA estimated garbage metrics for the given node may include:
- a given kvset includes a bloom filter to efficiently determine if the kvset might contain a given key, where there is one entry in the bloom filter for the kvset for each key in the kvset.
- These bloom filters may be used to estimate garbage metrics for a given node in a KVS tree.
- BloomDelta garbage metrics Given a node in a KVS tree, the bloom-estimated cardinally of the node and kvset statistics permit estimated garbage metrics for the node to be generated in several ways.
- the BloomDelta garbage metrics for the given node may include:
- Probabilistic filters different than bloom filters for which it is possible to approximate the cardinality of the intersection of sets of keys represented by two or more such filters, may be used as a substitute for bloom filters in the estimated garbage metrics.
- Computed and estimated garbage metrics may be combined to produce hybrid garbage metrics, another form of estimated garbage metrics due to the inclusion of another form of estimated garbage metrics. For example, given a node comprising T kvsets, if key compaction garbage metrics are available for W of these kvsets and W ⁇ T, then hybrid garbage metrics for the node may be generated as follows. For the W kvsets in the node for which key compaction garbage metrics are available, let:
- Garbage metrics allow the prioritization of garbage collection operations to the tree-levels or nodes with a sufficient amount of garbage to justify the overhead of a garbage collection operation. Prioritizing garbage collection operations in this manner increases their efficiency and reduces associated write-amplification. In addition, estimating the number of valid key-value pairs and number of obsolete key-value pairs in the tree, and the amount of storage capacity consumed by each category, is useful in reporting capacity utilization for the tree.
- FIG. 19 illustrates an example of a method 1900 for performing maintenance on a KVS tree, according to an embodiment.
- the operations of the method 1900 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- a kvset is created for a node in a KVS tree.
- a set of kvset metrics is computed for the kvset.
- the set of kvset metrics include a number of key-value pairs in the kvset.
- the set of kvset metrics include a number of tombstones in the kvset.
- the set of kvset metrics include a storage capacity to store all key entries for key-value pairs and tombstones in the kvset.
- the set of kvset metrics include a storage capacity for all values of key-value pairs in the kvset.
- the set of kvset metrics include key size statistics for keys in the kvset.
- the key size statistics include at least one of maximum, minimum, median, or mean.
- the set of kvset metrics include value size statistics for keys in the kvset.
- the value size statistics include at least one of maximum, minimum, median, or mean.
- the set of kvset metrics include a minimum or a maximum time-to-live (TTL) value for a key-value pair in the kvset.
- TTL may be useful when a an ingest operation specifies a period for which a key-value pair will be valid. Thus, after the key-value pair's expiration, it is a prime target for reclamation via a compaction operation.
- the kvset is created in response to a compaction operation.
- the compaction operation is at least one of a key compaction, a key-value compaction, a spill compaction, or a hoist compaction.
- the compaction operation is a key compaction.
- the set of kvset metrics may include metrics of unreferenced values in the kvset as a result of the key compaction.
- the unreferenced value metrics include at least one of a count of unreferenced values or a storage capacity consumed by unreferenced values.
- the storage capacity consumed is measured in bits, bytes, blocks, or the like used by an underlying storage device to hold key entries or values as the case may be.
- the set of kvset metrics may include an estimate of obsolete key-value pairs in the kvset.
- the estimate is such because the compaction only gains insight into obsolete (e.g., superseded) key-value pairs in the merge set subject to the compaction and thus does not know whether a seemingly current key-value pair is made obsolete by an entry in a newer kvset that is not part of the compaction.
- the estimate of obsolete key-value pairs may be calculated by summing a number of key entries from pre-compaction kvsets that were not included in the kvset.
- a number of obsolete pairs with respect to the merge set, will be known and may be used as an estimate of obsolete data in the created kvset.
- an estimate of valid key-value pairs in the kvset may be calculated by summing a number of key entries from pre-compaction kvsets that were included in the kvset and be a part of the set of kvset metrics.
- the set of kvset metrics include an estimated storage size of obsolete key-value pairs in the kvset.
- an estimated storage size of valid key-value pairs in the kvset the estimated storage size of valid key-value pairs calculated by summing storage sizes of key entries and corresponding values from pre-compaction kvsets that were included in the kvset.
- These estimates may be used for historical metrics as, unless a key-compaction is performed, the estimated obsolete values will be removed in the compaction. However, if a node has a regular (e.g., historical) performance in a compaction, one may assume that this performance continues in the future.
- the set of kvset metrics are stored in the kvset (e.g., in a primary key block header). In an example, the set of kvset metrics are stored in the node and not in the kvset. In an example, a subset of the kvset metrics are stored in the kvset and a second subset of the kvset metrics are stored in the node.
- the kvset is added to the node. Generally, once added to the node, the kvset is also written (e.g., to on-disk storage).
- the node is selected for a compaction operation based on a metric in the set of kvset metrics.
- the kvset metrics or the node metrics discussed below, or both, may contribute to a decision by a garbage collector or similar tree maintenance process.
- selecting the node for the compaction operation includes collecting sets of kvset metrics for a multiple of nodes, sorting the multiple of nodes based on the sets of kvset metrics, and selecting a subset of the multiple of nodes based on a sort order from the sorting.
- operation 1920 may be implemented such that performing the compaction operation on the node includes performing the compaction operation on each node in the subset of the multiple of nodes (including the node).
- a cardinality of the subset of the multiple of nodes is set by a performance value.
- the performance value is an efficiency of performing the compaction as measured by space recovered. This may often be implemented as a threshold.
- a threshold function may be used that accepts a number of parameters, such as the amount of unused storage capacity left on the underlying storage device and an estimate of capacity to be reclaimed in the compaction operation to arrive at a decision as to whether or not to perform a given compaction operation.
- the compaction operation is performed on the node.
- a type of compaction operation e.g., key compaction key-value compaction, spill compaction, or hoist compaction
- a metric in the set of kvset metrics is selected.
- the operations of the method 1900 may be extended to include modifying node metrics in response to adding the kvset to the node.
- the node metrics include a value of a fraction of estimated obsolete key-value pairs in kvsets subject to prior compactions performed on a node group including the node.
- the value is a simple average.
- the value is a moving average.
- the value is a weighted average.
- the value is a mean of the fraction of estimated obsolete key-value pairs in kvsets subject to a set number of most recent prior compactions for the node.
- the value is a mean of the fraction of estimated obsolete key-value pairs in kvsets subject to a set number of most recent prior compactions for all nodes at a tree-level of the node.
- node group includes only the node.
- the node group includes all nodes on a tree-level of the node.
- the node metrics include a summation of like metrics in the set of kvset metrics resulting from a compaction operation and previous kvset metrics from compaction operations performed on the node.
- the node metrics include an estimated number of keys that are the same in the kvset and a different kvset of the node.
- the estimated number of keys are calculated by obtaining a first key bloom filter from the kvset, obtaining a second key bloom filter from the different kvset, and intersecting the first key bloom filter and the second key bloom filter to produce a node bloom filter estimated cardinality (NBEC).
- NBEC node bloom filter estimated cardinality
- the node metrics include subtracting the NBEC from a NKVcnt value to estimate a number of obsolete key-value pairs in the node.
- the NKVcnt value is a total count of key value pairs in each kvset of the node for which a bloom filter was intersected to produce the NBEC.
- the node metrics include multiplying a NKVcap value by a Fobs value.
- the NKVcap value is a total storage capacity used by keys and values in each kvset in the node for which a bloom filter was intersected to produce the NBEC
- the Fobs value is the result of subtracting the NBEC from an NKVcnt value and dividing by NKVcnt, where the NKVcnt value is a total count of key value pairs in each kvset of the node for which a bloom filter was intersected to produce the NBEC.
- the node metrics are stored in the node.
- the node metrics are stored along with node metrics from other nodes.
- the node metrics are stored in a tree-level, the tree-level being common to all nodes in a level of the KVS tree.
- garbage collection metrics and their use described above to improve KVS tree performance may be aided in a number of ways by modifying the vanilla operation of the KVS tree or elements therein (e.g., tombstones) under certain circumstances. Examples may include tombstone acceleration, update tombstones, prefix tombstones, or immutable data KVS trees.
- a tombstone represents a deleted key-value in a KVS tree.
- the compaction includes the oldest kvset in the leaf, it is actually removed, but otherwise remains to prevent a possibly obsolete value for the key being returned in a search.
- tombstone acceleration includes writing non-obsolete tombstones to one or more new kvsets in some or all of these child nodes following the key distribution method used for spill compaction in the KVS tree.
- the merge set for a key compaction or key-value compaction operation includes the oldest kvset in the node containing the merge set, then accelerated tombstones (if any) need not be included in the new kvsets created by the compaction operation in that node. Otherwise, if the merge set for a key compaction or key-value compaction operation does not include the oldest kvset in the node containing the merge set, then accelerated tombstones (if any) are also included in the new kvsets created by the compaction operation in that node.
- the distribution of the accelerated tombstones into older areas of the KVS tree facilitates garbage collection by allowing the removal of key-value pairs in child nodes without waiting for the original tombstones to be pushed to the child nodes.
- a key compaction or key-value compaction operation may apply specified or computed criteria to determine whether or not to also perform tombstone acceleration.
- tombstone acceleration criteria include, but are not limited to, the number of non-obsolete tombstones in a merge set and the amount (for example in bytes) of key-value data logically deleted by the tombstones in a merge set which may be known or an estimate.
- Update tombstones operate similarly to accelerated tombstones though the original ingest value is not a tombstone. Essentially, when a new value is added to the KVS tree, all older values for that key may be garbage collected. Pushing a tombstone, akin to an accelerated tombstone, down the tree will allow compactions on these child nodes to remove the obsolete values.
- an ingest operation adds a new kvset to the root node and a key-value pair with key K in this new kvset includes a flag or other indicator that it is an update key-value pair that is replacing a key-value pair with key K that was included in an earlier ingest operation. It is an expectation, but not a requirement, that this indicator is accurate. If an update key-value pair with key K is included with an ingest operation, and if the root node has child nodes, then the ingest operation may also write a key tombstone for key K, the update tombstone, to a new kvset in a child node of the root node following the key distribution method used for spill compaction in the KVS tree.
- a key compaction or key-value compaction operation on a merge set in the root node may, in response to processing an update key-value pair with key K, also write a key tombstone for key K, again referred to an as update tombstone, to a new kvset in a child node of the root node following the key distribution method used for spill compaction in the KVS tree.
- a key tombstone for key K again referred to an as update tombstone
- KVS tree prefix operations are discussed below with respect to FIG. 25 , the concept may be used in tombstones as well.
- prefix operations a portion of the key, the prefix, is used for matches.
- the prefix portion of the key is used in its entirety to create the spill value, although a smaller portion may be used with deeper tree determinations fanning out to all children after the prefix path is consumed.
- Prefix tombstones use the power of the prefix matching multiple values to have a single entry represent the deletion of many key-value pairs.
- spill compaction uses a key distribution method based on a spill value of the first sub key of the keys, the first sub key being the key prefix.
- the prefix tombstone is a logical record comprising the key prefix and indicates that all keys starting with the prefix and their associated values, if any, have been logically deleted from the KVS tree at a particular point in time.
- a prefix tombstone serves the same purpose in a KVS tree as a key tombstone, except that a prefix tombstone may logically delete more than one valid key-value pair whereas a key tombstone may logically delete exactly one valid key-value pair.
- tombstone acceleration may be applied to prefix tombstones as well as key tombstones.
- Prefix tombstones may be treated differently than key tombstones in applying tombstone acceleration criteria because prefix tombstones may result in the physical removal of a large number of obsolete key-value pairs or tombstones in subsequent garbage collection operations.
- tombstone acceleration techniques discussed above result in a greater number of kvsets being created and thus may be inefficient.
- a tombstone may include a size of the data it is replacing from the application. This information may be used by the system to determine whether or not to perform the tombstone acceleration (or generate update tombstones) discussed above.
- Some data may be immutable.
- immutable key-value data include time series data, log data, sensor data, machine-generated data, and the output of database extract, transform, and load (ETL) processes, among others.
- a KVS tree may be configured to store immutable key-value data. In such a configuration the expectation, but not requirement, is that kvsets added to the KVS tree by an ingest operation do not contain tombstones.
- a KVS tree may be configured to store an amount of immutable data that is only restricted by the capacity of the storage media containing the KVS tree.
- the only garbage collection operation executed is key compaction.
- key compaction is performed to increase the efficiency of searching for keys in the KVS tree by reducing the number of kvsets in the root node.
- the root node will be the only node in the KVS tree.
- the compaction criteria may include the number of kvsets in the root node, or key search time statistics, such as the minimum, maximum, average and mean time to search.
- the merge set for a key compaction may include some or all of the kvsets in the root node.
- the KVS tree may be configured to store an amount of immutable data that is restricted by a retention criterion that may be enforced by removing key-value pairs from the KVS tree in a first-in first-out (FIFO) manner.
- retention criterion include: the maximum count of key-value pairs in the KVS tree; the maximum bytes of key-value data in the KVS tree; or the maximum age of a key-value pair in the KVS tree.
- the only garbage collection operation executed is key compaction.
- the key compaction is performed both to increase the efficiency of searching for keys in the KVS tree by reducing the number of kvsets in the root node and to facilitate removing key-value pairs from the KVS tree in a FIFO manner to enforce the retention criterion.
- the compaction criteria may specify that a key compaction is executed whenever two or more consecutive kvsets in the root node, comprising the merge set for the key compaction, meet a configured fraction of the retention criterion, referred to as the retention increment. The following are some examples of retention requirements:
- the retention criterion is W key-value pairs in a KVS tree
- the configured retention increment is 0.10*W key-value pairs
- the kvsets in the root node of the KVS tree will each have approximately 0.10*W key-value pairs, with the possible exception of the newest kvsets which combined may have fewer than 0.10*W key-value pairs.
- the KVS tree exceeds W key-value pairs by at least 0.10*W key-value pairs, the oldest kvset in the KVS tree may be deleted.
- tombstone acceleration, update acceleration, or prefix tombstones may be applied to other key-value stores than KVS trees.
- tombstone acceleration or update tombstones may be applied in an LSM Tree variant with one or more garbage collection operations that write key-value data to the same tree-level from which it is read and operate similarly to key compaction or key-value compaction in a KVS tree.
- Update tombstones may also be applied to an LSM Tree variant for which it is permitted to ingest tombstones into the child nodes of the root node.
- prefix tombstones may be used in an LSM Tree variant that either has only one node per level (which is common), or that implements a key distribution method for selecting child nodes based on a portion of a key, such as a sub key.
- tombstone delete size may be applied in an LSM Tree variant using tombstone acceleration.
- the techniques for optimizing garbage collection for immutable key-value data may be applied to an LSM Tree variant with a garbage collection operation that does not read or write values in key-value data, similar to key compaction in a KVS tree.
- garbage collection facilitators improves the efficiency of garbage collection in a KVS tree, or data structures. For example, tombstone acceleration results in tombstones being written to lower levels of the tree sooner than would occur otherwise when applying key compaction, key-value compaction, or a similar operation, thereby making it possible to eliminate garbage more quickly at all levels of the tree.
- Tombstone acceleration used in conjunction with key compaction, or a similar operation achieves these results with far less write-amplification than would result from spill compaction.
- prefix tombstones allow a single tombstone record to delete large numbers of related key-value pairs
- update tombstones bring the benefits of tombstone acceleration to update key-value pairs
- tombstone delete size improves accuracy when evaluating tombstone acceleration criteria
- techniques for optimizing garbage collection for immutable key-value data result in a write-amplification of one (1) for the values in key-value data.
- FIG. 20 illustrates an example of a method 2000 for modifying KVS tree operation, according to an embodiment.
- the operations of the method 2000 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- the method 2000 covers operations to implement a number of the features discussed above regarding tombstone acceleration, update acceleration (e.g., update tombstones), prefix tombstones, and immutable key-value data in KVS trees.
- a request for a KVS tree is received.
- the request includes a key prefix and a tombstone
- the parameter set has a member in the request that defines the tombstone as a prefix-tombstone
- executing the request on the KVS tree includes writing the prefix-tombstone to a kvset of the KVS tree.
- a prefix-tombstone matches any key with the same prefix as the key prefix of the prefix-tombstone on a KVS tree operation comparing keys.
- the request includes a key
- the parameter set includes a member that specifies tombstone acceleration
- executing the request on the KVS tree includes writing a tombstone in at least one child node specified by performing a spill function on the key.
- the spill function is a function that takes a key (or part of a key) as input and produces a spill value, as mentioned above with respect to FIG. 13 .
- the tombstone is written to all extant child nodes specified by performing the spill function on the key.
- the request includes a tombstone.
- the request includes a value.
- a parameter set for the KVS tree is received.
- the request is executed on the KVS tree by modifying operation of the KVS tree in accordance with the parameter.
- the request includes a key, a tombstone, and a storage size of a value in the KVS tree corresponding to the key.
- the parameter set has a member that specifies garbage collection statistics storage and executing the request on the KVS tree includes storing the key and the storage size in a data structure for the KVS tree.
- the tombstone is a prefix-tombstone.
- the KVS tree uses key compaction exclusively when the KVS tree is immutable.
- the method 2000 may be extended to store key search statistics in response to the KVS tree being immutable.
- the key search statistics are at least one of a minimum, maximum, average, or mean time to search.
- the key search statistics are a number of kvsets in the root node.
- the method 2000 may be extended to perform key compaction in response to the key search statistics meeting a threshold.
- the key compaction may include resetting the key search statistics in response to at least one of a compaction, an ingest, after a specified number of searches, or after a specified time interval.
- a third member of the parameter set specifies a retention constraint of the KVS tree, the KVS tree performs key compactions on kvsets based on the retention constraint, and the KVS tree removes an oldest kvset when the retention constraint is violated.
- the retention constraint is a maximum number of key-value pairs.
- the retention constraint is a maximum age of a key-value pair.
- the retention constraint is a maximum storage value consumed by key-value pairs.
- performing key compactions on kvsets based on the retention constraint includes grouping contiguous kvsets to produce a set of groups—a summed metric from each member in the set of groups approximating a fraction of the retention constraint—and performing key compaction on each member of the set of groups.
- FIG. 21 is a block diagram illustrating a key search, according to an embodiment.
- the search progresses by starting at the newest kvset in the root node and progressively moving to older kvsets until the key is found or the oldest kvset in the leaf node does not have the key. Due to the determinative nature of parent-to-child key mappings, there will be only one leaf searched, and the oldest kvset in that leaf will have the oldest key entries. Thus, if the illustrated search path is followed and the key is not found, then the key is not in the KVS tree.
- the search stops as soon as the newest key entry for the key is found.
- the search path moves from newest to oldest and stops as soon as a key entry for the key is located.
- This behavior allows the immutability of the kvsets to remain by not requiring an obsolete key-value pair to be immediately removed from the KVS tree. Instead, the newer value, or a tombstone to indicate deletion, is placed in a newer kvset and will be found first, resulting in an accurate response to the query without regard to the older key-pair version still resident in the KVS tree.
- the search for key K may be performed by setting a current node to the root node. If either a key-value pair or a tombstone with key K is found in the current node then the search is complete and either the associated value or an indication of “key not found”, respectively, is returned as the result. If the key K is not found, the current node is set to the child of the node as determined by the key K and the key distribution method used for spill compaction.
- search is complete and an indication of “key not found” is the result. Otherwise, the search for the key K in the current node's kvsets is performed and the process repeats.
- a search for a key K in a KVS tree follows the same path through the KVS tree that every key-value pair or tombstone with key K takes as the result of spill compaction.
- FIG. 22 illustrates an example of a method 2200 for performing a key search, according to an embodiment.
- the operations of the method 2200 are implemented with electronic hardware, such as that described throughout at this application, including below with respect to FIG. 26 (e.g., circuits).
- a search request including a key is received.
- the root node is selected as the current node.
- the inspection starts with a query to the newest kvset of the current node.
- the method 2200 proceeds to result 2260 and otherwise proceeds to result 2235 .
- the method 2200 proceeds to operation 2245 and otherwise proceeds to decision 2250 .
- the method 2200 proceeds to the result 2260 and otherwise proceeds to the operation 2255 otherwise.
- a typical scan operation may include search for a range of keys in which the search specifies multiple keys to bound the range.
- the scan specifies a criterion and expects a result of all keys in the kvs tree that meet the criterion.
- FIG. 23 is a block diagram illustrating a key scan, according to an embodiment.
- the key scan or pure scan, identifies every kvset in every node of the KVS tree containing a key entry that meets the scan criterion (e.g., falls within a specified range). While the keystore of kvsets permits an efficient search for a particular key, to ensure that every key meeting the scan criterion is found, results in searching every kvset. However, due the key-sorted nature of key-value storage in kvsets, the scan may quickly determine, without looking at every key.
- the scan may quickly determine, without looking at every key.
- the keys are stored in kvsets in key-sorted order.
- a given key may be located in log time and keys within the range (e.g., a highest and lowest key in the range) may also be determined quickly.
- the example kvset meta data discussed above with respect to FIGS. 1-5 may be used to speed scanning even further. For example, if the kvset maintains a minimum and maximum key value contained within the kvset, the scan may quickly determine that no keys in the kvset meet a specified range. Similarly, maintaining a bloom filter of kvset keys may be used to quickly determine that certain keys are not in a given kvset's key store.
- the search-like scan described directly above may be improved when one realizes that visitation of every kvset in every node occurs in a scan.
- the kvsets may be read simultaneously.
- the simultaneous reading of all kvsets may result in a very large buffer (e.g., storage location for returned results).
- This may be mitigated by ability to quickly determine whether a given kvset has keys that meet the scan criterion (e.g., within a range).
- every kvset may be visited, but only those kvsets with keys that meet the criterion are read. This example is illustrated in FIG. 23 .
- the spill value is based on the first subkey of the keys.
- a specified prefix includes a value for the first subkey of the keys.
- the prefix scan may proceed by identifying every kvset in every node of the KVS tree containing a key-value pair or tombstone with a key starting with the specified prefix.
- the prefix scan does not visit every node of the KVS tree. Rather, the inspected nodes may be confined to those along the path determined by the spill value of the first subkey value which defines the prefix.
- a last subkey may be used for the spill value to effect a suffix scan.
- a specified suffix includes a value for the last subkey of the keys. Additional varieties of scan may be implemented based on the specific subkey used in the spill value calculation.
- the prefix scan completes by returning the result set.
- the amount of temporary storage capacity required for compaction, as well as the amount of read-amplification and write-amplification, may be proportional to the amount of key-value capacity at the tree-level being compacted—which is exacerbated by the fact that the key-value capacity of tree-levels in an LSM Tree is typically configured to grow exponentially at each tree-level deeper in the tree.
- searching for a key K involves searching only one node per tree-level, which represents only a small fraction of the total keys in the KVS tree.
- searching for a key K requires searching all keys in each level.
- KVS trees permits finding all keys that start with a specified prefix by searching only one node per tree-level, which represents only a small fraction of the total keys in the KVS tree.
- finding all keys that start with a specified prefix requires searching all keys in each level.
- FIG. 26 illustrates a block diagram of an example machine 2600 upon which any one or more of the techniques (e.g., methodologies) discussed herein may perform.
- the machine 2600 may operate as a standalone device or may be connected (e.g., networked) to other machines.
- the machine 2600 may operate in the capacity of a server machine, a client machine, or both in server-client network environments.
- the machine 2600 may act as a peer machine in peer-to-peer (P2P) (or other distributed) network environment.
- P2P peer-to-peer
- the machine 2600 may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- STB set-top box
- PDA personal digital assistant
- mobile telephone a web appliance
- network router network router, switch or bridge
- machine any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.
- machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein, such as cloud computing, software as a service (SaaS), other computer cluster configurations.
- SaaS software as a service
- the hardware of the circuitry may include variably connected physical components (e.g., execution units, transistors, simple circuits, etc.) including a computer readable medium physically modified (e.g., magnetically, electrically, through moveable placement of invariant massed particles, etc.) to encode instructions of the specific operation.
- a computer readable medium physically modified (e.g., magnetically, electrically, through moveable placement of invariant massed particles, etc.) to encode instructions of the specific operation.
- the instructions enable embedded hardware (e.g., the execution units or a loading mechanism) to create members of the circuitry in hardware via, the variable connections to carry out portions of the specific operation when in operation.
- the computer readable medium is communicatively coupled to the other components of the circuitry when the device is operating.
- any of the physical components may be used in more than one member of more than one circuitry.
- execution units may be used in a first circuit of a first circuitry at one point in time and reused by a second circuit in the first circuitry, or by a third circuit in a second circuitry at a different time.
- Machine 2600 may include a hardware processor 2602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 2604 and a static memory 2606 , some or all of which may communicate with each other via an interlink (e.g., bus) 2608 .
- the machine 2600 may further include a display unit 2610 , an alphanumeric input device 2612 (e.g., a keyboard), and a user interface (UI) navigation device 2614 (e.g., a mouse).
- the display unit 2610 , input device 2612 and UI navigation device 2614 may be a touch screen display.
- the machine 2600 may additionally include a storage device (e.g., drive unit) 2616 , a signal generation device 2618 (e.g., a speaker), a network interface device 2620 , and one or more sensors 2621 , such as a global positioning system (GPS) sensor, compass, accelerometer, or other sensor.
- the machine 2600 may include an output controller 2628 , such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
- a serial e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
- USB universal serial bus
- the storage device 2616 may include a machine readable medium 2622 on which is stored one or more sets of data structures or instructions 2624 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein.
- the instructions 2624 may also reside, completely or at least partially, within the main memory 2604 , within static memory 2606 , or within the hardware processor 2602 during execution thereof by the machine 2600 .
- one or any combination of the hardware processor 2602 , the main memory 2604 , the static memory 2606 , or the storage device 2616 may constitute machine readable media.
- machine readable medium 2622 is illustrated as a single medium, the term “machine readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 2624 .
- machine readable medium may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 2624 .
- machine readable medium may include any medium that is capable of storing, encoding, or carrying instructions for execution by the machine 2600 and that cause the machine 2600 to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions.
- Non-limiting machine readable medium examples may include solid-state memories, and optical and magnetic media.
- a massed machine readable medium comprises a machine readable medium with a plurality of particles having invariant (e.g., rest) mass. Accordingly, massed machine-readable media are not transitory propagating signals.
- massed machine readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- non-volatile memory such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices
- EPROM Electrically Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory devices e.g., electrically Erasable Programmable Read-Only Memory (EEPROM)
- EPROM Electrically Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory devices e.g., electrical
- the instructions 2624 may further be transmitted or received over a communications network 2626 using a transmission medium via the network interface device 2620 utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.).
- transfer protocols e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.
- Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, peer-to-peer (P2P) networks, among others.
- the network interface device 2620 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 2626 .
- the network interface device 2620 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques.
- SIMO single-input multiple-output
- MIMO multiple-input multiple-output
- MISO multiple-input single-output
- transmission medium shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine 2600 , and includes digital or analog communications signals or other intangible medium to facilitate communication of such software.
- Example 1 is a key-value data structure, organized as a tree, on at least one machine readable medium, the data structure comprising: a multiple of nodes, a node from the multiple of nodes including: a temporally ordered sequence of immutable key-value sets (kvsets); and a determinative mapping for a key-value pair in a kvset of the node to any one child node of the node, the key-value pair including one key and one value, the key being unique in the kvset.
- kvsets temporally ordered sequence of immutable key-value sets
- Example 2 the subject matter of Example 1, wherein the determinative mapping includes a portion of a hash of a portion of the key.
- Example 3 the subject matter of Example 2, wherein the portion of the key is the entire key.
- Example 4 the subject matter of any one or more of Examples 2-3, wherein the hash includes a multiple of non-overlapping portions including the portion of the hash.
- Example 5 the subject matter of Example 4, wherein each of the multiple of non-overlapping portions corresponds to a level of the tree.
- Example 6 the subject matter of Example 5, wherein the portion of the hash is determined from the multiple of non-overlapping portions by a level of the node.
- Example 7 the subject matter of Example 6, wherein a maximum number of child nodes for the node is defined by a size of the portion of the hash.
- Example 8 the subject matter of Example 7, wherein the size of the portion of the hash is a number of bits.
- Example 9 the subject matter of any one or more of Examples 1-8, wherein the kvset includes a key-tree to store key entries of key-value pairs of the kvset.
- Example 10 the subject matter of Example 9, wherein the keys are stored in leaf nodes of the key-tree.
- Example 11 the subject matter of any one or more of Examples 9-10, wherein a maximum key in any subtree of the key-tree is in a rightmost entry of a rightmost child.
- Example 12 the subject matter of any one or more of Examples 9-11, wherein a rightmost edge of a first node is linked to a sub-node, and wherein all keys in a subtree rooted at the sub-node are greater than all keys in the first node.
- Example 13 the subject matter of any one or more of Examples 1-12, wherein key entries of the kvset are stored in a set of key-blocks including a primary key-block and zero or more extension key-blocks, members of the set of key-blocks corresponding to media blocks for a storage medium, each key-block including a header to identify it as a key-block and wherein values are stored in a set of value-blocks, members of the set of value-blocks corresponding to media blocks for the storage medium, each value-block including a header to identify it as a value-block.
- Example 14 the subject matter of Example 13, wherein a value block includes storage section to one or more values without separation between values.
- Example 15 the subject matter of any one or more of Examples 13-14, wherein the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset.
- Example 16 the subject matter of any one or more of Examples 13-15, wherein the primary key-block includes a list of media block identifications for value-blocks in the set of value blocks.
- Example 17 the subject matter of any one or more of Examples 13-16, wherein the primary key-block includes a copy of a lowest key in a key-tree of the kvset, the lowest key determined by a pre-set sort-order of the tree.
- Example 18 the subject matter of any one or more of Examples 13-17, wherein the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree.
- Example 19 the subject matter of any one or more of Examples 13-18, wherein the primary key-block includes a header to a key-tree of the kvset.
- Example 20 the subject matter of any one or more of Examples 13-19, wherein the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- Example 21 the subject matter of any one or more of Examples 13-20, wherein the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- Example 22 the subject matter of any one or more of Examples 13-21, wherein the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- Example 23 the subject matter of any one or more of Examples 13-22, wherein the primary key-block includes a set of metrics for the kvset.
- Example 24 the subject matter of Example 23, wherein the set of metrics includes a total number of keys stored in the kvset.
- Example 25 the subject matter of any one or more of Examples 23-24, wherein the set of metrics includes a number of keys with tombstone values stored in the kvset.
- Example 26 the subject matter of any one or more of Examples 23-25, wherein the set of metrics includes a sum of all key lengths for keys stored in the kvset.
- Example 27 the subject matter of any one or more of Examples 23-26, wherein the set of metrics includes a sum of all key values for keys stored in the kvset.
- Example 28 the subject matter of any one or more of Examples 23-27, wherein the set of metrics includes an amount of unreferenced data in value-blocks of the kvset.
- Example 29 the subject matter of any one or more of Examples 1-28, wherein the tree includes a first root in a first computer readable medium of the at least one machine readable medium, and a second root in a second computer readable medium of the at least one computer readable medium; and wherein the second root is the only child to the first root.
- Example 30 the subject matter of Example 29, wherein the first computer readable medium is byte addressable and wherein the second computer readable medium is block addressable.
- Example 31 is a system comprising processing circuitry to: receive a key-value set (kvset) to store in a key-value data structure, organized as a tree, of at least one machine readable medium, the kvset including a mapping of unique keys to values, the keys and the values of the kvset being immutable, nodes of the tree including a temporally ordered sequence of kvsets; and write the kvset to a sequence of kvsets of a root-node of the tree.
- kvset key-value set
- Example 32 the subject matter of Example 31, wherein the processing circuitry is configured to: receive a key and a corresponding value to store in the key-value data structure; place the key and the value in a preliminary kvset, the preliminary kvset being mutable; and write the kvset to the key-value data structure when a metric is reached.
- Example 33 the subject matter of Example 32, wherein the metric is a size of a preliminary root node.
- Example 34 the subject matter of any one or more of Examples 32-33, wherein a rate of writing to the preliminary root node is beyond a threshold, and wherein the processing circuitry is configured to throttle write requests to the key-value data structure.
- Example 35 the subject matter of any one or more of Examples 32-34, wherein the metric is an elapsed time.
- Example 36 the subject matter of any one or more of Examples 31-35, wherein the processing circuitry is configured to: receive a second kvset; write the second kvset to the sequence of kvsets for the root-node; and perform a spill operation on the root node in response to a metric of the root-node exceeding a threshold in response to writing the second kvset.
- Example 37 the subject matter of Example 36, wherein to perform the spill operation the processing circuitry is configured to: select a subset of the sequence of kvsets, the subset including contiguous kvsets including an oldest kvset; calculate a child-mapping for each key in each kvset of the subset of kvsets, the child mapping being a determinative map from a parent node to a child node based on a particular key and a tree-level of the parent node; collect keys and corresponding values into kvsets based on the child-mapping with each kvset set mapped to exactly one child node; write the kvsets to a newest position in respective sequences of kvsets in respective child nodes; and remove the subset of kvsets from the root node.
- Example 38 the subject matter of Example 37, wherein to calculate the child-mapping, the processing circuitry is configured to: extract a portion of a key; derive a spill value from the portion of the key; and return a portion of the spill value based on the tree-level of the parent node.
- Example 39 the subject matter of Example 38, wherein the portion of the key is the entire key.
- Example 40 the subject matter of any one or more of Examples 38-39, wherein to derive the spill value from the portion of the key, the processing circuitry is configured to perform a hash of the portion of the key.
- Example 41 the subject matter of any one or more of Examples 38-40, wherein to return the portion of the spill value based on the tree-level of the parent node, the processing circuitry is configured to: apply a pre-set apportionment to the spill value, the pre-set apportionment defining the portions of the spill value that apply to respective levels of the tree; and return the portion of the spill value corresponding to the pre-set apportionment and the tree-level of the parent node.
- Example 42 the subject matter of Example 41, wherein the pre-set apportionment defines a maximum number of child nodes for at least some of the tree-levels.
- Example 43 the subject matter of any one or more of Examples 41-42, wherein the pre-set apportionment defines a maximum depth to the tree.
- Example 44 the subject matter of any one or more of Examples 41-43, wherein the pre-set apportionment defines a sequence of bit-counts, each bit-count specifying a number of bits, the sequence ordered from low tree-levels to high-tree levels such that the spill value portion for the lowest tree-level is equal to a number of bits equal to the first bit-count starting at the beginning of the spill value and the spill value portion for the n-th tree-level is equal to the n-th bit-count in the sequence of bit counts with an offset into the spill value of the sum of bit counts starting at the first bit-count and ending at a n minus one bit-count.
- Example 45 the subject matter of any one or more of Examples 36-44 optionally include the processing circuitry is configured to perform a second spill operation on a child node in response to a metric of the child node exceeding a threshold after operation of the spill operation.
- Example 46 the subject matter of any one or more of Examples 31-45, wherein the processing circuitry is configured to compact a node of the tree.
- Example 47 the subject matter of Example 46, wherein, to compact the node, the processing circuitry is configured to perform a key compaction, the key compaction including the processing circuitry to: select a subset of kvsets from a sequence of kvsets for the node, the subset of kvsets including contiguous kvsets; locate a set of collision keys, members of the set of collision keys including key entries in at least two kvsets in the sequence of kvsets for the node; add a most recent key entry for each member of the set of collision keys to a new kvset; add entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset; and replace the subset of kvsets with the new kvset by writing the new kvset to the node and removing the subset of kvsets.
- the key compaction including the processing circuitry to: select a subset of kvset
- Example 48 the subject matter of Example 47, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein, to write the most recent key entry for each member of the set of collision keys to the new kvset and to write entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset, the processing circuitry is configured to omit any key entries that include a tombstone.
- Example 49 the subject matter of any one or more of Examples 47-48, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein, to write the most recent key entry for each member of the set of collision keys to the new kvset and to write entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset, the processing circuitry is configured to omit any key entries that are expired.
- Example 50 the subject matter of any one or more of Examples 47-49, wherein, to compact the node, the processing circuitry is configured to perform a value compaction, wherein keys and values in a kvset are stored in separate addressable blocks, and wherein the value compaction includes the processing circuitry to copy values references in key entries for the new kvset to new blocks and deleting blocks corresponding to the subset of kvsets.
- Example 51 the subject matter of any one or more of Examples 46-50, wherein the node includes a child node, wherein compacting the node causes a metric to drop below a threshold, and wherein the processing circuitry is configured to perform a hoist compaction on the child node in response to the metric dropping below the threshold.
- Example 52 the subject matter of Example 51, wherein the hoist compaction includes the processing circuitry further configured to: perform a key and value compaction on the child node to produce a new kvset without writing the new kvset to the child node; and write the new kvset to the node in an oldest position for a sequence of kvsets of the node.
- Example 53 the subject matter of any one or more of Examples 46-52, wherein the compaction is performed in response to a trigger.
- Example 54 the subject matter of Example 53, wherein the trigger is an expiration of a time period.
- Example 55 the subject matter of any one or more of Examples 53-54, wherein the trigger is a metric of the node.
- Example 56 the subject matter of Example 55, wherein the metric is a total size of kvsets of the node.
- Example 57 the subject matter of any one or more of Examples 55-56, wherein the metric is a number of kvsets of the node.
- Example 58 the subject matter of any one or more of Examples 55-57, wherein the metric is a total size of unreferenced values.
- Example 59 the subject matter of any one or more of Exam pies 55-58, wherein the metric is a number of unreferenced values.
- Example 60 the subject matter of any one or more of Examples 31-59, wherein, when a ksvet is written to the at least one storage medium, the kvset is immutable.
- Example 61 the subject matter of Example 60, wherein key entries of the kvset are stored in a set of key-blocks including a primary key-block and zero or more extension key-blocks, members of the set of key-blocks corresponding to media blocks for the at least one storage medium, each key-block including a header to identify it as a key-block; and wherein values are stored in a set of value-blocks, members of the set of value-blocks corresponding to media blocks for the at least one storage medium, each value-block including a header to identify it as a value-block.
- Example 62 the subject matter of Example 61, wherein a value block includes storage section to one or more values without separation between values.
- Example 63 the subject matter of any one or more of Examples 61-62, wherein the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset.
- Example 64 the subject matter of any one or more of Examples 61-63, wherein the primary key-block includes a list of media block identifications for value-blocks in the set of value blocks.
- Example 65 the subject matter of any one or more of Examples 61-64, wherein the primary key-block includes a copy of a lowest key in a key-tree of the kvset, the lowest key determined by a pre-set sort-order of the tree.
- Example 66 the subject matter of any one or more of Examples 61-65, wherein the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree.
- Example 67 the subject matter of any one or more of Examples 61-66, wherein the primary key-block includes a header to a key-tree of the kvset.
- Example 68 the subject matter of any one or more of Examples 61-67, wherein the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- Example 69 the subject matter of any one or more of Examples 61-68, wherein the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- Example 70 the subject matter of any one or more of Examples 61-69, wherein the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- Example 71 the subject matter of any one or more of Examples 61-70, wherein the primary key-block includes a set of metrics for the kvset.
- Example 72 the subject matter of Example 71, wherein the set of metrics include a total number of keys stored in the kvset.
- Example 73 the subject matter of any one or more of Examples 71-72, wherein the set of metrics includes a number of keys with tombstone values stored in the kvset.
- Example 74 the subject matter of any one or more of Examples 71-73, wherein the set of metrics includes a sum of all key lengths for keys stored in the kvset.
- Example 75 the subject matter of any one or more of Examples 71-74, wherein the set of metrics includes a count of key values for all keys stored in the kvset.
- Example 76 the subject matter of any one or more of Examples 71-75, wherein the set of metrics includes an amount of unreferenced data in value-blocks of the kvset.
- Example 77 the subject matter of any one or more of Examples 31-76, wherein the processing circuitry is configured to: receive a search request including a search key; traverse the tree until at least one of the entire tree is traversed or a first instance of the search key is found in a kvset of a node of the tree, to traverse the tree including the processing circuitry to: begin at a root-node of the tree; for each node being traversed: examine kvsets of the node from newest kvset to oldest kvset return a found indication and cease the traversal when the search key is found; and continue the traversal to a child node when the search key is not found, the child node existing and identified by a spill value derived from the search key and a tree-level of the node being traversed.
- Example 78 the subject matter of Example 77, wherein the found indication includes a value corresponding to a key-entry of the search key in an examined kvset.
- Example 79 the subject matter of any one or more of Examples 77-78 optionally include returning a not found indication when the search key is not found after the traversal has ended.
- Example 80 the subject matter of Example 79, wherein the found indication is the same as the not found indication when the key-entry includes a tombstone.
- Example 81 the subject matter of any one or more of Examples 77-80, wherein to examine the kvsets includes, for a given kvset, the processing circuitry to use a bloom filter of the kvset to determine whether the search key might be in the kvset.
- Example 82 the subject matter of any one or more of Examples 77-81, wherein to examine the kvsets includes, for a given kvset, the processing circuitry to determine that the search key is less than or equal to a maximum key value of the kvset.
- Example 83 the subject matter of any one or more of Examples 77-82, wherein to examine the kvsets includes, for a given kvset, the processing circuitry to determine that the search key is greater than or equal to a minimum key value of the kvset.
- Example 84 the subject matter of any one or more of Examples 31-83, wherein the processing circuitry is configured to: receive a scan request including a key criterion; collect keys specified by the key criterion from each kvset of a node set from the tree into a found set; reduce the found set to a result set by keeping key-value pairs that correspond to a most recent entry for a key that is not a tombstone; and return the result set.
- Example 85 the subject matter of Example 84, wherein the node set includes every node in the tree.
- Example 86 the subject matter of any one or more of Examples 84-85, wherein the criterion is a key prefix, and wherein the node-set includes each node that corresponds to the key prefix.
- Example 87 the subject matter of Example 86, wherein node correspondence to the key prefix is determined by a portion of a spill value derived from the key prefix, the portion of the spill value determined by a tree-level of a given node.
- Example 88 the subject matter of any one or more of Examples 84-87, wherein the criterion is a range.
- Example 89 is at least one machine readable medium including instructions that, when executed by processing circuitry, cause the machine to perform operations comprising: receiving a key-value set (kvset) to store in a key-value data structure, organized as a tree, of at least one machine readable medium, the kvset including a mapping of unique keys to values, the keys and the values of the kvset being immutable, nodes of the tree including a temporally ordered sequence of kvsets; and writing the kvset to a sequence of kvsets of a root-node of the tree.
- kvset key-value set
- Example 90 the subject matter of Example 89, wherein the operations comprise: receiving a key and a corresponding value to store in the key-value data structure; placing the key and the value in a preliminary kvset, the preliminary kvset being mutable; and writing the kvset to the key-value data structure when a metric is reached.
- Example 91 the subject matter of Example 90, wherein the metric is a size of a preliminary root node.
- Example 92 the subject matter of any one or more of Examples 90-91, wherein a rate of writing to the preliminary root node is beyond a threshold, and wherein the operations comprise throttling write requests to the key-value data structure.
- Example 93 the subject matter of any one or more of Examples 90-92, wherein the metric is an elapsed time.
- Example 94 the subject matter of any one or more of Examples 89-93, wherein the operations comprise: receiving a second kvset; writing the second kvset to the sequence of kvsets for the root-node; and performing a spill operation on the root node in response to a metric of the root-node exceeding a Threshold in response to writing the second kvset.
- Example 95 the subject matter of Example 94, wherein the spill operation includes: selecting a subset of the sequence of kvsets, the subset including contiguous kvsets including an oldest kvset; calculating a child-mapping for each key in each kvset of the subset of kvsets, the child mapping being a determinative map from a parent node to a child node based on a particular key and a tree-level of the parent node; collecting keys and corresponding values into kvsets based on the child-mapping with each kvset set mapped to exactly one child node; writing the kvsets to a newest position in respective sequences of kvsets in respective child nodes; and removing the subset of kvsets from the root node.
- Example 96 the subject matter of Example 95, wherein calculating the child-mapping includes: extracting a portion of a key; deriving a spill value from the portion of the key; and returning a portion of the spill value based on the tree-level of the parent node.
- Example 97 the subject matter of Example 96, wherein the portion of the key is the entire key.
- Example 98 the subject matter of any one or more of Examples 96-97, wherein deriving the spill value from the portion of the key includes performing a hash of the portion of the key.
- Example 99 the subject matter of any one or more of Examples 96-98, wherein returning the portion of the spill value based on the tree-level of the parent node includes: applying a pre-set apportionment to the spill value, the pre-set apportionment defining the portions of the spill value that apply to respective levels of the tree; and returning the portion of the spill value corresponding to the pre-set apportionment and the tree-level of the parent node.
- Example 100 the subject matter of Example 99, wherein the pre-set apportionment defines a maximum number of child nodes for at least some of the tree-levels.
- Example 101 the subject matter of any one or more of Examples 99-100, wherein the pre-set apportionment defines a maximum depth to the tree.
- Example 102 the subject matter of any one or more of Examples 99-101, wherein the pre-set apportionment defines a sequence of bit-counts, each bit-count specifying a number of bits, the sequence ordered from low tree-levels to high-tree levels such that the spill value portion for the lowest tree-level is equal to a number of bits equal to the first bit-count starting at the beginning of the spill value and the spill value portion for the n-th tree-level is equal to the n-th bit-count in the sequence of bit counts with an offset into the spill value of the sum of bit counts starting at the first bit-count and ending at a n minus one bit-count.
- Example 103 the subject matter of any one or more of Examples 94-102, wherein the operations comprise performing a second spill operation on a child node in response to a metric of the child node exceeding a threshold after operation of the spill operation.
- Example 104 the subject matter of any one or more of Examples 89-103, wherein the operations comprise compacting a node of the tree.
- Example 105 the subject matter of Example 104, wherein compacting the node includes performing a key compaction, the key compaction including: selecting a subset of kvsets from a sequence of kvsets for the node, the subset of kvsets including contiguous kvsets locating a set of collision keys, members of the set of collision keys including key entries in at least two kvsets in the sequence of kvsets for the node; adding a most recent key entry for each member of the set of collision keys to a new kvset; adding entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset; and replacing the subset of kvsets with the new kvset by writing the new kvset to the node and removing the subset of kvsets.
- the key compaction including: selecting a subset of kvsets from a sequence of kvsets for the node,
- Example 106 the subject matter of Example 105, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that include a tombstone.
- Example 107 the subject matter of any one or more of Examples 105-106, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that are expired.
- Example 108 the subject matter of any one or more of Examples 105-107, wherein compacting the node includes performing a value compaction, wherein keys and values in a kvset are stored in separate addressable blocks, and wherein the value compaction includes copying values references in key entries for the new kvset to new blocks and deleting blocks corresponding to the subset of kvsets.
- Example 109 the subject matter of any one or more of Examples 104-108, wherein the node includes a child node, wherein compacting the node causes a metric to drop below a threshold, and wherein the operations comprise performing a hoist compaction on the child node in response to the metric dropping below the threshold.
- Example 110 the subject matter of Example 109, wherein the hoist compaction includes: performing a key and value compaction on the child node to produce a new kvset without writing the new kvset to the child node; and writing the new kvset to the node in an oldest position for a sequence of kvsets of the node.
- Example 111 the subject matter of any one or more of Examples 104-110, wherein the compacting is performed in response to a trigger.
- Example 112 the subject matter of Example 111, wherein the trigger is an expiration of a time period.
- Example 113 the subject matter of any one or more of Examples 111-112, wherein the trigger is a metric of the node.
- Example 114 the subject matter of Example 113, wherein the metric is a total size of kvsets of the node.
- Example 115 the subject matter of any one or more of Examples 113-114, wherein the metric is a number of kvsets of the node.
- Example 116 the subject matter of any one or more of Examples 113-115, wherein the metric is a total size of unreferenced values.
- Example 117 the subject matter of any one or more of Examples 113-116, wherein the metric is a number of unreferenced values.
- Example 118 the subject matter of any one or more of Examples 89-117, wherein, when a ksvet is written to the at least one storage medium, the kvset is immutable.
- Example 119 the subject matter of Example 118, wherein key entries of the kvset are stored in a set of key-blocks including a primary key-block and zero or more extension key-blocks, members of the set of key-blocks corresponding to media blocks for the at least one storage medium, each key-block including a header to identify it as a key-block; and wherein values are stored in a set of value-blocks, members of the set of value-blocks corresponding to media blocks for the at least one storage medium, each value-block including a header to identify it as a value-block.
- Example 120 the subject matter of Example 119, wherein a value block includes storage section to one or more values without separation between values.
- Example 121 the subject matter of any one or more of Examples 119-120, wherein the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset.
- Example 122 the subject matter of any one or more of Examples 119-121, wherein the primary key-block includes a list of media block identifications for value-blocks in the set of value blocks.
- Example 123 the subject matter of any one or more of Examples 119-122, wherein the primary key-block includes a copy of a lowest key in a key-tree of the kvset, the lowest key determined by a pre-set sort-order of the tree.
- Example 124 the subject matter of any one or more of Examples 119-123, wherein the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree.
- Example 125 the subject matter of any one or more of Examples 119-124, wherein the primary key-block includes a header to a key-tree of the kvset.
- Example 126 the subject matter of any one or more of Examples 119-125, wherein the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- Example 127 the subject matter of any one or more of Examples 119-126, wherein the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- Example 128 the subject matter of any one or more of Examples 119-127, wherein the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- Example 129 the subject matter of any one or more of Examples 119-128, wherein the primary key-block includes a set of metrics for the kvset.
- Example 130 the subject matter of Example 129, wherein the set of metrics include a total number of keys stored in the kvset.
- Example 131 the subject matter of any one or more of Examples 129-130, wherein the set of metrics include a number of keys with tombstone values stored in the kvset.
- Example 132 the subject matter of any one or more of Examples 129-131, wherein the set of metrics include a sum of all key lengths for keys stored in the kvset.
- Example 133 the subject matter of any one or more of Examples 129-132, wherein the set of metrics include a count of key values for all keys stored in the kvset.
- Example 134 the subject matter of any one or more of Examples 129-133, wherein the set of metrics include an amount of unreferenced data in value-blocks of the kvset.
- Example 135 the subject matter of any one or more of Examples 89-134, wherein the operations comprise: receiving a search request including a search key; traversing the tree until at least one of the entire tree is traversed or a first instance of the search key is found in a kvset of a node of the tree, traversing the tree including: beginning at a root-node of the tree; for each node being traversed: examining kvsets of the node from newest kvset to oldest kvset returning a found indication and cease the traversal when the search key is found; and continuing the traversal to a child node when the search key is not found, the child node existing and identified by a spill value derived from the search key and a tree-level of the node being traversed.
- Example 136 the subject matter of Example 135, wherein the found indication includes a value corresponding to a key-entry of the search key in an examined kvset.
- Example 137 the subject matter of any one or more of Examples 135-136 optionally include returning a not found indication when the search key is not found after the traversal has ended.
- Example 138 the subject matter of Example 137, wherein the found indication is the same as the not found indication when the key-entry includes a tombstone.
- Example 139 the subject matter of any one or more of Examples 135-138, wherein examining the kvsets includes, for a given kvset, using a bloom filter of the kvset to determine whether the search key might be in the kvset.
- Example 140 the subject matter of any one or more of Examples 135-139, wherein examining the kvsets includes, for a given kvset, determining that the search key is less than or equal to a maximum key value of the kvset.
- Example 141 the subject matter of any one or more of Examples 135-140, wherein examining the kvsets includes, for a given kvset, determining that the search key is greater than or equal to a minimum key value of the kvset.
- Example 142 the subject matter of any one or more of Examples 89-141, wherein the operations comprise: receiving a scan request including a key criterion; collecting keys specified by the key criterion from each kvset of a node set from the tree into a found set; reducing the found set to a result set by keeping key-value pairs that correspond to a most recent entry for a key that is not a tombstone; and returning the result set.
- Example 143 the subject matter of Example 142, wherein the node set includes every node in the tree.
- Example 144 the subject matter of any one or more of Examples 142-143, wherein the key criterion is a key prefix, and wherein the node-set includes each node that corresponds to the key prefix.
- Example 145 the subject matter of Example 144, wherein node correspondence to the key prefix is determined by a portion of a spill value derived from the key prefix, the portion of the spill value determined by a tree-level of a given node.
- Example 146 the subject matter of any one or more of Examples 142-145, wherein the criterion is a range.
- Example 147 is a method comprising: receiving a key-value set (kvset) to store in a key-value data structure, organized as a tree, of at least one machine readable medium, the kvset including a mapping of unique keys to values, the keys and the values of the kvset being immutable, nodes of the tree including a temporally ordered sequence of kvsets; and writing the kvset to a sequence of kvsets of a root-node of the tree.
- kvset key-value set
- Example 148 the subject matter of Example 147 optionally includes receiving a key and a corresponding value to store in the key-value data structure; placing the key and the value in a preliminary kvset, the preliminary kvset being mutable; and writing the kvset to the key-value data structure when a metric is reached.
- Example 149 the subject matter of Example 148, wherein the metric is a size of a preliminary root node.
- Example 150 the subject matter of any one or more of Examples 148-149, wherein a rate of writing to the preliminary root node is beyond a threshold, and comprising throttling write requests to the key-value data structure.
- Example 151 the subject matter of any one or more of Examples 148-150, wherein the metric is an elapsed time.
- Example 152 the subject matter of any one or more of Examples 147-151 optionally include receiving a second kvset; writing the second kvset to the sequence of kvsets for the root-node; and performing a spill operation on the root node in response to a metric of the root-node exceeding a threshold in response to writing the second kvset.
- Example 153 the subject matter of Example 152, wherein the spill operation includes: selecting a subset of the sequence of kvsets, the subset including contiguous kvsets including an oldest kvset; calculating a child-mapping for each key in each kvset of the subset of kvsets, the child mapping being a determinative map from a parent node to a child node based on a particular key and a tree-level of the parent node; collecting keys and corresponding values into kvsets based on the child-mapping with each kvset set mapped to exactly one child node; writing the kvsets to a newest position in respective sequences of kvsets in respective child nodes; and removing the subset of kvsets from the root node.
- Example 154 the subject matter of Example 153, wherein calculating the child-mapping includes: extracting a portion of a key; deriving a spill value from the portion of the key; and returning a portion of the spill value based on the tree-level of the parent node.
- Example 155 the subject matter of Example 154, wherein the portion of the key is the entire key.
- Example 156 the subject matter of any one or more of Examples 154-155, wherein deriving the spill value from the portion of the key includes performing a hash of the portion of the key.
- Example 157 the subject matter of any one or more of Examples 154-156, wherein returning the portion of the spill value based on the tree-level of the parent node includes: applying a pre-set apportionment to the spill value, the pre-set apportionment defining the portions of the spill value that apply to respective levels of the tree; and returning the portion of the spill value corresponding to the pre-set apportionment and the tree-level of the parent node.
- Example 158 the subject matter of Example 157, wherein the pre-set apportionment defines a maximum number of child nodes for at least some of the tree-levels.
- Example 159 the subject matter of any one or more of Examples 157-158, wherein the pre-set apportionment defines a maximum depth to the tree.
- Example 160 the subject matter of any one or more of Examples 157-159, wherein the pre-set apportionment defines a sequence of bit-counts, each bit-count specifying a number of hits, the sequence ordered from low tree-levels to high-tree levels such that the spill value portion for the lowest tree-level is equal to a number of bits equal to the first bit-count starting at the beginning of the spill value and the spill value portion for the n-th tree-level is equal to the n-th bit-count in the sequence of bit counts with an offset into the spill value of the sum of bit counts starting at the first bit-count and ending at a n minus one bit-count.
- Example 161 the subject matter of any one or more of Examples 152-160 optionally include performing a second spill operation on a child node in response to a metric of the child node exceeding a threshold after operation of the spill operation.
- Example 162 the subject matter of any one or more of Examples 147-161 optionally include compacting a node of the tree.
- Example 163 the subject matter of Example 162, wherein compacting the node includes performing a key compaction, the key compaction including: selecting a subset of kvsets from a sequence of kvsets for the node, the subset of kvsets including contiguous kvsets; locating a set of collision keys, members of the set of collision keys including key entries in at least two kvsets in the sequence of kvsets for the node; adding a most recent key entry for each member of the set of collision keys to a new kvset; adding entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset; and replacing the subset of kvsets with the new kvset by writing the new kvset to the node and removing the subset of kvsets.
- the key compaction including: selecting a subset of kvsets from a sequence of kvsets for the node,
- Example 164 the subject matter of Example 163, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that include a tombstone.
- Example 165 the subject matter of any one or more of Examples 163-164, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset includes omitting any key entries that are expired.
- Example 166 the subject matter of any one or more of Examples 163-165, wherein compacting the node includes performing a value compaction, wherein keys and values in a kvset are stored in separate addressable blocks, and wherein the value compaction includes copying values references in key entries for the new kvset to new blocks and deleting blocks corresponding to the subset of kvsets.
- Example 167 the subject matter of any one or more of Examples 162-166, wherein the node includes a child node, wherein compacting the node causes a metric to drop below a threshold, and comprising performing a hoist compaction on the child node in response to the metric dropping below the threshold.
- Example 168 the subject matter of Example 167, wherein the hoist compaction includes: performing a key and value compaction on the child node to produce a new kvset without writing the new kvset to the child node; and writing the new kvset to the node in an oldest position for a sequence of kvsets of the node.
- Example 169 the subject matter of any one or more of Examples 162-168, wherein the compacting is performed in response to a trigger.
- Example 170 the subject matter of Example 169, wherein the trigger is an expiration of a time period.
- Example 171 the subject matter of any one or more of Examples 169-170, wherein the trigger is a metric of the node.
- Example 172 the subject matter of Example 171, wherein the metric is a total size of kvsets of the node.
- Example 173 the subject matter of any one or more of Examples 171-172, wherein the metric is a number of kvsets of the node.
- Example 174 the subject matter of any one or more of Examples 171-173, wherein the metric is a total size of unreferenced values.
- Example 175 the subject matter of any one or more of Examples 171-174, wherein the metric is a number of unreferenced values.
- Example 176 the subject matter of any one or more of Examples 147-175, wherein, when a ksvet is written to the at least one storage medium, the kvset is immutable.
- Example 177 the subject matter of Example 176, wherein key entries of the kvset are stored in a set of key-blocks including a primary key-block and zero or more extension key-blocks, members of the set of key-blocks corresponding to media blocks for the at least one storage medium, each key-block including a header to identify it as a key-block; and wherein values are stored in a set of value-blocks, members of the set of value-blocks corresponding to media blocks for the at least one storage medium, each value-block including a header to identify it as a value-block.
- Example 178 the subject matter of Example 177, wherein a value block includes storage section to one or more values without separation between values.
- Example 179 the subject matter of any one or more of Examples 177-178, wherein the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset.
- Example 180 the subject matter of any one or more of Examples 177-179, wherein the primary key-block includes a list of media block identifications for value-blocks in the set of value blocks.
- Example 181 the subject matter of any one or more of Examples 177-180, wherein the primary key-block includes a copy of a lowest key in a key-tree of the kvset, the lowest key determined by a pre-set sort-order of the tree.
- Example 182 the subject matter of any one or more of Examples 177-181, wherein the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree.
- Example 183 the subject matter of any one or more of Examples 177-182, wherein the primary key-block includes a header to a key-tree of the kvset.
- Example 184 the subject matter of any one or more of Examples 177-183, wherein the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- Example 185 the subject matter of any one or more of Examples 177-184, wherein the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- Example 186 the subject matter of any one or more of Examples 177-185, wherein the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- Example 187 the subject matter of any one or more of Examples 177-186, wherein the primary key-block includes a set of metrics for the kvset.
- Example 188 the subject matter of Example 187, wherein the set of metrics include a total number of keys stored in the kvset.
- Example 189 the subject matter of any one or more of Examples 187-188, wherein the set of metrics include a number of keys with tombstone values stored in the kvset.
- Example 190 the subject matter of any one or more of Examples 187-189, wherein the set of metrics include a sum of all key lengths for keys stored in the kvset.
- Example 191 the subject matter of any one or more of Examples 187-190, wherein the set of metrics include a count of key values for all keys stored in the kvset.
- Example 192 the subject matter of any one or more of Examples 187-191, wherein the set of metrics include an amount of unreferenced data in value-blocks of the kvset.
- Example 193 the subject matter of any one or more of Examples 147-192 optionally include receiving a search request including a search key; traversing the tree until at least one of the entire tree is traversed or a first instance of the search key is found in a kvset of a node of the tree, traversing the tree including: beginning at a root-node of the tree; for each node being traversed: examining kvsets of the node from newest kvset to oldest kvset returning a found indication and cease the traversal when the search key is found; and continuing the traversal to a child node when the search key is not found, the child node existing and identified by a spill value derived from the search key and a tree-level of the node being traversed.
- Example 194 the subject matter of Example 193, wherein the found indication includes a value corresponding to a key-entry of the search key in an examined kvset.
- Example 195 the subject matter of any one or more of Examples 193-194 optionally include returning a not found indication when the search key is not found after the traversal has ended.
- Example 196 the subject matter of Example 195, wherein the found indication is the same as the not found indication when the key-entry includes a tombstone.
- Example 197 the subject matter of any one or more of Examples 193-196, wherein to examine the kvsets includes, for a given kvset, using a bloom filter of the kvset to determine whether the search key might be in the kvset.
- Example 198 the subject matter of any one or more of Examples 193-197, wherein to examine the kvsets includes, for a given kvset, determining that the search key is less than or equal to a maximum key value of the kvset.
- Example 199 the subject matter of any one or more of Examples 193-198, wherein to examine the kvsets includes, for a given kvset, determining that the search key is greater than or equal to a minimum key value of the kvset.
- Example 200 the subject matter of any one or more of Examples 147-199 optionally include receiving a scan request including a key criterion; collecting keys specified by the key criterion from each kvset of a node set from the tree into a found set; reducing the found set to a result set by keeping key-value pairs that correspond to a most recent entry for a key that is not a tombstone; and returning the result set.
- Example 201 the subject matter of Example 200, wherein the node set includes every node in the tree.
- Example 202 the subject matter of any one or more of Examples 200-201, wherein the criterion is a key prefix, and wherein the node-set includes each node that corresponds to the key prefix.
- Example 203 the subject matter of Example 202, wherein node correspondence to the key prefix is determined by a portion of a spill value derived from the key prefix, the portion of the spill value determined by a tree-level of a given node.
- Example 204 the subject matter of any one or more of Examples 200-203, wherein the criterion is a range.
- Example 205 is a system comprising means to perform any method of Examples 147-204.
- Example 206 is at least one machine readable medium including instructions that, when executed by processing circuitry, cause the processing circuitry to perform any method of Examples 147-204.
- Example 207 is a system comprising: means for receiving a key-value set (kvset) to store in a key-value data structure, organized as a tree, of at least one machine readable medium, the kvset including a mapping of unique keys to values, the keys and the values of the kvset being immutable, nodes of the tree including a temporally ordered sequence of kvsets; and means for writing the kvset to a sequence of kvsets of a root-node of the tree.
- kvset key-value set
- Example 208 the subject matter of Example 207 optionally includes means for receiving a key and a corresponding value to store in the key-value data structure; means for placing the key and the value in a preliminary kvset, the preliminary kvset being mutable; and means for writing the kvset to the key-value data structure when a metric is reached.
- Example 209 the subject matter of Example 208, wherein the metric is a size of a preliminary root node.
- Example 210 the subject matter of any one or more of Examples 208-209, wherein a rate of writing to the preliminary root node is beyond a threshold, and comprising means for throttling write requests to the key-value data structure.
- Example 212 the subject matter of any one or more of Examples 207-211 optionally include means for receiving a second kvset; means for writing the second kvset to the sequence of kvsets for the root-node; and means for performing a spill operation on the root node in response to a metric of the root-node exceeding a threshold in response to writing the second kvset.
- Example 213 the subject matter of Example 212, wherein the spill operation includes: means for selecting a subset of the sequence of kvsets, the subset including contiguous kvsets including an oldest kvset; means for calculating a child-mapping for each key in each kvset of the subset of kvsets, the child mapping being a determinative map from a parent node to a child node based on a particular key and a tree-level of the parent node; means for collecting keys and corresponding values into kvsets based on the child-mapping with each kvset set mapped to exactly one child node; means for writing the kvsets to a newest position in respective sequences of kvsets in respective child nodes; and means for removing the subset of kvsets from the root node.
- Example 214 the subject matter of Example 213, wherein the means for calculating the child-mapping includes: means for extracting a portion of a key; means for deriving a spill value from the portion of the key; and means for returning a portion of the spill value based on the tree-level of the parent node.
- Example 215 the subject matter of Example 214, wherein the portion of the key is the entire key.
- Example 216 the subject matter of any one or more of Examples 214-215, wherein the means for deriving the spill value from the portion of the key includes means for performing a hash of the portion of the key.
- Example 217 the subject matter of any one or more of Examples 214-216, wherein the means for returning the portion of the spill value based on the tree-level of the parent node includes: means for applying a pre-set apportionment to the spill value, the pre-set apportionment defining the portions of the spill value that apply to respective levels of the tree; and means for returning the portion of the spill value corresponding to the pre-set apportionment and the tree-level of the parent node.
- Example 218 the subject matter of Example 217, wherein the pre-set apportionment defines a maximum number of child nodes for at least some of the tree-levels.
- Example 219 the subject matter of any one or more of Examples 217-218, wherein the pre-set apportionment defines a maximum depth to the tree.
- Example 220 the subject matter of any one or more of Examples 217-219, wherein the pre-set apportionment defines a sequence of bit-counts, each bit-count specifying a number of bits, the sequence ordered from low tree-levels to high-tree levels such that the spill value portion for the lowest tree-level is equal to a number of bits equal to the first bit-count starting at the beginning of the spill value and the spill value portion for the n-th tree-level is equal to the n-th bit-count in the sequence of bit counts with an offset into the spill value of the sum of bit counts starting at the first bit-count and ending at a n minus one bit-count.
- Example 221 the subject matter of any one or more of Examples 212-220 optionally include means for performing a second spill operation on a child node in response to a metric of the child node exceeding a threshold after operation of the spill operation.
- Example 222 the subject matter of any one or more of Examples 207-221 optionally include means for compacting a node of the tree.
- Example 223 the subject matter of Example 222, wherein the means for compacting the node include means for performing a key compaction, the key compaction including: selecting a subset of kvsets from a sequence of kvsets for the node, the subset of kvsets including contiguous kvsets; locating a set of collision keys, members of the set of collision keys including key entries in at least two kvsets in the sequence of kvsets for the node; adding a most recent key entry for each member of the set of collision keys to a new kvset; adding entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset; and replacing the subset of kvsets with the new kvset by writing the new kvset to the node and removing the subset of kvsets.
- the key compaction including: selecting a subset of kvsets from a sequence of kvset
- Example 224 the subject matter of Example 223, wherein the node has no children, wherein the subset of kvsets includes the oldest kvset, and wherein the means for writing the most recent key entry for each member of the set of collision keys to the new kvset and writing entries for each key in members of the subset of kvsets that are not in the set of collision keys to the new kvset include means for omitting any key entries that include a tombstone.
- Example 226 the subject matter of any one or more of Examples 223-225, wherein the means for compacting the node includes performing a value compaction, wherein keys and values in a kvset are stored in separate addressable blocks, and wherein the value compaction includes copying values references in key entries for the new kvset to new blocks and deleting blocks corresponding to the subset of kvsets.
- Example 227 the subject matter of any one or more of Examples 222-226, wherein the node includes a child node, wherein the means for compacting the node causes a metric to drop below a threshold, and comprising performing a hoist compaction on the child node in response to the metric dropping below the threshold.
- Example 228 the subject matter of Example 227, wherein the hoist compaction includes: performing a key and value compaction on the child node to produce a new kvset without writing the new kvset to the child node; and writing the new kvset to the node in an oldest position for a sequence of kvsets of the node.
- Example 229 the subject matter of any one or more of Examples 222-228, wherein the compacting is performed in response to a trigger.
- Example 230 the subject matter of Example 229, wherein the trigger is an expiration of a time period.
- Example 231 the subject matter of any one or more of Examples 229-230, wherein the trigger is a metric of the node.
- Example 232 the subject matter of Example 231, wherein the metric is a total size of kvsets of the node.
- Example 233 the subject matter of any one or more of Examples 231-232, wherein the metric is a number of kvsets of the node.
- Example 234 the subject matter of any one or more of Examples 231-233, wherein the metric is a total size of unreferenced values.
- Example 235 the subject matter of any one or more of Examples 231-234, wherein the metric is a number of unreferenced values.
- Example 236 the subject matter of any one or more of Examples 207-235, wherein, when a ksvet is written to the at least one storage medium, the kvset is immutable.
- Example 237 the subject matter of Example 236, wherein key entries of the kvset are stored in a set of key-blocks including a primary key-block and zero or more extension key-blocks, members of the set of key-blocks corresponding to media blocks for the at least one storage medium, each key-block including a header to identify it as a key-block; and wherein values are stored in a set of value-blocks, members of the set of value-blocks corresponding to media blocks for the at least one storage medium, each value-block including a header to identify it as a value-block.
- Example 238 the subject matter of Example 237, wherein a value block includes storage section to one or more values without separation between values.
- Example 239 the subject matter of any one or more of Examples 237-238, wherein the primary key-block includes a list of media block identifications for the one or more extension key-blocks of the kvset.
- Example 241 the subject matter of any one or more of Examples 237-240, wherein the primary key-block includes a copy of a lowest key in a key-tree of the kvset, the lowest key determined by a pre-set sort-order of the tree.
- Example 242 the subject matter of any one or more of Examples 237-241, wherein the primary key-block includes a copy of a highest key in a key-tree of the kvset, the highest key determined by a pre-set sort-order of the tree.
- Example 243 the subject matter of any one or more of Examples 237-242, wherein the primary key-block includes a header to a key-tree of the kvset.
- Example 244 the subject matter of any one or more of Examples 237-243, wherein the primary key-block includes a list of media block identifications for a key-tree of the kvset.
- Example 245 the subject matter of any one or more of Examples 237-244, wherein the primary key-block includes a bloom filter header for a bloom filter of the kvset.
- Example 246 the subject matter of any one or more of Examples 237-245, wherein the primary key-block includes a list of media block identifications for a bloom filter of the kvset.
- Example 247 the subject matter of any one or more of Examples 237-246, wherein the primary key-block includes a set of metrics for the kvset.
- Example 249 the subject matter of any one or more of Examples 247-248, wherein the set of metrics includes a number of keys with tombstone values stored in the kvset.
- Example 250 the subject matter of any one or more of Examples 247-249, wherein the set of metrics includes a sum of all key lengths for keys stored in the kvset.
- Example 251 the subject matter of any one or more of Examples 247-250, wherein the set of metrics includes a count of key values for all keys stored in the kvset.
- Example 252 the subject matter of any one or more of Examples 247-251, wherein the set of metrics includes an amount of unreferenced data in value-blocks of the kvset.
- Example 253 the subject matter of any one or more of Examples 207-252 optionally include means for receiving a search request including a search key; means for traversing the tree until at least one of the entire tree is traversed or a first instance of the search key is found in a kvset of a node of the tree, traversing the tree including: beginning at a root-node of the tree; and for each node being traversed: examining kvsets of the node from newest kvset to oldest kvset returning a found indication and cease the traversal when the search key is found; and continuing the traversal to a child node when the search key is not found, the child node existing and identified by a spill value derived from the search key and a tree-level of the node being traversed.
- Example 254 the subject matter of Example 253, wherein the found indication includes a value corresponding to a key-entry of the search key in an examined kvset.
- Example 255 the subject matter of any one or more of Examples 253-254 optionally include means for returning a not found indication when the search key is not found after the traversal has ended.
- Example 256 the subject matter of Example 255, wherein the found indication is the same as the not found indication when the key-entry includes a tombstone.
- Example 257 the subject matter of any one or more of Examples 253-256, wherein examining the kvsets includes, for a given kvset, using a bloom filter of the kvset to determine whether the search key might be in the kvset.
- Example 258 the subject matter of any one or more of Examples 253-257, wherein examining the kvsets includes, for a given kvset, determining that the search key is less than or equal to a maximum key value of the kvset.
- Example 259 the subject matter of any one or more of Examples 253-258, wherein examining the kvsets includes, for a given kvset, determining that the search key is greater than or equal to a minimum key value of the kvset.
- Example 260 the subject matter of any one or more of Examples 207-259 optionally include means for receiving a scan request including a key criterion; means for collecting keys specified by the key criterion from each kvset of a node set from the tree into a found set; means for reducing the found set to a result set by keeping key-value pairs that correspond to a most recent entry for a key that is not a tombstone; and means for returning the result set.
- Example 261 the subject matter of Example 260, wherein the node set includes every node in the tree.
- Example 262 the subject matter of any one or more of Examples 260-261, wherein the criterion is a key prefix, and wherein the node-set includes each node that corresponds to the key prefix.
- Example 263 the subject matter of Example 262, wherein node correspondence to the key prefix is determined by a portion of a spill value derived from the key prefix, the portion of the spill value determined by a tree-level of a given node.
- Example 264 the subject matter of any one or more of Examples 260-263, wherein the criterion is a range.
- the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.”
- the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
-
- DID is a unique device identifier for a storage device.
- SID is a stream identifier for a stream on a given storage device.
- TEMPSET is a finite set of temperature values.
- TEMP is an element of TEMPSET.
- FID is a unique forest identifier for a collection of KVS trees.
- TID is a unique tree identifier for a KVS tree. The
KVS tree 100 has a TID. - LNUM is a level number in a given KVS tree, where, for convenience, the root node of the KVS tree is considered to be at tree-
level 0, the child nodes of the root node (if any) are considered to be at tree-level 1, and so on. Thus, as illustrated,KVS tree 100 includes tree-levels L0 (including node 110) through L3. - NNUM is a number for a given node at a given level in a given KVS tree, where, for convenience, NNUM may be a number in the range zero through (NodeCount(LNUM)−1), where NodeCount(LNUM) is the total number of nodes at a tree-level LNUM, such that every node in the
KVS tree 100 is uniquely identified by the tuple (LNUM, NNUM). As illustrated inFIG. 1 , the complete listing of node tuples, starting atnode 110 and progressing top-to-bottom, left-to-right, would be:- L0 (root): (0,0)
- L1: (1,0), (1,1), (1,2), (1,3), (1,4)
- L2: (2,0), (2,1), (2,2), (2,3)
- L3: (3,0), (3,1), (3,2), (3,3)
- KVSETID is a unique kvset identifier.
- WTYPE is the value KBLOCK or VBLOCK as discussed below.
- WLAST is a Boolean value (TRUE or FALSE) as discussed below.
-
- A) KVSETID of the kvset being written;
- B) DID for the storage device;
- C) FID for the forest to which the KVS tree belongs;
- D) To for the KVS tree;
- E) LNUM of the node in the KVS tree containing the kvset;
- F) NNUM of the node in the KVS tree containing the kvset;
- G) WTYPE is KBLOCK if the write command is for a key-block KVSETID on DID, or is VBLOCK if the write command is for a value-block for KVSETID on DID
- H) WLAST is TRUE if the write command is the last for a KVSETID on DID, and is FALSE otherwise
In an example, for each such write command, the tuple (DID, RD, TID, LNUM, NNUM, KVSETID, WTYPE, WLAST)—referred to as a stream-mapping tuple—may be sent to the stream-mapping circuits 230. The stream-mapping circuits 230 may then respond with the stream ID for thestorage subsystem 225 to use with thewrite command 250.
-
- A) TSCOPE computed as (FID, TID, LNUM);
- B) TSCOPE computed as (LNUM);
- C) TSCOPE computed as (TID);
- D) TSCOPE computed as (TID, LNUM); or
- E) TSCOPE computed as (TID, LNUM, NNUM).
-
- A) SSCOPE computed as (FID, TID, LNUM, NNUM)
- B) SSCOPE computed as (KVSETID)
- C) SSCOPE computed as (TID)
- D) SSCOPE computed as (TID, LNUM)
- E) SSCOPE computed as (TID, LNUM, NNUM)
- F) SSCOPE computed as (LNUM)
- A) Obtain the number of streams available on DID, referred to as SCOUNT;
- B) Obtain a unique SID for each of the SCOUNT streams on DID; and
- C) For each value TEMP in TEMPSET:
- a) Compute how many of the SCOUNT streams to use for data classified by TEMP in accordance with the configured determiner for TEMP, referred to as TCOUNT; and
- b) Select TCOUNT SIDS for DID not yet entered in the A-SID table 240 and, for each selected TCOUNT SID for DID, create one entry (e.g., row) in A-SID table 240 for (DID, TEMP, SID).
-
- A) Temperature values {Hot, Cold}, with H streams on a given storage device used for data classified as Hot, and C streams on a given storage device used for data classified as Cold.
- B) A temperature assignment method configured with TSCOPE computed as (LNUM) whereby data written to L0 in any KVS tree is assigned a temperature value of Hot, and data written to L1 or greater in any KVS tree is assigned a temperature value of Cold.
- C) An LRU stream assignment method configured with SSCOPE computed as (TID, LNUM).
In this case, the total number of concurrent ingest and compaction operations—operations producing a write—for all KVS trees follows these conditions: concurrent ingest operations for all KVS trees is at most H—because the data for all ingest operations is written tolevel 0 in a KVS tree and hence will be classified as Hot—and concurrent compaction operations for all KVS trees is at most C—because the data for all spill compactions, and the majority of other compaction operations, is written tolevel 1 or greater and hence will be classified as Cold.
-
- A) All keys in the subtree rooted at an edge key K's child node are less than or equal to K.
- B) The maximum key in any tree or subtree is the right-most entry the right-most leaf node.
- C) Given a node N with a right-most edge that points to child R, all keys in the subtree rooted at node R are greater than all keys in node N.
-
- A) Level 0: spill value hits 0 through (E(0)−1) specify the child node number for key
- B) Level 1: spill value bits E(0) through (E(0)+E(1)−1) specify the child node number for key K; and
- C) Level L (L>1): spill value bits sum(E(0), . . . , E(L−1)) through (sum(E(0), . . . , E(L))−1) specify the child node number for key K.
| 0 | 1 | 2 | 3 | 4 | 5 |
| 2 | 8 | 4 | 16 | 32 | 2 |
| 0 | 1-3 | 4-5 | 6-9 | 10-14 | 15 |
Key | 0 | 110 | 01 | 1110 | 10001 | 1 |
Child node selected | 0 | 6 | 1 | 14 | 17 | 1 |
Where Level is a level number in the KVS tree; Child node count is the number of child nodes configured for all nodes at the specified level; Spill value bits is the spill value bit numbers that spill compaction uses for key distribution at the specified level; Key K spill value is the binary representation of the given 16-bit spill value for the given key K, specifically 0110011110100011—for clarity, the spill value is segmented into the bits that spill compaction uses for key distribution at the specified level; and Child node selected is the child node number that spill compaction selects for any (non-obsolete) key-value pair or tombstone with the given spill value—this includes all(non-obsolete) key-value pairs or tombstones with the given key K, as well as other keys different from key K that may have the same spill value.
-
- A) Prioritize applying garbage collection operations to those nodes with the most garbage, in particular garbage collection operations that physically remove obsolete key-value pairs and tombstones such as key-value compaction, spill compaction, and hoist-compaction. Prioritizing garbage collection operations in this manner increases their efficiency and reduces associated write-amplification; or
- B) Estimate the number of valid key-value pairs and number of obsolete key-value pairs in the KVS tree, and the amount of storage capacity consumed by each category. Such estimates are useful in reporting capacity utilization for the KVS tree.
In some cases it is advantageous to directly compute garbage metrics for a given node in a KVS tree, whereas in other cases it is advantageous to estimate them. Hence techniques for both computing and estimating garbage metrics are described below.
-
- A) Number of key-value pairs
- B) Number of key tombstones
- C) Capacity needed to store all keys for key-value pairs and tombstones
- D) Capacity needed to store all values for key-value pairs
- E) Key size statistics including minimum, maximum, median, and mean
- F) Value size statistics including minimum, maximum, median, and mean
- G) Count of, and capacity consumed by, unreferenced values if the kvset is the result of a key compaction.
- H) Minimum and maximum time-to-live (TTL) value for any key-value pair. A KVS tree may allow the user to specify a TTL value when storing a key-value pair, and the key-value pair will be removed during a compaction operation if its lifetime is exceeded.
-
- A) The count of unreferenced values in the kvset
- B) The bytes of unreferenced values in the kvset
-
- A) The count of unreferenced values in the node
- B) The bytes of unreferenced values in the node
It is clear that, if every kvset in a given node is the result of a key compaction operation, then the key compaction garbage metrics for the node are the sum of the like key compaction garbage metrics from each of the individual kvsets in the node.
-
- A) T=the number of kvsets in the given node
- B) S(j)=a kvset in the given node, where S(1) is the oldest kvset and S(T) is the newest
- C) KVcnt(S(j))=number of key-value pairs in S(j)
- D) NKVcnt=sum(KVcnt(S(j))) for j in range one through T
- E) Kcap(S(j))=capacity needed to store all keys for S(j) in bytes
- F) NKcap=sum(Kcap(S(j))) for j in range one through T
- G) Vcap(S(j))=capacity needed to store all values for S(j) in bytes
- H) NVcap=sum(Vcap(S(j))) for j in range one through T
- I) NKVcap=NKcap+NVcap
-
- A) Simple, cumulative, or weighted moving averages of the fraction of obsolete key-value pairs in prior executions of garbage collection operations in the given node; or
- B) Simple, cumulative, or weighted moving averages of the fraction of obsolete key-value pairs in prior executions of garbage collection operations in any node at the same level of the KVS tree as the given node.
In the above examples, garbage collection operations include, but are not limited to, key compaction, key-value compaction, spill compaction, or hoist compaction. Given a node in a KVS tree, historical garbage collection information and kvset statistics provide the information to generate estimated garbage metrics for the node.
-
- A) NKVcnt*NSMA(E) count of obsolete key-value pairs in the node;
- B) NKVcap−NSMA(E) bytes of obsolete key-value data in the node;
- C) NKVcnt−(NKVcnt*NSMA(E)) count of valid key-value pairs in the node; or
- D) NKVcap−(NKVcap*NSMA(E)) bytes of valid key-value data in the node.
-
- A) NKVcnt*LSMA(E) count of obsolete key-value pairs in the node;
- B) NKVcap*LSMA(E) bytes of obsolete key-value data in the node;
- C) NKVcnt−(NKVcnt*LSMA(E)) count of valid key-value pairs in the node; or
- D) NKVcap−(NKVcap*LSMA(E)) bytes of valid key-value data in the node.
-
- A) NKVcnt−NBEC count of obsolete key-value pairs in the node;
- B) NKVcap*Fobs bytes of obsolete key-value data in the node;
- C) NBEC count of valid key-value pairs in the node; or
- D) NKVcap−(NKVcap*Fobs) bytes of valid key-value data in the node.
-
- A) KGMOcnt=an estimate of the count of obsolete key-value pairs in the W kvsets+the sum of the count of unreferenced values from each of the W kvsets;
- B) KGMOcap=an estimate of the bytes of obsolete key-value data in the W kvsets+the sum of the bytes of unreferenced values from each of the W kvsets;
- C) KGMVcnt=an estimate of the count of valid key-value pairs in the W kvsets; and
- D) KGMVcap=an estimate of the bytes of valid key-value data in the W kvsets.
- Where the estimated garbage metrics may be generated using one of the techniques discussed above under the assumption that the W kvsets are the only kvsets in the node.
-
- A) EGMOcnt=an estimate the count of obsolete (garbage) key-value pairs in the (T−W) kvsets;
- B) EGMOcap=an estimate of the bytes of obsolete (garbage) key-value data in the (T−W) kvsets;
- C) EGMVcnt=an estimate of the count of valid key-value pairs in the (T−W) kvsets; and
- D) EGMVcap=an estimate of the bytes of valid key-value data in the (T−W) kvsets.
Where these estimated garbage metrics may be generated using one of the techniques discussed above under the assumption that the (T−W) kvsets are the only kvsets in the node. Given these parameters, the hybrid garbage metrics for the given node may include: - A) KGMOcnt+EGMOcnt count of obsolete key-value pairs in the node;
- B) KGMOcap+EGMOcap bytes of obsolete key-value data in the node;
- C) KGMVcnt+EGM count of valid key-value pairs in the node; or
- D) KGMVcap+EGMVcap bytes of valid key-value data in the node.
-
- A) If the retention criterion is W key-value pairs in the KVS tree, and the retention increment is 0.10*W key-value pairs, then key compaction is executed if two or more consecutive kvsets (the merge set) have a combined 0.10*W count of key-value pairs;
- B) If the retention criterion is X bytes of key-value data in the KVS tree, and the retention increment is 0.20*X bytes of key-value data, then key compaction is executed if two or more consecutive kvsets (the merge set) have a combined 0.20*X bytes of key-value data; or
- C) If the retention criterion is Y days of key-value data in the KVS tree, and the retention increment is 0.15*Y days of key-value data, then key compaction is executed if two or more consecutive kvsets (the merge set) have a combined 0.15*Y days of key-value data.
Claims (45)
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/428,877 US10725988B2 (en) | 2017-02-09 | 2017-02-09 | KVS tree |
CN201880011122.7A CN110268394B (en) | 2017-02-09 | 2018-02-05 | Method, system and machine readable storage medium for storing and manipulating key value data |
PCT/US2018/016892 WO2018148149A1 (en) | 2017-02-09 | 2018-02-05 | Kvs tree |
KR1020197026327A KR102266756B1 (en) | 2017-02-09 | 2018-02-05 | KVS tree |
TW107104545A TWI682274B (en) | 2017-02-09 | 2018-02-08 | Key-value store tree |
US16/856,920 US20200257669A1 (en) | 2017-02-09 | 2020-04-23 | Kvs tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/428,877 US10725988B2 (en) | 2017-02-09 | 2017-02-09 | KVS tree |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/856,920 Continuation US20200257669A1 (en) | 2017-02-09 | 2020-04-23 | Kvs tree |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180225315A1 US20180225315A1 (en) | 2018-08-09 |
US10725988B2 true US10725988B2 (en) | 2020-07-28 |
Family
ID=63037817
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/428,877 Active 2037-11-19 US10725988B2 (en) | 2017-02-09 | 2017-02-09 | KVS tree |
US16/856,920 Abandoned US20200257669A1 (en) | 2017-02-09 | 2020-04-23 | Kvs tree |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/856,920 Abandoned US20200257669A1 (en) | 2017-02-09 | 2020-04-23 | Kvs tree |
Country Status (5)
Country | Link |
---|---|
US (2) | US10725988B2 (en) |
KR (1) | KR102266756B1 (en) |
CN (1) | CN110268394B (en) |
TW (1) | TWI682274B (en) |
WO (1) | WO2018148149A1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10915546B2 (en) | 2018-10-10 | 2021-02-09 | Micron Technology, Inc. | Counter-based compaction of key-value store tree data block |
US10931651B2 (en) * | 2017-11-21 | 2021-02-23 | Advanced New Technologies Co., Ltd. | Key management |
US10936661B2 (en) | 2018-12-26 | 2021-03-02 | Micron Technology, Inc. | Data tree with order-based node traversal |
US11048755B2 (en) | 2018-12-14 | 2021-06-29 | Micron Technology, Inc. | Key-value store tree with selective use of key portion |
US11100071B2 (en) | 2018-10-10 | 2021-08-24 | Micron Technology, Inc. | Key-value store tree data block spill with compaction |
US11269885B2 (en) * | 2018-01-30 | 2022-03-08 | Salesforce.Com, Inc. | Cache for efficient record lookups in an LSM data structure |
US11334270B2 (en) | 2018-12-14 | 2022-05-17 | Micron Technology, Inc. | Key-value store using journaling with selective data storage format |
US11379431B2 (en) * | 2019-12-09 | 2022-07-05 | Microsoft Technology Licensing, Llc | Write optimization in transactional data management systems |
US11461047B2 (en) * | 2019-12-13 | 2022-10-04 | Samsung Electronics Co., Ltd. | Key-value storage device and operating method |
US20230176758A1 (en) * | 2021-12-03 | 2023-06-08 | Samsung Electronics Co., Ltd. | Two-level indexing for key-value persistent storage device |
US11741073B2 (en) | 2021-06-01 | 2023-08-29 | Alibaba Singapore Holding Private Limited | Granularly timestamped concurrency control for key-value store |
US11755427B2 (en) | 2021-06-01 | 2023-09-12 | Alibaba Singapore Holding Private Limited | Fast recovery and replication of key-value stores |
US11829291B2 (en) | 2021-06-01 | 2023-11-28 | Alibaba Singapore Holding Private Limited | Garbage collection of tree structure with page mappings |
US11892951B2 (en) | 2021-07-16 | 2024-02-06 | Samsung Electronics Co., Ltd | Key packing for flash key value store operations |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10706106B2 (en) | 2017-02-09 | 2020-07-07 | Micron Technology, Inc. | Merge tree modifications for maintenance operations |
US10706105B2 (en) | 2017-02-09 | 2020-07-07 | Micron Technology, Inc. | Merge tree garbage metrics |
US10719495B2 (en) | 2017-02-09 | 2020-07-21 | Micron Technology, Inc. | Stream selection for multi-stream storage devices |
US10725988B2 (en) | 2017-02-09 | 2020-07-28 | Micron Technology, Inc. | KVS tree |
US10187264B1 (en) * | 2017-02-14 | 2019-01-22 | Intuit Inc. | Gateway path variable detection for metric collection |
WO2018200475A1 (en) * | 2017-04-24 | 2018-11-01 | Reniac, Inc. | System and method to accelerate compaction |
US10824610B2 (en) * | 2018-09-18 | 2020-11-03 | Vmware, Inc. | Balancing write amplification and space amplification in buffer trees |
US11099771B2 (en) * | 2018-09-24 | 2021-08-24 | Salesforce.Com, Inc. | System and method for early removal of tombstone records in database |
CN109617669B (en) * | 2018-11-30 | 2022-02-22 | 广州高清视信数码科技股份有限公司 | Authentication method of set top box remote controller based on hash algorithm and terminal equipment |
US11113270B2 (en) | 2019-01-24 | 2021-09-07 | EMC IP Holding Company LLC | Storing a non-ordered associative array of pairs using an append-only storage medium |
CN110032549B (en) | 2019-01-28 | 2023-10-20 | 北京奥星贝斯科技有限公司 | Partition splitting method, partition splitting device, electronic equipment and readable storage medium |
EP3731109B1 (en) * | 2019-04-26 | 2022-07-06 | Datadobi cvba | Versioned backup on object addressable storage system |
KR102714982B1 (en) * | 2019-07-05 | 2024-10-10 | 삼성전자주식회사 | Storage device storing data based on key-value and operating method of the same |
EP3767486B1 (en) * | 2019-07-19 | 2023-03-22 | Microsoft Technology Licensing, LLC | Multi-record index structure for key-value stores |
US11216434B2 (en) | 2019-09-25 | 2022-01-04 | Atlassian Pty Ltd. | Systems and methods for performing tree-structured dataset operations |
CN111026329B (en) * | 2019-11-18 | 2021-04-20 | 华中科技大学 | Key value storage system based on host management tile record disk and data processing method |
KR20210063862A (en) | 2019-11-25 | 2021-06-02 | 에스케이하이닉스 주식회사 | Key-value storage and a system including the same |
CN111295650B (en) * | 2019-12-05 | 2023-05-16 | 支付宝(杭州)信息技术有限公司 | Performing mapping iterations in a blockchain-based system |
WO2020098819A2 (en) * | 2019-12-05 | 2020-05-22 | Alipay (Hangzhou) Information Technology Co., Ltd. | Performing map iterations in a blockchain-based system |
CN111399777B (en) * | 2020-03-16 | 2023-05-16 | 平凯星辰(北京)科技有限公司 | Differential key value data storage method based on data value classification |
CN111400320B (en) * | 2020-03-18 | 2023-06-20 | 百度在线网络技术(北京)有限公司 | Method and device for generating information |
US11599546B2 (en) * | 2020-05-01 | 2023-03-07 | EMC IP Holding Company LLC | Stream browser for data streams |
US11604759B2 (en) | 2020-05-01 | 2023-03-14 | EMC IP Holding Company LLC | Retention management for data streams |
US11340834B2 (en) | 2020-05-22 | 2022-05-24 | EMC IP Holding Company LLC | Scaling of an ordered event stream |
US11921683B2 (en) | 2020-06-08 | 2024-03-05 | Paypal, Inc. | Use of time to live value during database compaction |
CN111444196B (en) * | 2020-06-12 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method, device and equipment for generating Hash of global state in block chain type account book |
US11360992B2 (en) | 2020-06-29 | 2022-06-14 | EMC IP Holding Company LLC | Watermarking of events of an ordered event stream |
US11461299B2 (en) | 2020-06-30 | 2022-10-04 | Hewlett Packard Enterprise Development Lp | Key-value index with node buffers |
US11556513B2 (en) | 2020-06-30 | 2023-01-17 | Hewlett Packard Enterprise Development Lp | Generating snapshots of a key-value index |
US11340792B2 (en) | 2020-07-30 | 2022-05-24 | EMC IP Holding Company LLC | Ordered event stream merging |
US11599420B2 (en) | 2020-07-30 | 2023-03-07 | EMC IP Holding Company LLC | Ordered event stream event retention |
US11354444B2 (en) | 2020-09-30 | 2022-06-07 | EMC IP Holding Company LLC | Access control for an ordered event stream storage system |
US11513871B2 (en) | 2020-09-30 | 2022-11-29 | EMC IP Holding Company LLC | Employing triggered retention in an ordered event stream storage system |
US11461240B2 (en) | 2020-10-01 | 2022-10-04 | Hewlett Packard Enterprise Development Lp | Metadata cache for storing manifest portion |
US11755555B2 (en) | 2020-10-06 | 2023-09-12 | EMC IP Holding Company LLC | Storing an ordered associative array of pairs using an append-only storage medium |
US11599293B2 (en) | 2020-10-14 | 2023-03-07 | EMC IP Holding Company LLC | Consistent data stream replication and reconstruction in a streaming data storage platform |
US11354054B2 (en) | 2020-10-28 | 2022-06-07 | EMC IP Holding Company LLC | Compaction via an event reference in an ordered event stream storage system |
CN112235324B (en) * | 2020-12-14 | 2021-03-02 | 杭州字节信息技术有限公司 | Key management system, updating method and reading method based on KeyStore key tree |
US11347568B1 (en) | 2020-12-18 | 2022-05-31 | EMC IP Holding Company LLC | Conditional appends in an ordered event stream storage system |
US11816065B2 (en) | 2021-01-11 | 2023-11-14 | EMC IP Holding Company LLC | Event level retention management for data streams |
US11526297B2 (en) | 2021-01-19 | 2022-12-13 | EMC IP Holding Company LLC | Framed event access in an ordered event stream storage system |
US12099513B2 (en) | 2021-01-19 | 2024-09-24 | EMC IP Holding Company LLC | Ordered event stream event annulment in an ordered event stream storage system |
US11740828B2 (en) | 2021-04-06 | 2023-08-29 | EMC IP Holding Company LLC | Data expiration for stream storages |
US12001881B2 (en) | 2021-04-12 | 2024-06-04 | EMC IP Holding Company LLC | Event prioritization for an ordered event stream |
US11954537B2 (en) | 2021-04-22 | 2024-04-09 | EMC IP Holding Company LLC | Information-unit based scaling of an ordered event stream |
US11513714B2 (en) | 2021-04-22 | 2022-11-29 | EMC IP Holding Company LLC | Migration of legacy data into an ordered event stream |
US20240256511A1 (en) * | 2021-06-01 | 2024-08-01 | Speedb Ltd. | Lsm hybrid compaction |
US11681460B2 (en) | 2021-06-03 | 2023-06-20 | EMC IP Holding Company LLC | Scaling of an ordered event stream based on a writer group characteristic |
US11735282B2 (en) | 2021-07-22 | 2023-08-22 | EMC IP Holding Company LLC | Test data verification for an ordered event stream storage system |
US11853577B2 (en) | 2021-09-28 | 2023-12-26 | Hewlett Packard Enterprise Development Lp | Tree structure node compaction prioritization |
US11971850B2 (en) | 2021-10-15 | 2024-04-30 | EMC IP Holding Company LLC | Demoted data retention via a tiered ordered event stream data storage system |
KR20230070718A (en) | 2021-11-15 | 2023-05-23 | 에스케이하이닉스 주식회사 | Key-value storage identifying tenants and operating method thereof |
CN114416752B (en) * | 2022-03-31 | 2022-07-15 | 南京得瑞芯存科技有限公司 | Data processing method and device of KV SSD |
Citations (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5204958A (en) * | 1991-06-27 | 1993-04-20 | Digital Equipment Corporation | System and method for efficiently indexing and storing a large database with high data insertion frequency |
US5530850A (en) | 1993-10-25 | 1996-06-25 | International Business Machines Corporation | Data storage library array with log-structured file system which allows simultaneous write and garbage collection |
US6597957B1 (en) | 1999-12-20 | 2003-07-22 | Cisco Technology, Inc. | System and method for consolidating and sorting event data |
TW200421114A (en) | 2002-12-19 | 2004-10-16 | Ibm | Method and apparatus for building one or more indexes on data concurrent with manipulation of data |
US20080016066A1 (en) | 2006-06-30 | 2008-01-17 | Tele Atlas North America, Inc. | Adaptive index with variable compression |
TW200822066A (en) | 2006-05-10 | 2008-05-16 | Nero Ag | Apparatus for writing data to a medium |
TW200836084A (en) | 2007-01-05 | 2008-09-01 | Microsoft Corp | Optimizing execution of HD-DVD timing markup |
CN101515298A (en) | 2009-03-30 | 2009-08-26 | 华为技术有限公司 | Inserting method based on tree-shaped data structure node and storing device |
US20120072656A1 (en) | 2010-06-11 | 2012-03-22 | Shrikar Archak | Multi-tier caching |
US20120223889A1 (en) * | 2009-03-30 | 2012-09-06 | Touchtype Ltd | System and Method for Inputting Text into Small Screen Devices |
KR20130018602A (en) | 2011-08-08 | 2013-02-25 | 가부시끼가이샤 도시바 | Memory system including key-value store |
US20130117524A1 (en) | 2008-08-15 | 2013-05-09 | International Business Machines Corporation | Management of recycling bin for thinly-provisioned logical volumes |
US20130218840A1 (en) | 2012-02-17 | 2013-08-22 | Charles Smith | System and method for building a point-in-time snapshot of an eventually-consistent data store |
TW201342088A (en) | 2012-04-02 | 2013-10-16 | Ind Tech Res Inst | Digital content reordering method and digital content aggregator |
US20130306276A1 (en) | 2007-04-23 | 2013-11-21 | David D Duchesneau | Computing infrastructure |
TW201408070A (en) | 2012-04-04 | 2014-02-16 | Intel Corp | A compressed depth cache |
US20140064490A1 (en) | 2012-08-28 | 2014-03-06 | Samsung Electronics Co., Ltd. | Management of encryption keys for broadcast encryption and transmission of messages using broadcast encryption |
US20140074841A1 (en) | 2012-09-10 | 2014-03-13 | Apple Inc. | Concurrent access methods for tree data structures |
US20140082028A1 (en) * | 2011-06-27 | 2014-03-20 | Amazon Technologies, Inc. | System and method for implementing a scalable data storage service |
US20140222870A1 (en) | 2013-02-06 | 2014-08-07 | Lei Zhang | System, Method, Software, and Data Structure for Key-Value Mapping and Keys Sorting |
US20140279944A1 (en) | 2013-03-15 | 2014-09-18 | University Of Southern California | Sql query to trigger translation for maintaining consistency of cache augmented sql systems |
TWI454166B (en) | 2011-09-15 | 2014-09-21 | Fujitsu Ltd | Information management method and information management apparatus |
US20140344287A1 (en) * | 2013-05-16 | 2014-11-20 | Fujitsu Limited | Database controller, method, and program for managing a distributed data store |
US20150127658A1 (en) | 2013-11-06 | 2015-05-07 | International Business Machines Corporation | Key_value data storage system |
US20150244558A1 (en) | 2011-04-13 | 2015-08-27 | Et International, Inc. | Flowlet-based processing with key/value store checkpointing |
US20150254272A1 (en) | 2014-03-05 | 2015-09-10 | Giorgio Regni | Distributed Consistent Database Implementation Within An Object Store |
US20150286695A1 (en) | 2014-04-03 | 2015-10-08 | Sandisk Enterprise Ip Llc | Methods and Systems for Performing Efficient Snapshots in Tiered Data Structures |
CN105095287A (en) | 2014-05-14 | 2015-11-25 | 华为技术有限公司 | LSM (Log Structured Merge) data compact method and device |
US20150347495A1 (en) | 2012-02-03 | 2015-12-03 | Apple Inc. | Enhanced B-Trees with Record Merging |
US20160173445A1 (en) | 2014-12-15 | 2016-06-16 | Palo Alto Research Center Incorporated | Ccn routing using hardware-assisted hash tables |
US9400816B1 (en) | 2013-02-28 | 2016-07-26 | Google Inc. | System for indexing collections of structured objects that provides strong multiversioning semantics |
US20160275094A1 (en) | 2015-03-17 | 2016-09-22 | Cloudera, Inc. | Compaction policy |
US20170017411A1 (en) | 2015-07-13 | 2017-01-19 | Samsung Electronics Co., Ltd. | Data property-based data placement in a nonvolatile memory device |
US20170141791A1 (en) | 2015-11-16 | 2017-05-18 | International Business Machines Corporation | Compression of javascript object notation data using structure information |
US20170212680A1 (en) * | 2016-01-22 | 2017-07-27 | Suraj Prabhakar WAGHULDE | Adaptive prefix tree based order partitioned data storage system |
US20180067975A1 (en) | 2016-01-29 | 2018-03-08 | Hitachi, Ltd. | Computer system and data processing method |
US20180225316A1 (en) | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Stream selection for multi-stream storage devices |
US20180225322A1 (en) | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Merge tree modifications for maintenance operations |
US20180225321A1 (en) | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Merge tree garbage metrics |
WO2018148149A1 (en) | 2017-02-09 | 2018-08-16 | Micron Technology, Inc | Kvs tree |
US20180253386A1 (en) | 2015-12-28 | 2018-09-06 | Huawei Technologies Co., Ltd. | Data processing method and nvme storage device |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100488414B1 (en) * | 2000-12-30 | 2005-05-11 | 한국전자통신연구원 | Node Structuring Method for multiway search tree, and Searching Method by using it |
US8572036B2 (en) * | 2008-12-18 | 2013-10-29 | Datalight, Incorporated | Method and apparatus for fault-tolerant memory management |
US10558705B2 (en) * | 2010-10-20 | 2020-02-11 | Microsoft Technology Licensing, Llc | Low RAM space, high-throughput persistent key-value store using secondary memory |
US9069827B1 (en) * | 2012-01-17 | 2015-06-30 | Amazon Technologies, Inc. | System and method for adjusting membership of a data replication group |
US20130279503A1 (en) * | 2012-02-17 | 2013-10-24 | Rockstar Consortium Us Lp | Next Hop Computation Functions for Equal Cost Multi-Path Packet Switching Networks |
KR101341507B1 (en) * | 2012-04-13 | 2013-12-13 | 연세대학교 산학협력단 | Modified searching method and apparatus for b+ tree |
GB201210234D0 (en) * | 2012-06-12 | 2012-07-25 | Fujitsu Ltd | Reconciliation of large graph-based data storage |
EP2731061A1 (en) * | 2012-11-07 | 2014-05-14 | Fujitsu Limited | Program, method, and database system for storing descriptions |
US10990288B2 (en) * | 2014-08-01 | 2021-04-27 | Software Ag Usa, Inc. | Systems and/or methods for leveraging in-memory storage in connection with the shuffle phase of MapReduce |
US10061629B2 (en) * | 2015-07-22 | 2018-08-28 | Optumsoft, Inc. | Compact binary event log generation |
US11182365B2 (en) * | 2016-03-21 | 2021-11-23 | Mellanox Technologies Tlv Ltd. | Systems and methods for distributed storage of data across multiple hash tables |
WO2019084465A1 (en) * | 2017-10-27 | 2019-05-02 | Streamsimple, Inc. | Streaming microservices for stream processing applications |
-
2017
- 2017-02-09 US US15/428,877 patent/US10725988B2/en active Active
-
2018
- 2018-02-05 WO PCT/US2018/016892 patent/WO2018148149A1/en active Application Filing
- 2018-02-05 CN CN201880011122.7A patent/CN110268394B/en active Active
- 2018-02-05 KR KR1020197026327A patent/KR102266756B1/en active IP Right Grant
- 2018-02-08 TW TW107104545A patent/TWI682274B/en active
-
2020
- 2020-04-23 US US16/856,920 patent/US20200257669A1/en not_active Abandoned
Patent Citations (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5204958A (en) * | 1991-06-27 | 1993-04-20 | Digital Equipment Corporation | System and method for efficiently indexing and storing a large database with high data insertion frequency |
US5530850A (en) | 1993-10-25 | 1996-06-25 | International Business Machines Corporation | Data storage library array with log-structured file system which allows simultaneous write and garbage collection |
US6597957B1 (en) | 1999-12-20 | 2003-07-22 | Cisco Technology, Inc. | System and method for consolidating and sorting event data |
TW200421114A (en) | 2002-12-19 | 2004-10-16 | Ibm | Method and apparatus for building one or more indexes on data concurrent with manipulation of data |
TW200822066A (en) | 2006-05-10 | 2008-05-16 | Nero Ag | Apparatus for writing data to a medium |
US20080016066A1 (en) | 2006-06-30 | 2008-01-17 | Tele Atlas North America, Inc. | Adaptive index with variable compression |
TW200836084A (en) | 2007-01-05 | 2008-09-01 | Microsoft Corp | Optimizing execution of HD-DVD timing markup |
US20130306276A1 (en) | 2007-04-23 | 2013-11-21 | David D Duchesneau | Computing infrastructure |
US20130117524A1 (en) | 2008-08-15 | 2013-05-09 | International Business Machines Corporation | Management of recycling bin for thinly-provisioned logical volumes |
US20120223889A1 (en) * | 2009-03-30 | 2012-09-06 | Touchtype Ltd | System and Method for Inputting Text into Small Screen Devices |
US20100246446A1 (en) * | 2009-03-30 | 2010-09-30 | Wenhua Du | Tree-based node insertion method and memory device |
CN101515298A (en) | 2009-03-30 | 2009-08-26 | 华为技术有限公司 | Inserting method based on tree-shaped data structure node and storing device |
US20120072656A1 (en) | 2010-06-11 | 2012-03-22 | Shrikar Archak | Multi-tier caching |
US20150244558A1 (en) | 2011-04-13 | 2015-08-27 | Et International, Inc. | Flowlet-based processing with key/value store checkpointing |
US20140082028A1 (en) * | 2011-06-27 | 2014-03-20 | Amazon Technologies, Inc. | System and method for implementing a scalable data storage service |
KR20130018602A (en) | 2011-08-08 | 2013-02-25 | 가부시끼가이샤 도시바 | Memory system including key-value store |
TWI454166B (en) | 2011-09-15 | 2014-09-21 | Fujitsu Ltd | Information management method and information management apparatus |
US20150347495A1 (en) | 2012-02-03 | 2015-12-03 | Apple Inc. | Enhanced B-Trees with Record Merging |
US20130218840A1 (en) | 2012-02-17 | 2013-08-22 | Charles Smith | System and method for building a point-in-time snapshot of an eventually-consistent data store |
TW201342088A (en) | 2012-04-02 | 2013-10-16 | Ind Tech Res Inst | Digital content reordering method and digital content aggregator |
TW201408070A (en) | 2012-04-04 | 2014-02-16 | Intel Corp | A compressed depth cache |
US20140064490A1 (en) | 2012-08-28 | 2014-03-06 | Samsung Electronics Co., Ltd. | Management of encryption keys for broadcast encryption and transmission of messages using broadcast encryption |
US20140074841A1 (en) | 2012-09-10 | 2014-03-13 | Apple Inc. | Concurrent access methods for tree data structures |
US20140222870A1 (en) | 2013-02-06 | 2014-08-07 | Lei Zhang | System, Method, Software, and Data Structure for Key-Value Mapping and Keys Sorting |
US9400816B1 (en) | 2013-02-28 | 2016-07-26 | Google Inc. | System for indexing collections of structured objects that provides strong multiversioning semantics |
US20140279944A1 (en) | 2013-03-15 | 2014-09-18 | University Of Southern California | Sql query to trigger translation for maintaining consistency of cache augmented sql systems |
US20140344287A1 (en) * | 2013-05-16 | 2014-11-20 | Fujitsu Limited | Database controller, method, and program for managing a distributed data store |
US20150127658A1 (en) | 2013-11-06 | 2015-05-07 | International Business Machines Corporation | Key_value data storage system |
US20150254272A1 (en) | 2014-03-05 | 2015-09-10 | Giorgio Regni | Distributed Consistent Database Implementation Within An Object Store |
US20150286695A1 (en) | 2014-04-03 | 2015-10-08 | Sandisk Enterprise Ip Llc | Methods and Systems for Performing Efficient Snapshots in Tiered Data Structures |
CN105095287A (en) | 2014-05-14 | 2015-11-25 | 华为技术有限公司 | LSM (Log Structured Merge) data compact method and device |
US20160173445A1 (en) | 2014-12-15 | 2016-06-16 | Palo Alto Research Center Incorporated | Ccn routing using hardware-assisted hash tables |
US20160275094A1 (en) | 2015-03-17 | 2016-09-22 | Cloudera, Inc. | Compaction policy |
US20170017411A1 (en) | 2015-07-13 | 2017-01-19 | Samsung Electronics Co., Ltd. | Data property-based data placement in a nonvolatile memory device |
US20170141791A1 (en) | 2015-11-16 | 2017-05-18 | International Business Machines Corporation | Compression of javascript object notation data using structure information |
US20180253386A1 (en) | 2015-12-28 | 2018-09-06 | Huawei Technologies Co., Ltd. | Data processing method and nvme storage device |
US20170212680A1 (en) * | 2016-01-22 | 2017-07-27 | Suraj Prabhakar WAGHULDE | Adaptive prefix tree based order partitioned data storage system |
US20180067975A1 (en) | 2016-01-29 | 2018-03-08 | Hitachi, Ltd. | Computer system and data processing method |
US20180225316A1 (en) | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Stream selection for multi-stream storage devices |
US20180225321A1 (en) | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Merge tree garbage metrics |
WO2018148149A1 (en) | 2017-02-09 | 2018-08-16 | Micron Technology, Inc | Kvs tree |
WO2018148198A1 (en) | 2017-02-09 | 2018-08-16 | Micron Technology, Inc | Merge tree modifications for maintenance operations |
WO2018148203A1 (en) | 2017-02-09 | 2018-08-16 | Micron Technology, Inc | Stream selection for multi-stream storage devices |
WO2018148151A1 (en) | 2017-02-09 | 2018-08-16 | Micron Technology, Inc | Merge tree garbage metrics |
US20180225322A1 (en) | 2017-02-09 | 2018-08-09 | Micron Technology, Inc. | Merge tree modifications for maintenance operations |
TW201837720A (en) | 2017-02-09 | 2018-10-16 | 美商美光科技公司 | Stream selection for multi-stream storage devices |
TW201841122A (en) | 2017-02-09 | 2018-11-16 | 美商美光科技公司 | Key-value store tree |
TW201841123A (en) | 2017-02-09 | 2018-11-16 | 美商美光科技公司 | Merge tree modifications for maintenance operations |
TW201842454A (en) | 2017-02-09 | 2018-12-01 | 美商美光科技公司 | Merge tree garbage metrics |
CN110268394A (en) | 2017-02-09 | 2019-09-20 | 美光科技公司 | KVS tree |
Non-Patent Citations (45)
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10931651B2 (en) * | 2017-11-21 | 2021-02-23 | Advanced New Technologies Co., Ltd. | Key management |
US11775524B2 (en) | 2018-01-30 | 2023-10-03 | Salesforce, Inc. | Cache for efficient record lookups in an LSM data structure |
US11269885B2 (en) * | 2018-01-30 | 2022-03-08 | Salesforce.Com, Inc. | Cache for efficient record lookups in an LSM data structure |
US11100071B2 (en) | 2018-10-10 | 2021-08-24 | Micron Technology, Inc. | Key-value store tree data block spill with compaction |
US10915546B2 (en) | 2018-10-10 | 2021-02-09 | Micron Technology, Inc. | Counter-based compaction of key-value store tree data block |
US11599552B2 (en) * | 2018-10-10 | 2023-03-07 | Micron Technology, Inc. | Counter-based compaction of key-value store tree data block |
US11048755B2 (en) | 2018-12-14 | 2021-06-29 | Micron Technology, Inc. | Key-value store tree with selective use of key portion |
US11334270B2 (en) | 2018-12-14 | 2022-05-17 | Micron Technology, Inc. | Key-value store using journaling with selective data storage format |
US10936661B2 (en) | 2018-12-26 | 2021-03-02 | Micron Technology, Inc. | Data tree with order-based node traversal |
US11657092B2 (en) | 2018-12-26 | 2023-05-23 | Micron Technology, Inc. | Data tree with order-based node traversal |
US11379431B2 (en) * | 2019-12-09 | 2022-07-05 | Microsoft Technology Licensing, Llc | Write optimization in transactional data management systems |
US11461047B2 (en) * | 2019-12-13 | 2022-10-04 | Samsung Electronics Co., Ltd. | Key-value storage device and operating method |
US11741073B2 (en) | 2021-06-01 | 2023-08-29 | Alibaba Singapore Holding Private Limited | Granularly timestamped concurrency control for key-value store |
US11755427B2 (en) | 2021-06-01 | 2023-09-12 | Alibaba Singapore Holding Private Limited | Fast recovery and replication of key-value stores |
US11829291B2 (en) | 2021-06-01 | 2023-11-28 | Alibaba Singapore Holding Private Limited | Garbage collection of tree structure with page mappings |
US11892951B2 (en) | 2021-07-16 | 2024-02-06 | Samsung Electronics Co., Ltd | Key packing for flash key value store operations |
US20230176758A1 (en) * | 2021-12-03 | 2023-06-08 | Samsung Electronics Co., Ltd. | Two-level indexing for key-value persistent storage device |
US11954345B2 (en) * | 2021-12-03 | 2024-04-09 | Samsung Electronics Co., Ltd. | Two-level indexing for key-value persistent storage device |
Also Published As
Publication number | Publication date |
---|---|
US20200257669A1 (en) | 2020-08-13 |
US20180225315A1 (en) | 2018-08-09 |
KR102266756B1 (en) | 2021-06-22 |
TWI682274B (en) | 2020-01-11 |
CN110268394B (en) | 2023-10-27 |
KR20190111124A (en) | 2019-10-01 |
WO2018148149A1 (en) | 2018-08-16 |
CN110268394A (en) | 2019-09-20 |
TW201841122A (en) | 2018-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200257669A1 (en) | Kvs tree | |
US20200334295A1 (en) | Merge tree garbage metrics | |
US20200334294A1 (en) | Merge tree modifications for maintenance operations | |
US20200349139A1 (en) | Stream selection for multi-stream storage devices | |
US11238098B2 (en) | Heterogenous key-value sets in tree database | |
WO2017204965A1 (en) | Methods and apparatus to provide group-based row-level security for big data platforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICRON TECHNOLOGY, INC., IDAHO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOYER, STEVEN;TOMLINSON, ALEXANDER;GROVES, JOHN M;AND OTHERS;SIGNING DATES FROM 20170208 TO 20170209;REEL/FRAME:041265/0331 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: SUPPLEMENT NO. 4 TO PATENT SECURITY AGREEMENT;ASSIGNOR:MICRON TECHNOLOGY, INC.;REEL/FRAME:042405/0909 Effective date: 20170425 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, ILLINOIS Free format text: SECURITY INTEREST;ASSIGNORS:MICRON TECHNOLOGY, INC.;MICRON SEMICONDUCTOR PRODUCTS, INC.;REEL/FRAME:047540/0001 Effective date: 20180703 Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, IL Free format text: SECURITY INTEREST;ASSIGNORS:MICRON TECHNOLOGY, INC.;MICRON SEMICONDUCTOR PRODUCTS, INC.;REEL/FRAME:047540/0001 Effective date: 20180703 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
AS | Assignment |
Owner name: MICRON TECHNOLOGY, INC., IDAHO Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS COLLATERAL AGENT;REEL/FRAME:050702/0451 Effective date: 20190731 |
|
AS | Assignment |
Owner name: MICRON TECHNOLOGY, INC., IDAHO Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:051028/0001 Effective date: 20190731 Owner name: MICRON SEMICONDUCTOR PRODUCTS, INC., IDAHO Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:051028/0001 Effective date: 20190731 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP, ISSUE FEE PAYMENT VERIFIED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |