CN107330094B - Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method - Google Patents

Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method Download PDF

Info

Publication number
CN107330094B
CN107330094B CN201710542207.5A CN201710542207A CN107330094B CN 107330094 B CN107330094 B CN 107330094B CN 201710542207 A CN201710542207 A CN 201710542207A CN 107330094 B CN107330094 B CN 107330094B
Authority
CN
China
Prior art keywords
value
node
bloom filter
key
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710542207.5A
Other languages
Chinese (zh)
Other versions
CN107330094A (en
Inventor
潘海娜
凌纯清
谢鲲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201710542207.5A priority Critical patent/CN107330094B/en
Publication of CN107330094A publication Critical patent/CN107330094A/en
Application granted granted Critical
Publication of CN107330094B publication Critical patent/CN107330094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a bloom filter tree structure for dynamically storing key value pairs and a key value pair storage method, wherein the bloom filter tree structure comprises a complete d-branch tree; the method is characterized in that each node of each complete d-ary tree is a bloom filter; each leaf node of each full d-ary tree represents a value; the storage unit size of each node is half of that of the parent node of the node, and the root node comprises d × k different hash functions, that is, the root node comprises d hash groups, and each group comprises k hash functions. The invention can greatly reduce the time for collecting query, reduce resource consumption, process dynamically arrived data and adapt to network environment in the application fields of generating a large amount of data and needing key value pair query, such as database interactive query, resource positioning in high-speed network, computer network monitoring and the like.

Description

Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method
Technical Field
The invention relates to the field of computer networks and computer system storage, in particular to the application field of interactive query with high performance and high throughput, and specifically relates to a storage structure and a storage method of an expandable bloom tree for key value pairs.
Background
In recent years, with the rapid development of computers, the size of collections in databases, networks and other applications has increased geometrically. Storing and querying key value pairs (keys) are common tasks in computer systems, and therefore, a corresponding key value pair storage data structure needs to be designed to support rapid key value pair query. Key-value pair operations often occur in network and storage systems, such as the key-value database MongoDB, CouchDB. Each unique key placed in the key-value pair storage system corresponds to a value, for example, (3, 5) is a key-value pair with a key of 3 and a value of 5, and after (3, 5) is stored in the key-value pair storage system, the value (value) can be obtained by querying the key (key) of 3.
Designing efficient key-value pair storage and query structures presents a significant challenge. In a layer 2 switch, a MAC address is associated with a unique port. When a frame is to be forwarded, the search engine queries the MAC table of the destination address to be forwarded by the frame, so that the problem of mapping a MAC address to a port is converted into a key-value-pair query problem, and at this time, the MAC address is regarded as a key and the port number to be queried becomes a value. Since the MAC address is continuously added to the list, the size of the element is unknown. If the key value pairs are stored by adopting a cell structure, a large amount of space is consumed, and a large amount of time is consumed when the value of the corresponding key is searched; if a static bloom filter structure is adopted to store key-value pairs, only static data can be processed, which is not practical in practical application. Therefore, in a high-speed computer network, how to efficiently store the information and quickly query the corresponding key-value pairs become a challenge.
The Bloom Filter is a data structure which is economical in space and efficient in query, can meet the requirements of efficient resource interaction and searching in the current life, and can effectively represent a data set. Bloom has been proposed by b.bloom in 1970, and is widely used in various computer systems to represent a huge data set and improve query efficiency. The essence of the bloom filter structure is to map the elements in the set into a bit vector by k hash functions. The bloom filter achieves the efficient representation set, meanwhile, when the element query is carried out, certain false positive (a certain element does not belong to the set and is mistakenly judged as belonging to the set) false judgment rate exists, false negative (a certain element belongs to the set and is mistakenly judged as not belonging to the set) false judgment does not exist, and the query and storage efficiency is high.
But a conventional bloom filter can only support dependent queries of whether an element exists in a set. If the element is a key, then only a dependent query if the key exists in the set can be supported, and no (key, value) operation can be supported. Because the bloom filter cannot directly store values, it cannot operate on key-value pairs using a conventional bloom filter. In order to make the bloom filter support the basic operation of key-value pairs, the traditional bloom filter must be improved, and a new bloom filter structure must be designed.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a bloom filter tree structure for dynamically storing key-value pairs and a key-value pair storage method, aiming at the defects of the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a bloom filter tree structure for dynamically storing key-value pairs, comprising a full d-ary tree; each node of each complete d-ary tree is a bloom filter; each leaf node of each full d-ary tree represents a value; the storage unit size of each node is half of that of the parent node of the node, and the root node comprises d × k different hash functions, that is, the root node comprises d hash groups, and each group comprises k hash functions.
Correspondingly, the invention also provides a method for storing key value pairs of the bloom filter tree structure, which comprises the following steps:
inserting operation: when a key value pair (key, value) needs to be inserted, firstly checking whether the value is already inserted into the bloom filter, and if not, performing subsequent operation of adding a new value; if the value exists in the bloom filter tree, key value pair insertion is directly carried out, a leaf node corresponding to the value is searched according to the value, the leaf node position code is obtained, a unique path from a root node to the leaf node is determined, two groups of hash functions of the root node are calculated, namely k hash functions h are used for the key(i,1),h(i,2),...,h(i,k)Calculate h(i,1)(key),h(i,2)(key),...,h(i,k)(key),; wherein i represents a group number of the selected hash function, and i is 1 or 2; the hash value calculated by i-1 is stored in the array A, and the hash value calculated by i-2 is stored in the array B; then, according to the first bit code of the leaf node, selecting a group of values in A, B arrays at the root node for insertion, namely performing right zero shift operation on the A or B arrays, then according to the first bit code of the leaf node, obtaining a bloom filter needing the next insertion operation, and then according to the second bit code of the leaf node, selecting A, B arraysPerforming right shift operation on a group of values to obtain an insertion position, and performing insertion; continuing to perform the operation, and inserting keys into each layer of bloom filter until the keys are inserted into the leaf nodes corresponding to the value;
and (3) query operation: when a value corresponding to a key value key needs to be queried, firstly, two groups of hash functions h of a root node are calculated(i,1)(key),h(i,2)(key),...,h(i,k)(key) (i ═ 1 or 2), their values are stored in A, B two sets of arrays, respectively; respectively carrying out query operation on position units corresponding to the A, B two array values at the root node, and searching a first bit encoding value; according to the obtained coding value, switching to the next node for continuous query, at the moment, performing right shift operation on the A, B two arrays, and querying the corresponding position unit to obtain a coding value; continuing to perform the above operation until the leaf node is found, finally obtaining a complete encoding value, and performing decoding operation according to the leaf node encoding to obtain a corresponding value;
adding a new value operation: to add a new value to the bloom filter, the following two cases are distinguished: if the original bloom filter is an unfilled binary tree, when a new value needs to be added, directly adding a new leaf node at the tail of the bloom filter, wherein the leaf node represents the newly added value; if the original bloom filter is a full binary tree, a new bloom filter is added above the root node, the size of the newly added bloom filter is 2 times of the root node of the original bloom filter tree, at the moment, the new bloom filter becomes the root node of the new bloom filter tree, the original bloom filter becomes a left sub-tree of the new root node, a full binary tree which is one layer lower than the original bloom filter tree is created to serve as a right sub-tree of the new root node, the position of the upper 1 of the original root node is shifted to the left by one bit and is inserted into the new root node, and at the moment, two groups of H of the new root node are two, the H of the new root node is two groups of H, the H of the full binary tree is a full binary tree, the full binary tree3The hash function is compared with the original two groups H3The hash function has one more layer, namely, one more row is selected in the base function, and new leaf nodes are continuously added at the end of the bloom filter.
In the insertion operation, the method for selecting the array to be inserted from the root node comprises the following steps: if the code value is 0, the A array is selected, and if the code value is 1, the B array is selected.
In the insertion operation, the method for obtaining the next bloom filter to be inserted according to the first bit encoding of the leaf node comprises the following steps: if the encoded value is 0, the left node is operated, and if the encoded value is 1, the right node is operated.
Compared with the prior art, the invention has the beneficial effects that: the invention can greatly reduce the time for collecting query, reduce resource consumption, process dynamically arrived data and adapt to network environment in the application fields of generating a large amount of data and needing key value pair query, such as database interactive query, resource positioning in high-speed network, computer network monitoring and the like.
Drawings
FIG. 1 is a structural diagram of a bloom tree that combines bloom filters with a binary tree structure, where each node is a bloom filter. The root node has two groups of H3Hash function
Figure BDA0001342109990000041
Each group having k hash functions. A leaf node represents a value, and a unique path from the root node to the leaf node can be obtained according to the encoding of the leaf node. According to leaf node encoding, if the encoding value is 0, performing shift operation on the first group of hash functions; if the encoded value is 1, a shift operation is performed on the second set of hash functions. In FIG. 1
Figure BDA0001342109990000042
Is that
Figure BDA0001342109990000043
The obtained mixture is mixed with a solvent to obtain a mixture,
Figure BDA0001342109990000044
is that
Figure BDA0001342109990000045
Obtained by
Figure BDA0001342109990000046
Is that
Figure BDA0001342109990000047
The obtained mixture is mixed with a solvent to obtain a mixture,
Figure BDA0001342109990000048
is that
Figure BDA0001342109990000049
And (4) obtaining the product.
FIG. 2 is a diagram illustrating the operation of adding a new value to an underfilled binary tree. When a value which does not exist in a leaf node needs to be inserted, if the binary tree is not a full binary tree, the leaf node can be directly added at the tail of the tree to indicate the inserted value. The leaf node encoded as 11 in the figure is the newly added value.
FIG. 3 is a diagram illustrating the operation of adding a new value to a full binary tree. When a value which does not exist in a leaf node needs to be inserted, if the binary tree is a full binary tree, the leaf node cannot be added to the original tree, and at this time, a layer needs to be expanded upwards, such as the newly added level 3 in fig. 3. The original binary tree becomes the left subtree of level 3, and a full binary tree with one layer less than the original binary tree is constructed as the right subtree of level 3. At this time, two groups of H also exist in level 33Hash function
Figure BDA00013421099900000410
Wherein
Figure BDA00013421099900000411
Is formed by
Figure BDA00013421099900000412
All the positions in the root nodes of the original binary tree are shifted to the left by one bit and are inserted into level 3; while
Figure BDA00013421099900000413
Then it is compared from the original base matrix
Figure BDA00013421099900000414
And selecting one more row. At this time, a new underfill binary tree is constructed, and a leaf node may be added directly at the end of the tree to indicate the inserted value. The leaf node encoded as 100 in the figure is the newly added value.
FIG. 4 shows details of implementation environment data, including data source, data time, and packet size, compared with the results of recent research, which includes: a search Tree based on Bloom filters for multiple-set membership testing is proposed, the structure is used for carrying out key-value pair query by combining a Tree and a Bloom filter structure, the algorithm uses d groups of hash functions on each Bloom filter node, calculation is carried out from a root node until a leaf node corresponding to a value is found, and the algorithm can only process static data; COMB is a key value query algorithm based on bloom filters proposed by a paper 'Fast dynamic multiple-set membership testing using combinatorial bloom filters' in IEEE/ACM Transactions on network 2012, and after a value is encoded by the algorithm, the number of the known value groups is selected to be proper for the number of the bloom filters and the number of the bloom filters needing interpolation, and all the bloom filters need to be queried during query to obtain corresponding codes.
Fig. 5(a) to 5(f) show fixed-size bloom filter trees m-220And bit, under different hash function numbers, processing 1024 groups of data by using three different algorithms of SBFT, Bloom Tree and COMB, and comparing the data on the average processing time. FIG. 5(a) is a schematic diagram of experimental results obtained by a simulation experiment using the data set MAWI1, FIG. 5(b) is a schematic diagram of experimental results obtained by a simulation experiment using the data set MAWI 2, FIG. 5(c) is a schematic diagram of experimental results obtained by a simulation experiment using the data set MAWI 3, FIG. 5(d) is a schematic diagram of experimental results obtained by a simulation experiment using the data set ClarkNet-HTTP, FIG. 5(e) is a schematic diagram of experimental results obtained by a simulation experiment using the data set UMass, and FIG. 5(e) is a schematic diagram of experimental results obtained by a simulation experiment using the data set UMassAnd 5(f) is a schematic diagram of experimental results obtained by performing simulation experiments by using the data set TKN. All data results in fig. 5(a) -5 (f) show that our proposed algorithm SBFT averages the least amount of time consuming data for each data set. Compared with the Bloom Tree, the algorithm only needs to select d groups of hash functions for calculation at the root node and then carry out shifting operation, and the Bloom Tree needs to select d groups of hash functions for calculation at each node, so that the time consumption is large; compared with the COMB, the COMB is similar to the COMB, the COMB needs to select a plurality of hash functions for each Bloom filter, and the time consumption is large, but the COMB needs to operate the number of the Bloom filters to be less than that of the Bloom filters, so the time consumption is smaller than that of the Bloom filters.
FIG. 6 is a diagram illustrating memory consumption states when three algorithms are used to process static data. The data result shows that the memories consumed by the three algorithms are the same, that is, the algorithm does not occupy redundant memories under the condition of least time consumption for processing data.
Fig. 7(a) to 7(f) are schematic diagrams illustrating how the three algorithms take time to process the data packets when the number of the data packets increases. Each group uses 3 hash functions. Since both Bloom Tree and COMB algorithms operate with known data volumes, the structure needs to be built up when a data packet has not been inserted. And the algorithm can operate on unknown data quantity, and data is processed along with the arrival of a data packet, which is a process for dynamically processing data. Fig. 7(a) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set MAWI1, fig. 7(b) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set MAWI 2, fig. 7(c) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set MAWI 3, fig. 7(d) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set ClarkNet-HTTP, fig. 7(e) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set UMass, and fig. 7(f) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set TKN. All data results in fig. 7 show that our algorithm is a dynamic process for processing packets for each data set and consumes less time than the Bloom Tree and COMB, which are static processes and consume more time.
Fig. 8(a) to 8(f) are schematic diagrams illustrating memory consumption of three algorithms for processing data packets when the number of data packets increases. Each group uses 3 hash functions. Fig. 8(a) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set MAWI1, fig. 8(b) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set MAWI 2, fig. 8(c) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set MAWI 3, fig. 8(d) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set ClarkNet-HTTP, fig. 8(e) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set UMass, and fig. 8(f) is a schematic diagram of an experimental result obtained by a simulation experiment using the data set TKN. All data results in fig. 8(a) -8 (f) show that, for each data set, our algorithm is a dynamic process for processing a data packet, our algorithm is a process for processing data and building a structure, while the Bloom Tree and COMB processing data packets are a static process, and the structure is built before the data is processed, which also shows that their structure can only process static data.
Detailed Description
In this embodiment, when processing static data, the selected memory size m is 220bit, the root node occupies 95325bit, 18 rows and 8 columns of H are selected3And a hash function base matrix, wherein the first 16 rows of the base matrix are extracted from the root node, each group selects k to be 3 hash functions, the data of the group number g to be 1024 groups is processed, and the tree height h is 10. FIG. 1 is a diagram of a static bloom tree structure from which key-value pair insertion, query processes may be analyzed.
Insert operation Insert (key, value) process: and searching the leaf node corresponding to the value according to the value to obtain the position code of the leaf node, and determining a unique path from the root node to the leaf node. Calculating two groups of hash functions of the root node, and using k hash functions h for key(i,1),h(i,2),...,h(i,k)(i denotes the number of the selected hash function group) and h(i,1)(key),h(i,2)(key),...,h(i,k)(key) to put their values intoStored in two arrays A, B. Then, according to the first bit encoding of the leaf node, a group of values in A, B arrays is selected for insertion at the root node (if the encoding value is 0, the A array is selected, and if the encoding value is 1, the B array is selected), which is equivalent to performing right-shift zero operation on A, B arrays (A array)>>0 or B>>0). And then, according to the first bit encoding of the leaf node, obtaining the next bloom filter needing to be subjected to the insertion operation (if the encoding value is 0, the left node is operated, and if the encoding value is 1, the right node is operated). Then, according to the second bit encoding of the leaf node, selecting a group of values in A, B array for right shift operation (A)>>1 or B>>1) And obtaining the insertion position. Similar operations are continued, and keys are inserted into each layer of bloom filter until the leaf nodes corresponding to the value are inserted. As in fig. 1, assuming that the value in the inserted key-value pair is encoded as 01, two hash functions of the root node are calculated, and their values are stored in two arrays A, B respectively; since the first bit of the code is 0, then the array A is selected to be shifted to the right at the root node by 0 bit A>>Inserting 0 into the root node, and reaching the left node in the next step; since the second bit is 1, the selected array B is shifted to the right by one bit B at the node>>1, inserting the node, and then arriving at a right node; finally, directly right-shifting the array B by two bits B>>And 2, inserting the node to finish the inserting operation.
Query (key) process: first, two sets of hash functions h of the root node are calculated(i,1)(key),h(i,2)(key),...,h(i,k)(key) (i ═ 1 or 2), where i denotes a group number for which the hash function was chosen, the hash value calculated for i ═ 1 is stored in group a, and the hash value calculated for i ═ 2 is stored in group B. And respectively carrying out query operation on position units corresponding to A, B two array values at the root node, if the values of the corresponding positions are all 1, namely meeting, searching a first bit code value (if the A array is met, the code is 0, if the B array is met, the code is 1, and if the A array is not met, the key value does not exist, and finishing the query). According to the obtained coding value, go to the next node to continue the query, at this time, the operation of right shift (A) needs to be carried out on A, B two arrays>>1,B>>1) Querying at the corresponding location unit to obtain oneThe value is encoded. Similar operations continue until a leaf node is found. Finally, a complete coding value is obtained, and the corresponding value can be obtained according to the leaf node coding. Illustrated by fig. 1: assuming a query key, first calculating two groups of hash functions of a root node, and respectively storing the values of the two groups of hash functions in A, B two groups of arrays; a, B, respectively inquiring the values of the two arrays in the root node, wherein the corresponding position units of the values in the A array in the node are all 1, recording that the first bit is coded as 0, and the next operation node is a left node; shift A, B two arrays to the right by one bit A>>1,B>>1, respectively inquiring in the node, wherein the corresponding position units of the values in the B array in the node are all 1, the second bit code is recorded as 1, and the next operation node is a right node; finally, directly right-shifting the B array by two bits B>>2, inquiring, namely, verifying finally, wherein the step does not need to record a code value, if the verification is passed, the inquired code is 01, and the inquiry is completed only by finding a corresponding value in the table.
In this embodiment, when processing dynamic data, the new value operation is added in two cases: if the original bloom tree is an unfilled binary tree, when a new value needs to be added, a new leaf node can be directly added at the end of the tree, and the leaf node represents the newly added value, such as a new leaf node encoded as 11 in fig. 2; if the original bloom tree is a full binary tree, a new bloom filter needs to be added above a root node at the moment, the size of the bloom filter is 2 times of that of an original root node, the new bloom filter is a new root node at the moment, the original bloom tree becomes a left sub-tree of the root node, a full binary tree which is one layer lower than the original bloom tree is created to serve as a right sub-tree of the new root node, the position of the original root node, which is 1, is completely shifted left by one position and inserted into the new root node, and at the moment, two groups of H of the root node are inserted into the new root node, and the position of the original root node, which is 1, is completely shifted left by one position3The hash function is more than the original two groups H3The hash function has one more layer, i.e. one more row is selected in the base function, and then it is possible to continue to add a new leaf node at the end of the tree, such as the new leaf node encoded as 100 in fig. 3.

Claims (3)

1. A method for key-value pair storage of a bloom filter tree structure, the bloom filter tree structure comprising complete d-trees, each node of each complete d-tree being a bloom filter; each leaf node of each full d-ary tree represents a value; the storage unit size of each node is half of that of a father node of the node, and a root node comprises d multiplied by k different hash functions, namely the root node comprises d hash groups, and each group comprises k hash functions; it is characterized by comprising:
inserting operation: when a key value pair (key, value) needs to be inserted, firstly checking whether the value is already inserted into the bloom filter, and if not, performing subsequent operation of adding a new value; if the value exists in the bloom filter tree, key value pair insertion is directly carried out, a leaf node corresponding to the value is searched according to the value, the leaf node position code is obtained, a unique path from a root node to the leaf node is determined, two groups of hash functions of the root node are calculated, namely k hash functions h are used for the keyi,1,hi,2,...,hi,kCalculate hi,1key,hi,2key,...,hi,kkey, wherein i represents the group number of the selected hash function, and i is 1 or 2; the hash value calculated by i-1 is stored in the array A, and the hash value calculated by i-2 is stored in the array B; then, according to the first bit code of the leaf node, selecting a group of values in A, B arrays at the root node for insertion, namely performing right zero shift operation on the A or B arrays, then obtaining a bloom filter which needs to be subjected to the next insertion operation according to the first bit code of the leaf node, and then selecting a group of values in A, B arrays for right one shift operation according to the second bit code of the leaf node to obtain an insertion position for insertion; continuing to perform the operation, and inserting keys into each layer of bloom filter until the keys are inserted into the leaf nodes corresponding to the value;
and (3) query operation: when a value corresponding to a key value key needs to be queried, firstly, two groups of hash functions h of a root node are calculatedi,1key,hi,2key,...,hi,kkey, i ═ 1 or 2Storing their values in A, B two sets of arrays respectively; respectively carrying out query operation on position units corresponding to the A, B two array values at the root node, and searching a first bit encoding value; according to the obtained coding value, switching to the next node for continuous query, at the moment, performing right shift operation on the A, B two arrays, and querying the corresponding position unit to obtain a coding value; continuing to perform the above operation until the leaf node is found, finally obtaining a complete encoding value, and performing decoding operation according to the leaf node encoding to obtain a corresponding value;
adding a new value operation: to add a new value to the bloom filter, the following two cases are distinguished: if the original bloom filter is an unfilled binary tree, when a new value needs to be added, directly adding a new leaf node at the tail of the bloom filter, wherein the leaf node represents the newly added value; if the original bloom filter is a full binary tree, a new bloom filter is added above the root node, the size of the newly added bloom filter is 2 times of the root node of the original bloom filter tree, at the moment, the new bloom filter becomes the root node of the new bloom filter tree, the original bloom filter becomes a left sub-tree of the new root node, a full binary tree which is one layer lower than the original bloom filter tree is created to serve as a right sub-tree of the new root node, the position of the upper 1 of the original root node is shifted to the left by one bit and is inserted into the new root node, and at the moment, two groups of H of the new root node are two, the H of the new root node is two groups of H, the H of the full binary tree is a full binary tree, the full binary tree3The hash function is compared with the original two groups H3The hash function has one more layer, namely, one more row is selected in the base function, and new leaf nodes are continuously added at the end of the bloom filter.
2. The method of claim 1, wherein in the inserting operation, the method for selecting the array to be inserted at the root node is as follows: if the code value is 0, the A array is selected, and if the code value is 1, the B array is selected.
3. The method according to claim 1, wherein in the inserting operation, the method for obtaining the next bloom filter to be inserted according to the first bit encoding of the leaf node is: if the encoded value is 0, the left node is operated, and if the encoded value is 1, the right node is operated.
CN201710542207.5A 2017-07-05 2017-07-05 Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method Active CN107330094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710542207.5A CN107330094B (en) 2017-07-05 2017-07-05 Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710542207.5A CN107330094B (en) 2017-07-05 2017-07-05 Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method

Publications (2)

Publication Number Publication Date
CN107330094A CN107330094A (en) 2017-11-07
CN107330094B true CN107330094B (en) 2020-06-16

Family

ID=60196387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710542207.5A Active CN107330094B (en) 2017-07-05 2017-07-05 Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method

Country Status (1)

Country Link
CN (1) CN107330094B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021678B (en) * 2017-12-07 2022-05-17 北京理工大学 Key value pair storage structure with compact structure and quick key value pair searching method
CN108717448B (en) * 2018-05-18 2022-02-25 南京大学 Key value pair storage-oriented range query filtering method and key value pair storage system
CN109508326B (en) * 2018-11-22 2020-03-17 北京百度网讯科技有限公司 Method, device and system for processing data
CN110674133B (en) * 2019-09-09 2022-05-24 北京航天自动控制研究所 Compression storage and calculation method for high-dimensional interpolation
CN110851848B (en) * 2019-11-12 2022-03-25 广西师范大学 Privacy protection method for symmetric searchable encryption
CN113794558B (en) * 2021-09-16 2024-02-27 烽火通信科技股份有限公司 L-tree calculation method, device and system in XMS algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105745642A (en) * 2014-03-31 2016-07-06 华为技术有限公司 Device and method for inquiring data
CN106708427A (en) * 2016-11-17 2017-05-24 华中科技大学 Storage method suitable for key value pair data
CN106777003A (en) * 2016-12-07 2017-05-31 安徽大学 A kind of search index method and system towards Key Value storage systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10133764B2 (en) * 2015-09-30 2018-11-20 Sandisk Technologies Llc Reduction of write amplification in object store

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105745642A (en) * 2014-03-31 2016-07-06 华为技术有限公司 Device and method for inquiring data
CN106708427A (en) * 2016-11-17 2017-05-24 华中科技大学 Storage method suitable for key value pair data
CN106777003A (en) * 2016-12-07 2017-05-31 安徽大学 A kind of search index method and system towards Key Value storage systems

Also Published As

Publication number Publication date
CN107330094A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
CN107330094B (en) Bloom filter tree structure for dynamically storing key value pairs and key value pair storage method
CN110413611B (en) Data storage and query method and device
Lawder et al. Querying multi-dimensional data indexed using the Hilbert space-filling curve
CN104866502B (en) Data matching method and device
CN101604337B (en) Apparatus and method for hash table storage, searching
CN1838124A (en) Method for rapidly positioning grid + T tree index in mass data memory database
CN109325032B (en) Index data storage and retrieval method, device and storage medium
CN111868710A (en) Random extraction forest index structure for searching large-scale unstructured data
CN111801665A (en) Hierarchical Locality Sensitive Hash (LSH) partition indexing for big data applications
CN106777003B (en) Key-Value storage system oriented index query method and system
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN111625534A (en) Data structure for hash operation and hash table storage and query method based on structure
CN106599091B (en) RDF graph structure storage and index method based on key value storage
CN108460123B (en) High-dimensional data retrieval method, computer device, and storage medium
CN103051543A (en) Route prefix processing, lookup, adding and deleting method
CN108549696B (en) Time series data similarity query method based on memory calculation
Hong et al. Efficient R-tree based indexing scheme for server-centric cloud storage system
CN102045412A (en) Method and equipment for carrying out compressed storage on internet protocol version (IPv)6 address prefix
CN112434085B (en) Roaring Bitmap-based user data statistical method
Feng et al. Real-time SLAM relocalization with online learning of binary feature indexing
CN110083603B (en) Method and system for realizing node path query based on adjacency list
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
CN114398373A (en) File data storage and reading method and device applied to database storage
CN107294855A (en) A kind of TCP under high-performance calculation network searches optimization method
CN110909027A (en) Hash retrieval method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant