CN108287840B - Data storage and query method based on matrix hash - Google Patents
Data storage and query method based on matrix hash Download PDFInfo
- Publication number
- CN108287840B CN108287840B CN201710014205.9A CN201710014205A CN108287840B CN 108287840 B CN108287840 B CN 108287840B CN 201710014205 A CN201710014205 A CN 201710014205A CN 108287840 B CN108287840 B CN 108287840B
- Authority
- CN
- China
- Prior art keywords
- sub
- key
- tables
- bit
- bloom filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a data storage and query method based on matrix hash. The method comprises the following steps: 1) establishing a hash table data structure comprising z sub-tables, z being an even number, each sub-tableThe size of the equal difference is decreased progressively; for theCombining the ith sub-table with the z-i +1 th sub-table to obtainSub-tables with equal size; 2) establishing an auxiliary data structure which comprises z bloom filters corresponding to the z sub-tables, wherein the size arithmetic of each bloom filter is decreased; for theCombining the ith bloom filter with the z-i +1 th bloom filter to obtainA bloom filter of equal size; then the product is mixed withAdding the corresponding bits of the bloom filters together to form 1 multi-bit bloom filter; 3) and inserting key value pairs by using the hash table data structure and the auxiliary data structure to realize data storage. The invention can realize quick update and quick query.
Description
Technical Field
The invention belongs to the technical field of memory databases, and particularly relates to a data organization, indexing and storage method based on a matrix hash algorithm.
Background
Compared with a disk database, the memory database has higher flexibility and usability, and can be divided into a relational memory database and a key-value memory database in a paradigm. Key Value based memory databases (Key Value Store) have the advantages of flexibility, compactness, memory saving, fast query, etc., and have unique advantages compared with relational memory databases, so that the Key Value based memory databases are widely applied to various large internet companies, such as amazon, Facebook, Youtube, hectogram, new wave, search fox, etc. Data of the key value storage system exists in a key value pair mode, and a hash table is used for storage, so that a hash algorithm is used as a core technology of the key value storage system and is a key factor directly influencing system performance and website efficiency.
The practical problem that exists at present is that with the rapid development of the internet, a large amount of data is accumulated by many internet companies, and because the number of key-value pairs is huge and the available memory space is limited, when a new key-value pair is inserted, the key-value pair conflicts more. Such a conflict may cause problems such as failed insertion of a new key value pair, failed update and lookup of an existing key value pair, and the like, which greatly affects the performance of the key value storage system, thereby causing great economic loss to an internet company using the key value storage system.
Meanwhile, the demands and requirements of clients on data operation are higher and higher, and the query results of data need to be obtained quickly, so that higher requirements are provided for the response capability of the internet company, and if the internet company cannot respond instantly, the user experience is greatly influenced.
The two problems are widely existed in internet companies applying key value storage systems, and the existing hash table design continuously tries a new idea to better solve the two key problems. First, to address the collision problem, existing hash table designs extensively reduce the collision probability through auxiliary data structures (such as bloom filters). A typical algorithm design for comparison is fast hash (fast hash table) (h.song, s.dharma purifier, j.turner, and j.lockwood.fast hash table accessible extended memory filter: an aid to network processing.acm sigcom Communication view,35(4): 181-192, 2005.), segment hash (segment hash) (s.kumar and p.crowe.g. segmented hash table for high performance network processing.in.acm ANCS, pages 91-103,2005.), and peacock hash (peer hash table) in (s.kumar, j.turn, p.arch and p.crowe.g. Communication network). For a new key-value pair that needs to be inserted, these hash designs all use a bloom filter to determine the hash table to be inserted. For conflicting key-value pairs, either pointers are used to hang on the linked list or discarded. These hash designs, while using multiple sub-tables to reduce collisions, still suffer from drawbacks such as lower loading rates. There is also a large reduction in collision rate.
Secondly, the query time problem, more typical hash designs are perfect hashes (z.j.czech, g.havas, and b.s.majewski.an optimal algorithm for generating a minimal work hash function. information Processing Letters,43(5): 257-264, 1992.), cucko hashes (b.fan, d.g. andersen, and m.kaminsky.memc. 3: Compact and current memca with a number of cells that are not identical, volume 13, pages 385-398, 2013), etc., however these hashes are very inefficient at more recent times and require a lot of hash computations and memory access. For example, cuckoo hash requires approximately 500 hash computations and memory accesses when updating a hash table, and even then, it is likely that the update fails. Thus for these hash table designs, if multiple updates fail, the entire hash table will have to be rebuilt. The reconstruction process will require a significant amount of time, which is unacceptable for real-world applications.
Disclosure of Invention
In order to solve the problems of hash table conflict and query time and overcome the defects of high conflict rate, low memory use efficiency, low loading rate and the like of the conventional hash table, the invention provides a novel hash table design scheme, namely 'matrix hash', which combines multi-sub-table hash, a bloom filter and a bitmap.
The technical scheme adopted by the invention is as follows:
a data storage method based on matrix hash is characterized by comprising the following steps:
1) establishing a hash table data structure which comprises z sub-tables, wherein z is an even number, and the size equal difference of each sub-table is decreased progressively; for theCombining the ith sub-table with the z-i +1 th sub-table to obtainSub-tables with equal size;
2) establishing an assistance data structure including the samez bloom filters corresponding to the z sub-tables, wherein the size equal difference of each bloom filter is decreased progressively; for theCombining the ith bloom filter with the z-i +1 th bloom filter to obtainA bloom filter of equal size; then the product is mixed withAdding the corresponding bits of the bloom filters together to form 1 multi-bit bloom filter;
3) and inserting key value pairs by using the hash table data structure and the auxiliary data structure to realize data storage.
Further, each time a new key-value pair is inserted, it is inserted into the sub-table with the smallest load rate.
Further, a linked list is hung on the last sub-list, namely the z-th sub-list, and if an empty bucket cannot be found in the key-value pair to be inserted, the key-value pair is hung on the linked list by using a pointer.
Furthermore, each sub-table has a bitmap corresponding to each bit in the bitmap corresponding to a bucket in the sub-table corresponding to each bit in the bitmap; the bit in the bitmap corresponding to the empty bucket is 0, and the bit in the bitmap corresponding to the non-empty bucket is 1.
Further, an additional bloom filter F is addedhalfWhich is responsible for recording the second part of the sub-table, i.e.To reduce the number of sub-tables queried.
Further, the key-value pairs are inserted as follows:
a) for a given key-value pair, checking whether z candidate buckets are empty through a bitmap, and then inserting the key-value pair into a sub-table with the lowest loading rate to balance the loading rates of all the sub-tables; suppose the sub-table index to be inserted is i, ifUpdate the bloom filter FiTo indicate that key x is in sub-table TiUpdating the corresponding bitmap; if it is notUpdate the bloom filter Fz-i+1To indicate x is in the sub-table TiAnd update FhalfAnd a corresponding bitmap;
b) if the bitmap shows that z buckets into which key x should be inserted are all full, then the mechanism of kicking is used to effect the insertion of the key-value pair.
Further, the key-value pair query mode is as follows: when querying x, first in the multi-bit bloom filter FmAnd FhalfQuery x ifReturn true, and FhalfReturning false, the sub-table T is checkedi(ii) a Otherwise, check the sub-table T firstz-i+1If there is no match, the sub-table T is checked againi(ii) a If x cannot be found in the z sub-tables, searching a linked list of the last sub-table; if still not found, x is not in the hash table.
Further, the key-value pair deletion mode is as follows: when x is deleted, the bucket where x is located is found according to the query operation, then the key-value pair is removed from the bucket, and the corresponding bit of the bucket where x is located in the bitmap is reset.
The invention has the beneficial effects that: 1) high loading + few pointers: a large number of key-value pairs are stored with a small memory space and the number of pointers used is small. 2) And (4) low collision rate. 3) And (3) quick updating: the hash table can be updated with few memory accesses. 4) The zero update fails. 5) Quick query: the key value pairs can be quickly searched by using few memory accesses, or for the non-existing key value pairs, the non-existing results can be quickly returned. 6) The practicability is as follows: it is easy to implement in a hardware system.
Drawings
Fig. 1 is an algorithm diagram of matrix hashing.
Fig. 2 is a structural diagram of a multi-bit bloom filter.
Detailed Description
The invention is further illustrated by the following specific examples and the accompanying drawings.
Data structure
The data structure of the matrix hash of the invention comprehensively uses a multi-level sub-table, a bloom filter and a bitmap. The data structure is composed of a hash table data structure and an auxiliary data structure.
1. Hash table data structure
The size of each sub-table, i.e. the maximum number of elements that can be stored, is decreasing with arithmetic, and therefore the bloom filters corresponding to the sub-tables are also decreasing with arithmetic. A simpler equalization strategy is used when inserting elements: whenever a new key-value pair is inserted, it is inserted into the sub-table with the smallest loading rate, so that it can be ensured that the number of elements in each sub-table is also in the form of decreasing arithmetic progression.
Assume a total of z sub-tables, where z is an even number. For theMatrix hash combines the ith sub-table and the z-i +1 th sub-table to finally obtainSub-tables of equal size. Since the combined sub-table shape is similar to a matrix, we name this algorithm as matrix hashing. To avoid insertion failures, the last child table is allowed to be linked. If the key-value pair to be inserted cannot find an empty bucket finally, a linked list can be hung on the z-th sub-table. Because the z-th sub-table is the smallest sub-table, the pointers occupy the smallest memory.
Fig. 1 is an algorithmic schematic of matrix hashing, where the left side is 6 sub-tables and 6 bloom filters with decreasing sizes in equal difference, and the middle is 3 sub-tables and 3 bloom filters with equal sizes after combination. The upper right side is a multi-bit bloom filter formed by combining three standard bloom filters BF1, BF2 and BF 3.
2. Auxiliary data structure
Similar to hash table binding, forMatrix hash combines the ith bloom filter and the z-i +1 bloom filter to finally obtainA standard bloom filter of equal size. Then, by applying thisThe corresponding bits of each bloom filter are added together to form 1 bloom filter. In this bloom filter, each box is composed ofAnd a bit. I call this bloom filter a multi-bit bloom filter and use FmAnd (4) showing. To this end, we have combined the original z equal-difference bloom filters into 1 multi-bit bloom filters.
FIG. 2 is a block diagram of a multi-bit bloom filter. As shown in the figure, the three bits in a bin come from three standard bloom filters of equal size, i.e., F1, F2, F3, respectively. It should be noted that the combination of bloom filters is performed in on-chip memory, physically, and the combination of sub-tables is conceptual only. The algorithm implementation of the multi-bit bloom filter is as follows:
suppose that F1, F2, and F3 all have m bits. For F1, the most significant bit is taken first (m bits of F1 are compared with 2)m-1Do a logical and operation) and then shift the resulting result 2 x m bits to the left (multiply the resulting m bits by 2)2m) Then take the next highest order bit (compare m bits of F1 with 2)m-2Do a logical and operation), shift the result 2 x (m-1) bits to the left (multiply the resulting m bits by 2)2(m-1)) The result is accumulated with the value after the highest bit operation,by analogy, each bit is similarly accumulated until the last bit, assuming that the resulting accumulated value is f 1. The same operation is performed on F2 and F3 respectively, and the accumulated values F2 and F3 are obtained. The result of shifting f1 and f2 to the right by one bit and the result of shifting f3 to the right by two bits are logically or-operated to obtain the multi-bit bloom filter (namely, the multi-bit bloom filter is obtained)。
One problem arises due to the above bloom filter design: when a bloom filter returns true, we need to query the corresponding two child tables. E.g. if x is in the ith bloom filter, then it needs to be in the sub-table TiOr Tz-i+1To query. To reduce the number of sub-tables in the query, an additional bloom filter, called F, is addedhalfResponsible for recording the second part of the sub-table, i.e.
Matrix hashing also uses bitmaps within a slice, with one bitmap corresponding to each sub-table, and each bit in the bitmap corresponding to a bucket in its corresponding sub-table. The bit in the bitmap corresponding to the empty bucket is 0, and the non-empty bucket is 1.
Matrix hash false positive rate derivation
The matrix haxi has two bloom filters: fmAnd Fhalf. Assuming n is the number of key-value pairs, z sub-tables are recomposedAnd (4) a sub-table. Suppose FmThere are m boxes, each box hasA bit, thisEach bit corresponds toAnd (4) a sub-table. Suppose FmThere are k sub-tables of which there are,Fmand the individual components thereofThe bloom filters are equal. Thus FmThe false positive rate of (D) is as defined in F (F)m) Expressed, the formula is as follows:
if the number of bloom filters returning true is u +1, the false positive rate formula is:
f(Fm,u)=0.5k*u*(1-0.5k(z-u-1))
Fhalfthere are also k hash functions, FhalfThe false positive rate is: f (F)half)=0.5k. If only FmReturning true, and the key-value pair only exists in one sub-table, there is no false alarm, and the probability of this event occurrence is (1-F (F)m))*(1-f(Fhalf)). If only FmReturning true and reporting key-value pairs in u +1 sub-tables, there will be u false positives, the probability of this event is F (F)m,u)*(1-f(Fhalf)). If only FhalfReporting a false alarm with a probability of (1-F (F)m))*f(Fhalf). If FmThere are u false positives, and FhalfWith a false alarm, the probability of this event occurring is F (F)m,u)*f(Fhalf)。
For example: when z is 8 and k is 16, the false positive rate of the matrix hash is 1- (1-F (F)m))*(1-f(Fhalf))≈6.1*10-5This number is very small.
Inserting, inquiring and deleting mode of key value pair
In a key value storage system, the specific operation implementation modes of the matrix hash algorithm for inserting, inquiring and deleting key value pairs are as follows:
1. insertion mode of key value pair
For a given key-value pair, key x is inserted. First it is checked by means of the bitmap whether the z candidate buckets are empty. The key-value pairs are then inserted into the sub-table with the lowest load rate to balance the load rates of all sub-tables. Assume that the sub-table index to be inserted is i. If it is notThen F is updatediTo indicate x is in the sub-table TiIn, updating the corresponding bitmap; if sub-table indexNeed to mix Fz-i+1Updated to indicate x is in sub-table TiAnd update FhalfAnd a corresponding bitmap. During insertion, in order to put a box in contact with FiThe corresponding bit is set to 1, and the data in this bin is storedOne bit and 2i-1It is only necessary to do logic or operation.
If the bitmap shows that z buckets into which x should be inserted are all full, then the mechanism of kicking in cuckoo hash (cuckoo hash) is used and the bitmap is used to decide which key-value pair to kick. The bitmap is used to sequentially check the z candidate buckets for x to determine if there is an empty bucket in the remaining z-1 sub-tables that can insert y, which is an element originally in the candidate bucket, such as y. If so, kick y out, insert x, and insert y into the new location. If such a y cannot be found, a blind kick is performed and the above insertion procedure is repeated. And limiting the number of blind kicks to theta, and if the number of blind kicks exceeds theta, hanging the key value pair on the linked list of the last sub-table. By varying the value of θ, RHT4 may be a tradeoff between load rate and insertion speed. Because the bitmap has a global record of empty and non-empty buckets in the sub-table, the use of on-chip bitmaps significantly reduces the number of kicks.
2. Key value pair query mode
Such as query x, first at FmAnd FhalfQuery x ifReturn true, and FhalfReturning false, the sub-table T is checkedi. Otherwise, check the sub-table T firstz-i+1If there is no match, the sub-table T is checked againi. And if x cannot be found in the z sub-tables, searching the linked list of the last sub-table. If it is still not found, x is not in the hash table.
It should be noted that, during the query process, only k hash functions need to be calculated, and z × k hash functions do not need to be calculated, because: elements forming a bloom filterThe parameters of the individual bloom filters are identical. If reading a box and FiCorresponding bit, will thisOne bit and 2i-1And performing logic AND operation. If the result is 0, the bit in the bin corresponding to Fi is 0, otherwise it is 1.
3. Key value pair deleting mode
If x is deleted, RHT1 first finds the bucket where x is located according to the above query operation, then removes the key-value pair from the bucket, and resets the bit corresponding to the bucket where x is located in the bitmap.
Fourth, experimental data
To better evaluate the matrix hash of the present invention and the existing hash design, we adopted the data of the actual application. We obtained 12 Forwarding Information Bases (FIBs) of the website www.ripe.net at 8 am on 2014.07.08 days, and for each FIB, a traffic trace (traffic trace) was generated uniformly and manually for each prefix (prefix). We use the part of the FIB that is relevant to us, namely the prefix (prefix) and the relevant next hop. Prefix (prefix) as a key and next hop as a value. We use β to denote the ratio of the total number of buckets and the total number of elements of the hash table. Wherein beta is more than or equal to 1.05 and less than or equal to 10. We denote the threshold for blind kick operation by θ. The number of key value pairs in the FIB was 500,000. The difference in size of the 8 sub-tables created was 5000, and the total size of the sub-tables was β n. Let θ be 0, which means blind kicks are not allowed, but only kicks with bitmaps. And (3) inserting the key value pair every time, wherein the maximum value of copy times is 8+1, and if the element does not find an empty candidate position in 8 sub-tables, the element needs to be inserted into a linked list of the last sub-table. The conflict rate is the ratio of the number on the last sub-table linked list to the total number of elements. The bloom filter has 16 hash functions. The experimental results are as follows:
1. the matrix hash experiment shows that:
1) loading rate and collision rate
Experimental settings β ═ 1.05 and θ ═ 0, the experimental results show that matrix hashing achieves very high loading rates with only 1.05 × n of memory, where the loading rates of the 8 sub-tables are well balanced and the total loading rate is 95.19%. The collision rate was about 0.05%, and only a few FIB collision rates exceeded 0.06%.
2) Insertion and query time
The experiment sets β to 1.05 and θ to 0, and the experiment inserts all elements of each FIB into the matrix hash, and the experimental results show that the more elements are inserted, the more memory accesses are required. Most elements, requiring less than 6 memory accesses per insert, have a query memory access count between 1 and 1.0019, with an average of 1.00059.
3) Bitmap kicking and blind kicking
The experiment sets beta to 1.05, and the experiment result shows that when theta is 5, the memory times of inserting a key-value pair is 8 times (5+1) +1 times 49 times at most, the memory access times of inquiring a key-value pair is lower than 8 times, and then the linked list of the last sublist has no elements. When θ is 0, there are only a few elements (0.56%) on the linked list of the last sub-table, although blind kicks are not allowed. The worst case of memory access at the time of insertion is 8+1 times.
4) Collision rate vs beta
The experimental setting θ is 0, and the experimental result shows that the larger β is, the smaller the collision rate is, and when β ≧ 1.18, the collision rate approaches 0.
2. Matrix hash is compared to other hashes:
experiments compare matrix hashing with six well-known hash designs, namely chain hashing, linear detection, double hashing, cuckoo hashing, d-left hashing and peacock bird hashing. First, an insertion failure is defined, and for linear probes, double-hash and cuckoo hash, when a collision occurs, another bucket is probed and this probing is repeated all the time. The repeated detection times are limited to 500, which means that the maximum value of the memory access times of each insertion is 500 for the three hash designs, if the collision still exists more than 500 times, the three hash designs abandon the continuous insertion, and the elements which are not inserted at the 500 th loop are discarded, thereby causing the insertion failure. For the peacock-bird hash and the matrix hash, the bloom filter has 16 hash functions.
Experiment one: (β 1.05, different FIB)
1) Loading rate
The experimental results show that: the loading rate of matrix hash is always the highest.
2) Insertion time
The experimental results show that: matrix hashing, the number of copies required for insertion is minimal in all but chain hashing. This is because the chain hash requires only one or two accesses during insertion, so the access time is short, but the chain hash has significant disadvantages in other aspects. And the matrix hash can achieve fast insertion due to the existence of the bloom filter and the bitmap.
3) Finding time
The experimental results show that: the matrix hah has the shortest search time because the matrix hah has this higher loading rate and a smaller false positive rate.
Experiment two (different beta, FIB rrc00)
1) Loading rate
The experimental results show that: the loading rate of matrix hash is always the highest, the difference between the loading rates of chain hash and double hash is not large, and the high loading rate is achieved only when beta is higher in peacock-bird hash.
2) Insertion time
The experimental results show that: matrix hashing, the number of copies required for insertion is minimal in all but chain hashing.
3) Finding time
The experimental results show that: the matrix haxi has the shortest search time.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.
Claims (3)
1. A data storage method based on matrix hash is characterized by comprising the following steps:
1) establishing a hash table data structure which comprises z sub-tables, wherein z is an even number, and the size equal difference of each sub-table is decreased progressively; for theCombining the ith sub-table with the z-i +1 th sub-table to obtainSub-tables with equal size; each sub-table in the z sub-tables corresponds to a bitmap, and each bit in the bitmap corresponds to a bucket in the corresponding sub-table; the bit in the bitmap corresponding to the empty bucket is 0, and the bit in the bitmap corresponding to the non-empty bucket is 1;
2) establishing an auxiliary data structure which comprises z bloom filters corresponding to the z sub-tables, wherein the size arithmetic of each bloom filter is decreased; for theCombining the ith bloom filter with the z-i +1 th bloom filter to obtainA bloom filter of equal size; then the product is mixed withCorresponding bits of the plural bloom filters are added together to form 1 plural-bit bloom filter Fm(ii) a Adding an additional bloom filter FhalfResponsible for recording the second in the z sub-tablesIndividual watchTo the z th sub-table TzI.e. byTo reduce the number of sub-tables queried;
the multi-bit bloom filter FmThe implementation mode of (2) is as follows:
assume that the standard bloom filters F1, F2, F3 all have m bits; for F1, firstly taking the most significant bit, then moving the obtained result to the left by 2 x m bits, then taking the next most significant bit, moving the obtained result to the left by 2 x (m-1) bits, accumulating the obtained result and the value after the operation of the most significant bit, and so on, performing similar operation on each bit, and accumulating until the last bit, and assuming that the obtained accumulated value is F1; respectively carrying out the same operation on F2 and F3 to obtain accumulated values F2 and F3; performing logic or operation on the result obtained after the f1 and the f2 are shifted to the right by one bit and the result obtained after the f3 is shifted to the right by two bits to obtain a multi-bit bloom filter;
3) inserting key value pairs by using the hash table data structure and the auxiliary data structure to realize data storage;
wherein, step 3) includes:
inserting a new key value pair into the sub-table with the minimum loading rate in the z sub-tables every time the key value pair is inserted;
hanging a linked list on the last sub-table, namely the z-th sub-table, in the z sub-tables;
the key-value pairs are inserted as follows:
a) for a given key-value pair, checking whether z candidate buckets are empty through a bitmap, and then inserting the key-value pair into a sub-table with the lowest loading rate to balance the loading rates of all the sub-tables; suppose the sub-table index to be inserted is i, if Updating the ith bloom filter F of the z bloom filtersiTo indicate the i-th sub-table T of the z sub-tables of the key-value pairiUpdating the corresponding bitmap; if it is notThen the z-i +1 th of the z bloom filters F is updatedz-i+1To indicate the i-th sub-table T of the z sub-tables of the key-value pairiAnd update FhalfAnd a corresponding bitmap;
b) if the bitmap shows that z buckets into which the key x should be inserted are full, the insertion of the key-value pair is realized by using a kick mechanism;
wherein, the implementation mode of the step b) is as follows: using the bitmap to sequentially check z candidate buckets corresponding to x to determine whether an original element y in the candidate buckets has an empty bucket in the remaining z-1 subtables into which y can be inserted; if yes, kicking out y, inserting x, and inserting y into a new position; and if y cannot be found, executing blind kicking, repeating the above insertion process, limiting the number of times of blind kicking to theta, and if the number of times of blind kicking exceeds theta, hanging the key value pair on the linked list of the last sub-table.
2. The method of claim 1, in which key-value pairsThe query mode is as follows: when looking up the key x, first in the multi-bit bloom filter FmAnd FhalfMiddle query key x, forIf Fi returns true, and FhalfReturning false, the ith sub-table T in the z sub-tables is checkedi(ii) a Otherwise, checking the z-i +1 th sub-table T in the z sub-tablesz-i+1If there is no match, then check the ith sub-table T of the z sub-tablesi(ii) a If the key x cannot be found in the z sub-tables, searching a linked list of the last sub-table; if it is still not found, it indicates that key x is not in the hash table.
3. The method of claim 1, wherein key-value pairs are deleted by: when deleting the key x, firstly finding the bucket where the key x is located according to the query operation, then removing the key value pair from the bucket, and resetting the corresponding bit of the bucket where the key x is located in the bitmap.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710014205.9A CN108287840B (en) | 2017-01-09 | 2017-01-09 | Data storage and query method based on matrix hash |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710014205.9A CN108287840B (en) | 2017-01-09 | 2017-01-09 | Data storage and query method based on matrix hash |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108287840A CN108287840A (en) | 2018-07-17 |
CN108287840B true CN108287840B (en) | 2022-05-03 |
Family
ID=62819334
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710014205.9A Active CN108287840B (en) | 2017-01-09 | 2017-01-09 | Data storage and query method based on matrix hash |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108287840B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108989452A (en) * | 2018-08-07 | 2018-12-11 | 佛山市苔藓云链科技有限公司 | A kind of data transmission of internet of things device |
CN109471635B (en) * | 2018-09-03 | 2021-09-17 | 中新网络信息安全股份有限公司 | Algorithm optimization method based on Java Set implementation |
CN109597807A (en) * | 2018-10-25 | 2019-04-09 | 阿里巴巴集团控股有限公司 | Number storehouse list processing method and apparatus |
CN109766341B (en) * | 2018-12-27 | 2022-04-22 | 厦门市美亚柏科信息股份有限公司 | Method, device and storage medium for establishing Hash mapping |
CN109800228B (en) * | 2018-12-28 | 2023-03-10 | 深圳竹云科技有限公司 | Method for efficiently and quickly solving hash conflict |
CN111563199B (en) * | 2020-04-26 | 2023-10-10 | 北京奇艺世纪科技有限公司 | Data processing method and device |
CN111552692B (en) * | 2020-04-30 | 2023-04-07 | 南方科技大学 | Plus-minus cuckoo filter |
CN112416933B (en) * | 2020-11-19 | 2022-09-23 | 重庆邮电大学 | High-performance hash table implementation method based on-chip and off-chip memories |
CN112699323A (en) * | 2021-01-07 | 2021-04-23 | 西藏宁算科技集团有限公司 | Cloud caching system and cloud caching method based on double bloom filters |
CN113342828A (en) * | 2021-07-02 | 2021-09-03 | 广东唯审信息科技有限公司 | Hash table conflict resolution method based on d-dimensional mapping |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104317795A (en) * | 2014-08-28 | 2015-01-28 | 华为技术有限公司 | Two-dimensional filter generation method, query method and device |
CN105027527A (en) * | 2012-12-31 | 2015-11-04 | 华为技术有限公司 | Scalable storage systems with longest prefix matching switches |
CN105468298A (en) * | 2015-11-19 | 2016-04-06 | 中国科学院信息工程研究所 | Key value storage method based on log-structured merged tree |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10810200B2 (en) * | 2015-01-07 | 2020-10-20 | International Business Machines Corporation | Technology for join processing |
-
2017
- 2017-01-09 CN CN201710014205.9A patent/CN108287840B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105027527A (en) * | 2012-12-31 | 2015-11-04 | 华为技术有限公司 | Scalable storage systems with longest prefix matching switches |
CN104317795A (en) * | 2014-08-28 | 2015-01-28 | 华为技术有限公司 | Two-dimensional filter generation method, query method and device |
CN105468298A (en) * | 2015-11-19 | 2016-04-06 | 中国科学院信息工程研究所 | Key value storage method based on log-structured merged tree |
Also Published As
Publication number | Publication date |
---|---|
CN108287840A (en) | 2018-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287840B (en) | Data storage and query method based on matrix hash | |
US9495398B2 (en) | Index for hybrid database | |
US8055681B2 (en) | Data storage method and data storage structure | |
CN112000846B (en) | Method for grouping LSM tree indexes based on GPU | |
WO2020057272A1 (en) | Index data storage and retrieval methods and apparatuses, and storage medium | |
CN104077423A (en) | Consistent hash based structural data storage, inquiry and migration method | |
US8352470B2 (en) | Adaptive aggregation: improving the performance of grouping and duplicate elimination by avoiding unnecessary disk access | |
CN106599091B (en) | RDF graph structure storage and index method based on key value storage | |
WO2021051782A1 (en) | Consensus method, apparatus and device of block chain | |
CN116450656B (en) | Data processing method, device, equipment and storage medium | |
CN115718819A (en) | Index construction method, data reading method and index construction device | |
CN109800228B (en) | Method for efficiently and quickly solving hash conflict | |
Khan et al. | Set-based unified approach for attributed graph summarization | |
US11782895B2 (en) | Cuckoo hashing including accessing hash tables using affinity table | |
CN113867627A (en) | Method and system for optimizing performance of storage system | |
Gong et al. | Abc: a practicable sketch framework for non-uniform multisets | |
CN116521956A (en) | Graph database query method and device, electronic equipment and storage medium | |
Patgiri et al. | Shed more light on bloom filter's variants | |
US20210248142A1 (en) | Dual filter histogram optimization | |
CN115114294A (en) | Self-adaption method and device of database storage mode and computer equipment | |
CN113220214A (en) | Multi-node storage system and data deduplication method thereof | |
US20130290378A1 (en) | Adaptive probabilistic indexing with skip lists | |
CN111949439B (en) | Database-based data file updating method and device | |
Sasaniyan Asl et al. | A Cuckoo Filter Modification Inspired by Bloom Filter | |
CN113342828A (en) | Hash table conflict resolution method based on d-dimensional mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |