CN108287840A - A kind of data storage and query method based on matrix Hash - Google Patents
A kind of data storage and query method based on matrix Hash Download PDFInfo
- Publication number
- CN108287840A CN108287840A CN201710014205.9A CN201710014205A CN108287840A CN 108287840 A CN108287840 A CN 108287840A CN 201710014205 A CN201710014205 A CN 201710014205A CN 108287840 A CN108287840 A CN 108287840A
- Authority
- CN
- China
- Prior art keywords
- sublist
- key
- bloom filter
- hash
- value pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of data storage and query methods based on matrix Hash.This method includes:1) hash table data structure is established, it includes z sublist, z is even number, and the size equal difference of each sublist is successively decreased;ForI-th of sublist and z i+1 sublists are combined, obtainedA equal-sized sublist;2) secondary data structure is established, it includes z Bloom filter corresponding with the z sublist, the size equal difference of each Bloom filter is successively decreased;ForI-th of Bloom filter and z i+1 Bloom filters are combined, obtainedA equal-sized Bloom filter;Then shouldThe corresponding bit of a Bloom filter adds together, forms bit Bloom filter more than 1;3) it is inserted into key-value pair using the hash table data structure and the secondary data structure, realizes data storage.The present invention can realize quickly update and quick search.
Description
Technical field
The invention belongs to memory database technical field, more particularly to a kind of data organization based on matrix hash algorithm,
Index, storage method.
Background technology
Memory database has higher flexibility and ease for use compared to disk database, and memory database is from pattern
Associative memory database and key assignments type memory database can be divided into.Memory database (Key Value based on key assignments
Store) have many advantages, such as flexible succinct, saving memory, quick search, have uniqueness compared to the memory database based on relationship type
Advantage, thus be widely used in major Internet company, for example, Amazon, Facebook, Youtube, Baidu, Sina,
Sohu etc..The data of key assignments storage system are the presence in the form of key-value pair, and are stored with Hash table, therefore Hash
Core technology of the algorithm as key assignments storage system is the key factor for directly affecting system performance and website efficiency.
Presently, there are practical problem be that, with internet fast development, many Internet companies all have accumulated largely
Data, due to the enormous amount of key-value pair, and available memory headroom is limited, therefore when one new key of insertion
It is worth clock synchronization, the conflict of key-value pair can be relatively more.Such conflict can lead to the insertion failure of new key-value pair, have key-value pair more
New the problems such as searching failure, leverage the performance of key assignments storage system, thus the internet to using key assignments storage system
Company causes larger economic loss.
Meanwhile demand of the client to data manipulation and requiring higher and higher, the query result for quickly obtaining data is needed, because
And high requirement is proposed to the responding ability of Internet company, if Internet company cannot accomplish summary responses, it will significantly
Influence user experience.
Two above problem is widely present in major Internet company using key assignments storage system, and existing Hash table is set
Meter also continuously attempts to new thinking preferably to solve the two critical issues.First against collision problem, existing Hash table
Design reduces collision probability by the data structure of auxiliary (such as Bloom filter) extensively.It is fast than the design of more typical algorithm
Fast Hash (fast hash table) (H.Song, S.Dharmapurikar, J.Turner, and J.Lockwood.Fast
hash table lookup using extended bloom filter:an aid to network
processing.ACM SIGCOMM Computer Communication Review,35(4):181-192,2005.), divide
Duan Haxi (segment hash) (S.Kumar and P.Crowley.Segmented hash:an efficient hash
table implementation for high performance networking subsystems.In Proc.ACM
ANCS, pages 91-103,2005.), peacock Hash (peacock hash) (S.Kumar, J.Turner, and
P.Crowley.Peacock hashing:Deterministic and updatable hashing for high
performance networking.In Proc.IEEE INFOCOM,2008.).The new key assignments being inserted into is needed for one
Right, the design of these Hash all determines the Hash table being inserted into using Bloom filter.Key-value pair for conflict or use
Pointer hangs on chained list or abandons.Although the design of these Hash reduces conflict using more sublists, there are still lack
Point, such as lower charging ratio.Collision rate also also has larger reduction space.
Followed by query time problem is designed with perfect Hash (Z.J.Czech, G.Havas, and than more typical Hash
B.S.Majewski.An optimal algorithm for generating minimal perfect hash
functions.Information Processing Letters,43(5):257-264,1992.), cuckoo Hash
(B.Fan,D.G.Andersen,and M.Kaminsky.Memc3:Compact and concurrent memcache with
Dumber caching and smarter hashing.In NSDI, volume 13, pages 385-398,2013.) etc.,
However the shortcomings that these Hash be update when it is very inefficient, need a large amount of Hash calculation and internal storage access.For example, cuckoo
Hash needs nearly 500 Hash calculations and internal storage access when updating Hash table, even so, is likely to update failure.
Therefore these Hash tables are designed, if repeatedly update failure, will have to rebuild entire Hash table.Reconstruction process will
The plenty of time is needed, the application system for reality is unacceptable.
Invention content
To solve the problems, such as Hash table conflict and query time, high collision rate, low memory existing for existing Hash table is overcome to make
The defects of with efficiency, low charging ratio, the present invention provide it is a kind of more sublist Hash, Bloom filter, bitmap are combined it is new
Hash table design scheme-" matrix Hash ".
The technical solution adopted by the present invention is as follows:
A kind of date storage method based on matrix Hash, which is characterized in that include the following steps:
1) hash table data structure is established, it includes z sublist, z is even number, and the size equal difference of each sublist is successively decreased;ForI-th of sublist and the z-i+1 sublist are combined, obtainedA equal-sized sublist;
2) secondary data structure is established, it includes z Bloom filter corresponding with the z sublist, each grand filtering of cloth
The size equal difference of device is successively decreased;ForI-th of Bloom filter and the z-i+1 Bloom filter are combined, obtainedA equal-sized Bloom filter;Then shouldThe corresponding bit of a Bloom filter adds together, and formation is compared more 1
Special Bloom filter;
3) it is inserted into key-value pair using the hash table data structure and the secondary data structure, realizes data storage.
Further, it whenever being inserted into a new key-value pair, inserts it into the sublist of charging ratio minimum.
Further, the chaining table on the last character table i.e. z-th of sublist, if the key-value pair being inserted into can not find one
A empty barrel, then hung over using pointer on chained list.
Further, bitmap is corresponding there are one each sublists, in the corresponding sublist of each bit in bitmap
A bucket it is corresponding;It is 0 that empty barrel, which corresponds to the bit in bitmap, and it is 1 that non-empty barrel, which corresponds to the bit in bitmap,.
Further, increase an additional Bloom filter Fhalf, it is responsible for recording the second part of sublist, i.e.,To reduce the sublist number of inquiry.
Further, the inserted mode of key-value pair is as follows:
A) for a given key-value pair, check whether z candidate bucket is empty by bitmap first, then by key-value pair
It is inserted into the minimum sublist of charging ratio, to balance all sublist charging ratios;Assuming that the sublist to be inserted into index is i, ifThen update Bloom filter FiTo indicate key x in sublist TiIn, and update corresponding bitmap;If
Then update Bloom filter Fz-i+1To indicate x in sublist TiIn, and update FhalfWith corresponding bitmap;
If b) bitmap shows that the z bucket that key x should be inserted into has been expired, inserting for key-value pair is realized using the mechanism kicked
Enter.
Further, the inquiry mode of key-value pair is:When inquiring x, first in more bit Bloom filter FmAnd FhalfIn
X is inquired, ifReturn to true, and FhalfFalse is returned, then checks sublist Ti;Otherwise, sublist is first checked
Tz-i+1If do not matched, sublist T is reexaminedi;If can not all find x in z sublist, last height is searched
The chained list of table;If still can not find, illustrate x not in Hash table.
Further, the deletion mode of key-value pair is:When deleting x, the bucket where x is found according to inquiry operation first,
Then key-value pair is removed from bucket, the corresponding bit of bucket where x in set figure of laying equal stress on.
The beneficial effects of the invention are as follows:1) high charging ratio+less pointer:A large amount of key assignments is stored with smaller memory headroom
It is right, and the pointer number used is seldom.2) low collision rate.3) quickly update:Utilize the i.e. renewable Hash of seldom internal storage access
Table.4) zero update failure.5) quick search:Key-value pair can be quickly found with seldom internal storage access, or for not depositing
Key-value pair, can quickly return to the result being not present.6) practicability:It is easy to realize in hardware system.
Description of the drawings
Fig. 1 is the algorithm schematic diagram of matrix Hash.
Fig. 2 is the structure chart of more bit Bloom filters.
Specific implementation mode
Below by specific embodiments and the drawings, the present invention will be further described.
One, data structures
The data structure synthesis of " the matrix Hash " of the present invention has used multistage sublist, Bloom filter and bitmap.Data
Structure is made of hash table data structure and secondary data structure two parts.
1. hash table data structure
The size of each sublist, the greatest member number that can be stored are that equal difference is successively decreased, therefore corresponding with sublist
Bloom filter is also what equal difference was successively decreased.A fairly simple balance policy has been used when being inserted into element:Whenever insertion one
It when a new key-value pair, is inserted into the sublist of charging ratio minimum, so it is ensured that element number in each sublist
And similar arithmetic series successively decreases existing for form.
Assuming that a shared z sublist, z is even number.ForMatrix Hash is a by i-th of sublist and z-i+1
Sublist combines, and finally obtainsA equal-sized sublist.Because the sublist shape in conjunction with after is similar with matrix, therefore we
This algorithm is named as matrix Hash.Fail in order to avoid being inserted into, allows the last one sublist chaining table.If being inserted into key
Value to finally can not find an empty barrel, then can in z-th of sublist chaining table.Because z-th of sublist is minimum sublist, because
This pointer will occupy minimum memory.
The algorithm schematic diagram of Fig. 1 matrix Hash, wherein the left side is that size is in 6 sublists and 6 grand mistakes of cloth that equal difference is successively decreased
Filter, centre are equal-sized 3 sublists and 3 Bloom filters after combining.Upper right side is three grand filterings of standard cloth
Device BF1, more bit Bloom filters that BF2, BF3 are combined into.
2. secondary data structure
With Hash table combine it is similar, forMatrix Hash is grand by i-th of Bloom filter and the z-i+1 cloth
Filter combines, and finally obtainsA equal-sized standard Bloom filter.Then, by by thisA Bloom filter pair
It answers bit to add together, forms 1 Bloom filter.In this Bloom filter, each case byA bit composition.
This Bloom filter is referred to as more bit Bloom filters by I, and F is used in combinationmIt indicates.This is arrived, we are former z equal difference
Bloom filter be combined into bit Bloom filter more than 1.
The structure chart of the more bit Bloom filters of Fig. 2.As shown in the drawing, three bits in a case are respectively from three
Equal-sized standard Bloom filter, that is, F1, F2, F3.It is worth noting that, the combination of Bloom filter is the memory in piece
It carries out, is that physically, and the combination of sublist is only notional.The algorithm realization method of more bit Bloom filters is as follows
It is shown:
Assuming that F1, F2, F3 have m bit.For F1, first take most significant bit (by the m bit of F1 and 2m-1It patrols
Collect and operate), then by acquired results, to moving to left 2*m, (m bit of gained is multiplied by 22m), then take time high order bit (by F1
M bit and 2m-2Do logical AND operation), by acquired results, to the position 2* (m-1) is moved to left, (m bit of gained is multiplied by 22(m-1)),
Acquired results and the value after highest bit operating are added up, and so on, each bit does similar operations, adds up, to the last
One bit, it is assumed that obtained accumulated value is f1.Same operation is done respectively to F2, F3, it is f2, f3 to obtain accumulated value.By f1,
As a result, this three does logic or operation obtains more bit Bloom filters after result and f3 move to right two after f2 moves to right one
(it is)。
Since the design of the above Bloom filter can lead to a problem:When a Bloom filter returns to true, I
Need to inquire corresponding two sublists.If than x in i-th of Bloom filter, then need in sublist TiOr Tz-i+1In look into
It askes.In order to reduce the sublist number of inquiry, an additional Bloom filter, referred to as F are increasedhalf, it is responsible for the of record sublist
Two parts, i.e.,
In addition matrix Hash uses bitmap also in piece, and there are one each sublists, and bitmap is corresponding, each in bitmap
A bucket in the corresponding sublist of bit is corresponding.It is 0 that empty barrel, which corresponds to the bit in bitmap, and non-empty barrel is 1.
The false positive rate of two, matrixes Hash derives
There are two Bloom filters for matrix Hash:FmAnd Fhalf.Assuming that n is the number of key-value pair, z sublist reassembles into
A sublist.Assuming that FmThere is m case, has in each caseA bit, thisA bit corresponds to respectivelyA sublist.Assuming that FmThere is k son
Table,FmThe positive rate of vacation and form the independent of itA Bloom filter is equal.Therefore FmThe positive rate of vacation such as use f
(Fm) indicate, formula is as follows:
If the number for returning to the Bloom filter of true is u+1, false sun rate formula is:
f(Fm, u)=0.5k*u*(1-0.5k(z-u-1))
FhalfEqually there are k hash function, FhalfThe positive rate of vacation be:f(Fhalf)=0.5k.If only FmTrue is returned,
And key-value pair exists only in a sublist, then does not report by mistake, and the probability that this event occurs is (1-f (Fm))*(1-f
(Fhalf)).If only FmTrue is returned, and reports that key-value pair is present in u+1 sublist, there will be u wrong report, this things
The probability of part is f (Fm,u)*(1-f(Fhalf)).If only FhalfOne wrong report of report, the probability that this event occurs are (1-
f(Fm))*f(Fhalf).If FmThere are u wrong report, and FhalfThere are one wrong report, the probability that this event occurs is f (Fm,u)*f
(Fhalf)。
Such as:As z=8 and k=16, the positive rate of vacation of matrix Hash is 1- (1-f (Fm))*(1-f(Fhalf))≈6.1*
10-5, this number is very small.
The insertion of three, key-value pairs, deletes mode at inquiry
In key assignments storage system, matrix hash algorithm is inserted into, inquiry, deletes the concrete operations embodiment of key-value pair such as
Under:
1. the inserted mode of key-value pair
For a given key-value pair, key x is inserted into.Check whether z candidate bucket is empty by bitmap first.So
Key-value pair is inserted into the minimum sublist of charging ratio afterwards, to balance all sublist charging ratios.Assuming that the sublist to be inserted into index
For i.IfThen update FiTo indicate x in sublist TiIn, update corresponding bitmap;If sublist indexesIt need to be by Fz-i+1It more newly arrives and indicates x in sublist TiIn, and update FhalfWith corresponding bitmap.In insertion process, it is
By in a case with FiCorresponding bit is set to 1, will be in this caseA bit and 2i-1Do logic or operation.
If bitmap shows that the z bucket that x should be inserted into has been expired, use what is kicked in cuckoo Hash (cuckoo Hash)
Mechanism determines which key-value pair kicked with bitmap.The corresponding z candidate bucket of x is checked in order using bitmap, with true
Original element in candidate bucket, such as y are determined, in remaining z-1 sublist, if y can be inserted into there are one empty barrel.If
Have, then kick out of y, x is inserted into, and y is inserted into new position.If can not find such a y, execute it is blind kick, and repeat
The flow being inserted into above.The blind number kicked is limited to θ, is kicked Ru blind more than θ times, then key-value pair is suspended to the chain of the last one sublist
On table.By changing the value of θ, RHT4 can be weighed between charging ratio and insertion speed.Because bitmap is in sublist
Empty barrel and non-empty barrel there are one global record, in piece the use of bitmap significantly reduce the number for kicking operation.
2. the inquiry mode of key-value pair
X is such as inquired, first in FmAnd FhalfMiddle inquiry x, ifReturn to true, and FhalfFalse is returned,
Then check sublist Ti.Otherwise, sublist T is first checkedz-i+1If do not matched, sublist T is reexaminedi.If in z sublist all
X can not be found, then searches the chained list of the last one sublist.If still can not find, illustrate x not in the Hash table.
It is worth noting that, in query process, k hash function only need to be calculated, does not need to calculate z × k
Hash function, this is because:Form the original of a Bloom filterThe parameter of a Bloom filter is identical.If read
Take in a case with FiCorresponding bit, by thisA bit and 2i-1Do logical AND operation.If result is 0, in case with Fi
Corresponding bit is 0, otherwise is 1.
3. the deletion mode of key-value pair
X is such as deleted, RHT1 according to above-mentioned inquiry operation, finds the bucket where x first, then moves key-value pair from bucket
It removes, the corresponding bit of bucket where x in set figure of laying equal stress on.
Four, experimental datas
In order to preferably assess the matrix Hash and the design of existing Hash of the present invention, we use practical application
Data.We obtain website www.ripe.net 2014.07.08 days 8 a.m.s 12 forwarding information storehouses (FIB,
Forward Information Base), for each FIB, a manually generated stream unified to each prefix (prefix)
Amount tracking (traffic trace).We using in FIB with our relevant parts, that is, prefix (prefix) and correlation
Next-hop.Prefix (prefix) is used as key, and next-hop is as value.We indicate the total barrelage amount of Hash table and total with β
The ratio of number of elements.Wherein 1.05≤β≤10.We indicate the blind threshold value for kicking operation with θ.Key-value pair number is in FIB
500,000.The magnitude difference for 8 sublists established is 5000, and the total size of sublist is β * n.Make θ=0, it means that do not allow
It is blind to kick, it can only only be kicked with bitmap.It is inserted into key-value pair every time, the imitative maximum value for depositing number is 8+1, if element is in 8 sublists
Empty position candidate is not found, then need to be inserted on the chained list of a last sublist.The last one sublist of collision rate
The ratio of number and total element number on chained list.Bloom filter has 16 hash functions.Experimental result is as follows:
1. the experiment performance of matrix Hash:
1) charging ratio and collision rate
Experimental setup β=1.05, θ=0, the experimental results showed that, matrix Hash is only achieved that with the memory of 1.05*n non-
Often high charging ratio, wherein the charging ratio of 8 sublists is very balanced, total charging ratio is 95.19%.Collision rate is on 0.05% left side
The right side, only several FIB collision rates have been more than 0.06%.
2) insertion and query time
The all elements of each FIB are inserted into matrix Hash, experimental result table by experimental setup β=1.05, θ=0, experiment
Bright, insertion element is more, and required internal storage access is more.Most elements are inserted into required internal storage access number and are less than 6 every time
Secondary, the internal storage access number of inquiry is between 1 to 1.0019, mean value 1.00059.
3) bitmap is kicked kicks with blind
Experimental setup β=1.05, the experimental results showed that, as θ=5, the memory number for being inserted into a key-value pair is preferably at most
8* (5+1)+1=49 times, the internal storage access number of one key-value pair of inquiry is less than 8 times, at this moment on the chained list of the last one sublist
There is no element.When θ=0, although blind kick is not allowed to, also only has seldom element on the chained list of the last one sublist
(0.56%).The worst case of internal storage access is 8+1 times when insertion.
4) collision rate vs β
Experimental setup θ=0, the experimental results showed that β is bigger, collision rate is smaller, and as β >=1.18, and collision rate is close
It is 0.
2. matrix Hash is compared with other Hash:
It tests matrix Hash and chain type Hash, linear probing, double Hash, cuckoo Hash, d-left Hash, hole bird
The well-known Hash design of six kinds of Hash compares.It defines first and is inserted into failure, for linear probing, double Hash and cloth
Paddy bird Hash can detect another bucket, and this detection can repeat always when a collision occurs.We will repeat to detect
Time number limiting within 500, it means that for these three Hash design, every time be inserted into internal storage access number maximum
Value is 500, if it exceeds 500 times still have conflict, the design of these three Hash will be abandoned continuing into, when by the 500th cycle
The element being not inserted into abandons, and also has led to being inserted into failure.Have 16 for hole bird Hash and matrix Hash, Bloom filter
A hash function.
Experiment one:(β=1.05, different FIB)
1) charging ratio
The experimental results showed that:The charging ratio of matrix Hash is always highest.
2) it is inserted into the time
The experimental results showed that:Matrix Hash is inserted into required imitate in other all Hash other than chain type Hash
It is minimum to deposit number.This is because chain type Hash, only needs memory access once or twice when being inserted into, so the memory access time is shorter, but
Chain type Hash in other respects the shortcomings that it is very prominent.And matrix Hash can be reached due to the presence of Bloom filter and bitmap
Quick insertion.
3) time is searched
The experimental results showed that:Matrix Hash has the shortest lookup time, because matrix Hash has this higher charging ratio
With the positive rate of vacation of very little.
Test two (different β, FIB rrc00)
1) charging ratio
The experimental results showed that:The charging ratio of matrix Hash is always highest, and the charging ratio of chain type Hash and double Hash is poor
Seldom, hole bird Hash has just reached higher charging ratio only when β is relatively high.
2) it is inserted into the time
The experimental results showed that:Matrix Hash is inserted into required imitate in other all Hash other than chain type Hash
It is minimum to deposit number.
3) time is searched
The experimental results showed that:Matrix Hash has the shortest lookup time.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be modified or replaced equivalently technical scheme of the present invention, without departing from the spirit and scope of the present invention, this
The protection domain of invention should be subject to described in claims.
Claims (9)
1. a kind of date storage method based on matrix Hash, which is characterized in that include the following steps:
1) hash table data structure is established, it includes z sublist, z is even number, and the size equal difference of each sublist is successively decreased;ForI-th of sublist and the z-i+1 sublist are combined, obtainedA equal-sized sublist;
2) secondary data structure is established, it includes z Bloom filter corresponding with the z sublist, each Bloom filter
Size equal difference is successively decreased;ForI-th of Bloom filter and the z-i+1 Bloom filter are combined, obtainedIt is a
Equal-sized Bloom filter;Then shouldThe corresponding bit of a Bloom filter adds together, forms bit more than 1
Bloom filter;
3) it is inserted into key-value pair using the hash table data structure and the secondary data structure, realizes data storage.
2. the method as described in claim 1, it is characterised in that:Whenever being inserted into a new key-value pair, dress is inserted it into
In the sublist of load rate minimum.
3. the method as described in claim 1, it is characterised in that:The chaining table on the last character table i.e. z-th of sublist, if
The key-value pair being inserted into can not find an empty barrel, then is hung on chained list using pointer.
4. the method as described in claim 1, it is characterised in that:There are one each sublists, and bitmap is corresponding, each in bitmap
A bucket in a corresponding sublist of bit is corresponding;It is 0 that empty barrel, which corresponds to the bit in bitmap, and non-empty barrel corresponds in bitmap
Bit be 1.
5. method as claimed in claim 4, it is characterised in that:Increase an additional Bloom filter Fhalf, it is responsible for record
The second part of sublist, i.e.,To reduce the sublist number of inquiry.
6. method as claimed in claim 5, which is characterized in that the inserted mode of key-value pair is as follows:
A) it for a given key-value pair, checks whether z candidate bucket is empty by bitmap first, is then inserted into key-value pair
In the sublist minimum to charging ratio, to balance all sublist charging ratios;Assuming that the sublist to be inserted into index is i, ifThen update Bloom filter FiTo indicate key x in sublist TiIn, and update corresponding bitmap;If
Then update Bloom filter Fz-i+1To indicate x in sublist TiIn, and update FhalfWith corresponding bitmap;
If b) bitmap shows that the z bucket that key x should be inserted into has been expired, the insertion of key-value pair is realized using the mechanism kicked.
7. method as claimed in claim 6, which is characterized in that the realization method of step b) is:It is examined in order using bitmap
The corresponding z candidate bucket of x is looked into, whether there are one empty barrels in remaining z-1 sublist by original element y in candidate bucket to determine
Y can be inserted into;If so, then kicking out of y, x is inserted into, and y is inserted into new position;If can not find y, execute blind
It kicks, and repeats the above flow being inserted into, and the blind number kicked is limited to θ, kick, be then suspended to key-value pair most Ru blind more than θ times
On the chained list of the latter sublist.
8. the method for claim 7, which is characterized in that the inquiry mode of key-value pair is:When inquiring x, first more
Bit Bloom filter FmAnd FhalfMiddle inquiry x, ifReturn to true, and FhalfFalse is returned, then checks son
Table Ti;Otherwise, sublist T is first checkedz-i+1If do not matched, sublist T is reexaminedi;If can not all be searched in z sublist
To x, then the chained list of the last one sublist is searched;If still can not find, illustrate x not in Hash table.
9. the method for claim 7, which is characterized in that the deletion mode of key-value pair is:When deleting x, basis first
Inquiry operation finds the bucket where x, then removes key-value pair from bucket, the corresponding bit of bucket where x in set figure of laying equal stress on.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710014205.9A CN108287840B (en) | 2017-01-09 | 2017-01-09 | Data storage and query method based on matrix hash |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710014205.9A CN108287840B (en) | 2017-01-09 | 2017-01-09 | Data storage and query method based on matrix hash |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108287840A true CN108287840A (en) | 2018-07-17 |
CN108287840B CN108287840B (en) | 2022-05-03 |
Family
ID=62819334
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710014205.9A Active CN108287840B (en) | 2017-01-09 | 2017-01-09 | Data storage and query method based on matrix hash |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108287840B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108989452A (en) * | 2018-08-07 | 2018-12-11 | 佛山市苔藓云链科技有限公司 | A kind of data transmission of internet of things device |
CN109471635A (en) * | 2018-09-03 | 2019-03-15 | 中新网络信息安全股份有限公司 | A kind of algorithm optimization method realized based on Java Set set |
CN109597807A (en) * | 2018-10-25 | 2019-04-09 | 阿里巴巴集团控股有限公司 | Number storehouse list processing method and apparatus |
CN109766341A (en) * | 2018-12-27 | 2019-05-17 | 厦门市美亚柏科信息股份有限公司 | A kind of method, apparatus that establishing Hash mapping, storage medium |
CN109800228A (en) * | 2018-12-28 | 2019-05-24 | 深圳竹云科技有限公司 | A method of efficiently quickly solving hash conflict |
CN111552692A (en) * | 2020-04-30 | 2020-08-18 | 南方科技大学 | Plus-minus cuckoo filter |
CN111563199A (en) * | 2020-04-26 | 2020-08-21 | 北京奇艺世纪科技有限公司 | Data processing method and device |
CN112416933A (en) * | 2020-11-19 | 2021-02-26 | 重庆邮电大学 | High-performance hash table implementation method based on-chip and off-chip memories |
CN112699323A (en) * | 2021-01-07 | 2021-04-23 | 西藏宁算科技集团有限公司 | Cloud caching system and cloud caching method based on double bloom filters |
CN113342828A (en) * | 2021-07-02 | 2021-09-03 | 广东唯审信息科技有限公司 | Hash table conflict resolution method based on d-dimensional mapping |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104317795A (en) * | 2014-08-28 | 2015-01-28 | 华为技术有限公司 | Two-dimensional filter generation method, query method and device |
CN105027527A (en) * | 2012-12-31 | 2015-11-04 | 华为技术有限公司 | Scalable storage systems with longest prefix matching switches |
CN105468298A (en) * | 2015-11-19 | 2016-04-06 | 中国科学院信息工程研究所 | Key value storage method based on log-structured merged tree |
US20160196306A1 (en) * | 2015-01-07 | 2016-07-07 | International Business Machines Corporation | Technology for join processing |
-
2017
- 2017-01-09 CN CN201710014205.9A patent/CN108287840B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105027527A (en) * | 2012-12-31 | 2015-11-04 | 华为技术有限公司 | Scalable storage systems with longest prefix matching switches |
CN104317795A (en) * | 2014-08-28 | 2015-01-28 | 华为技术有限公司 | Two-dimensional filter generation method, query method and device |
US20160196306A1 (en) * | 2015-01-07 | 2016-07-07 | International Business Machines Corporation | Technology for join processing |
CN105468298A (en) * | 2015-11-19 | 2016-04-06 | 中国科学院信息工程研究所 | Key value storage method based on log-structured merged tree |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108989452A (en) * | 2018-08-07 | 2018-12-11 | 佛山市苔藓云链科技有限公司 | A kind of data transmission of internet of things device |
CN109471635A (en) * | 2018-09-03 | 2019-03-15 | 中新网络信息安全股份有限公司 | A kind of algorithm optimization method realized based on Java Set set |
CN109471635B (en) * | 2018-09-03 | 2021-09-17 | 中新网络信息安全股份有限公司 | Algorithm optimization method based on Java Set implementation |
CN109597807A (en) * | 2018-10-25 | 2019-04-09 | 阿里巴巴集团控股有限公司 | Number storehouse list processing method and apparatus |
CN109766341A (en) * | 2018-12-27 | 2019-05-17 | 厦门市美亚柏科信息股份有限公司 | A kind of method, apparatus that establishing Hash mapping, storage medium |
CN109766341B (en) * | 2018-12-27 | 2022-04-22 | 厦门市美亚柏科信息股份有限公司 | Method, device and storage medium for establishing Hash mapping |
CN109800228A (en) * | 2018-12-28 | 2019-05-24 | 深圳竹云科技有限公司 | A method of efficiently quickly solving hash conflict |
CN109800228B (en) * | 2018-12-28 | 2023-03-10 | 深圳竹云科技有限公司 | Method for efficiently and quickly solving hash conflict |
CN111563199A (en) * | 2020-04-26 | 2020-08-21 | 北京奇艺世纪科技有限公司 | Data processing method and device |
CN111563199B (en) * | 2020-04-26 | 2023-10-10 | 北京奇艺世纪科技有限公司 | Data processing method and device |
CN111552692A (en) * | 2020-04-30 | 2020-08-18 | 南方科技大学 | Plus-minus cuckoo filter |
CN111552692B (en) * | 2020-04-30 | 2023-04-07 | 南方科技大学 | Plus-minus cuckoo filter |
CN112416933B (en) * | 2020-11-19 | 2022-09-23 | 重庆邮电大学 | High-performance hash table implementation method based on-chip and off-chip memories |
CN112416933A (en) * | 2020-11-19 | 2021-02-26 | 重庆邮电大学 | High-performance hash table implementation method based on-chip and off-chip memories |
CN112699323A (en) * | 2021-01-07 | 2021-04-23 | 西藏宁算科技集团有限公司 | Cloud caching system and cloud caching method based on double bloom filters |
CN113342828A (en) * | 2021-07-02 | 2021-09-03 | 广东唯审信息科技有限公司 | Hash table conflict resolution method based on d-dimensional mapping |
Also Published As
Publication number | Publication date |
---|---|
CN108287840B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287840A (en) | A kind of data storage and query method based on matrix Hash | |
Li et al. | Packet forwarding in named data networking requirements and survey of solutions | |
CN110083601B (en) | Key value storage system-oriented index tree construction method and system | |
Xia et al. | Refreshing the sky: the compressed skycube with efficient support for frequent updates | |
CN103810237B (en) | Data managing method and system | |
JP6356675B2 (en) | Aggregation / grouping operation: Hardware implementation of hash table method | |
US20140188885A1 (en) | Utilization and Power Efficient Hashing | |
CN109255055A (en) | A kind of diagram data access method and device based on packet associated table | |
CN105574054B (en) | A kind of distributed caching range query method, apparatus and system | |
CN106202548A (en) | Date storage method, lookup method and device | |
CN106294772A (en) | The buffer memory management method of distributed memory columnar database | |
CN102819586A (en) | Uniform Resource Locator (URL) classifying method and equipment based on cache | |
Xiao et al. | Using parallel bloom filters for multiattribute representation on network services | |
US20080133494A1 (en) | Method and apparatus for searching forwarding table | |
CN112000846A (en) | Method for grouping LSM tree indexes based on GPU | |
Hua et al. | Nest: Locality-aware approximate query service for cloud computing | |
CN113157943A (en) | Distributed storage and visual query processing method for large-scale financial knowledge map | |
CN106919691A (en) | Method, device and the searching system retrieved based on web page library | |
CN104391992A (en) | Asset data-oriented data processing system | |
CN106156171A (en) | A kind of enquiring and optimizing method of Virtual asset data | |
CN118227518B (en) | Table entry storage and searching method and device, network equipment and storage medium | |
Skandar et al. | An efficient duplication record detection algorithm for data cleansing | |
Gong et al. | Abc: a practicable sketch framework for non-uniform multisets | |
CN109522242A (en) | A kind of method and apparatus for searching for Cache data | |
JP6006740B2 (en) | Index management device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |