CN108287840A - A kind of data storage and query method based on matrix Hash - Google Patents

A kind of data storage and query method based on matrix Hash Download PDF

Info

Publication number
CN108287840A
CN108287840A CN201710014205.9A CN201710014205A CN108287840A CN 108287840 A CN108287840 A CN 108287840A CN 201710014205 A CN201710014205 A CN 201710014205A CN 108287840 A CN108287840 A CN 108287840A
Authority
CN
China
Prior art keywords
sublist
key
bloom filter
hash
value pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710014205.9A
Other languages
Chinese (zh)
Other versions
CN108287840B (en
Inventor
杨仝
张梦瑜
李晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201710014205.9A priority Critical patent/CN108287840B/en
Publication of CN108287840A publication Critical patent/CN108287840A/en
Application granted granted Critical
Publication of CN108287840B publication Critical patent/CN108287840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of data storage and query methods based on matrix Hash.This method includes:1) hash table data structure is established, it includes z sublist, z is even number, and the size equal difference of each sublist is successively decreased;ForI-th of sublist and z i+1 sublists are combined, obtainedA equal-sized sublist;2) secondary data structure is established, it includes z Bloom filter corresponding with the z sublist, the size equal difference of each Bloom filter is successively decreased;ForI-th of Bloom filter and z i+1 Bloom filters are combined, obtainedA equal-sized Bloom filter;Then shouldThe corresponding bit of a Bloom filter adds together, forms bit Bloom filter more than 1;3) it is inserted into key-value pair using the hash table data structure and the secondary data structure, realizes data storage.The present invention can realize quickly update and quick search.

Description

A kind of data storage and query method based on matrix Hash
Technical field
The invention belongs to memory database technical field, more particularly to a kind of data organization based on matrix hash algorithm, Index, storage method.
Background technology
Memory database has higher flexibility and ease for use compared to disk database, and memory database is from pattern Associative memory database and key assignments type memory database can be divided into.Memory database (Key Value based on key assignments Store) have many advantages, such as flexible succinct, saving memory, quick search, have uniqueness compared to the memory database based on relationship type Advantage, thus be widely used in major Internet company, for example, Amazon, Facebook, Youtube, Baidu, Sina, Sohu etc..The data of key assignments storage system are the presence in the form of key-value pair, and are stored with Hash table, therefore Hash Core technology of the algorithm as key assignments storage system is the key factor for directly affecting system performance and website efficiency.
Presently, there are practical problem be that, with internet fast development, many Internet companies all have accumulated largely Data, due to the enormous amount of key-value pair, and available memory headroom is limited, therefore when one new key of insertion It is worth clock synchronization, the conflict of key-value pair can be relatively more.Such conflict can lead to the insertion failure of new key-value pair, have key-value pair more New the problems such as searching failure, leverage the performance of key assignments storage system, thus the internet to using key assignments storage system Company causes larger economic loss.
Meanwhile demand of the client to data manipulation and requiring higher and higher, the query result for quickly obtaining data is needed, because And high requirement is proposed to the responding ability of Internet company, if Internet company cannot accomplish summary responses, it will significantly Influence user experience.
Two above problem is widely present in major Internet company using key assignments storage system, and existing Hash table is set Meter also continuously attempts to new thinking preferably to solve the two critical issues.First against collision problem, existing Hash table Design reduces collision probability by the data structure of auxiliary (such as Bloom filter) extensively.It is fast than the design of more typical algorithm Fast Hash (fast hash table) (H.Song, S.Dharmapurikar, J.Turner, and J.Lockwood.Fast hash table lookup using extended bloom filter:an aid to network processing.ACM SIGCOMM Computer Communication Review,35(4):181-192,2005.), divide Duan Haxi (segment hash) (S.Kumar and P.Crowley.Segmented hash:an efficient hash table implementation for high performance networking subsystems.In Proc.ACM ANCS, pages 91-103,2005.), peacock Hash (peacock hash) (S.Kumar, J.Turner, and P.Crowley.Peacock hashing:Deterministic and updatable hashing for high performance networking.In Proc.IEEE INFOCOM,2008.).The new key assignments being inserted into is needed for one Right, the design of these Hash all determines the Hash table being inserted into using Bloom filter.Key-value pair for conflict or use Pointer hangs on chained list or abandons.Although the design of these Hash reduces conflict using more sublists, there are still lack Point, such as lower charging ratio.Collision rate also also has larger reduction space.
Followed by query time problem is designed with perfect Hash (Z.J.Czech, G.Havas, and than more typical Hash B.S.Majewski.An optimal algorithm for generating minimal perfect hash functions.Information Processing Letters,43(5):257-264,1992.), cuckoo Hash (B.Fan,D.G.Andersen,and M.Kaminsky.Memc3:Compact and concurrent memcache with Dumber caching and smarter hashing.In NSDI, volume 13, pages 385-398,2013.) etc., However the shortcomings that these Hash be update when it is very inefficient, need a large amount of Hash calculation and internal storage access.For example, cuckoo Hash needs nearly 500 Hash calculations and internal storage access when updating Hash table, even so, is likely to update failure. Therefore these Hash tables are designed, if repeatedly update failure, will have to rebuild entire Hash table.Reconstruction process will The plenty of time is needed, the application system for reality is unacceptable.
Invention content
To solve the problems, such as Hash table conflict and query time, high collision rate, low memory existing for existing Hash table is overcome to make The defects of with efficiency, low charging ratio, the present invention provide it is a kind of more sublist Hash, Bloom filter, bitmap are combined it is new Hash table design scheme-" matrix Hash ".
The technical solution adopted by the present invention is as follows:
A kind of date storage method based on matrix Hash, which is characterized in that include the following steps:
1) hash table data structure is established, it includes z sublist, z is even number, and the size equal difference of each sublist is successively decreased;ForI-th of sublist and the z-i+1 sublist are combined, obtainedA equal-sized sublist;
2) secondary data structure is established, it includes z Bloom filter corresponding with the z sublist, each grand filtering of cloth The size equal difference of device is successively decreased;ForI-th of Bloom filter and the z-i+1 Bloom filter are combined, obtainedA equal-sized Bloom filter;Then shouldThe corresponding bit of a Bloom filter adds together, and formation is compared more 1 Special Bloom filter;
3) it is inserted into key-value pair using the hash table data structure and the secondary data structure, realizes data storage.
Further, it whenever being inserted into a new key-value pair, inserts it into the sublist of charging ratio minimum.
Further, the chaining table on the last character table i.e. z-th of sublist, if the key-value pair being inserted into can not find one A empty barrel, then hung over using pointer on chained list.
Further, bitmap is corresponding there are one each sublists, in the corresponding sublist of each bit in bitmap A bucket it is corresponding;It is 0 that empty barrel, which corresponds to the bit in bitmap, and it is 1 that non-empty barrel, which corresponds to the bit in bitmap,.
Further, increase an additional Bloom filter Fhalf, it is responsible for recording the second part of sublist, i.e.,To reduce the sublist number of inquiry.
Further, the inserted mode of key-value pair is as follows:
A) for a given key-value pair, check whether z candidate bucket is empty by bitmap first, then by key-value pair It is inserted into the minimum sublist of charging ratio, to balance all sublist charging ratios;Assuming that the sublist to be inserted into index is i, ifThen update Bloom filter FiTo indicate key x in sublist TiIn, and update corresponding bitmap;If Then update Bloom filter Fz-i+1To indicate x in sublist TiIn, and update FhalfWith corresponding bitmap;
If b) bitmap shows that the z bucket that key x should be inserted into has been expired, inserting for key-value pair is realized using the mechanism kicked Enter.
Further, the inquiry mode of key-value pair is:When inquiring x, first in more bit Bloom filter FmAnd FhalfIn X is inquired, ifReturn to true, and FhalfFalse is returned, then checks sublist Ti;Otherwise, sublist is first checked Tz-i+1If do not matched, sublist T is reexaminedi;If can not all find x in z sublist, last height is searched The chained list of table;If still can not find, illustrate x not in Hash table.
Further, the deletion mode of key-value pair is:When deleting x, the bucket where x is found according to inquiry operation first, Then key-value pair is removed from bucket, the corresponding bit of bucket where x in set figure of laying equal stress on.
The beneficial effects of the invention are as follows:1) high charging ratio+less pointer:A large amount of key assignments is stored with smaller memory headroom It is right, and the pointer number used is seldom.2) low collision rate.3) quickly update:Utilize the i.e. renewable Hash of seldom internal storage access Table.4) zero update failure.5) quick search:Key-value pair can be quickly found with seldom internal storage access, or for not depositing Key-value pair, can quickly return to the result being not present.6) practicability:It is easy to realize in hardware system.
Description of the drawings
Fig. 1 is the algorithm schematic diagram of matrix Hash.
Fig. 2 is the structure chart of more bit Bloom filters.
Specific implementation mode
Below by specific embodiments and the drawings, the present invention will be further described.
One, data structures
The data structure synthesis of " the matrix Hash " of the present invention has used multistage sublist, Bloom filter and bitmap.Data Structure is made of hash table data structure and secondary data structure two parts.
1. hash table data structure
The size of each sublist, the greatest member number that can be stored are that equal difference is successively decreased, therefore corresponding with sublist Bloom filter is also what equal difference was successively decreased.A fairly simple balance policy has been used when being inserted into element:Whenever insertion one It when a new key-value pair, is inserted into the sublist of charging ratio minimum, so it is ensured that element number in each sublist And similar arithmetic series successively decreases existing for form.
Assuming that a shared z sublist, z is even number.ForMatrix Hash is a by i-th of sublist and z-i+1 Sublist combines, and finally obtainsA equal-sized sublist.Because the sublist shape in conjunction with after is similar with matrix, therefore we This algorithm is named as matrix Hash.Fail in order to avoid being inserted into, allows the last one sublist chaining table.If being inserted into key Value to finally can not find an empty barrel, then can in z-th of sublist chaining table.Because z-th of sublist is minimum sublist, because This pointer will occupy minimum memory.
The algorithm schematic diagram of Fig. 1 matrix Hash, wherein the left side is that size is in 6 sublists and 6 grand mistakes of cloth that equal difference is successively decreased Filter, centre are equal-sized 3 sublists and 3 Bloom filters after combining.Upper right side is three grand filterings of standard cloth Device BF1, more bit Bloom filters that BF2, BF3 are combined into.
2. secondary data structure
With Hash table combine it is similar, forMatrix Hash is grand by i-th of Bloom filter and the z-i+1 cloth Filter combines, and finally obtainsA equal-sized standard Bloom filter.Then, by by thisA Bloom filter pair It answers bit to add together, forms 1 Bloom filter.In this Bloom filter, each case byA bit composition. This Bloom filter is referred to as more bit Bloom filters by I, and F is used in combinationmIt indicates.This is arrived, we are former z equal difference Bloom filter be combined into bit Bloom filter more than 1.
The structure chart of the more bit Bloom filters of Fig. 2.As shown in the drawing, three bits in a case are respectively from three Equal-sized standard Bloom filter, that is, F1, F2, F3.It is worth noting that, the combination of Bloom filter is the memory in piece It carries out, is that physically, and the combination of sublist is only notional.The algorithm realization method of more bit Bloom filters is as follows It is shown:
Assuming that F1, F2, F3 have m bit.For F1, first take most significant bit (by the m bit of F1 and 2m-1It patrols Collect and operate), then by acquired results, to moving to left 2*m, (m bit of gained is multiplied by 22m), then take time high order bit (by F1 M bit and 2m-2Do logical AND operation), by acquired results, to the position 2* (m-1) is moved to left, (m bit of gained is multiplied by 22(m-1)), Acquired results and the value after highest bit operating are added up, and so on, each bit does similar operations, adds up, to the last One bit, it is assumed that obtained accumulated value is f1.Same operation is done respectively to F2, F3, it is f2, f3 to obtain accumulated value.By f1, As a result, this three does logic or operation obtains more bit Bloom filters after result and f3 move to right two after f2 moves to right one (it is)。
Since the design of the above Bloom filter can lead to a problem:When a Bloom filter returns to true, I Need to inquire corresponding two sublists.If than x in i-th of Bloom filter, then need in sublist TiOr Tz-i+1In look into It askes.In order to reduce the sublist number of inquiry, an additional Bloom filter, referred to as F are increasedhalf, it is responsible for the of record sublist Two parts, i.e.,
In addition matrix Hash uses bitmap also in piece, and there are one each sublists, and bitmap is corresponding, each in bitmap A bucket in the corresponding sublist of bit is corresponding.It is 0 that empty barrel, which corresponds to the bit in bitmap, and non-empty barrel is 1.
The false positive rate of two, matrixes Hash derives
There are two Bloom filters for matrix Hash:FmAnd Fhalf.Assuming that n is the number of key-value pair, z sublist reassembles into A sublist.Assuming that FmThere is m case, has in each caseA bit, thisA bit corresponds to respectivelyA sublist.Assuming that FmThere is k son Table,FmThe positive rate of vacation and form the independent of itA Bloom filter is equal.Therefore FmThe positive rate of vacation such as use f (Fm) indicate, formula is as follows:
If the number for returning to the Bloom filter of true is u+1, false sun rate formula is:
f(Fm, u)=0.5k*u*(1-0.5k(z-u-1))
FhalfEqually there are k hash function, FhalfThe positive rate of vacation be:f(Fhalf)=0.5k.If only FmTrue is returned, And key-value pair exists only in a sublist, then does not report by mistake, and the probability that this event occurs is (1-f (Fm))*(1-f (Fhalf)).If only FmTrue is returned, and reports that key-value pair is present in u+1 sublist, there will be u wrong report, this things The probability of part is f (Fm,u)*(1-f(Fhalf)).If only FhalfOne wrong report of report, the probability that this event occurs are (1- f(Fm))*f(Fhalf).If FmThere are u wrong report, and FhalfThere are one wrong report, the probability that this event occurs is f (Fm,u)*f (Fhalf)。
Such as:As z=8 and k=16, the positive rate of vacation of matrix Hash is 1- (1-f (Fm))*(1-f(Fhalf))≈6.1* 10-5, this number is very small.
The insertion of three, key-value pairs, deletes mode at inquiry
In key assignments storage system, matrix hash algorithm is inserted into, inquiry, deletes the concrete operations embodiment of key-value pair such as Under:
1. the inserted mode of key-value pair
For a given key-value pair, key x is inserted into.Check whether z candidate bucket is empty by bitmap first.So Key-value pair is inserted into the minimum sublist of charging ratio afterwards, to balance all sublist charging ratios.Assuming that the sublist to be inserted into index For i.IfThen update FiTo indicate x in sublist TiIn, update corresponding bitmap;If sublist indexesIt need to be by Fz-i+1It more newly arrives and indicates x in sublist TiIn, and update FhalfWith corresponding bitmap.In insertion process, it is By in a case with FiCorresponding bit is set to 1, will be in this caseA bit and 2i-1Do logic or operation.
If bitmap shows that the z bucket that x should be inserted into has been expired, use what is kicked in cuckoo Hash (cuckoo Hash) Mechanism determines which key-value pair kicked with bitmap.The corresponding z candidate bucket of x is checked in order using bitmap, with true Original element in candidate bucket, such as y are determined, in remaining z-1 sublist, if y can be inserted into there are one empty barrel.If Have, then kick out of y, x is inserted into, and y is inserted into new position.If can not find such a y, execute it is blind kick, and repeat The flow being inserted into above.The blind number kicked is limited to θ, is kicked Ru blind more than θ times, then key-value pair is suspended to the chain of the last one sublist On table.By changing the value of θ, RHT4 can be weighed between charging ratio and insertion speed.Because bitmap is in sublist Empty barrel and non-empty barrel there are one global record, in piece the use of bitmap significantly reduce the number for kicking operation.
2. the inquiry mode of key-value pair
X is such as inquired, first in FmAnd FhalfMiddle inquiry x, ifReturn to true, and FhalfFalse is returned, Then check sublist Ti.Otherwise, sublist T is first checkedz-i+1If do not matched, sublist T is reexaminedi.If in z sublist all X can not be found, then searches the chained list of the last one sublist.If still can not find, illustrate x not in the Hash table.
It is worth noting that, in query process, k hash function only need to be calculated, does not need to calculate z × k Hash function, this is because:Form the original of a Bloom filterThe parameter of a Bloom filter is identical.If read Take in a case with FiCorresponding bit, by thisA bit and 2i-1Do logical AND operation.If result is 0, in case with Fi Corresponding bit is 0, otherwise is 1.
3. the deletion mode of key-value pair
X is such as deleted, RHT1 according to above-mentioned inquiry operation, finds the bucket where x first, then moves key-value pair from bucket It removes, the corresponding bit of bucket where x in set figure of laying equal stress on.
Four, experimental datas
In order to preferably assess the matrix Hash and the design of existing Hash of the present invention, we use practical application Data.We obtain website www.ripe.net 2014.07.08 days 8 a.m.s 12 forwarding information storehouses (FIB, Forward Information Base), for each FIB, a manually generated stream unified to each prefix (prefix) Amount tracking (traffic trace).We using in FIB with our relevant parts, that is, prefix (prefix) and correlation Next-hop.Prefix (prefix) is used as key, and next-hop is as value.We indicate the total barrelage amount of Hash table and total with β The ratio of number of elements.Wherein 1.05≤β≤10.We indicate the blind threshold value for kicking operation with θ.Key-value pair number is in FIB 500,000.The magnitude difference for 8 sublists established is 5000, and the total size of sublist is β * n.Make θ=0, it means that do not allow It is blind to kick, it can only only be kicked with bitmap.It is inserted into key-value pair every time, the imitative maximum value for depositing number is 8+1, if element is in 8 sublists Empty position candidate is not found, then need to be inserted on the chained list of a last sublist.The last one sublist of collision rate The ratio of number and total element number on chained list.Bloom filter has 16 hash functions.Experimental result is as follows:
1. the experiment performance of matrix Hash:
1) charging ratio and collision rate
Experimental setup β=1.05, θ=0, the experimental results showed that, matrix Hash is only achieved that with the memory of 1.05*n non- Often high charging ratio, wherein the charging ratio of 8 sublists is very balanced, total charging ratio is 95.19%.Collision rate is on 0.05% left side The right side, only several FIB collision rates have been more than 0.06%.
2) insertion and query time
The all elements of each FIB are inserted into matrix Hash, experimental result table by experimental setup β=1.05, θ=0, experiment Bright, insertion element is more, and required internal storage access is more.Most elements are inserted into required internal storage access number and are less than 6 every time Secondary, the internal storage access number of inquiry is between 1 to 1.0019, mean value 1.00059.
3) bitmap is kicked kicks with blind
Experimental setup β=1.05, the experimental results showed that, as θ=5, the memory number for being inserted into a key-value pair is preferably at most 8* (5+1)+1=49 times, the internal storage access number of one key-value pair of inquiry is less than 8 times, at this moment on the chained list of the last one sublist There is no element.When θ=0, although blind kick is not allowed to, also only has seldom element on the chained list of the last one sublist (0.56%).The worst case of internal storage access is 8+1 times when insertion.
4) collision rate vs β
Experimental setup θ=0, the experimental results showed that β is bigger, collision rate is smaller, and as β >=1.18, and collision rate is close It is 0.
2. matrix Hash is compared with other Hash:
It tests matrix Hash and chain type Hash, linear probing, double Hash, cuckoo Hash, d-left Hash, hole bird The well-known Hash design of six kinds of Hash compares.It defines first and is inserted into failure, for linear probing, double Hash and cloth Paddy bird Hash can detect another bucket, and this detection can repeat always when a collision occurs.We will repeat to detect Time number limiting within 500, it means that for these three Hash design, every time be inserted into internal storage access number maximum Value is 500, if it exceeds 500 times still have conflict, the design of these three Hash will be abandoned continuing into, when by the 500th cycle The element being not inserted into abandons, and also has led to being inserted into failure.Have 16 for hole bird Hash and matrix Hash, Bloom filter A hash function.
Experiment one:(β=1.05, different FIB)
1) charging ratio
The experimental results showed that:The charging ratio of matrix Hash is always highest.
2) it is inserted into the time
The experimental results showed that:Matrix Hash is inserted into required imitate in other all Hash other than chain type Hash It is minimum to deposit number.This is because chain type Hash, only needs memory access once or twice when being inserted into, so the memory access time is shorter, but Chain type Hash in other respects the shortcomings that it is very prominent.And matrix Hash can be reached due to the presence of Bloom filter and bitmap Quick insertion.
3) time is searched
The experimental results showed that:Matrix Hash has the shortest lookup time, because matrix Hash has this higher charging ratio With the positive rate of vacation of very little.
Test two (different β, FIB rrc00)
1) charging ratio
The experimental results showed that:The charging ratio of matrix Hash is always highest, and the charging ratio of chain type Hash and double Hash is poor Seldom, hole bird Hash has just reached higher charging ratio only when β is relatively high.
2) it is inserted into the time
The experimental results showed that:Matrix Hash is inserted into required imitate in other all Hash other than chain type Hash It is minimum to deposit number.
3) time is searched
The experimental results showed that:Matrix Hash has the shortest lookup time.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be modified or replaced equivalently technical scheme of the present invention, without departing from the spirit and scope of the present invention, this The protection domain of invention should be subject to described in claims.

Claims (9)

1. a kind of date storage method based on matrix Hash, which is characterized in that include the following steps:
1) hash table data structure is established, it includes z sublist, z is even number, and the size equal difference of each sublist is successively decreased;ForI-th of sublist and the z-i+1 sublist are combined, obtainedA equal-sized sublist;
2) secondary data structure is established, it includes z Bloom filter corresponding with the z sublist, each Bloom filter Size equal difference is successively decreased;ForI-th of Bloom filter and the z-i+1 Bloom filter are combined, obtainedIt is a Equal-sized Bloom filter;Then shouldThe corresponding bit of a Bloom filter adds together, forms bit more than 1 Bloom filter;
3) it is inserted into key-value pair using the hash table data structure and the secondary data structure, realizes data storage.
2. the method as described in claim 1, it is characterised in that:Whenever being inserted into a new key-value pair, dress is inserted it into In the sublist of load rate minimum.
3. the method as described in claim 1, it is characterised in that:The chaining table on the last character table i.e. z-th of sublist, if The key-value pair being inserted into can not find an empty barrel, then is hung on chained list using pointer.
4. the method as described in claim 1, it is characterised in that:There are one each sublists, and bitmap is corresponding, each in bitmap A bucket in a corresponding sublist of bit is corresponding;It is 0 that empty barrel, which corresponds to the bit in bitmap, and non-empty barrel corresponds in bitmap Bit be 1.
5. method as claimed in claim 4, it is characterised in that:Increase an additional Bloom filter Fhalf, it is responsible for record The second part of sublist, i.e.,To reduce the sublist number of inquiry.
6. method as claimed in claim 5, which is characterized in that the inserted mode of key-value pair is as follows:
A) it for a given key-value pair, checks whether z candidate bucket is empty by bitmap first, is then inserted into key-value pair In the sublist minimum to charging ratio, to balance all sublist charging ratios;Assuming that the sublist to be inserted into index is i, ifThen update Bloom filter FiTo indicate key x in sublist TiIn, and update corresponding bitmap;If Then update Bloom filter Fz-i+1To indicate x in sublist TiIn, and update FhalfWith corresponding bitmap;
If b) bitmap shows that the z bucket that key x should be inserted into has been expired, the insertion of key-value pair is realized using the mechanism kicked.
7. method as claimed in claim 6, which is characterized in that the realization method of step b) is:It is examined in order using bitmap The corresponding z candidate bucket of x is looked into, whether there are one empty barrels in remaining z-1 sublist by original element y in candidate bucket to determine Y can be inserted into;If so, then kicking out of y, x is inserted into, and y is inserted into new position;If can not find y, execute blind It kicks, and repeats the above flow being inserted into, and the blind number kicked is limited to θ, kick, be then suspended to key-value pair most Ru blind more than θ times On the chained list of the latter sublist.
8. the method for claim 7, which is characterized in that the inquiry mode of key-value pair is:When inquiring x, first more Bit Bloom filter FmAnd FhalfMiddle inquiry x, ifReturn to true, and FhalfFalse is returned, then checks son Table Ti;Otherwise, sublist T is first checkedz-i+1If do not matched, sublist T is reexaminedi;If can not all be searched in z sublist To x, then the chained list of the last one sublist is searched;If still can not find, illustrate x not in Hash table.
9. the method for claim 7, which is characterized in that the deletion mode of key-value pair is:When deleting x, basis first Inquiry operation finds the bucket where x, then removes key-value pair from bucket, the corresponding bit of bucket where x in set figure of laying equal stress on.
CN201710014205.9A 2017-01-09 2017-01-09 Data storage and query method based on matrix hash Active CN108287840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710014205.9A CN108287840B (en) 2017-01-09 2017-01-09 Data storage and query method based on matrix hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710014205.9A CN108287840B (en) 2017-01-09 2017-01-09 Data storage and query method based on matrix hash

Publications (2)

Publication Number Publication Date
CN108287840A true CN108287840A (en) 2018-07-17
CN108287840B CN108287840B (en) 2022-05-03

Family

ID=62819334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710014205.9A Active CN108287840B (en) 2017-01-09 2017-01-09 Data storage and query method based on matrix hash

Country Status (1)

Country Link
CN (1) CN108287840B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108989452A (en) * 2018-08-07 2018-12-11 佛山市苔藓云链科技有限公司 A kind of data transmission of internet of things device
CN109471635A (en) * 2018-09-03 2019-03-15 中新网络信息安全股份有限公司 A kind of algorithm optimization method realized based on Java Set set
CN109597807A (en) * 2018-10-25 2019-04-09 阿里巴巴集团控股有限公司 Number storehouse list processing method and apparatus
CN109766341A (en) * 2018-12-27 2019-05-17 厦门市美亚柏科信息股份有限公司 A kind of method, apparatus that establishing Hash mapping, storage medium
CN109800228A (en) * 2018-12-28 2019-05-24 深圳竹云科技有限公司 A method of efficiently quickly solving hash conflict
CN111552692A (en) * 2020-04-30 2020-08-18 南方科技大学 Plus-minus cuckoo filter
CN111563199A (en) * 2020-04-26 2020-08-21 北京奇艺世纪科技有限公司 Data processing method and device
CN112416933A (en) * 2020-11-19 2021-02-26 重庆邮电大学 High-performance hash table implementation method based on-chip and off-chip memories
CN112699323A (en) * 2021-01-07 2021-04-23 西藏宁算科技集团有限公司 Cloud caching system and cloud caching method based on double bloom filters
CN113342828A (en) * 2021-07-02 2021-09-03 广东唯审信息科技有限公司 Hash table conflict resolution method based on d-dimensional mapping

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317795A (en) * 2014-08-28 2015-01-28 华为技术有限公司 Two-dimensional filter generation method, query method and device
CN105027527A (en) * 2012-12-31 2015-11-04 华为技术有限公司 Scalable storage systems with longest prefix matching switches
CN105468298A (en) * 2015-11-19 2016-04-06 中国科学院信息工程研究所 Key value storage method based on log-structured merged tree
US20160196306A1 (en) * 2015-01-07 2016-07-07 International Business Machines Corporation Technology for join processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105027527A (en) * 2012-12-31 2015-11-04 华为技术有限公司 Scalable storage systems with longest prefix matching switches
CN104317795A (en) * 2014-08-28 2015-01-28 华为技术有限公司 Two-dimensional filter generation method, query method and device
US20160196306A1 (en) * 2015-01-07 2016-07-07 International Business Machines Corporation Technology for join processing
CN105468298A (en) * 2015-11-19 2016-04-06 中国科学院信息工程研究所 Key value storage method based on log-structured merged tree

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108989452A (en) * 2018-08-07 2018-12-11 佛山市苔藓云链科技有限公司 A kind of data transmission of internet of things device
CN109471635A (en) * 2018-09-03 2019-03-15 中新网络信息安全股份有限公司 A kind of algorithm optimization method realized based on Java Set set
CN109471635B (en) * 2018-09-03 2021-09-17 中新网络信息安全股份有限公司 Algorithm optimization method based on Java Set implementation
CN109597807A (en) * 2018-10-25 2019-04-09 阿里巴巴集团控股有限公司 Number storehouse list processing method and apparatus
CN109766341A (en) * 2018-12-27 2019-05-17 厦门市美亚柏科信息股份有限公司 A kind of method, apparatus that establishing Hash mapping, storage medium
CN109766341B (en) * 2018-12-27 2022-04-22 厦门市美亚柏科信息股份有限公司 Method, device and storage medium for establishing Hash mapping
CN109800228A (en) * 2018-12-28 2019-05-24 深圳竹云科技有限公司 A method of efficiently quickly solving hash conflict
CN109800228B (en) * 2018-12-28 2023-03-10 深圳竹云科技有限公司 Method for efficiently and quickly solving hash conflict
CN111563199A (en) * 2020-04-26 2020-08-21 北京奇艺世纪科技有限公司 Data processing method and device
CN111563199B (en) * 2020-04-26 2023-10-10 北京奇艺世纪科技有限公司 Data processing method and device
CN111552692A (en) * 2020-04-30 2020-08-18 南方科技大学 Plus-minus cuckoo filter
CN111552692B (en) * 2020-04-30 2023-04-07 南方科技大学 Plus-minus cuckoo filter
CN112416933B (en) * 2020-11-19 2022-09-23 重庆邮电大学 High-performance hash table implementation method based on-chip and off-chip memories
CN112416933A (en) * 2020-11-19 2021-02-26 重庆邮电大学 High-performance hash table implementation method based on-chip and off-chip memories
CN112699323A (en) * 2021-01-07 2021-04-23 西藏宁算科技集团有限公司 Cloud caching system and cloud caching method based on double bloom filters
CN113342828A (en) * 2021-07-02 2021-09-03 广东唯审信息科技有限公司 Hash table conflict resolution method based on d-dimensional mapping

Also Published As

Publication number Publication date
CN108287840B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN108287840A (en) A kind of data storage and query method based on matrix Hash
Li et al. Packet forwarding in named data networking requirements and survey of solutions
CN110083601B (en) Key value storage system-oriented index tree construction method and system
Xia et al. Refreshing the sky: the compressed skycube with efficient support for frequent updates
CN103810237B (en) Data managing method and system
JP6356675B2 (en) Aggregation / grouping operation: Hardware implementation of hash table method
US20140188885A1 (en) Utilization and Power Efficient Hashing
CN109255055A (en) A kind of diagram data access method and device based on packet associated table
CN105574054B (en) A kind of distributed caching range query method, apparatus and system
CN106202548A (en) Date storage method, lookup method and device
CN106294772A (en) The buffer memory management method of distributed memory columnar database
CN102819586A (en) Uniform Resource Locator (URL) classifying method and equipment based on cache
Xiao et al. Using parallel bloom filters for multiattribute representation on network services
US20080133494A1 (en) Method and apparatus for searching forwarding table
CN112000846A (en) Method for grouping LSM tree indexes based on GPU
Hua et al. Nest: Locality-aware approximate query service for cloud computing
CN113157943A (en) Distributed storage and visual query processing method for large-scale financial knowledge map
CN106919691A (en) Method, device and the searching system retrieved based on web page library
CN104391992A (en) Asset data-oriented data processing system
CN106156171A (en) A kind of enquiring and optimizing method of Virtual asset data
CN118227518B (en) Table entry storage and searching method and device, network equipment and storage medium
Skandar et al. An efficient duplication record detection algorithm for data cleansing
Gong et al. Abc: a practicable sketch framework for non-uniform multisets
CN109522242A (en) A kind of method and apparatus for searching for Cache data
JP6006740B2 (en) Index management device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant