CN106326475A - High-efficiency static hash table implement method and system - Google Patents

High-efficiency static hash table implement method and system Download PDF

Info

Publication number
CN106326475A
CN106326475A CN201610793354.5A CN201610793354A CN106326475A CN 106326475 A CN106326475 A CN 106326475A CN 201610793354 A CN201610793354 A CN 201610793354A CN 106326475 A CN106326475 A CN 106326475A
Authority
CN
China
Prior art keywords
value
key
hash
rank
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610793354.5A
Other languages
Chinese (zh)
Other versions
CN106326475B (en
Inventor
刘燕兵
张春燕
卢毓海
谭建龙
郭莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201610793354.5A priority Critical patent/CN106326475B/en
Publication of CN106326475A publication Critical patent/CN106326475A/en
Application granted granted Critical
Publication of CN106326475B publication Critical patent/CN106326475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a high-efficiency static hash table implement method and system. The method comprises the steps of 1, setting the size hash-bit of a hash bucket, generating a plurality of data pairs, allowing key [i] and value [i] to correspond to key word and value; 2, according to the key [i] value, constructing a hash table by using rank operation, and calculating C table and D table; 3, according to C table and D table, calculating rank (h), based on the rank (k) value, storing the corresponding key [i] and value [i]; 4, according to the value key required to query, determining whether the element exists in the hash table or not, if yes, querying and returning the value value in the corresponding storage position, otherwise the access is failed; 5, based on the results of step 4, returning the result information. Rank select algorithm is used to achieve the construction and access of the new static hash table, and the high-efficiency static hash table implement method and system can be used in the fields, such as content filtering, information security and the like.

Description

A kind of efficient static Hashing table realization method and system
Technical field
It is contemplated that design static Hashing gauge pressure compression algorithm, for the field such as information filtering, information security.Due to static state The storage of Hash table takes up room relatively big, and algorithm now also has significant optimization space for the compression of static Hashing table.This Bright it is intended to static Hashing table is compressed, the access to static Hashing table can be supported.
Background technology
Look-up table in data structure is divided into static lookup table and dynamic look-up table.Look-up table is mainly for the data in table Constantly search, until finding out its required value.The type of static lookup table mainly include sequential search, two points look into Look for, block research and the lookup etc. of static tree table, and the type of dynamic look-up table mainly includes binary sort tree, balanced binary Tree, B-tree and B+ tree etc..The efficiency that the lookup algorithm of above-mentioned introduction is searched depends on number of comparisons, searches average time the most, its Efficiency is the lowest, and the search efficiency of look-up table is the highest on average.For rapidly locating, it is possible to use Hash table promotes Access efficiency.
Hash table is called again hash table, and it utilizes key-value pair (key-value) to store data, is a kind of special data Structure.Hash table accesses record by key-value pair is mapped to a position in table, to accelerate the speed searched.This maps Function is called hash function, and the array depositing record is called Hash table.Mapping in Hash table is not necessarily injection, therefore may Produce the phenomenon of hash-collision, data structure has a lot of algorithm can solve hash-collision.The application scenarios of Hash table is very Extensively, application Hash table storage data realize quickly searching is the most common operation.In actual computational science, Hash table Can the aspect such as Route Selection in peer-to-peer network (P2P), database lookup, compression ordinal number index and information security play Huge application.
In real life, Hash table also has important effect.Such as, bank's Foreground Data to be carried out and back-end data When carrying out reconciliation process, corresponding value can be found according to key, thus complete the reconciliation work of foreground and background data;Profit in life IC-card when taking pubic transport, using the numbering of IC-card as key, getting on the bus swipes the card is recorded as the insertion process of Hash table, stores in value Pick-up time and name of station, getting off swipes the card is recorded as the search procedure of Hash table, deletes in Hash table this number information simultaneously and counts Evaluation time and spacing.
Hash table, according to whether supporting dynamic additions and deletions operation, is divided into static Hashing table and dynamic Hash table.Static Hashing table It is that inquiry operation is only supported for HASH operation, does not support that dynamic additions and deletions operate.Static Hashing table is applicable to once data are pre- Depositing to Hash table, work afterwards is mainly responsible for quickly searching data.In pattern matching algorithm, static Hashing table accords with very much Closing the highly effective algorithms such as the application background of some algorithm, such as Wu-Manber, Karp-Rabin is all to utilize HASH function to rule Being processed to matched text, rule is once loaded onto in Hash table by these hashing operation often in advance, carries out the most again Coupling.
Hash table algorithm now mainly includes linear probing hash algorithm, binary chop algorithm and two points of hash algorithms. These algorithms also meet the demand of static Hashing table, can efficiently locate data when storage and inquiry, but its space is deposited Storage also has the biggest room for promotion with search efficiency aspect.The thought of each algorithm is briefly described below.
Linear probing algorithm: when the Hash Round Robin data partition p obtained by hash function H (key) conflict of keyword key, with p For standard, additionally obtain new Hash Round Robin data partition p1 by hash function H (key) ..., so it is iterated calculating, when finally Till when having a Hash Round Robin data partition pi to occur without conflict, and corresponding keyword and value are stored on this Hash Round Robin data partition.Search Time, first passing through hash function H (key), find out and whether Hash bucket exists keyword key, if existing, returning value value.
Binary chop algorithm: during storage, sorts to keyword key value;During lookup, utilize Bisection Algorithms to search key value, enter And find value.
Two points of hash algorithms: chain address is divided into different Hash buckets, during storage, utilize binary chop algorithm to deposit in each bucket Storage.During lookup, first pass through hash function and judge place Hash bucket, utilize binary chop algorithm to search key value in Hash bucket, enter And find value.
The algorithm of above-mentioned Hash table is the most all widely used, and its storage and search efficiency are each variant, storage Taken up space and search efficiency aspect is each has something to recommend him.In order to design more efficient hash table algorithm, more save Hash table institute The space taken, the present invention will utilize rank-select algorithm to be compressed static Hashing table, and this algorithm is at room and time Aspect all has greatly improved compared with other algorithms.Rank-select algorithm is document " Jacobson in 1989 G.Space-efficient static trees and graphs[C]//Foundations of Computer Science, 1989., 30th Annual Symposium on.IEEE, 1989:549-554. " in bit vector storage tree knot The algorithm of a kind of compression stroke that structure is proposed.Rank-select algorithm is described in detail below.Rank-select algorithm exists Primarily to compressive abutment tree construction in document, as it is shown in figure 1, original tree construction based on pointer storage is reduced to profit Storing with binary string, its main thought make use of rank-select algorithm exactly.
For introducing rank-select algorithm, first defined parameters rank (m): from first position in string of binary characters Put and start at the m of position the number of 1.Such as rank (10)=7 in Fig. 2.
In Fig. 3, binary tree marks node from root node successively according to binary tree height size, and wherein black represents that node is deposited , white represents that node is empty, carries out hierarchical storage according to the hierarchical structure of tree, it can be seen that having 8 nodes is 1, represents 8 nodal informations in original matrix.As can be seen from Figure, 1 node of the storage of former binary tree structure utilizes n byte to deposit Storage, if stored according to position, its space hold can greatly reduce.
Document "V,Navarro G.Rank and select revisited and extended[J] .Theoretical prove for size to be the bit vector of n in Computer Science, 2007,387 (3): 332-347. " B, it is only necessary to the additional storage space of o (n), can realize the Rank operation of O (1) time complexity.SSE instruction-level contains _ mm_ The rank bit manipulation of more than 64 is supported in the instruction such as popcnt_u64, achieves rank operation so that operation is more on hardware Fast.Additionally, Rank-select algorithm can achieve surprising results in terms of compression sparse matrix.
Rank-select algorithm can be effectively compressed data, can be converted into the storage organization of Hash table, multiple for summary O (1) The rank operation of miscellaneous degree, now introduces an example and its algorithm idea is described.As shown in Figure 4, for bit vector that size is n*8 position B, D table and rank that C table memory length respectively is 8 and 32 operation, i.e. store up till now till position, before 1 number.Such as The number of the 1 of the 1st position h in inquiry B [6], then rank (h)=C [1]+D [1*4+2]+_ mm_popcnt_u64 (B [6] > > 7)=6+4+1=11.Wherein " _ mm_popcnt_u64 " represents a built-in command of SSE4.2 instruction set, represents current location Number to initial position 1.
Next the Rank operation introducing O (1) time complexity implements process, as shown in Figure 5.Figure is in example The general expansion of Rank operation: Hash table coexists and stores up n key-value pair, for each D vector, memory length is s position, institute Log is accounted for by figure place2R, each C vector, memory length is r position, and shared figure place is log2N, so D vector accounts for altogetherRatio Spy, C vector accounts for altogetherBit, therefore exceptional space has
M = n s log 2 r + n r log 2 n - - - ( 3 )
When calculating the rank value of m-th position, wherein m=i*r+j*s+k, wherein0≤k < s, permissible Below equation is utilized to calculate:
C [ i ] = Σ k = 0 i r - 1 B [ k ] - - - ( 4 )
D [ i , j ] = Σ k = i r i r + j s - 1 B [ k ] - - - ( 5 )
Rank (B, m)=C [i]+D [i, j]+rank (Bi*r+j*s,k) (6)
Wherein, rank (Bi*r+j*s, k) represent from j+j*s position of the i-th * to m-th position, the number of 1, signal Figure is Fig. 6.
In sum, Rank-select algorithm can be put into practice well on compressive abutment list structure, can be effectively compressed sky Between.Static Hashing table is also required to optimize space efficiency further, therefore it is contemplated that utilizes Rank-select algorithm to realize The structure of Novel static Hash table and access.
Summary of the invention
The present invention provides a kind of efficient static Hashing table realization method and system, it is possible to utilize Rank-select algorithm Realize structure and the access of Novel static Hash table.
Static Hashing table can effectively be compressed by the present invention, and can realize directly accessing.Fig. 7 is that tradition is breathed out The storage mode of uncommon table, H represents the size of Hash bucket, and n represents keyword number.Accounting for 4 bytes according to pointer, integer takies 4 The space of individual byte, taken up space total 4H+8n byte.
Foregoing teachings describes the detailed process of the rank operation of O (1) complexity, realizes for the ease of computer, this The storage organization of bright combination computer, the rank devising a kind of O (1) complexity operates specific implementation, and this is also design Basic thought in Hash table compression scheme based on Rank operation.Taking r=256 in experiment, s=64, C [i] are with an int table Showing, D [i] represents with a char, and the space the most additionally taken is:
M = n s log 2 r + n r log 2 n = n s * 8 + n r * 32 = n 64 * 8 + n 256 * 32 = n 4
Original pointer is then changed to be a binary vector B by Hash compression algorithm based on Rank operation, it is necessary first to Size hash_bits of Hash table is set, during keyword key storage, needs first to do modular arithmetic h=key mod (hash_bits). By calculating the value of rank (h), thus constantly map that to memory element, as shown in Figure 8.From the foregoing it will be appreciated that need waste Exceptional space isThe bit size of Hash bucket (H be), so memory space altogether isWord Joint, space greatly reduced than originally.
For convenience of the storage of static Hashing table, set up following structure for follow-up use:
CB
{
C
D
bitmap[4]
}
Each CB is a structure, represents Hash table structure, comprises three variablees.C table represents storage regular length r Rank operation, D table represent storage regular length s rank operation, C table is integer, and D table is that char type (also may be used by D table and C table It is set to other types, as long as the figure place storage that can regular length r or s position rank be operated is upper).Deposit for convenience of computer Storage, arranges r=256, s=64.Bitmap be size be the signless long array of 4, bitmap [i] (i=0,1, 2,3) certain element of bitmap is represented.Owing to each signless long takies 64 sizes, so, a bitmap Array takies 256, the length of the rank operation of C table storage the most herein, and the element value of each bitmap is just It is the length of the rank operation of D table storage.
Set up Hash table CB array, it is assumed that certain Elements C B of Hash table [j], then the value in this element structure body can represent For CB [j] .C, CB [j] .D, CB [j] .bitmap [i] (i=0,1,2,3).For the ease of describing C table and D table, CB below [j] .C and C [j] represents equivalent meanings, and CB [j] .D and D [j] represents equivalent meanings.
For describing present disclosure in detail, this section is first introduced and is built Hash table and several big parts of the system of access thereof and stream Journey process, introduces concrete building Hash table and access the main process of Hash table subsequently.
In the present invention, build Hash table and access system mainly comprise with lower component, as shown in Figure 9:
1) system pretreatment component: set Hash bucket size hash_bit, generate multiple data pair, key [i] and value [i] corresponds to keyword and value.
2) build Hash table parts: according to key [i] value, utilize rank operation to build Hash table, and calculate C table and D table.
3) storage information components: according to C table and D table, calculates rank (h), wherein h=key mod (hash_bits), and Value according to rank (h) stores corresponding key [i] and value [i].
4) information components is accessed: according to value key to be inquired about, it is judged that whether Hash table exists this element, if depositing , then store position enquiring in correspondence and return value value, otherwise, accessing unsuccessfully.
5) return information parts: according to the result of previous step gained, return object information.
It is described above and builds Hash table and each parts of the system of access thereof, for the ease of understanding its mistake building and accessing Journey, facilitates Computer Storage, and the calculating process of its rank operation can represent by pseudo code below:
The calculating process natural language description of above-mentioned rank operation is as follows:
1) query interface B has how many 1 before i-th bit, first subscript i and 63 are done the value with computing and be assigned to k, then The value that subscript i moves to right 8 is assigned to i1, then subscript i is moved to right 6 and deducts i1The value moving to left 2 is assigned to i2, then i1For C The subscript that table is corresponding, it is the subscript that D table is corresponding that subscript i moves to right 6.
2) variable B will be designated as (i from down1<<8+i2< < 6) start, (i1<<8+i2< < 6+k-1) this segment variable of terminating composes It is worth in e.
3) finally return that the value summation for C [i1], D [i > > 6] with _ mm_popcnt_u64 (e), be i-th bit in variable B The number of front 1, is also rank (i) operation of variable B.
The calculating process of rank operation all can frequently be used, for hereinbefore to C table in the structure and access of Hash table With implementing in D table, the size of Hash bucket introduced below is no less than 28, all utilize above rank operation store and Access data.The present invention is divided into structure Hash table and two processes of key word of the inquiry, so sketching its base for the two process Hash table storage and the concrete steps of access in rank operation.
1. the concrete steps of Hash table storage algorithm based on rank operation:
1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are relative Should.
2) key value is the most disposably imported in bitmap.If key-value is to total num element, then CB table is big Little it isFirst definition Hash bucket quantity be hash_bit, hash_bit value be the size of CB table Clength and 28Product.To the bitmap (i.e. containing the bitmap of 4 elements) of 4 sizes of distribution in each Hash bucket, often Individual bitmap element stores the data of 64, and Initialize installation is that everybody is 0, as shown in Figure 10.According to time degree O (1) Rank operation carries out recording the data content of key array.First key Yu hash_bit delivery is obtained h, it is ensured that fall at Hash bucket In;Then h is stored on Hash bucket correspondence position, according to the position of equation below record h, until all of key value all depends on Secondary record position.
Q=h&255
CB [h>>8] .bitmap [q>>6] |=(1<<(q&63)) (7)
3) storage calculates C array and D array.Owing to second step is equivalent to the size according to h value, record all key values Relevant position information, so available aforesaid rank operation starts to record the corresponding of C array and D array from Hash table CB [0] Information, the number of 1 in wherein C [i] represents above CB [i-1] individual Hash bucket, CB [i] .D [1] represents in CB [i] .bitmap [0] The number of 1 ... CB [i] .D [3] represents the number of 1 in CB [i] .bitmap [0] to CB [i] .bitmap [2].
4) utilize C table and D table information, utilize the algorithm of rank operation hereinbefore to calculate the rank that each key value is corresponding Value.
5) rank value is utilized to record each Hash bucket interior element number, according to the laminated structure record of Hash table C.Utilize Rank value is as sequential storage key, value value.If different key has same rank value, i.e. fall to going out in same Hash bucket Show hash-collision.Present rank value has had second layer meaning, is i.e. sorted for h=key mod (hash_bit), Utilize rank value represent sequence position, then carry out store key-value to when, when its rank value is identical, it is described There is the element of more than 2 in Hash bucket, for the ease of storage, primary order stores according to rank value size order, secondary suitable Sequence stores successively according to rank value is identical.
6) storage key, value value is in array.
2. the concrete steps of Hash table access algorithm based on rank operation:
1) first data key to be inquired about and hash_bit delivery are obtained h.
2) q=h&255 is calculated, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, I.e. in original Hash bucket, whether there is key value.If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully;If This step is judged as 1, has this key value in the most former Hash table, then need to find value value.
3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then exist Judge whether successively in this Hash bucket, containing inquiry data key, if comprising, to return value value, if not comprising, the inquiry next one Keyword, until keyword is empty, inquires about unsuccessfully.
Beneficial effects of the present invention is as follows:
The present invention utilizes Rank-select algorithm to realize structure and the access of Novel static Hash table, it is possible to breathe out static state Uncommon table is compressed, and can support the access to static Hashing table, it is possible to optimize space efficiency further;The program can be used for content The fields such as filtration, information security.
Accompanying drawing explanation
Fig. 1 is that binary string carries out storing adjacent tree construction schematic diagram.
Fig. 2 is rank operation example figure.
Fig. 3 is to utilize rank-select algorithm to carry out binary tree node storing exemplary plot.
Fig. 4 is the Rank operation example figure of O (1) time complexity.
Fig. 5 is that the Rank operation of O (1) time complexity realizes figure.
Fig. 6 is the rank operation chart calculating m-th position.
Fig. 7 is the storage mode schematic diagram of tradition Hash table.
Fig. 8 is compact Hash table storage mode schematic diagram based on Rank operation.
Fig. 9 is the system unit figure building based on rank operation static Hashing table and accessing.
Figure 10 is Hash table storage organization schematic diagram.
Figure 11 is Hash bucket example displaying figure.
Detailed description of the invention
Below by specific embodiment, the present invention will be further described.
This section is introduced mainly for the concrete steps in summary of the invention, analyzes for concrete data to be stored, It is divided into Hash table storing process based on rank operation and Hash table access process based on rank operation.
1. Hash table storing process example based on rank operation:
Aforementioned storing step is exemplified below.Assume size hash_bit=2 of Hash bucket9, data to be stored are shown in Table 1 below, then shown in corresponding h value also see table.
Table 1 Hash table storage data key, value and h value
key 1 513 65 257
value 1 2 3 4
h 1 1 65 257
Data after key delivery show according to binary representation such as Figure 11 upper left, so under the Hash table of correspondence such as Figure 11 Shown in side, so, C [0]=0, C [1]=2, C [2]=3;CB [0] .D [0]=0, CB [0] .D [1]=1, CB [0] .D [2]= 2, CB [0] .D [3]=2, CB [1] .D [0]=0, CB [1] .D [1]=1, CB [1] .D [2]=1, CB [1] .D [3]=1;Storage Key, value value, in two-dimensional array, understands according to the sequence of rank (h) value, has two key to be mapped to same during rank (h)=1 Position, is shown in Table 2, then, when storing, changes in coordinates is shown in Table 3, obtained by wherein idx is cumulative first few items, represents its storage termination The half of key coordinate deducts 1, then the one-dimension array storing key, value is shown in Table 4, and wherein, idx1 represents that the coordinate of array is compiled Number, the coordinate of key is even number, and value coordinate is odd number.
Table 2 rank (h) value and number thereof
rank(h) 1 2 3
count 2 1 1
Table 3 accumulated amount obtains coordinate idx
rank(h) 1 2 3
idx 2 3 4
Table 4 one-dimension array idx1 storage key, value value
idx1 0 1 2 3 4 5 6 7
key or value 1 1 513 2 65 3 257 4
2. Hash table query script example based on rank operation:
Hash table set up process according to storage example shown in, if whether key=513 to be inquired about in Hash table, first Calculating h=1q=1, CB [h > > 8] .bitmap [q > > 6]=1 establishment, owing to there are two key values hits this position, then search Idx1 [0] and idx1 [2] is the most equal with 513, due to idx1 [2]=513, then returns value=idx1 [3]=2, inquires about into Merit.
Design based on such scheme, below by the Hash table compression algorithm operated based on Rank and binary chop algorithm, line Property detection hash algorithm and two points of hash algorithms compare, wherein test data be 10,000,000 key-value pair, key and Value is 32 signless integers of stochastic generation, takies 76.294MB memory space altogether.Data to be checked be 10,000,000 with 32 signless integers that machine generates, arranging Hit ratio is 1%.Experiment test environment is 64 WIN7 operating systems, Intel CPU i5,4GB internal memory.
Owing to Hash table this parameter of bucket size can be adjusted, so for different algorithms, Hash table in Shi Yan Bucket size have also been made different tests.Hash table compression algorithm based on Rank operation can utilize in SSE instruction set _ mm_ The rank bit manipulation of more than 64 is supported in the instruction such as popcnt_u64, uses SSE instruction set and the SSE not used in this experiment test The algorithm of instruction set has all done corresponding comparison.
Experiment one: the Hash table compression algorithm based on Rank operation using SSE instruction set with not using SSE instruction set
As shown in Table 5 and 6, the exceptional space of two kinds of algorithms all increases along with the increase of Hash table bucket, and inquiry velocity exists Hash table bucket is 229Shi Sudu is maximum.Use SSE instruction set than the Hash gauge pressure based on Rank operation not using SSE instruction set Compression algorithm in the case of formed objects Hash table bucket, inquiry velocity decided advantage.This illustrates the rank realized within hardware Bit manipulation is better than the rank bit manipulation of software design.
The Hash table compression algorithm (Rank for SSE) that table 5 operates based on Rank
Hash table bucket size Key-value space (MB) Exceptional space (MB) Inquiry velocity (ten thousand times/second)
224 76.294 31.265 1779
225 76.294 38.023 2463
226 76.294 45.478 2906
227 76.294 56.802 3278
228 76.294 77.490 4000
229 76.294 117.838 4566
230 76.294 198.014 4000
The Hash table compression algorithm (Rank None SSE) that table 6 operates based on Rank
Hash table bucket size Key-value space (MB) Exceptional space (MB) Inquiry velocity (ten thousand times/second)
224 76.294 31.265 744
225 76.294 38.023 1303
226 76.294 45.478 2000
227 76.294 56.802 2785
228 76.294 77.490 3367
229 76.294 117.838 4255
230 76.294 198.014 3773
Experiment two: binary chop algorithm, Hash binary chop algorithm and linear probing hash algorithm
Table 7 binary chop algorithm (CBinarySearch)
Key-value space (MB) Exceptional space (MB) Inquiry velocity (ten thousand times/second)
76.294 0 172
Table 8 Hash binary chop algorithm (CHashBinarySearch)
Hash table bucket size Key-value space (MB) Exceptional space (MB) Inquiry velocity (ten thousand times/second)
224 76.294 64 1600
225 76.294 128 1776
226 76.294 256 1883
227 76.294 512 1560
Table 9 linear probing hash algorithm (CLinearProbe)
By above 3 experiments, it will thus be seen that 1: binary chop algorithm need not exceptional space and Hash bucket size ginseng Number, but inquiry velocity is slow.2: linear probing algorithm is in Hash bucket size 226For time, looking into of 35,580,000 times/second can be reached Asking speed, efficiency is optimum in three kinds of algorithms, but exceptional space takies relatively big, reaches 435.706MB.3: Hash binary chop In Hash bucket size 2 in algorithm26For time, the inquiry velocity of 18,830,000 times/second can be reached, speed is inferior to linear probing algorithm, but It is that exceptional space is less, for 256MB.
Experiment three: compression algorithm based on Rank operation compression Hash table and algorithm contrast in experiment two
10 5 kinds of Hash compression algorithm Experimental comparison's tables of table
By above Experimental comparison, it will thus be seen that Hash table compression algorithm based on Rank operation in inquiry velocity and accounts for All having great advantage by space aspect, its speed and exceptional space are all considerably beyond other three kinds of algorithms.Use SSE instruction set Time, Hash table compression algorithm based on Rank operation is in Hash bucket size 229For time, can reach 45,660,000 times/second inquiry speed Degree, exceptional space only takes up 117.838MB.
Above example is only limited in order to technical scheme to be described, the ordinary skill of this area Technical scheme can be modified or equivalent by personnel, without departing from the spirit and scope of the present invention, and this The protection domain of invention should be as the criterion with described in claims.

Claims (9)

1. an efficient static Hashing table implementation method, it is characterised in that comprise the following steps:
1) set Hash bucket size hash_bit, generate multiple data pair, by key [i] and value [i] corresponding to keyword with Value;
2) according to key [i] value, utilizing rank operation to build Hash table, and calculate C table and D table, wherein C table represents that storage is fixing The rank operation of length r, D table represents the rank operation of storage regular length s;
3) calculate rank (h), wherein h=key mod (hash_bits) according to C table and D table, and store according to the value of rank (h) Corresponding key [i] and value [i];
4) judging whether Hash table exists this element according to value key to be inquired about, if existing, looking in correspondence storage position Ask and return value value, otherwise access failure;
5) according to step 4) result of gained, return object information.
2. the method for claim 1, it is characterised in that step 3) use following steps to realize Kazakhstan based on rank operation Uncommon table storing process:
3-1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are corresponding;
3-2) the most disposably importing key value in bitmap, the quantity defining Hash bucket is hash_bit, according to time degree O (1) data content of rank operation note key array;Key Yu hash_bit delivery is obtained h, it is ensured that fall in Hash bucket, Then h is stored on Hash bucket correspondence position, according to the size of h value, records the relevant position information of all key values;
3-3) storage calculates C array and D array, utilizes rank operation to start to record C array and D array from Hash table CB [0] Corresponding information;
3-4) utilize C table and D table information, calculate the rank value that each key value is corresponding;
3-5) utilize rank value to record each Hash bucket interior element number, according to the laminated structure record of Hash table C, utilize rank Value is as sequential storage key, value value;
3-6) storage key, value value is in array.
3. method as claimed in claim 2, it is characterised in that step 3-2) in, the value of hash_bit is the size of CB table Clength and 28Product, the bitmap, each bitmap distributing 4 sizes in each Hash bucket is stored the data of 64, Initialize installation is that everybody is 0.
4. method as claimed in claim 3, it is characterised in that step 3-2) according to the position of equation below record h, until institute Some key value record positions the most successively:
Q=h&255,
CB [h>>8] .bitmap [q>>6] |=(1<<(q&63)).
5. method as claimed in claim 2, it is characterised in that step 3-5) in, when storing key-value pair, if different Key have a same rank value, then primary order is according to the storage of rank value size order, and secondary order is identical successively according to rank value Storage.
6. the method for claim 1, it is characterised in that step 4) use following steps to realize Kazakhstan based on rank operation Uncommon table access process:
4-1) data key to be inquired about and hash_bit delivery are obtained h;
4-2) calculate q=h&255, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, i.e. Key value whether is had in original Hash bucket;If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully;
If this step is judged as 1, in the most former Hash table, there is this key value, then need to find value value;
4-3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then at this Judging whether successively in Hash bucket, containing inquiry data key, if comprising, to return value value, if not comprising, the inquiry next one closes Key word, until keyword is empty, inquires about unsuccessfully.
7. an efficient static Hashing table realizes system, it is characterised in that including:
System pretreatment component, is used for setting Hash bucket size hash_bit, is generating multiple data pair, by key [i] and Value [i] corresponds to keyword and value;
Build Hash table parts, for according to key [i] value, utilize rank operation to build Hash table, and calculate C table and D table, its Middle C table represents the rank operation of storage regular length r, and D table represents the rank operation of storage regular length s;
Storage information components, for calculating rank (h), wherein h=key mod (hash_bits) according to C table and D table, and according to The value of rank (h) stores corresponding key [i] and value [i];
Access information components, for judging whether Hash table exists this element according to value key to be inquired about, if existing, Corresponding storage position enquiring also returns value value, otherwise accesses failure;
Return information parts, for the result according to access information components gained, return object information.
8. system as claimed in claim 7, it is characterised in that described storage information components use following steps realize based on The Hash table storing process of rank operation:
1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are corresponding;
2) the most disposably importing key value in bitmap, the quantity defining Hash bucket is hash_bit, according to time degree O (1) The data content of rank operation note key array;Key Yu hash_bit delivery is obtained h, it is ensured that fall in Hash bucket, so After h is stored on Hash bucket correspondence position, according to the size of h value, record the relevant position information of all key values;
3) storage calculates C array and D array, utilizes rank operation to start to record C array and the phase of D array from Hash table CB [0] Answer information;
4) utilize C table and D table information, calculate the rank value that each key value is corresponding;
5) utilize rank value to record each Hash bucket interior element number, according to the laminated structure record of Hash table C, utilize rank value As sequential storage key, value value;
6) storage key, value value is in array.
9. system as claimed in claim 7, it is characterised in that described access information components use following steps realize based on The Hash table access process of rank operation:
1) data key to be inquired about and hash_bit delivery are obtained h;
2) q=h&255 is calculated, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, i.e. exist Originally whether there is key value in Hash bucket;If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully;If this step It is judged as 1, in the most former Hash table, has this key value, then need to find value value;
3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then in this Kazakhstan Judging whether successively in uncommon bucket containing inquiry data key, if comprising, returning value value, if not comprising, inquiry is next crucial Word, until keyword is empty, inquires about unsuccessfully.
CN201610793354.5A 2016-08-31 2016-08-31 Efficient static hash table implementation method and system Active CN106326475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610793354.5A CN106326475B (en) 2016-08-31 2016-08-31 Efficient static hash table implementation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610793354.5A CN106326475B (en) 2016-08-31 2016-08-31 Efficient static hash table implementation method and system

Publications (2)

Publication Number Publication Date
CN106326475A true CN106326475A (en) 2017-01-11
CN106326475B CN106326475B (en) 2019-12-27

Family

ID=57786280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610793354.5A Active CN106326475B (en) 2016-08-31 2016-08-31 Efficient static hash table implementation method and system

Country Status (1)

Country Link
CN (1) CN106326475B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766258A (en) * 2017-09-27 2018-03-06 精硕科技(北京)股份有限公司 Memory storage method and apparatus, memory lookup method and apparatus
CN109582598A (en) * 2018-12-13 2019-04-05 武汉中元华电软件有限公司 A kind of preprocess method for realizing efficient lookup Hash table based on external storage
CN110413215A (en) * 2018-04-28 2019-11-05 伊姆西Ip控股有限责任公司 For obtaining the method, equipment and computer program product of access authority
CN110457535A (en) * 2019-08-14 2019-11-15 广州虎牙科技有限公司 Hash bucket lookup method, Hash table storage, Hash table lookup method and device
CN110928483A (en) * 2018-09-19 2020-03-27 华为技术有限公司 Data storage method, data acquisition method and equipment
CN111177476A (en) * 2019-12-05 2020-05-19 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium
WO2020107484A1 (en) * 2018-11-30 2020-06-04 华为技术有限公司 Acl rule classification method, lookup method and device
CN111241146A (en) * 2018-11-29 2020-06-05 北京数安鑫云信息技术有限公司 Method and system for counting TopK-Frequency information
CN111694559A (en) * 2020-05-21 2020-09-22 北京云杉世纪网络科技有限公司 Method and device for realizing hash table in GC program language
CN113448996A (en) * 2021-06-11 2021-09-28 成都三零嘉微电子有限公司 High-speed searching method for IPSec security policy database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799596A (en) * 2011-05-27 2012-11-28 广州明朝网络科技有限公司 Key word filtering method and system based on network application
CN104881439A (en) * 2015-05-11 2015-09-02 中国科学院信息工程研究所 Method and system for space-efficient multi-pattern matching
CN105359142A (en) * 2014-05-23 2016-02-24 华为技术有限公司 Hash join method, device and database management system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799596A (en) * 2011-05-27 2012-11-28 广州明朝网络科技有限公司 Key word filtering method and system based on network application
CN105359142A (en) * 2014-05-23 2016-02-24 华为技术有限公司 Hash join method, device and database management system
CN104881439A (en) * 2015-05-11 2015-09-02 中国科学院信息工程研究所 Method and system for space-efficient multi-pattern matching

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766258A (en) * 2017-09-27 2018-03-06 精硕科技(北京)股份有限公司 Memory storage method and apparatus, memory lookup method and apparatus
CN110413215B (en) * 2018-04-28 2023-11-07 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for obtaining access rights
CN110413215A (en) * 2018-04-28 2019-11-05 伊姆西Ip控股有限责任公司 For obtaining the method, equipment and computer program product of access authority
CN110928483B (en) * 2018-09-19 2021-04-09 华为技术有限公司 Data storage method, data acquisition method and equipment
CN110928483A (en) * 2018-09-19 2020-03-27 华为技术有限公司 Data storage method, data acquisition method and equipment
CN111241146B (en) * 2018-11-29 2023-09-19 北京数安鑫云信息技术有限公司 Method and system for counting TopK-Frequency information
CN111241146A (en) * 2018-11-29 2020-06-05 北京数安鑫云信息技术有限公司 Method and system for counting TopK-Frequency information
WO2020107484A1 (en) * 2018-11-30 2020-06-04 华为技术有限公司 Acl rule classification method, lookup method and device
CN109582598B (en) * 2018-12-13 2023-05-02 武汉中元华电软件有限公司 Preprocessing method for realizing efficient hash table searching based on external storage
CN109582598A (en) * 2018-12-13 2019-04-05 武汉中元华电软件有限公司 A kind of preprocess method for realizing efficient lookup Hash table based on external storage
CN110457535A (en) * 2019-08-14 2019-11-15 广州虎牙科技有限公司 Hash bucket lookup method, Hash table storage, Hash table lookup method and device
CN111177476A (en) * 2019-12-05 2020-05-19 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium
CN111177476B (en) * 2019-12-05 2023-08-18 北京百度网讯科技有限公司 Data query method, device, electronic equipment and readable storage medium
CN111694559A (en) * 2020-05-21 2020-09-22 北京云杉世纪网络科技有限公司 Method and device for realizing hash table in GC program language
CN111694559B (en) * 2020-05-21 2023-07-21 北京云杉世纪网络科技有限公司 Method and device for implementing hash table in GC program language
CN113448996A (en) * 2021-06-11 2021-09-28 成都三零嘉微电子有限公司 High-speed searching method for IPSec security policy database
CN113448996B (en) * 2021-06-11 2022-09-09 成都三零嘉微电子有限公司 High-speed searching method for IPSec security policy database

Also Published As

Publication number Publication date
CN106326475B (en) 2019-12-27

Similar Documents

Publication Publication Date Title
CN106326475A (en) High-efficiency static hash table implement method and system
Bentley et al. Decomposable searching problems I. Static-to-dynamic transformation
CN101404032B (en) Video retrieval method and system based on contents
US8219550B2 (en) Methods and systems for implementing approximate string matching within a database
CN102053992B (en) Clustering method and system
CN103902702A (en) Data storage system and data storage method
CN108897761A (en) A kind of clustering storage method and device
CN103577440A (en) Data processing method and device in non-relational database
CN105975587A (en) Method for organizing and accessing memory database index with high performance
US7020782B2 (en) Size-dependent hashing for credit card verification and other applications
CN102591855A (en) Data identification method and data identification system
CN103902701A (en) Data storage system and data storage method
US8028000B2 (en) Data storage structure
CN105117442A (en) Probability based big data query method
CN103914456A (en) Data storage method and system
CN104486777A (en) Method and device for processing data
CN105159950A (en) Mass data real-time sorting and inquiring method and system
CN100476824C (en) Method and system for storing element and method and system for searching element
CN101751475B (en) Method for compressing section records and device therefor
CN105357247A (en) Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network
CN112434031A (en) Uncertain high-utility mode mining method based on information entropy
CN108280226A (en) Data processing method and relevant device
CN106844541A (en) A kind of on-line analytical processing method and device
CN106845787A (en) A kind of data method for automatically exchanging and device
CN110221778A (en) Processing method, system, storage medium and the electronic equipment of hotel&#39;s data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant