CN106326475A - High-efficiency static hash table implement method and system - Google Patents
High-efficiency static hash table implement method and system Download PDFInfo
- Publication number
- CN106326475A CN106326475A CN201610793354.5A CN201610793354A CN106326475A CN 106326475 A CN106326475 A CN 106326475A CN 201610793354 A CN201610793354 A CN 201610793354A CN 106326475 A CN106326475 A CN 106326475A
- Authority
- CN
- China
- Prior art keywords
- value
- key
- hash
- rank
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a high-efficiency static hash table implement method and system. The method comprises the steps of 1, setting the size hash-bit of a hash bucket, generating a plurality of data pairs, allowing key [i] and value [i] to correspond to key word and value; 2, according to the key [i] value, constructing a hash table by using rank operation, and calculating C table and D table; 3, according to C table and D table, calculating rank (h), based on the rank (k) value, storing the corresponding key [i] and value [i]; 4, according to the value key required to query, determining whether the element exists in the hash table or not, if yes, querying and returning the value value in the corresponding storage position, otherwise the access is failed; 5, based on the results of step 4, returning the result information. Rank select algorithm is used to achieve the construction and access of the new static hash table, and the high-efficiency static hash table implement method and system can be used in the fields, such as content filtering, information security and the like.
Description
Technical field
It is contemplated that design static Hashing gauge pressure compression algorithm, for the field such as information filtering, information security.Due to static state
The storage of Hash table takes up room relatively big, and algorithm now also has significant optimization space for the compression of static Hashing table.This
Bright it is intended to static Hashing table is compressed, the access to static Hashing table can be supported.
Background technology
Look-up table in data structure is divided into static lookup table and dynamic look-up table.Look-up table is mainly for the data in table
Constantly search, until finding out its required value.The type of static lookup table mainly include sequential search, two points look into
Look for, block research and the lookup etc. of static tree table, and the type of dynamic look-up table mainly includes binary sort tree, balanced binary
Tree, B-tree and B+ tree etc..The efficiency that the lookup algorithm of above-mentioned introduction is searched depends on number of comparisons, searches average time the most, its
Efficiency is the lowest, and the search efficiency of look-up table is the highest on average.For rapidly locating, it is possible to use Hash table promotes
Access efficiency.
Hash table is called again hash table, and it utilizes key-value pair (key-value) to store data, is a kind of special data
Structure.Hash table accesses record by key-value pair is mapped to a position in table, to accelerate the speed searched.This maps
Function is called hash function, and the array depositing record is called Hash table.Mapping in Hash table is not necessarily injection, therefore may
Produce the phenomenon of hash-collision, data structure has a lot of algorithm can solve hash-collision.The application scenarios of Hash table is very
Extensively, application Hash table storage data realize quickly searching is the most common operation.In actual computational science, Hash table
Can the aspect such as Route Selection in peer-to-peer network (P2P), database lookup, compression ordinal number index and information security play
Huge application.
In real life, Hash table also has important effect.Such as, bank's Foreground Data to be carried out and back-end data
When carrying out reconciliation process, corresponding value can be found according to key, thus complete the reconciliation work of foreground and background data;Profit in life
IC-card when taking pubic transport, using the numbering of IC-card as key, getting on the bus swipes the card is recorded as the insertion process of Hash table, stores in value
Pick-up time and name of station, getting off swipes the card is recorded as the search procedure of Hash table, deletes in Hash table this number information simultaneously and counts
Evaluation time and spacing.
Hash table, according to whether supporting dynamic additions and deletions operation, is divided into static Hashing table and dynamic Hash table.Static Hashing table
It is that inquiry operation is only supported for HASH operation, does not support that dynamic additions and deletions operate.Static Hashing table is applicable to once data are pre-
Depositing to Hash table, work afterwards is mainly responsible for quickly searching data.In pattern matching algorithm, static Hashing table accords with very much
Closing the highly effective algorithms such as the application background of some algorithm, such as Wu-Manber, Karp-Rabin is all to utilize HASH function to rule
Being processed to matched text, rule is once loaded onto in Hash table by these hashing operation often in advance, carries out the most again
Coupling.
Hash table algorithm now mainly includes linear probing hash algorithm, binary chop algorithm and two points of hash algorithms.
These algorithms also meet the demand of static Hashing table, can efficiently locate data when storage and inquiry, but its space is deposited
Storage also has the biggest room for promotion with search efficiency aspect.The thought of each algorithm is briefly described below.
Linear probing algorithm: when the Hash Round Robin data partition p obtained by hash function H (key) conflict of keyword key, with p
For standard, additionally obtain new Hash Round Robin data partition p1 by hash function H (key) ..., so it is iterated calculating, when finally
Till when having a Hash Round Robin data partition pi to occur without conflict, and corresponding keyword and value are stored on this Hash Round Robin data partition.Search
Time, first passing through hash function H (key), find out and whether Hash bucket exists keyword key, if existing, returning value value.
Binary chop algorithm: during storage, sorts to keyword key value;During lookup, utilize Bisection Algorithms to search key value, enter
And find value.
Two points of hash algorithms: chain address is divided into different Hash buckets, during storage, utilize binary chop algorithm to deposit in each bucket
Storage.During lookup, first pass through hash function and judge place Hash bucket, utilize binary chop algorithm to search key value in Hash bucket, enter
And find value.
The algorithm of above-mentioned Hash table is the most all widely used, and its storage and search efficiency are each variant, storage
Taken up space and search efficiency aspect is each has something to recommend him.In order to design more efficient hash table algorithm, more save Hash table institute
The space taken, the present invention will utilize rank-select algorithm to be compressed static Hashing table, and this algorithm is at room and time
Aspect all has greatly improved compared with other algorithms.Rank-select algorithm is document " Jacobson in 1989
G.Space-efficient static trees and graphs[C]//Foundations of Computer
Science, 1989., 30th Annual Symposium on.IEEE, 1989:549-554. " in bit vector storage tree knot
The algorithm of a kind of compression stroke that structure is proposed.Rank-select algorithm is described in detail below.Rank-select algorithm exists
Primarily to compressive abutment tree construction in document, as it is shown in figure 1, original tree construction based on pointer storage is reduced to profit
Storing with binary string, its main thought make use of rank-select algorithm exactly.
For introducing rank-select algorithm, first defined parameters rank (m): from first position in string of binary characters
Put and start at the m of position the number of 1.Such as rank (10)=7 in Fig. 2.
In Fig. 3, binary tree marks node from root node successively according to binary tree height size, and wherein black represents that node is deposited
, white represents that node is empty, carries out hierarchical storage according to the hierarchical structure of tree, it can be seen that having 8 nodes is 1, represents
8 nodal informations in original matrix.As can be seen from Figure, 1 node of the storage of former binary tree structure utilizes n byte to deposit
Storage, if stored according to position, its space hold can greatly reduce.
Document "V,Navarro G.Rank and select revisited and extended[J]
.Theoretical prove for size to be the bit vector of n in Computer Science, 2007,387 (3): 332-347. "
B, it is only necessary to the additional storage space of o (n), can realize the Rank operation of O (1) time complexity.SSE instruction-level contains _ mm_
The rank bit manipulation of more than 64 is supported in the instruction such as popcnt_u64, achieves rank operation so that operation is more on hardware
Fast.Additionally, Rank-select algorithm can achieve surprising results in terms of compression sparse matrix.
Rank-select algorithm can be effectively compressed data, can be converted into the storage organization of Hash table, multiple for summary O (1)
The rank operation of miscellaneous degree, now introduces an example and its algorithm idea is described.As shown in Figure 4, for bit vector that size is n*8 position
B, D table and rank that C table memory length respectively is 8 and 32 operation, i.e. store up till now till position, before 1 number.Such as
The number of the 1 of the 1st position h in inquiry B [6], then rank (h)=C [1]+D [1*4+2]+_ mm_popcnt_u64 (B [6] > >
7)=6+4+1=11.Wherein " _ mm_popcnt_u64 " represents a built-in command of SSE4.2 instruction set, represents current location
Number to initial position 1.
Next the Rank operation introducing O (1) time complexity implements process, as shown in Figure 5.Figure is in example
The general expansion of Rank operation: Hash table coexists and stores up n key-value pair, for each D vector, memory length is s position, institute
Log is accounted for by figure place2R, each C vector, memory length is r position, and shared figure place is log2N, so D vector accounts for altogetherRatio
Spy, C vector accounts for altogetherBit, therefore exceptional space has
When calculating the rank value of m-th position, wherein m=i*r+j*s+k, wherein0≤k < s, permissible
Below equation is utilized to calculate:
Rank (B, m)=C [i]+D [i, j]+rank (Bi*r+j*s,k) (6)
Wherein, rank (Bi*r+j*s, k) represent from j+j*s position of the i-th * to m-th position, the number of 1, signal
Figure is Fig. 6.
In sum, Rank-select algorithm can be put into practice well on compressive abutment list structure, can be effectively compressed sky
Between.Static Hashing table is also required to optimize space efficiency further, therefore it is contemplated that utilizes Rank-select algorithm to realize
The structure of Novel static Hash table and access.
Summary of the invention
The present invention provides a kind of efficient static Hashing table realization method and system, it is possible to utilize Rank-select algorithm
Realize structure and the access of Novel static Hash table.
Static Hashing table can effectively be compressed by the present invention, and can realize directly accessing.Fig. 7 is that tradition is breathed out
The storage mode of uncommon table, H represents the size of Hash bucket, and n represents keyword number.Accounting for 4 bytes according to pointer, integer takies 4
The space of individual byte, taken up space total 4H+8n byte.
Foregoing teachings describes the detailed process of the rank operation of O (1) complexity, realizes for the ease of computer, this
The storage organization of bright combination computer, the rank devising a kind of O (1) complexity operates specific implementation, and this is also design
Basic thought in Hash table compression scheme based on Rank operation.Taking r=256 in experiment, s=64, C [i] are with an int table
Showing, D [i] represents with a char, and the space the most additionally taken is:
Original pointer is then changed to be a binary vector B by Hash compression algorithm based on Rank operation, it is necessary first to
Size hash_bits of Hash table is set, during keyword key storage, needs first to do modular arithmetic h=key mod (hash_bits).
By calculating the value of rank (h), thus constantly map that to memory element, as shown in Figure 8.From the foregoing it will be appreciated that need waste
Exceptional space isThe bit size of Hash bucket (H be), so memory space altogether isWord
Joint, space greatly reduced than originally.
For convenience of the storage of static Hashing table, set up following structure for follow-up use:
CB
{
C
D
bitmap[4]
}
Each CB is a structure, represents Hash table structure, comprises three variablees.C table represents storage regular length r
Rank operation, D table represent storage regular length s rank operation, C table is integer, and D table is that char type (also may be used by D table and C table
It is set to other types, as long as the figure place storage that can regular length r or s position rank be operated is upper).Deposit for convenience of computer
Storage, arranges r=256, s=64.Bitmap be size be the signless long array of 4, bitmap [i] (i=0,1,
2,3) certain element of bitmap is represented.Owing to each signless long takies 64 sizes, so, a bitmap
Array takies 256, the length of the rank operation of C table storage the most herein, and the element value of each bitmap is just
It is the length of the rank operation of D table storage.
Set up Hash table CB array, it is assumed that certain Elements C B of Hash table [j], then the value in this element structure body can represent
For CB [j] .C, CB [j] .D, CB [j] .bitmap [i] (i=0,1,2,3).For the ease of describing C table and D table, CB below
[j] .C and C [j] represents equivalent meanings, and CB [j] .D and D [j] represents equivalent meanings.
For describing present disclosure in detail, this section is first introduced and is built Hash table and several big parts of the system of access thereof and stream
Journey process, introduces concrete building Hash table and access the main process of Hash table subsequently.
In the present invention, build Hash table and access system mainly comprise with lower component, as shown in Figure 9:
1) system pretreatment component: set Hash bucket size hash_bit, generate multiple data pair, key [i] and value
[i] corresponds to keyword and value.
2) build Hash table parts: according to key [i] value, utilize rank operation to build Hash table, and calculate C table and D table.
3) storage information components: according to C table and D table, calculates rank (h), wherein h=key mod (hash_bits), and
Value according to rank (h) stores corresponding key [i] and value [i].
4) information components is accessed: according to value key to be inquired about, it is judged that whether Hash table exists this element, if depositing
, then store position enquiring in correspondence and return value value, otherwise, accessing unsuccessfully.
5) return information parts: according to the result of previous step gained, return object information.
It is described above and builds Hash table and each parts of the system of access thereof, for the ease of understanding its mistake building and accessing
Journey, facilitates Computer Storage, and the calculating process of its rank operation can represent by pseudo code below:
The calculating process natural language description of above-mentioned rank operation is as follows:
1) query interface B has how many 1 before i-th bit, first subscript i and 63 are done the value with computing and be assigned to k, then
The value that subscript i moves to right 8 is assigned to i1, then subscript i is moved to right 6 and deducts i1The value moving to left 2 is assigned to i2, then i1For C
The subscript that table is corresponding, it is the subscript that D table is corresponding that subscript i moves to right 6.
2) variable B will be designated as (i from down1<<8+i2< < 6) start, (i1<<8+i2< < 6+k-1) this segment variable of terminating composes
It is worth in e.
3) finally return that the value summation for C [i1], D [i > > 6] with _ mm_popcnt_u64 (e), be i-th bit in variable B
The number of front 1, is also rank (i) operation of variable B.
The calculating process of rank operation all can frequently be used, for hereinbefore to C table in the structure and access of Hash table
With implementing in D table, the size of Hash bucket introduced below is no less than 28, all utilize above rank operation store and
Access data.The present invention is divided into structure Hash table and two processes of key word of the inquiry, so sketching its base for the two process
Hash table storage and the concrete steps of access in rank operation.
1. the concrete steps of Hash table storage algorithm based on rank operation:
1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are relative
Should.
2) key value is the most disposably imported in bitmap.If key-value is to total num element, then CB table is big
Little it isFirst definition Hash bucket quantity be hash_bit, hash_bit value be the size of CB table
Clength and 28Product.To the bitmap (i.e. containing the bitmap of 4 elements) of 4 sizes of distribution in each Hash bucket, often
Individual bitmap element stores the data of 64, and Initialize installation is that everybody is 0, as shown in Figure 10.According to time degree O (1)
Rank operation carries out recording the data content of key array.First key Yu hash_bit delivery is obtained h, it is ensured that fall at Hash bucket
In;Then h is stored on Hash bucket correspondence position, according to the position of equation below record h, until all of key value all depends on
Secondary record position.
Q=h&255
CB [h>>8] .bitmap [q>>6] |=(1<<(q&63)) (7)
3) storage calculates C array and D array.Owing to second step is equivalent to the size according to h value, record all key values
Relevant position information, so available aforesaid rank operation starts to record the corresponding of C array and D array from Hash table CB [0]
Information, the number of 1 in wherein C [i] represents above CB [i-1] individual Hash bucket, CB [i] .D [1] represents in CB [i] .bitmap [0]
The number of 1 ... CB [i] .D [3] represents the number of 1 in CB [i] .bitmap [0] to CB [i] .bitmap [2].
4) utilize C table and D table information, utilize the algorithm of rank operation hereinbefore to calculate the rank that each key value is corresponding
Value.
5) rank value is utilized to record each Hash bucket interior element number, according to the laminated structure record of Hash table C.Utilize
Rank value is as sequential storage key, value value.If different key has same rank value, i.e. fall to going out in same Hash bucket
Show hash-collision.Present rank value has had second layer meaning, is i.e. sorted for h=key mod (hash_bit),
Utilize rank value represent sequence position, then carry out store key-value to when, when its rank value is identical, it is described
There is the element of more than 2 in Hash bucket, for the ease of storage, primary order stores according to rank value size order, secondary suitable
Sequence stores successively according to rank value is identical.
6) storage key, value value is in array.
2. the concrete steps of Hash table access algorithm based on rank operation:
1) first data key to be inquired about and hash_bit delivery are obtained h.
2) q=h&255 is calculated, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1,
I.e. in original Hash bucket, whether there is key value.If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully;If
This step is judged as 1, has this key value in the most former Hash table, then need to find value value.
3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then exist
Judge whether successively in this Hash bucket, containing inquiry data key, if comprising, to return value value, if not comprising, the inquiry next one
Keyword, until keyword is empty, inquires about unsuccessfully.
Beneficial effects of the present invention is as follows:
The present invention utilizes Rank-select algorithm to realize structure and the access of Novel static Hash table, it is possible to breathe out static state
Uncommon table is compressed, and can support the access to static Hashing table, it is possible to optimize space efficiency further;The program can be used for content
The fields such as filtration, information security.
Accompanying drawing explanation
Fig. 1 is that binary string carries out storing adjacent tree construction schematic diagram.
Fig. 2 is rank operation example figure.
Fig. 3 is to utilize rank-select algorithm to carry out binary tree node storing exemplary plot.
Fig. 4 is the Rank operation example figure of O (1) time complexity.
Fig. 5 is that the Rank operation of O (1) time complexity realizes figure.
Fig. 6 is the rank operation chart calculating m-th position.
Fig. 7 is the storage mode schematic diagram of tradition Hash table.
Fig. 8 is compact Hash table storage mode schematic diagram based on Rank operation.
Fig. 9 is the system unit figure building based on rank operation static Hashing table and accessing.
Figure 10 is Hash table storage organization schematic diagram.
Figure 11 is Hash bucket example displaying figure.
Detailed description of the invention
Below by specific embodiment, the present invention will be further described.
This section is introduced mainly for the concrete steps in summary of the invention, analyzes for concrete data to be stored,
It is divided into Hash table storing process based on rank operation and Hash table access process based on rank operation.
1. Hash table storing process example based on rank operation:
Aforementioned storing step is exemplified below.Assume size hash_bit=2 of Hash bucket9, data to be stored are shown in
Table 1 below, then shown in corresponding h value also see table.
Table 1 Hash table storage data key, value and h value
key | 1 | 513 | 65 | 257 |
value | 1 | 2 | 3 | 4 |
h | 1 | 1 | 65 | 257 |
Data after key delivery show according to binary representation such as Figure 11 upper left, so under the Hash table of correspondence such as Figure 11
Shown in side, so, C [0]=0, C [1]=2, C [2]=3;CB [0] .D [0]=0, CB [0] .D [1]=1, CB [0] .D [2]=
2, CB [0] .D [3]=2, CB [1] .D [0]=0, CB [1] .D [1]=1, CB [1] .D [2]=1, CB [1] .D [3]=1;Storage
Key, value value, in two-dimensional array, understands according to the sequence of rank (h) value, has two key to be mapped to same during rank (h)=1
Position, is shown in Table 2, then, when storing, changes in coordinates is shown in Table 3, obtained by wherein idx is cumulative first few items, represents its storage termination
The half of key coordinate deducts 1, then the one-dimension array storing key, value is shown in Table 4, and wherein, idx1 represents that the coordinate of array is compiled
Number, the coordinate of key is even number, and value coordinate is odd number.
Table 2 rank (h) value and number thereof
rank(h) | 1 | 2 | 3 |
count | 2 | 1 | 1 |
Table 3 accumulated amount obtains coordinate idx
rank(h) | 1 | 2 | 3 |
idx | 2 | 3 | 4 |
Table 4 one-dimension array idx1 storage key, value value
idx1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
key or value | 1 | 1 | 513 | 2 | 65 | 3 | 257 | 4 |
2. Hash table query script example based on rank operation:
Hash table set up process according to storage example shown in, if whether key=513 to be inquired about in Hash table, first
Calculating h=1q=1, CB [h > > 8] .bitmap [q > > 6]=1 establishment, owing to there are two key values hits this position, then search
Idx1 [0] and idx1 [2] is the most equal with 513, due to idx1 [2]=513, then returns value=idx1 [3]=2, inquires about into
Merit.
Design based on such scheme, below by the Hash table compression algorithm operated based on Rank and binary chop algorithm, line
Property detection hash algorithm and two points of hash algorithms compare, wherein test data be 10,000,000 key-value pair, key and
Value is 32 signless integers of stochastic generation, takies 76.294MB memory space altogether.Data to be checked be 10,000,000 with
32 signless integers that machine generates, arranging Hit ratio is 1%.Experiment test environment is 64 WIN7 operating systems,
Intel CPU i5,4GB internal memory.
Owing to Hash table this parameter of bucket size can be adjusted, so for different algorithms, Hash table in Shi Yan
Bucket size have also been made different tests.Hash table compression algorithm based on Rank operation can utilize in SSE instruction set _ mm_
The rank bit manipulation of more than 64 is supported in the instruction such as popcnt_u64, uses SSE instruction set and the SSE not used in this experiment test
The algorithm of instruction set has all done corresponding comparison.
Experiment one: the Hash table compression algorithm based on Rank operation using SSE instruction set with not using SSE instruction set
As shown in Table 5 and 6, the exceptional space of two kinds of algorithms all increases along with the increase of Hash table bucket, and inquiry velocity exists
Hash table bucket is 229Shi Sudu is maximum.Use SSE instruction set than the Hash gauge pressure based on Rank operation not using SSE instruction set
Compression algorithm in the case of formed objects Hash table bucket, inquiry velocity decided advantage.This illustrates the rank realized within hardware
Bit manipulation is better than the rank bit manipulation of software design.
The Hash table compression algorithm (Rank for SSE) that table 5 operates based on Rank
Hash table bucket size | Key-value space (MB) | Exceptional space (MB) | Inquiry velocity (ten thousand times/second) |
224 | 76.294 | 31.265 | 1779 |
225 | 76.294 | 38.023 | 2463 |
226 | 76.294 | 45.478 | 2906 |
227 | 76.294 | 56.802 | 3278 |
228 | 76.294 | 77.490 | 4000 |
229 | 76.294 | 117.838 | 4566 |
230 | 76.294 | 198.014 | 4000 |
The Hash table compression algorithm (Rank None SSE) that table 6 operates based on Rank
Hash table bucket size | Key-value space (MB) | Exceptional space (MB) | Inquiry velocity (ten thousand times/second) |
224 | 76.294 | 31.265 | 744 |
225 | 76.294 | 38.023 | 1303 |
226 | 76.294 | 45.478 | 2000 |
227 | 76.294 | 56.802 | 2785 |
228 | 76.294 | 77.490 | 3367 |
229 | 76.294 | 117.838 | 4255 |
230 | 76.294 | 198.014 | 3773 |
Experiment two: binary chop algorithm, Hash binary chop algorithm and linear probing hash algorithm
Table 7 binary chop algorithm (CBinarySearch)
Key-value space (MB) | Exceptional space (MB) | Inquiry velocity (ten thousand times/second) |
76.294 | 0 | 172 |
Table 8 Hash binary chop algorithm (CHashBinarySearch)
Hash table bucket size | Key-value space (MB) | Exceptional space (MB) | Inquiry velocity (ten thousand times/second) |
224 | 76.294 | 64 | 1600 |
225 | 76.294 | 128 | 1776 |
226 | 76.294 | 256 | 1883 |
227 | 76.294 | 512 | 1560 |
Table 9 linear probing hash algorithm (CLinearProbe)
By above 3 experiments, it will thus be seen that 1: binary chop algorithm need not exceptional space and Hash bucket size ginseng
Number, but inquiry velocity is slow.2: linear probing algorithm is in Hash bucket size 226For time, looking into of 35,580,000 times/second can be reached
Asking speed, efficiency is optimum in three kinds of algorithms, but exceptional space takies relatively big, reaches 435.706MB.3: Hash binary chop
In Hash bucket size 2 in algorithm26For time, the inquiry velocity of 18,830,000 times/second can be reached, speed is inferior to linear probing algorithm, but
It is that exceptional space is less, for 256MB.
Experiment three: compression algorithm based on Rank operation compression Hash table and algorithm contrast in experiment two
10 5 kinds of Hash compression algorithm Experimental comparison's tables of table
By above Experimental comparison, it will thus be seen that Hash table compression algorithm based on Rank operation in inquiry velocity and accounts for
All having great advantage by space aspect, its speed and exceptional space are all considerably beyond other three kinds of algorithms.Use SSE instruction set
Time, Hash table compression algorithm based on Rank operation is in Hash bucket size 229For time, can reach 45,660,000 times/second inquiry speed
Degree, exceptional space only takes up 117.838MB.
Above example is only limited in order to technical scheme to be described, the ordinary skill of this area
Technical scheme can be modified or equivalent by personnel, without departing from the spirit and scope of the present invention, and this
The protection domain of invention should be as the criterion with described in claims.
Claims (9)
1. an efficient static Hashing table implementation method, it is characterised in that comprise the following steps:
1) set Hash bucket size hash_bit, generate multiple data pair, by key [i] and value [i] corresponding to keyword with
Value;
2) according to key [i] value, utilizing rank operation to build Hash table, and calculate C table and D table, wherein C table represents that storage is fixing
The rank operation of length r, D table represents the rank operation of storage regular length s;
3) calculate rank (h), wherein h=key mod (hash_bits) according to C table and D table, and store according to the value of rank (h)
Corresponding key [i] and value [i];
4) judging whether Hash table exists this element according to value key to be inquired about, if existing, looking in correspondence storage position
Ask and return value value, otherwise access failure;
5) according to step 4) result of gained, return object information.
2. the method for claim 1, it is characterised in that step 3) use following steps to realize Kazakhstan based on rank operation
Uncommon table storing process:
3-1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are corresponding;
3-2) the most disposably importing key value in bitmap, the quantity defining Hash bucket is hash_bit, according to time degree O
(1) data content of rank operation note key array;Key Yu hash_bit delivery is obtained h, it is ensured that fall in Hash bucket,
Then h is stored on Hash bucket correspondence position, according to the size of h value, records the relevant position information of all key values;
3-3) storage calculates C array and D array, utilizes rank operation to start to record C array and D array from Hash table CB [0]
Corresponding information;
3-4) utilize C table and D table information, calculate the rank value that each key value is corresponding;
3-5) utilize rank value to record each Hash bucket interior element number, according to the laminated structure record of Hash table C, utilize rank
Value is as sequential storage key, value value;
3-6) storage key, value value is in array.
3. method as claimed in claim 2, it is characterised in that step 3-2) in, the value of hash_bit is the size of CB table
Clength and 28Product, the bitmap, each bitmap distributing 4 sizes in each Hash bucket is stored the data of 64,
Initialize installation is that everybody is 0.
4. method as claimed in claim 3, it is characterised in that step 3-2) according to the position of equation below record h, until institute
Some key value record positions the most successively:
Q=h&255,
CB [h>>8] .bitmap [q>>6] |=(1<<(q&63)).
5. method as claimed in claim 2, it is characterised in that step 3-5) in, when storing key-value pair, if different
Key have a same rank value, then primary order is according to the storage of rank value size order, and secondary order is identical successively according to rank value
Storage.
6. the method for claim 1, it is characterised in that step 4) use following steps to realize Kazakhstan based on rank operation
Uncommon table access process:
4-1) data key to be inquired about and hash_bit delivery are obtained h;
4-2) calculate q=h&255, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, i.e.
Key value whether is had in original Hash bucket;If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully;
If this step is judged as 1, in the most former Hash table, there is this key value, then need to find value value;
4-3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then at this
Judging whether successively in Hash bucket, containing inquiry data key, if comprising, to return value value, if not comprising, the inquiry next one closes
Key word, until keyword is empty, inquires about unsuccessfully.
7. an efficient static Hashing table realizes system, it is characterised in that including:
System pretreatment component, is used for setting Hash bucket size hash_bit, is generating multiple data pair, by key [i] and
Value [i] corresponds to keyword and value;
Build Hash table parts, for according to key [i] value, utilize rank operation to build Hash table, and calculate C table and D table, its
Middle C table represents the rank operation of storage regular length r, and D table represents the rank operation of storage regular length s;
Storage information components, for calculating rank (h), wherein h=key mod (hash_bits) according to C table and D table, and according to
The value of rank (h) stores corresponding key [i] and value [i];
Access information components, for judging whether Hash table exists this element according to value key to be inquired about, if existing,
Corresponding storage position enquiring also returns value value, otherwise accesses failure;
Return information parts, for the result according to access information components gained, return object information.
8. system as claimed in claim 7, it is characterised in that described storage information components use following steps realize based on
The Hash table storing process of rank operation:
1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are corresponding;
2) the most disposably importing key value in bitmap, the quantity defining Hash bucket is hash_bit, according to time degree O (1)
The data content of rank operation note key array;Key Yu hash_bit delivery is obtained h, it is ensured that fall in Hash bucket, so
After h is stored on Hash bucket correspondence position, according to the size of h value, record the relevant position information of all key values;
3) storage calculates C array and D array, utilizes rank operation to start to record C array and the phase of D array from Hash table CB [0]
Answer information;
4) utilize C table and D table information, calculate the rank value that each key value is corresponding;
5) utilize rank value to record each Hash bucket interior element number, according to the laminated structure record of Hash table C, utilize rank value
As sequential storage key, value value;
6) storage key, value value is in array.
9. system as claimed in claim 7, it is characterised in that described access information components use following steps realize based on
The Hash table access process of rank operation:
1) data key to be inquired about and hash_bit delivery are obtained h;
2) q=h&255 is calculated, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, i.e. exist
Originally whether there is key value in Hash bucket;If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully;If this step
It is judged as 1, in the most former Hash table, has this key value, then need to find value value;
3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then in this Kazakhstan
Judging whether successively in uncommon bucket containing inquiry data key, if comprising, returning value value, if not comprising, inquiry is next crucial
Word, until keyword is empty, inquires about unsuccessfully.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610793354.5A CN106326475B (en) | 2016-08-31 | 2016-08-31 | Efficient static hash table implementation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610793354.5A CN106326475B (en) | 2016-08-31 | 2016-08-31 | Efficient static hash table implementation method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106326475A true CN106326475A (en) | 2017-01-11 |
CN106326475B CN106326475B (en) | 2019-12-27 |
Family
ID=57786280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610793354.5A Active CN106326475B (en) | 2016-08-31 | 2016-08-31 | Efficient static hash table implementation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106326475B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766258A (en) * | 2017-09-27 | 2018-03-06 | 精硕科技(北京)股份有限公司 | Memory storage method and apparatus, memory lookup method and apparatus |
CN109582598A (en) * | 2018-12-13 | 2019-04-05 | 武汉中元华电软件有限公司 | A kind of preprocess method for realizing efficient lookup Hash table based on external storage |
CN110413215A (en) * | 2018-04-28 | 2019-11-05 | 伊姆西Ip控股有限责任公司 | For obtaining the method, equipment and computer program product of access authority |
CN110457535A (en) * | 2019-08-14 | 2019-11-15 | 广州虎牙科技有限公司 | Hash bucket lookup method, Hash table storage, Hash table lookup method and device |
CN110928483A (en) * | 2018-09-19 | 2020-03-27 | 华为技术有限公司 | Data storage method, data acquisition method and equipment |
CN111177476A (en) * | 2019-12-05 | 2020-05-19 | 北京百度网讯科技有限公司 | Data query method and device, electronic equipment and readable storage medium |
WO2020107484A1 (en) * | 2018-11-30 | 2020-06-04 | 华为技术有限公司 | Acl rule classification method, lookup method and device |
CN111241146A (en) * | 2018-11-29 | 2020-06-05 | 北京数安鑫云信息技术有限公司 | Method and system for counting TopK-Frequency information |
CN111694559A (en) * | 2020-05-21 | 2020-09-22 | 北京云杉世纪网络科技有限公司 | Method and device for realizing hash table in GC program language |
CN113448996A (en) * | 2021-06-11 | 2021-09-28 | 成都三零嘉微电子有限公司 | High-speed searching method for IPSec security policy database |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799596A (en) * | 2011-05-27 | 2012-11-28 | 广州明朝网络科技有限公司 | Key word filtering method and system based on network application |
CN104881439A (en) * | 2015-05-11 | 2015-09-02 | 中国科学院信息工程研究所 | Method and system for space-efficient multi-pattern matching |
CN105359142A (en) * | 2014-05-23 | 2016-02-24 | 华为技术有限公司 | Hash join method, device and database management system |
-
2016
- 2016-08-31 CN CN201610793354.5A patent/CN106326475B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799596A (en) * | 2011-05-27 | 2012-11-28 | 广州明朝网络科技有限公司 | Key word filtering method and system based on network application |
CN105359142A (en) * | 2014-05-23 | 2016-02-24 | 华为技术有限公司 | Hash join method, device and database management system |
CN104881439A (en) * | 2015-05-11 | 2015-09-02 | 中国科学院信息工程研究所 | Method and system for space-efficient multi-pattern matching |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766258A (en) * | 2017-09-27 | 2018-03-06 | 精硕科技(北京)股份有限公司 | Memory storage method and apparatus, memory lookup method and apparatus |
CN110413215B (en) * | 2018-04-28 | 2023-11-07 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for obtaining access rights |
CN110413215A (en) * | 2018-04-28 | 2019-11-05 | 伊姆西Ip控股有限责任公司 | For obtaining the method, equipment and computer program product of access authority |
CN110928483B (en) * | 2018-09-19 | 2021-04-09 | 华为技术有限公司 | Data storage method, data acquisition method and equipment |
CN110928483A (en) * | 2018-09-19 | 2020-03-27 | 华为技术有限公司 | Data storage method, data acquisition method and equipment |
CN111241146B (en) * | 2018-11-29 | 2023-09-19 | 北京数安鑫云信息技术有限公司 | Method and system for counting TopK-Frequency information |
CN111241146A (en) * | 2018-11-29 | 2020-06-05 | 北京数安鑫云信息技术有限公司 | Method and system for counting TopK-Frequency information |
WO2020107484A1 (en) * | 2018-11-30 | 2020-06-04 | 华为技术有限公司 | Acl rule classification method, lookup method and device |
CN109582598B (en) * | 2018-12-13 | 2023-05-02 | 武汉中元华电软件有限公司 | Preprocessing method for realizing efficient hash table searching based on external storage |
CN109582598A (en) * | 2018-12-13 | 2019-04-05 | 武汉中元华电软件有限公司 | A kind of preprocess method for realizing efficient lookup Hash table based on external storage |
CN110457535A (en) * | 2019-08-14 | 2019-11-15 | 广州虎牙科技有限公司 | Hash bucket lookup method, Hash table storage, Hash table lookup method and device |
CN111177476A (en) * | 2019-12-05 | 2020-05-19 | 北京百度网讯科技有限公司 | Data query method and device, electronic equipment and readable storage medium |
CN111177476B (en) * | 2019-12-05 | 2023-08-18 | 北京百度网讯科技有限公司 | Data query method, device, electronic equipment and readable storage medium |
CN111694559A (en) * | 2020-05-21 | 2020-09-22 | 北京云杉世纪网络科技有限公司 | Method and device for realizing hash table in GC program language |
CN111694559B (en) * | 2020-05-21 | 2023-07-21 | 北京云杉世纪网络科技有限公司 | Method and device for implementing hash table in GC program language |
CN113448996A (en) * | 2021-06-11 | 2021-09-28 | 成都三零嘉微电子有限公司 | High-speed searching method for IPSec security policy database |
CN113448996B (en) * | 2021-06-11 | 2022-09-09 | 成都三零嘉微电子有限公司 | High-speed searching method for IPSec security policy database |
Also Published As
Publication number | Publication date |
---|---|
CN106326475B (en) | 2019-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106326475A (en) | High-efficiency static hash table implement method and system | |
Bentley et al. | Decomposable searching problems I. Static-to-dynamic transformation | |
CN101404032B (en) | Video retrieval method and system based on contents | |
US8219550B2 (en) | Methods and systems for implementing approximate string matching within a database | |
CN102053992B (en) | Clustering method and system | |
CN103902702A (en) | Data storage system and data storage method | |
CN108897761A (en) | A kind of clustering storage method and device | |
CN103577440A (en) | Data processing method and device in non-relational database | |
CN105975587A (en) | Method for organizing and accessing memory database index with high performance | |
US7020782B2 (en) | Size-dependent hashing for credit card verification and other applications | |
CN102591855A (en) | Data identification method and data identification system | |
CN103902701A (en) | Data storage system and data storage method | |
US8028000B2 (en) | Data storage structure | |
CN105117442A (en) | Probability based big data query method | |
CN103914456A (en) | Data storage method and system | |
CN104486777A (en) | Method and device for processing data | |
CN105159950A (en) | Mass data real-time sorting and inquiring method and system | |
CN100476824C (en) | Method and system for storing element and method and system for searching element | |
CN101751475B (en) | Method for compressing section records and device therefor | |
CN105357247A (en) | Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network | |
CN112434031A (en) | Uncertain high-utility mode mining method based on information entropy | |
CN108280226A (en) | Data processing method and relevant device | |
CN106844541A (en) | A kind of on-line analytical processing method and device | |
CN106845787A (en) | A kind of data method for automatically exchanging and device | |
CN110221778A (en) | Processing method, system, storage medium and the electronic equipment of hotel's data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |