CN106326475A

CN106326475A - High-efficiency static hash table implement method and system

Info

Publication number: CN106326475A
Application number: CN201610793354.5A
Authority: CN
Inventors: 刘燕兵; 张春燕; 卢毓海; 谭建龙; 郭莉
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2017-01-11
Anticipated expiration: 2036-08-31
Also published as: CN106326475B

Abstract

The invention relates to a high-efficiency static hash table implement method and system. The method comprises the steps of 1, setting the size hash-bit of a hash bucket, generating a plurality of data pairs, allowing key [i] and value [i] to correspond to key word and value; 2, according to the key [i] value, constructing a hash table by using rank operation, and calculating C table and D table; 3, according to C table and D table, calculating rank (h), based on the rank (k) value, storing the corresponding key [i] and value [i]; 4, according to the value key required to query, determining whether the element exists in the hash table or not, if yes, querying and returning the value value in the corresponding storage position, otherwise the access is failed; 5, based on the results of step 4, returning the result information. Rank select algorithm is used to achieve the construction and access of the new static hash table, and the high-efficiency static hash table implement method and system can be used in the fields, such as content filtering, information security and the like.

Description

A kind of efficient static Hashing table realization method and system

Technical field

It is contemplated that design static Hashing gauge pressure compression algorithm, for the field such as information filtering, information security.Due to static state The storage of Hash table takes up room relatively big, and algorithm now also has significant optimization space for the compression of static Hashing table.This Bright it is intended to static Hashing table is compressed, the access to static Hashing table can be supported.

Background technology

Look-up table in data structure is divided into static lookup table and dynamic look-up table.Look-up table is mainly for the data in table Constantly search, until finding out its required value.The type of static lookup table mainly include sequential search, two points look into Look for, block research and the lookup etc. of static tree table, and the type of dynamic look-up table mainly includes binary sort tree, balanced binary Tree, B-tree and B+ tree etc..The efficiency that the lookup algorithm of above-mentioned introduction is searched depends on number of comparisons, searches average time the most, its Efficiency is the lowest, and the search efficiency of look-up table is the highest on average.For rapidly locating, it is possible to use Hash table promotes Access efficiency.

Hash table is called again hash table, and it utilizes key-value pair (key-value) to store data, is a kind of special data Structure.Hash table accesses record by key-value pair is mapped to a position in table, to accelerate the speed searched.This maps Function is called hash function, and the array depositing record is called Hash table.Mapping in Hash table is not necessarily injection, therefore may Produce the phenomenon of hash-collision, data structure has a lot of algorithm can solve hash-collision.The application scenarios of Hash table is very Extensively, application Hash table storage data realize quickly searching is the most common operation.In actual computational science, Hash table Can the aspect such as Route Selection in peer-to-peer network (P2P), database lookup, compression ordinal number index and information security play Huge application.

In real life, Hash table also has important effect.Such as, bank's Foreground Data to be carried out and back-end data When carrying out reconciliation process, corresponding value can be found according to key, thus complete the reconciliation work of foreground and background data；Profit in life IC-card when taking pubic transport, using the numbering of IC-card as key, getting on the bus swipes the card is recorded as the insertion process of Hash table, stores in value Pick-up time and name of station, getting off swipes the card is recorded as the search procedure of Hash table, deletes in Hash table this number information simultaneously and counts Evaluation time and spacing.

Hash table, according to whether supporting dynamic additions and deletions operation, is divided into static Hashing table and dynamic Hash table.Static Hashing table It is that inquiry operation is only supported for HASH operation, does not support that dynamic additions and deletions operate.Static Hashing table is applicable to once data are pre- Depositing to Hash table, work afterwards is mainly responsible for quickly searching data.In pattern matching algorithm, static Hashing table accords with very much Closing the highly effective algorithms such as the application background of some algorithm, such as Wu-Manber, Karp-Rabin is all to utilize HASH function to rule Being processed to matched text, rule is once loaded onto in Hash table by these hashing operation often in advance, carries out the most again Coupling.

Hash table algorithm now mainly includes linear probing hash algorithm, binary chop algorithm and two points of hash algorithms. These algorithms also meet the demand of static Hashing table, can efficiently locate data when storage and inquiry, but its space is deposited Storage also has the biggest room for promotion with search efficiency aspect.The thought of each algorithm is briefly described below.

Linear probing algorithm: when the Hash Round Robin data partition p obtained by hash function H (key) conflict of keyword key, with p For standard, additionally obtain new Hash Round Robin data partition p1 by hash function H (key) ..., so it is iterated calculating, when finally Till when having a Hash Round Robin data partition pi to occur without conflict, and corresponding keyword and value are stored on this Hash Round Robin data partition.Search Time, first passing through hash function H (key), find out and whether Hash bucket exists keyword key, if existing, returning value value.

Binary chop algorithm: during storage, sorts to keyword key value；During lookup, utilize Bisection Algorithms to search key value, enter And find value.

Two points of hash algorithms: chain address is divided into different Hash buckets, during storage, utilize binary chop algorithm to deposit in each bucket Storage.During lookup, first pass through hash function and judge place Hash bucket, utilize binary chop algorithm to search key value in Hash bucket, enter And find value.

The algorithm of above-mentioned Hash table is the most all widely used, and its storage and search efficiency are each variant, storage Taken up space and search efficiency aspect is each has something to recommend him.In order to design more efficient hash table algorithm, more save Hash table institute The space taken, the present invention will utilize rank-select algorithm to be compressed static Hashing table, and this algorithm is at room and time Aspect all has greatly improved compared with other algorithms.Rank-select algorithm is document " Jacobson in 1989 G.Space-efficient static trees and graphs[C]//Foundations of Computer Science, 1989., 30th Annual Symposium on.IEEE, 1989:549-554. " in bit vector storage tree knot The algorithm of a kind of compression stroke that structure is proposed.Rank-select algorithm is described in detail below.Rank-select algorithm exists Primarily to compressive abutment tree construction in document, as it is shown in figure 1, original tree construction based on pointer storage is reduced to profit Storing with binary string, its main thought make use of rank-select algorithm exactly.

For introducing rank-select algorithm, first defined parameters rank (m): from first position in string of binary characters Put and start at the m of position the number of 1.Such as rank (10)=7 in Fig. 2.

In Fig. 3, binary tree marks node from root node successively according to binary tree height size, and wherein black represents that node is deposited , white represents that node is empty, carries out hierarchical storage according to the hierarchical structure of tree, it can be seen that having 8 nodes is 1, represents 8 nodal informations in original matrix.As can be seen from Figure, 1 node of the storage of former binary tree structure utilizes n byte to deposit Storage, if stored according to position, its space hold can greatly reduce.

Document "V,Navarro G.Rank and select revisited and extended[J] .Theoretical prove for size to be the bit vector of n in Computer Science, 2007,387 (3): 332-347. " B, it is only necessary to the additional storage space of o (n), can realize the Rank operation of O (1) time complexity.SSE instruction-level contains _ mm_ The rank bit manipulation of more than 64 is supported in the instruction such as popcnt_u64, achieves rank operation so that operation is more on hardware Fast.Additionally, Rank-select algorithm can achieve surprising results in terms of compression sparse matrix.

Rank-select algorithm can be effectively compressed data, can be converted into the storage organization of Hash table, multiple for summary O (1) The rank operation of miscellaneous degree, now introduces an example and its algorithm idea is described.As shown in Figure 4, for bit vector that size is n*8 position B, D table and rank that C table memory length respectively is 8 and 32 operation, i.e. store up till now till position, before 1 number.Such as The number of the 1 of the 1st position h in inquiry B [6], then rank (h)=C [1]+D [1*4+2]+_ mm_popcnt_u64 (B [6] > > 7)=6+4+1=11.Wherein " _ mm_popcnt_u64 " represents a built-in command of SSE4.2 instruction set, represents current location Number to initial position 1.

Next the Rank operation introducing O (1) time complexity implements process, as shown in Figure 5.Figure is in example The general expansion of Rank operation: Hash table coexists and stores up n key-value pair, for each D vector, memory length is s position, institute Log is accounted for by figure place₂R, each C vector, memory length is r position, and shared figure place is log₂N, so D vector accounts for altogetherRatio Spy, C vector accounts for altogetherBit, therefore exceptional space has

M = \frac{n}{s} \log_{2} r + \frac{n}{r} \log_{2} n - - - (3)

When calculating the rank value of m-th position, wherein m=i*r+j*s+k, wherein0≤k ＜ s, permissible Below equation is utilized to calculate:

C [i] = Σ_{k = 0}^{i r - 1} B [k] - - - (4)

D [i, j] = Σ_{k = i r}^{i r + j s - 1} B [k] - - - (5)

Rank (B, m)=C [i]+D [i, j]+rank (B_i*r+j*s,k) (6)

Wherein, rank (B_i*r+j*s, k) represent from j+j*s position of the i-th * to m-th position, the number of 1, signal Figure is Fig. 6.

In sum, Rank-select algorithm can be put into practice well on compressive abutment list structure, can be effectively compressed sky Between.Static Hashing table is also required to optimize space efficiency further, therefore it is contemplated that utilizes Rank-select algorithm to realize The structure of Novel static Hash table and access.

Summary of the invention

The present invention provides a kind of efficient static Hashing table realization method and system, it is possible to utilize Rank-select algorithm Realize structure and the access of Novel static Hash table.

Static Hashing table can effectively be compressed by the present invention, and can realize directly accessing.Fig. 7 is that tradition is breathed out The storage mode of uncommon table, H represents the size of Hash bucket, and n represents keyword number.Accounting for 4 bytes according to pointer, integer takies 4 The space of individual byte, taken up space total 4H+8n byte.

Foregoing teachings describes the detailed process of the rank operation of O (1) complexity, realizes for the ease of computer, this The storage organization of bright combination computer, the rank devising a kind of O (1) complexity operates specific implementation, and this is also design Basic thought in Hash table compression scheme based on Rank operation.Taking r=256 in experiment, s=64, C [i] are with an int table Showing, D [i] represents with a char, and the space the most additionally taken is:

M = \frac{n}{s} \log_{2} r + \frac{n}{r} \log_{2} n = \frac{n}{s} * 8 + \frac{n}{r} * 32 = \frac{n}{64} * 8 + \frac{n}{256} * 32 = \frac{n}{4}

Original pointer is then changed to be a binary vector B by Hash compression algorithm based on Rank operation, it is necessary first to Size hash_bits of Hash table is set, during keyword key storage, needs first to do modular arithmetic h=key mod (hash_bits). By calculating the value of rank (h), thus constantly map that to memory element, as shown in Figure 8.From the foregoing it will be appreciated that need waste Exceptional space isThe bit size of Hash bucket (H be), so memory space altogether isWord Joint, space greatly reduced than originally.

For convenience of the storage of static Hashing table, set up following structure for follow-up use:

CB

{

C

D

bitmap[4]

}

Each CB is a structure, represents Hash table structure, comprises three variablees.C table represents storage regular length r Rank operation, D table represent storage regular length s rank operation, C table is integer, and D table is that char type (also may be used by D table and C table It is set to other types, as long as the figure place storage that can regular length r or s position rank be operated is upper).Deposit for convenience of computer Storage, arranges r=256, s=64.Bitmap be size be the signless long array of 4, bitmap [i] (i=0,1, 2,3) certain element of bitmap is represented.Owing to each signless long takies 64 sizes, so, a bitmap Array takies 256, the length of the rank operation of C table storage the most herein, and the element value of each bitmap is just It is the length of the rank operation of D table storage.

Set up Hash table CB array, it is assumed that certain Elements C B of Hash table [j], then the value in this element structure body can represent For CB [j] .C, CB [j] .D, CB [j] .bitmap [i] (i=0,1,2,3).For the ease of describing C table and D table, CB below [j] .C and C [j] represents equivalent meanings, and CB [j] .D and D [j] represents equivalent meanings.

For describing present disclosure in detail, this section is first introduced and is built Hash table and several big parts of the system of access thereof and stream Journey process, introduces concrete building Hash table and access the main process of Hash table subsequently.

In the present invention, build Hash table and access system mainly comprise with lower component, as shown in Figure 9:

1) system pretreatment component: set Hash bucket size hash_bit, generate multiple data pair, key [i] and value [i] corresponds to keyword and value.

2) build Hash table parts: according to key [i] value, utilize rank operation to build Hash table, and calculate C table and D table.

3) storage information components: according to C table and D table, calculates rank (h), wherein h=key mod (hash_bits), and Value according to rank (h) stores corresponding key [i] and value [i].

4) information components is accessed: according to value key to be inquired about, it is judged that whether Hash table exists this element, if depositing , then store position enquiring in correspondence and return value value, otherwise, accessing unsuccessfully.

5) return information parts: according to the result of previous step gained, return object information.

It is described above and builds Hash table and each parts of the system of access thereof, for the ease of understanding its mistake building and accessing Journey, facilitates Computer Storage, and the calculating process of its rank operation can represent by pseudo code below:

The calculating process natural language description of above-mentioned rank operation is as follows:

1) query interface B has how many 1 before i-th bit, first subscript i and 63 are done the value with computing and be assigned to k, then The value that subscript i moves to right 8 is assigned to i₁, then subscript i is moved to right 6 and deducts i₁The value moving to left 2 is assigned to i₂, then i₁For C The subscript that table is corresponding, it is the subscript that D table is corresponding that subscript i moves to right 6.

2) variable B will be designated as (i from down₁<<8+i₂< < 6) start, (i₁<<8+i₂< < 6+k-1) this segment variable of terminating composes It is worth in e.

3) finally return that the value summation for C [i1], D [i > > 6] with _ mm_popcnt_u64 (e), be i-th bit in variable B The number of front 1, is also rank (i) operation of variable B.

The calculating process of rank operation all can frequently be used, for hereinbefore to C table in the structure and access of Hash table With implementing in D table, the size of Hash bucket introduced below is no less than 2⁸, all utilize above rank operation store and Access data.The present invention is divided into structure Hash table and two processes of key word of the inquiry, so sketching its base for the two process Hash table storage and the concrete steps of access in rank operation.

1. the concrete steps of Hash table storage algorithm based on rank operation:

1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are relative Should.

2) key value is the most disposably imported in bitmap.If key-value is to total num element, then CB table is big Little it isFirst definition Hash bucket quantity be hash_bit, hash_bit value be the size of CB table Clength and 2⁸Product.To the bitmap (i.e. containing the bitmap of 4 elements) of 4 sizes of distribution in each Hash bucket, often Individual bitmap element stores the data of 64, and Initialize installation is that everybody is 0, as shown in Figure 10.According to time degree O (1) Rank operation carries out recording the data content of key array.First key Yu hash_bit delivery is obtained h, it is ensured that fall at Hash bucket In；Then h is stored on Hash bucket correspondence position, according to the position of equation below record h, until all of key value all depends on Secondary record position.

Q=h&255

CB [h>>8] .bitmap [q>>6] |=(1<<(q&63)) (7)

3) storage calculates C array and D array.Owing to second step is equivalent to the size according to h value, record all key values Relevant position information, so available aforesaid rank operation starts to record the corresponding of C array and D array from Hash table CB [0] Information, the number of 1 in wherein C [i] represents above CB [i-1] individual Hash bucket, CB [i] .D [1] represents in CB [i] .bitmap [0] The number of 1 ... CB [i] .D [3] represents the number of 1 in CB [i] .bitmap [0] to CB [i] .bitmap [2].

4) utilize C table and D table information, utilize the algorithm of rank operation hereinbefore to calculate the rank that each key value is corresponding Value.

5) rank value is utilized to record each Hash bucket interior element number, according to the laminated structure record of Hash table C.Utilize Rank value is as sequential storage key, value value.If different key has same rank value, i.e. fall to going out in same Hash bucket Show hash-collision.Present rank value has had second layer meaning, is i.e. sorted for h=key mod (hash_bit), Utilize rank value represent sequence position, then carry out store key-value to when, when its rank value is identical, it is described There is the element of more than 2 in Hash bucket, for the ease of storage, primary order stores according to rank value size order, secondary suitable Sequence stores successively according to rank value is identical.

6) storage key, value value is in array.

2. the concrete steps of Hash table access algorithm based on rank operation:

1) first data key to be inquired about and hash_bit delivery are obtained h.

2) q=h&255 is calculated, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, I.e. in original Hash bucket, whether there is key value.If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully；If This step is judged as 1, has this key value in the most former Hash table, then need to find value value.

3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then exist Judge whether successively in this Hash bucket, containing inquiry data key, if comprising, to return value value, if not comprising, the inquiry next one Keyword, until keyword is empty, inquires about unsuccessfully.

Beneficial effects of the present invention is as follows:

The present invention utilizes Rank-select algorithm to realize structure and the access of Novel static Hash table, it is possible to breathe out static state Uncommon table is compressed, and can support the access to static Hashing table, it is possible to optimize space efficiency further；The program can be used for content The fields such as filtration, information security.

Accompanying drawing explanation

Fig. 1 is that binary string carries out storing adjacent tree construction schematic diagram.

Fig. 2 is rank operation example figure.

Fig. 3 is to utilize rank-select algorithm to carry out binary tree node storing exemplary plot.

Fig. 4 is the Rank operation example figure of O (1) time complexity.

Fig. 5 is that the Rank operation of O (1) time complexity realizes figure.

Fig. 6 is the rank operation chart calculating m-th position.

Fig. 7 is the storage mode schematic diagram of tradition Hash table.

Fig. 8 is compact Hash table storage mode schematic diagram based on Rank operation.

Fig. 9 is the system unit figure building based on rank operation static Hashing table and accessing.

Figure 10 is Hash table storage organization schematic diagram.

Figure 11 is Hash bucket example displaying figure.

Detailed description of the invention

Below by specific embodiment, the present invention will be further described.

This section is introduced mainly for the concrete steps in summary of the invention, analyzes for concrete data to be stored, It is divided into Hash table storing process based on rank operation and Hash table access process based on rank operation.

1. Hash table storing process example based on rank operation:

Aforementioned storing step is exemplified below.Assume size hash_bit=2 of Hash bucket⁹, data to be stored are shown in Table 1 below, then shown in corresponding h value also see table.

Table 1 Hash table storage data key, value and h value

key	1	513	65	257
					value	1	2	3	4
h	1	1	65	257

Data after key delivery show according to binary representation such as Figure 11 upper left, so under the Hash table of correspondence such as Figure 11 Shown in side, so, C [0]=0, C [1]=2, C [2]=3；CB [0] .D [0]=0, CB [0] .D [1]=1, CB [0] .D [2]= 2, CB [0] .D [3]=2, CB [1] .D [0]=0, CB [1] .D [1]=1, CB [1] .D [2]=1, CB [1] .D [3]=1；Storage Key, value value, in two-dimensional array, understands according to the sequence of rank (h) value, has two key to be mapped to same during rank (h)=1 Position, is shown in Table 2, then, when storing, changes in coordinates is shown in Table 3, obtained by wherein idx is cumulative first few items, represents its storage termination The half of key coordinate deducts 1, then the one-dimension array storing key, value is shown in Table 4, and wherein, idx1 represents that the coordinate of array is compiled Number, the coordinate of key is even number, and value coordinate is odd number.

Table 2 rank (h) value and number thereof

rank(h)	1	2	3
				count	2	1	1

Table 3 accumulated amount obtains coordinate idx

rank(h)	1	2	3
				idx	2	3	4

Table 4 one-dimension array idx1 storage key, value value

idx1	0	1	2	3	4	5	6	7
									key or value	1	1	513	2	65	3	257	4

2. Hash table query script example based on rank operation:

Hash table set up process according to storage example shown in, if whether key=513 to be inquired about in Hash table, first Calculating h=1q=1, CB [h > > 8] .bitmap [q > > 6]=1 establishment, owing to there are two key values hits this position, then search Idx1 [0] and idx1 [2] is the most equal with 513, due to idx1 [2]=513, then returns value=idx1 [3]=2, inquires about into Merit.

Design based on such scheme, below by the Hash table compression algorithm operated based on Rank and binary chop algorithm, line Property detection hash algorithm and two points of hash algorithms compare, wherein test data be 10,000,000 key-value pair, key and Value is 32 signless integers of stochastic generation, takies 76.294MB memory space altogether.Data to be checked be 10,000,000 with 32 signless integers that machine generates, arranging Hit ratio is 1%.Experiment test environment is 64 WIN7 operating systems, Intel CPU i5,4GB internal memory.

Owing to Hash table this parameter of bucket size can be adjusted, so for different algorithms, Hash table in Shi Yan Bucket size have also been made different tests.Hash table compression algorithm based on Rank operation can utilize in SSE instruction set _ mm_ The rank bit manipulation of more than 64 is supported in the instruction such as popcnt_u64, uses SSE instruction set and the SSE not used in this experiment test The algorithm of instruction set has all done corresponding comparison.

Experiment one: the Hash table compression algorithm based on Rank operation using SSE instruction set with not using SSE instruction set

As shown in Table 5 and 6, the exceptional space of two kinds of algorithms all increases along with the increase of Hash table bucket, and inquiry velocity exists Hash table bucket is 2²⁹Shi Sudu is maximum.Use SSE instruction set than the Hash gauge pressure based on Rank operation not using SSE instruction set Compression algorithm in the case of formed objects Hash table bucket, inquiry velocity decided advantage.This illustrates the rank realized within hardware Bit manipulation is better than the rank bit manipulation of software design.

The Hash table compression algorithm (Rank for SSE) that table 5 operates based on Rank

Hash table bucket size	Key-value space (MB)	Exceptional space (MB)	Inquiry velocity (ten thousand times/second)
				2²⁴	76.294	31.265	1779
2²⁵	76.294	38.023	2463
				2²⁶	76.294	45.478	2906
2²⁷	76.294	56.802	3278
				2²⁸	76.294	77.490	4000
2²⁹	76.294	117.838	4566
				2³⁰	76.294	198.014	4000

The Hash table compression algorithm (Rank None SSE) that table 6 operates based on Rank

Hash table bucket size	Key-value space (MB)	Exceptional space (MB)	Inquiry velocity (ten thousand times/second)
				2²⁴	76.294	31.265	744
2²⁵	76.294	38.023	1303
				2²⁶	76.294	45.478	2000
2²⁷	76.294	56.802	2785
				2²⁸	76.294	77.490	3367
2²⁹	76.294	117.838	4255
				2³⁰	76.294	198.014	3773

Experiment two: binary chop algorithm, Hash binary chop algorithm and linear probing hash algorithm

Table 7 binary chop algorithm (CBinarySearch)

Key-value space (MB)	Exceptional space (MB)	Inquiry velocity (ten thousand times/second)
			76.294	0	172

Table 8 Hash binary chop algorithm (CHashBinarySearch)

Hash table bucket size	Key-value space (MB)	Exceptional space (MB)	Inquiry velocity (ten thousand times/second)
				2²⁴	76.294	64	1600
2²⁵	76.294	128	1776
				2²⁶	76.294	256	1883
2²⁷	76.294	512	1560

Table 9 linear probing hash algorithm (CLinearProbe)

By above 3 experiments, it will thus be seen that 1: binary chop algorithm need not exceptional space and Hash bucket size ginseng Number, but inquiry velocity is slow.2: linear probing algorithm is in Hash bucket size 2²⁶For time, looking into of 35,580,000 times/second can be reached Asking speed, efficiency is optimum in three kinds of algorithms, but exceptional space takies relatively big, reaches 435.706MB.3: Hash binary chop In Hash bucket size 2 in algorithm²⁶For time, the inquiry velocity of 18,830,000 times/second can be reached, speed is inferior to linear probing algorithm, but It is that exceptional space is less, for 256MB.

Experiment three: compression algorithm based on Rank operation compression Hash table and algorithm contrast in experiment two

10 5 kinds of Hash compression algorithm Experimental comparison's tables of table

By above Experimental comparison, it will thus be seen that Hash table compression algorithm based on Rank operation in inquiry velocity and accounts for All having great advantage by space aspect, its speed and exceptional space are all considerably beyond other three kinds of algorithms.Use SSE instruction set Time, Hash table compression algorithm based on Rank operation is in Hash bucket size 2²⁹For time, can reach 45,660,000 times/second inquiry speed Degree, exceptional space only takes up 117.838MB.

Above example is only limited in order to technical scheme to be described, the ordinary skill of this area Technical scheme can be modified or equivalent by personnel, without departing from the spirit and scope of the present invention, and this The protection domain of invention should be as the criterion with described in claims.

Claims

1. an efficient static Hashing table implementation method, it is characterised in that comprise the following steps:

1) set Hash bucket size hash_bit, generate multiple data pair, by key [i] and value [i] corresponding to keyword with Value；

2) according to key [i] value, utilizing rank operation to build Hash table, and calculate C table and D table, wherein C table represents that storage is fixing The rank operation of length r, D table represents the rank operation of storage regular length s；

3) calculate rank (h), wherein h=key mod (hash_bits) according to C table and D table, and store according to the value of rank (h) Corresponding key [i] and value [i]；

4) judging whether Hash table exists this element according to value key to be inquired about, if existing, looking in correspondence storage position Ask and return value value, otherwise access failure；

5) according to step 4) result of gained, return object information.

2. the method for claim 1, it is characterised in that step 3) use following steps to realize Kazakhstan based on rank operation Uncommon table storing process:

3-1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are corresponding；

3-2) the most disposably importing key value in bitmap, the quantity defining Hash bucket is hash_bit, according to time degree O (1) data content of rank operation note key array；Key Yu hash_bit delivery is obtained h, it is ensured that fall in Hash bucket, Then h is stored on Hash bucket correspondence position, according to the size of h value, records the relevant position information of all key values；

3-3) storage calculates C array and D array, utilizes rank operation to start to record C array and D array from Hash table CB [0] Corresponding information；

3-4) utilize C table and D table information, calculate the rank value that each key value is corresponding；

3-5) utilize rank value to record each Hash bucket interior element number, according to the laminated structure record of Hash table C, utilize rank Value is as sequential storage key, value value；

3-6) storage key, value value is in array.

3. method as claimed in claim 2, it is characterised in that step 3-2) in, the value of hash_bit is the size of CB table Clength and 2⁸Product, the bitmap, each bitmap distributing 4 sizes in each Hash bucket is stored the data of 64, Initialize installation is that everybody is 0.

4. method as claimed in claim 3, it is characterised in that step 3-2) according to the position of equation below record h, until institute Some key value record positions the most successively:

Q=h&255,

CB [h>>8] .bitmap [q>>6] |=(1<<(q&63)).

5. method as claimed in claim 2, it is characterised in that step 3-5) in, when storing key-value pair, if different Key have a same rank value, then primary order is according to the storage of rank value size order, and secondary order is identical successively according to rank value Storage.

6. the method for claim 1, it is characterised in that step 4) use following steps to realize Kazakhstan based on rank operation Uncommon table access process:

4-1) data key to be inquired about and hash_bit delivery are obtained h；

4-2) calculate q=h&255, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, i.e. Key value whether is had in original Hash bucket；If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully；

If this step is judged as 1, in the most former Hash table, there is this key value, then need to find value value；

4-3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then at this Judging whether successively in Hash bucket, containing inquiry data key, if comprising, to return value value, if not comprising, the inquiry next one closes Key word, until keyword is empty, inquires about unsuccessfully.

7. an efficient static Hashing table realizes system, it is characterised in that including:

System pretreatment component, is used for setting Hash bucket size hash_bit, is generating multiple data pair, by key [i] and Value [i] corresponds to keyword and value；

Build Hash table parts, for according to key [i] value, utilize rank operation to build Hash table, and calculate C table and D table, its Middle C table represents the rank operation of storage regular length r, and D table represents the rank operation of storage regular length s；

Storage information components, for calculating rank (h), wherein h=key mod (hash_bits) according to C table and D table, and according to The value of rank (h) stores corresponding key [i] and value [i]；

Access information components, for judging whether Hash table exists this element according to value key to be inquired about, if existing, Corresponding storage position enquiring also returns value value, otherwise accesses failure；

Return information parts, for the result according to access information components gained, return object information.

8. system as claimed in claim 7, it is characterised in that described storage information components use following steps realize based on The Hash table storing process of rank operation:

1) data of pretreatment being divided into key and value array, key [i], value [i] and keyword, key assignments are corresponding；

2) the most disposably importing key value in bitmap, the quantity defining Hash bucket is hash_bit, according to time degree O (1) The data content of rank operation note key array；Key Yu hash_bit delivery is obtained h, it is ensured that fall in Hash bucket, so After h is stored on Hash bucket correspondence position, according to the size of h value, record the relevant position information of all key values；

3) storage calculates C array and D array, utilizes rank operation to start to record C array and the phase of D array from Hash table CB [0] Answer information；

4) utilize C table and D table information, calculate the rank value that each key value is corresponding；

5) utilize rank value to record each Hash bucket interior element number, according to the laminated structure record of Hash table C, utilize rank value As sequential storage key, value value；

6) storage key, value value is in array.

9. system as claimed in claim 7, it is characterised in that described access information components use following steps realize based on The Hash table access process of rank operation:

1) data key to be inquired about and hash_bit delivery are obtained h；

2) q=h&255 is calculated, it is judged that CB [h>>8] .bitmap [q>>6] and (1<<(q&63)) do with whether computing is 1, i.e. exist Originally whether there is key value in Hash bucket；If this step is judged as 0, not this key value in the most former Hash table, inquire about unsuccessfully；If this step It is judged as 1, in the most former Hash table, has this key value, then need to find value value；

3) in order to prevent hash-collision, in the most former Hash table, there are two and the hit of above key value in this position, then in this Kazakhstan Judging whether successively in uncommon bucket containing inquiry data key, if comprising, returning value value, if not comprising, inquiry is next crucial Word, until keyword is empty, inquires about unsuccessfully.