CN108021678A - A kind of compact-sized key-value pair storage organization and quick key-value pair lookup method - Google Patents

A kind of compact-sized key-value pair storage organization and quick key-value pair lookup method Download PDF

Info

Publication number
CN108021678A
CN108021678A CN201711287661.7A CN201711287661A CN108021678A CN 108021678 A CN108021678 A CN 108021678A CN 201711287661 A CN201711287661 A CN 201711287661A CN 108021678 A CN108021678 A CN 108021678A
Authority
CN
China
Prior art keywords
key
value
layer
value pair
bloom filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711287661.7A
Other languages
Chinese (zh)
Other versions
CN108021678B (en
Inventor
嵩天
魏煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201711287661.7A priority Critical patent/CN108021678B/en
Publication of CN108021678A publication Critical patent/CN108021678A/en
Application granted granted Critical
Publication of CN108021678B publication Critical patent/CN108021678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2443Stored procedures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Abstract

The present invention relates to a kind of compact-sized key-value pair storage organization and quick key-value pair lookup method, belongs to real-time big data processing and key assignments inquiring technology field.The structure being combined based on stratified hash table and Bloom Filter, stored by the first floor using 2 left hashing hash data structures, every layer of Hash table is all stored in the data that memory contention occurs in the table of upper strata as supplementary table, and combine the preservation that Bloom Filter carries out colliding data, " Bloom Filter judges that searched key whether there is the result in current collection " is improved as index and search speed, so that improving memory space utilization rate makes compact-sized and realize the software platform requirement quickly searched.The present invention can efficiently solve current extensive key-value pair storage at a high speed and search the time and not know, searches that speed is non-constant, software realizes the problems such as speed of service does not reach application requirement more slowly.

Description

A kind of compact-sized key-value pair storage organization and quick key-value pair lookup method
Technical field
The present invention relates to a kind of compact-sized key-value pair storage organization and quick key assignments (key-value) to lookup side Method, it is more particularly to a kind of based on multi-level Hash table (hash table) and Bloom Filter (bloom filter, BF) Key-value pair stores and lookup method, belongs to real-time big data processing and key assignments inquiring technology field.
Background technology
With expanding day by day for internet scale, network traffics are growing, and field of computer technology stores big data It is higher and higher with the requirement of lookup speed.Key-value pair searches problem in all directions such as big data processing and the height of computer realm Using quite wide in fast network.
However, the performance of key-value pair lookup method is severely limited by hardware resource, therefore, in becoming for network function virtualization Under gesture, adaptation software platform is even more important and more adapts to internet development.It can be handled at the same time in a software platform a variety of Task, storage and lookup method cannot take a whole cache (cache) alone, and therefore, this method will be controlled carefully The structure size being put into cache, and avoid frequently replacing.However, most of existing method more adapts to hardware platform, and looking into Look for speed and cache service efficiencies aspect not high enough.
Present existing key-value pair lookup method is broadly divided into Hash table search, tree-like lookup, and based on Broome mistake The lookup method of filter.Hash table method is more traditional method, can perform key by definite time complexity O (1) The insertion of value pair, is deleted, and is searched, but the significant drawbacks of Hash table, which are them, to be needed to handle hash-collision, so that it cannot every time All completed in O (1) complexity.Therefore, carrying out the basic data structure design of method using Hash table merely can not meet currently The requirement of speed is searched key-value pair, and reducing hash-collision needs the memory space of bigger, so as to cause space availability ratio It is relatively low.
Method based on Bloom Filter data structure depends on the spy that Bloom Filter is simple in structure, compact Property.Main method has the method for multiple Bloom Filter combinations, the method for Bloom Filter combination data encoding, based on cloth Deformed data structure of Shandong nurse filter etc..These structural changes structure of original Bloom Filter, improves certain Space utilization rate, but destroy original simple, the compact characteristic of Bloom Filter, therefore given play to The due effects of Bloom filter.
There is compact and quick key-value pair lookup method the present invention relates to a kind of, there is provided a kind of compact-sized Key-value pair storage organization, improves the space availability ratio of traditional Hash table method, and key assignments is completed by definite time complexity To lookup, the compact-sized key-value pair storage organization and quick key-value pair lookup method proposed relates generally to following two Problem:
(1) designed based on multilayer Hash table, the first floor using 2-left hashing hash data structures for multi-level Hash table, Ensure the high usage in the close and space of structure, while reduce hash-collision;
(2) colliding data in respective layer Hash table is stored by the Bloom Filter of multilayer, and offer is by key for searching The no result there are current layer colliding data collection improves as index is searched and searches speed, reduce internal storage access number.
The content of the invention
The defects of the purpose of the invention is to overcome the key-value pair storage organization of existing method not compact enough and in order to Solve current extensive key-value pair storage and search time uncertainty, software realizes that the speed of service does not reach application requirement slowly excessively The problems such as, meet the requirement for rationally striving for less carrying out data replacement using cache, propose a kind of compact-sized key Value is to storage organization and quick key-value pair lookup method.
Idea of the invention is that the structure being combined based on stratified hash table and Bloom Filter, 2- is used by the first floor Left hashing hash data structures store, and every layer of Hash table is all stored in the number that memory contention occurs in the table of upper strata as supplementary table According to, and the preservation that Bloom Filter carries out colliding data is combined, by " Bloom Filter judges that searched key whether there is Result in current collection " improves as index and searches speed, so that improving memory space utilization rate makes compact-sized and realize The software platform requirement quickly searched.A kind of compact-sized key-value pair storage organization and quick key-value pair lookup method, including A kind of compact-sized key-value pair storage organization and a kind of quick key-value pair lookup method;
Wherein, a kind of compact-sized key-value pair storage organization, abbreviation storage organization includes λ layers of Hash table, from the second layer Hash table starts, and the total storage number of blocks included in every layer of Hash table successively decreases in geometric sequence, wherein, the specific factor such as preferable is 10;
Wherein, first floor Hash table, i.e. first layer Hash table, using 2-left hashing Hash tables, which exists The sublist of left and right two, the memory block of same number is included in each sublist;For remainder layer Hash table, there is no the general of sublist Read;
Wherein, each memory block includes N number of storage unit, and the size of each storage unit is M-bit;
Wherein, the scope of M*N is less than the size of the Computer Cache row using the storage organization, and preferable N is 8, preferably M be 32 or 64 bits;
First storage unit preserves in the memory block occupied storage unit number, its balance in each memory block What is stored in storage unit is key-value pair;
Wherein, key-value pair includes key and value;What is stored during specific storage is " check value " of key " value " and key;Wherein, The Value section of storage can store the value in itself or direction value storage location itself pointer, be referred to as being worth hereinafter;Key Common checking algorithm can be used in the calculating of " check value ", such as crc cyclic redundancy check or md5 checking algorithms;
In addition to last layer of Hash table, every layer of Hash table corresponds to a Bloom Filter, is filtered from first layer Broome Device starts, and the shared space size of every layer of Bloom Filter successively decreases in geometric sequence, wherein, it is preferable to wait specific factor as 10;
A kind of quick key-value pair lookup method, including key-value pair data storing process and key-value pair search procedure;
Wherein, key-value pair data storing process, i.e. key-value pair insertion process, abbreviation insertion process;Key-value pair search procedure Abbreviation search procedure;
Wherein, insertion process comprises the following steps that:
Step 1, set the Hash table number of plies and each layer of Hash table to include the number of memory block, while each layer of Kazakhstan is set The size of the corresponding Bloom Filter of uncommon table;
Step 2, the number N of the storage unit included in each memory block of Hash table is set, each storage unit is set Size is M-bit;
The preferable scheme of step 2 is that the size M*N bits for ensureing memory block are equal to or less than Computer Cache row size, by This makes to call in whole memory block in caching during access block, and whens different units for avoiding access block is repeatedly cached Replace, so as to accelerate to search the time;
Step 3, to first layer Hash table, i.e. 2-left hashing, middle insertion data, if not conflicting, the key-value pair It is inserted into successfully, skips to step 5;If conflict, by data markers in the corresponding bloomfilter of first layer, and performs step 4;
Wherein, the data of insertion are exactly the Value section of key-value pair, while also store " check value " of key;
The detailed process of step 3 is:Hash operation twice is carried out to the key in key-value pair, uses two Hash calculation results Determine a memory block respectively in the left and right sublist of 2-left hashing;
Wherein, the Hash operation of every layer of Hash table is fixed that the Hash operation twice of first layer is also fixed herein;
When there is the following two kinds situation, show to be inserted into data collision, it is necessary to skip to step 4:
Storage unit is occupied full in memory block at 3.1A two, i.e., storage unit number is identical in memory block at two, is N-1, equal to the largest key value that can be stored in the block to number, then shows that insertion conflict occurs, enters step 4;
The check value preserved at 3.1B two in memory block in all storage units exists and data to be inserted into or desire insert key " check value " identical storage unit of value pair, then show that insertion conflict occurs, enter step 4;
When two kinds of insertion data conflict situations for being unsatisfactory for 3.1A and 3.1B, i.e., when insertion data do not conflict, exist as follows Two kinds of situations:
The occupied number of unit is different in memory block at 3.2A two, then will be inserted into data, the i.e. value of the key and the key Check value is saved in first idle storage unit of the less memory block of occupied unit number, skips to step 6;
The occupied number of unit is identical in memory block at 3.2B two, then will be inserted into data, will the key it is corresponding value and The check value of the key is saved in first idle storage unit of left side memory block, skips to step 6;
Step 4, judge whether current layer is last layer, and according to judging result, if last layer, then jump into step 7, if current layer is not last layer in Hash table structure, it should add and work as into the corresponding Bloom Filter set of current layer Preceding key, skips to step 5;
Step 5 is jumped into next layer of Hash table, carries out a Hash operation, one is determined using the result of this Hash operation Memory block position, is proceeded as follows by judging the memory block and current key with the presence or absence of conflict:When data are inserted into appearance , it is necessary to skip to step 4 during conflict:When insertion data do not conflict, data will be inserted into, will the value of the key and the check value of the key It is saved in first idle storage unit of the memory block, enters step 6;
Wherein, data collision is inserted into, it is necessary to which two kinds of situations for skipping to step 4 are as follows:
Storage unit is occupied full in 5.1A memory blocks, i.e., storage unit number is N-1 in memory block, equal to can be deposited in the block The largest key value of storage then shows that insertion conflict occurs, need to skip to step 4 to number;
The check value preserved in 5.1B memory blocks in all storage units exists and data to be inserted into or key-value pair to be inserted into " check value " identical storage unit, then show that insertion conflict occurs, step 4 need to be skipped to;
Step 6, data are inserted into successfully, i.e. key-value pair insertion function call success, function returns to 1, terminates insertion process;
Step 7, data insertion failure, i.e. key-value pair insertion function call failure, function return to 0, terminate insertion process;
So far, from step 1 to step 6 or step 7, insertion process is completed;
Part II key-value pair search procedure, i.e., for providing the key in key-value pair, search the value of corresponding key assignments centering Process, abbreviation search procedure, comprises the following steps that:
Step 8, the key provided in key-value pair is searched in first layer Bloom Filter set, by judging to be somebody's turn to do Whether key preserves in the set, and decision skips to step 10 or step 9, is specially:
If 8.1 keys are not present in first layer Bloom Filter set, step 9 is performed;
If 8.2 keys are present in first layer Bloom Filter set, step 10 is performed;
Step 9, Hash calculation twice is carried out in 2-left hashing, its result determines one respectively in the sublist of left and right A memory block;By the check value in two memory blocks in each non-dummy cell and " check value calculated according to current key " ratio Compared with checking whether there is a situation where equal, proceeded as follows according to judging result:
If 9.1 there is a situation where it is equal, then it represents that the value that the unit preserves is the searched corresponding value of key, skips to step Rapid 14;
If 9.2 there is no equal situation, the corresponding value of key to be not present in this storage organization, step 15 is skipped to;
Step 10, into next layer of Bloom Filter, in next layer of Bloom Filter set to the key that provides into Row is searched, by judging whether the key proceeds as follows in the Bloom Filter set:
If 10.1 keys are present in the Bloom Filter set, by judging that layer where current Bloom Filter is No is last layer, and decision skips to step 11 or step 10, is specially:
If layer where the current Bloom Filters of 10.1A is not last layer, step 10 is skipped to;
If layer where the current Bloom Filters of 10.1B is last layer, step 11 is skipped to;
If 10.2 keys are not present in this layer of Bloom Filter set, into current Bloom Filter respective layer Hash table, skips to step 12;
Step 11, into last layer of Hash table;
Step 12, the corresponding check value of key for searching in current layer Hash table, carries out a Hash operation to key, passes through Kazakhstan The result of uncommon computing determines a memory block in Hash table, and the check value of each non-dummy cell is traveled through in memory block and is compared Whether equal with the check value of current key, according to whether there is a situation where equal, judgement skips to step 13 or step 14, specifically For:
If 12.1 there is a situation where equal, that is, equal check value is have found, then enters step 13;
If 12.2 there is no equal situation, i.e., do not find equal check value, then enter step 14;
Step 13, function call success is searched, function returns to the value preserved in current memory cell, terminates to search;
Step 14, function call failure is searched, function returns to 0, terminates to search;
So far, from step 1 to step 13 or step 14, a kind of quick key-value pair lookup method is completed.
Beneficial effect
A kind of compact-sized key-value pair storage organization of the invention and quick key-value pair lookup method, with existing method phase Than having the advantages that:
1. the method for the invention ensures data by the stratified hash table using 2-left hashing Hash tables as the first floor The compactedness of storage, by using Bloom Filter memory contention data, effectively reduces the sky that Bloom Filter takes Between, that is, efficiently solve the problem of memory space utilization rate is low;
2. the problem of contrast existing methods searches speed slowly and is not suitable with software platform, this method is adapted to software and puts down The cache memory sizes of platform, and lookup speed is improved, it ensure that real-time.
Brief description of the drawings
Fig. 1 is a kind of compact-sized key-value pair fast searching method (insertion process) flow signal of the embodiment of the present invention Figure;
Fig. 2 is a kind of compact-sized key-value pair fast searching method (search procedure) flow signal of the embodiment of the present invention Figure;
Fig. 3 is key-value pair storage organization schematic diagram compact-sized in the present invention;
Fig. 4 is the schematic diagram that insertion process uses 2-left hashing in quick key-value pair lookup method embodiment 1;
Fig. 5 is the schematic diagram that insertion process uses 2-left hashing in quick key-value pair lookup method embodiment 2;
Fig. 6 is the schematic diagram that search procedure is used in quick key-value pair lookup method embodiment 2.
Embodiment
To make the object, technical solution and advantage of invention clearer, the specific reality below in conjunction with attached drawing to the present invention The mode of applying is described in detail.The present embodiment is implemented premised on technical solution of the present invention, gives detailed embodiment And specific operation process, but protection scope of the present invention is not limited to following embodiments.
The present invention is for a kind of compact key-value pair storage organization of Hash table and Bloom Filter based on layering and soon Fast key-value pair lookup method, its specific steps are divided into two parts, and Fig. 1 show Part I, key-value pair insertion process, Fig. 2 institutes It is shown as Part II, key-value pair search procedure.
Embodiment 1
The specific steps of Part I key-value pair insertion process include:
Step 1, set the Hash table number of plies and each layer of Hash table to include the number of memory block, while each layer of Kazakhstan is set The size of the corresponding Bloom Filter of uncommon table;
In the present embodiment is as shown in figure 3, it is 4 layers to set the hash data structure number of plies, first floor Hash table 2-left hashing bags 204800 memory blocks are included, second layer Hash table includes 500 memory blocks, and third time Hash table includes 50 memory block the 4th Layer Hash table includes 5 memory blocks;
It is 5000 bytes to set first layer Bloom Filter size, and lower floor is decremented to 500 bytes, 50 bytes successively;
Step 2, the number N of the storage unit included in each memory block of Hash table is set, each storage unit is set Size is M-bit;
In embodiment, it is that 8, M is 32 bits to set N;The size of each memory block is 8*32=256 bits;Each storage Unit includes the Value section and " check value " part, and value part accounts for 16 bits, and verification value part accounts for 16 bits;
The first floor is 2-left hashing data structures, which is made of two sublists, as shown in Figure 4;Each Sublist is made of the memory block of same number, and in the present embodiment, each sublist includes 102400 memory blocks;
Step 3, data are inserted into first layer Hash table, operation is performed according to judging whether to conflict;
In the present embodiment, as shown in figure 4, carrying out Hash operation twice to the key1 in key-value pair (key1, value1) Hash1 (key1) and Hash2 (key1), wherein, Hash1 is the hash function that left sublist uses, and Hash2 is that right sublist uses Hash function, twice the result of Hash operation correspond to respectively in 2-left hashing in two sublists in left and right two at store Block;
Check first unit of memory block at two, two in figure at memory block occupied respectively 2 and 4 units, And check value conflict is not present, then by first free cells of memory block of the check value deposit left side sublist of value1 and key1 In, and the counter of first unit is added up 1, storing step finishes, and enters step 4;
Step 4, it is inserted into successfully, function returns to 1, terminates addition step;
Part II key-value pair search procedure, comprises the following steps that:
Step 5, the key provided is searched in first layer Bloom Filter set, by judge the key whether Preserved in the set to proceed as follows:
In the present embodiment, which is not present in first layer Bloom Filter set, performs step 6;
Step 6, key key1, the corresponding value of key for searching in first layer 2-left hashing structures, according to lookup are passed through As a result judge to perform step;
In the present embodiment, Hash11 (key1) and Hash12 (key1) computing is carried out to key1 first, in 2-left A memory block is found respectively in two sublists of hashing;
The check code of key1 is calculated, by it compared with check value all in two memory blocks, is deposited at this time in left side Identical check value is found in storage block, expression is searched successfully, enters step 7;
Step 7, the value1 found is returned, terminates to search;
Embodiment 2
Part I key-value pair insertion process, with reference to attached drawing 3 and 5, comprises the following steps that:
Step 1, set the Hash table number of plies and each layer of Hash table to include the number of memory block, while each layer of Kazakhstan is set The size of the corresponding Bloom Filter of uncommon table;
In the present embodiment, same embodiment 1 is set;
Step 2, the number N of the storage unit included in each memory block of Hash table is set, each storage unit is set Size is M-bit;
In the present embodiment, same embodiment 1 is set;
Step 3, data are inserted into first layer Hash table, operation is performed according to judging whether to conflict;
In the present embodiment, in key-value pair (key2, value2) key2 carry out twice Hash operation Hash1 (key2) and Hash2 (key2), corresponding two positions of the value are full as shown in figure 5, occupied 7 units, therefore produce insertion punching Prominent, dotted line 1 represents insertion conflict in Fig. 3, enters step 4;
Step 4, judge whether current layer is last layer, and operated according to judging result;
Current layer is not last layer in Hash table structure, should be added into the corresponding Bloom Filter set of current layer Key before sovolin, as shown in solid line 2 in Fig. 3, after skip to step 5;
Step 5, jump into next layer of Hash table, carry out a Hash operation, one is determined using the result of this Hash operation A memory block position, is proceeded as follows by judging the memory block and current key with the presence or absence of conflict:
In the present embodiment, into second layer Hash table;
In second layer Hash table, a Hash operation is carried out, which determines a memory block position, this implementation In example, a Hash operation is carried out to key2 and determines a memory block;
In currently stored piece, stored 3 data, less than, reexamine key2 check value whether with currently stored piece It is identical to store the check value of data, finds identical check code herein, therefore key2 also produces conflict in second layer Hash table, Dotted line 3 represents to produce conflict in the second layer in Fig. 3;
Repeat step 4 at this time, key2 is added into second layer Bloom Filter, such as Fig. 3 solid lines 4;
Into third layer Hash table, key2 is inserted into third layer Hash table;A Hash operation, mapping are carried out to key2 Some memory block of key2 to third layer Hash table;
In currently stored piece, store less than, while verify that the check value of key2 is unique in the memory block, by key2 and The check code of key2 is stored in first empty unit, such as Fig. 3 solid lines 5, enters step 6;
Step 6, it is inserted into successfully, function returns to 1, terminates addition step;
Part II key-value pair search procedure, with reference to attached drawing 6, comprises the following steps that:
Step 7, key2 is searched in first layer Bloom Filter set, operation is performed according to whether finding;
In the present embodiment, judge that key2 whether there is in first layer Bloom Filter, be as a result yes, such as Fig. 6 solid lines 1, then enter step 8;
Step 8, into next layer of Bloom Filter, the key provided is carried out in next layer of Bloom Filter set Search, by judging whether the key proceeds as follows in the Bloom Filter set;
In the present embodiment, key2 is present in second layer Bloom Filter set, and is not last layer, in Fig. 6 Shown in solid line 2, therefore repeat step 8, i.e., into third layer Bloom Filter, key2 is searched whether in the set;
As shown in dotted line 3 in Fig. 6, key2 is not present in third layer Bloom Filter set, enters step 9;
Step 9, into the Hash table of current Bloom Filter respective layer, step 10 is performed;
In the present embodiment, into third layer Hash table;
Step 10, the corresponding value of key for searching in current Hash table, judges to perform step according to lookup result;
In the present embodiment, in third layer Hash table, a Hash operation is carried out, a storage is determined according to operation result Block, is searched in the memory block, if there is the check value identical with key2 check values;
It has found identical check value in the present embodiment, the value preserved in the storage unit is value2 or direction The pointer of value2;
Step 11, value2 is returned, lookup terminates.
Conclusion
The present invention is mainly made of compact key-value pair storage organization and key-value pair fast searching method two parts, fast quick checking Method is looked for be divided into adding procedure and search procedure.Data are stored by using the Hash table of multilayer, are especially used with 2-left Hashing Hash tables carry out multilayer Hash table for the first floor and store key-value pair, ensure the compactedness of key-value pair storage organization.By making The data that conflict is produced in corresponding Hash table are recorded in Bloom Filter, reduce internal storage access number, are improved slow to high speed The utilization deposited.The time overhead that key-value pair is searched is more constant, because more than 95% data can all be stored in 2-left In hashing, the key-value pair in 2-left hashing, which is searched, averagely needs 1.5 internal storage access, has reached lookup speed Constant purpose.By using the sufficiently small Bloom Filter structure that can be preserved in the caches, it is searched and is not produced Raw internal storage access number, therefore search and be exceedingly fast, achieve the purpose that accelerate key-value pair lookup realizes medium velocity in software.
Since then, just complete based on first floor 2-left hashing, and use Bloom Filter record hash-collision item Carry out storage and the fast searching method of key-value pair.
Although the attached drawing embodiment that the invention has been described is combined, it will be apparent to those skilled in the art that not On the premise of departing from the principle of the invention, some improvement can also be made, these also should be regarded as belonging to protection scope of the present invention.

Claims (10)

1. a kind of compact-sized key-value pair storage organization and quick key-value pair lookup method, it is characterised in that:Core concept is The structure being combined based on stratified hash table and Bloom Filter, is deposited by the first floor using 2-left hashing hash data structures Storage, every layer of Hash table is all stored in the data that memory contention occurs in the table of upper strata as supplementary table, and combines Bloom Filter The preservation of colliding data is carried out, will " Bloom Filter judges that searched key whether there is the result in current collection " conduct Index, which improves, searches speed, so that improving memory space utilization rate makes compact-sized and realize that the software platform quickly searched will Ask.
A kind of 2. compact-sized key-value pair storage organization, it is characterised in that:Abbreviation storage organization includes λ layers of Hash table, from Two layers of Hash table start, and the total storage number of blocks included in every layer of Hash table successively decreases in geometric sequence, wherein, it is preferable etc. compare because Son is 10;
Wherein, first floor Hash table, i.e. first layer Hash table, using 2-left hashing Hash tables, which has left and right Two sublists, the memory block of same number is included in each sublist;For remainder layer Hash table, there is no the concept of sublist;
Wherein, each memory block includes N number of storage unit, and the size of each storage unit is M-bit;
Wherein, the scope of M*N is less than the size of the Computer Cache row using the storage organization, and preferable N is 8, and preferable M is 32 or 64 bits;
First storage unit preserves in the memory block that occupied storage unit number, remaining storage are single in each memory block What is stored in member is key-value pair;
Wherein, key-value pair includes key and value;What is stored during specific storage is " check value " of key " value " and key;Wherein, store The Value section can store the value in itself or direction value storage location itself pointer, be referred to as being worth hereinafter;" the verification of key Common checking algorithm can be used in the calculating of value ", such as crc cyclic redundancy check or md5 checking algorithms;
In addition to last layer of Hash table, every layer of Hash table corresponds to a Bloom Filter, is opened from first layer Bloom Filter Beginning, the shared space size of every layer of Bloom Filter successively decreases in geometric sequence, wherein, it is preferable to wait specific factor as 10.
A kind of 3. quick key-value pair lookup method, it is characterised in that:Searched including key-value pair data storing process and key-value pair Journey;
Wherein, key-value pair data storing process, i.e. key-value pair insertion process, abbreviation insertion process;Key-value pair search procedure is referred to as Search procedure;
Wherein, insertion process comprises the following steps that:
Step 1, set the Hash table number of plies and each layer of Hash table to include the number of memory block, while each layer of Hash table is set The size of corresponding Bloom Filter;
Step 2, the number N of the storage unit included in each memory block of Hash table is set, the size of each storage unit is set For M-bit;
Step 3, to first layer Hash table, i.e. 2-left hashing, middle insertion data, if not conflicting, which is inserted into Success, skips to step 5;If conflict, by data markers in the corresponding bloom filter of first layer, and performs step 4;
Step 4, judge whether current layer is last layer, and according to judging result, if last layer, then jump into step 7, if Current layer is not last layer in Hash table structure, should be added into the corresponding Bloom Filter set of current layer current Key, skips to step 5;
Step 5 is jumped into next layer of Hash table, carries out a Hash operation, and a storage is determined using the result of this Hash operation Block position, is proceeded as follows by judging the memory block and current key with the presence or absence of conflict:When data collision is inserted into appearance When, it is necessary to skip to step 4;When insertion data do not conflict, data will be inserted into, will the value of the key and the check value preservation of the key Into first idle storage unit of the memory block, 6 are entered step;
Step 6, data are inserted into successfully, i.e. key-value pair insertion function call success, function returns to 1, terminates insertion process;
Step 7, data insertion failure, i.e. key-value pair insertion function call failure, function return to 0, terminate insertion process;
So far, from step 1 to step 6 or step 7, insertion process is completed;
Part II key-value pair search procedure, i.e., for providing the key in key-value pair, lookup corresponds to the process of the value of key assignments centering, Abbreviation search procedure, comprises the following steps that:
Step 8, the key provided in key-value pair is searched in first layer Bloom Filter set, by judging that the key is No to be preserved in the set, decision skips to step 10 or step 9;
Step 9, Hash calculation twice is carried out in 2-left hashing, its result determines one respectively in the sublist of left and right and deposits Store up block;Check value in two memory blocks in each non-dummy cell and " check value calculated according to current key " are compared, looked into See the equal situation that whether there is, decision skips to step 14 or step 15,
Step 10, into next layer of Bloom Filter, the key provided is looked into next layer of Bloom Filter set Look for, by judging whether the key proceeds as follows in the Bloom Filter set:
If 10.1 keys are present in the Bloom Filter set, by layer where judging current Bloom Filter whether be Last layer, decision skip to step 11 or step 10;
If 10.2 keys are not present in this layer of Bloom Filter set, into the Hash of current Bloom Filter respective layer Table, skips to step 12;
Step 11, into last layer of Hash table;
Step 12, the corresponding check value of key for searching in current layer Hash table, carries out a Hash operation to key, is transported by Hash The result of calculation determines a memory block in Hash table, traveled through in memory block the check value of each non-dummy cell and compare whether Equal with the check value of current key, according to whether there is a situation where equal, judgement skips to step 13 or step 14;
Step 13, function call success is searched, function returns to the value preserved in current memory cell, terminates to search;
Step 14, function call failure is searched, function returns to 0, terminates to search;
So far, from step 1 to step 13 or step 14, a kind of quick key-value pair lookup method is completed.
A kind of 4. quick key-value pair lookup method according to claim 3, it is characterised in that:The preferable scheme of step 2 is Ensure that the size M*N bits of memory block are equal to or less than Computer Cache row size, thus make entirely to deposit during access block Storage block is called in caching, avoids carrying out repeatedly caching replacement during the different units of access block, so as to accelerate to search the time.
A kind of 5. quick key-value pair lookup method according to claim 3, it is characterised in that:In step 3, the data of insertion It is exactly the Value section of key-value pair, while also stores " check value " of key;The detailed process of step 3 is:To in key-value pair Key carries out Hash operation twice, and one is determined respectively in the left and right sublist of 2-left hashing using two Hash calculation results A memory block;
Wherein, the Hash operation of every layer of Hash table is fixed that the Hash operation twice of first layer is also fixed herein;
When there is the following two kinds situation, show to be inserted into data collision, it is necessary to skip to step 4:
Storage unit is occupied full in memory block at 3.1A two, i.e., storage unit number is identical in memory block at two, is N-1, Equal to the largest key value that can be stored in the block to number, then show that insertion conflict occurs, enter step 4;
The check value preserved at 3.1B two in memory block in all storage units exists and data to be inserted into or key-value pair to be inserted into " check value " identical storage unit, then show that insertion conflict occurs, enter step 4;
When two kinds of insertion data conflict situations for being unsatisfactory for 3.1A and 3.1B, i.e., when insertion data do not conflict, there are the following two kinds Situation:
The occupied number of unit is different in memory block at 3.2A two, then will be inserted into the verification of data, the i.e. value of the key and the key Value is saved in first idle storage unit of the less memory block of occupied unit number, skips to step 6;
The occupied number of unit is identical in memory block at 3.2B two, then will be inserted into data, will the corresponding value of the key and the key Check value be saved in left side memory block first idle storage unit in, skip to step 6.
A kind of 6. quick key-value pair lookup method according to claim 3, it is characterised in that:In step 5, insertion data punching Dash forward, it is necessary to which two kinds of situations for skipping to step 4 are as follows:
Storage unit is occupied full in 5.1A memory blocks, i.e., storage unit number is N-1 in memory block, equal to what can be stored in the block Largest key value to number then shows that insertion conflict occurs;
The check value preserved in 5.1B memory blocks in all storage units exists and " the school of data to be inserted into or key-value pair to be inserted into Test value " identical storage unit, then show that insertion conflict occurs.
A kind of 7. quick key-value pair lookup method according to claim 3, it is characterised in that:Step 8 is by judging the key Whether preserved in the set, decision skips to step 10 or step 9, is specially:
If 8.1 keys are not present in first layer Bloom Filter set, step 9 is performed;
If 8.2 keys are present in first layer Bloom Filter set, step 10 is performed.
A kind of 8. quick key-value pair lookup method according to claim 3, it is characterised in that:Step 9 is by two memory blocks In check value in each non-dummy cell and " check value calculated according to current key " compare, check whether that there are equal feelings Condition, decision skip to step 14 or step 15, are specially:
If 9.1 there is a situation where it is equal, then it represents that the value that the unit preserves is the searched corresponding value of key, skips to step 14;
If 9.2 there is no equal situation, the corresponding value of key to be not present in this storage organization, step 15 is skipped to.
A kind of 9. quick key-value pair lookup method according to claim 3, it is characterised in that:It is current by judging in 10.1 Whether layer where Bloom Filter is last layer, and decision skips to step 11 or step 10, is specially:
If layer where the current Bloom Filters of 10.1A is not last layer, step 10 is skipped to;
If layer where the current Bloom Filters of 10.1B is last layer, step 11 is skipped to.
A kind of 10. quick key-value pair lookup method according to claim 3, it is characterised in that:Step 12 is in memory block Whether the check value of each non-dummy cell of traversal simultaneously compares equal with the check value of current key, and according to whether there are equal feelings Condition, judgement skip to step 13 or 14, are specially:
If 12.1 there is a situation where equal, that is, equal check value is have found, then enters step 13;
If 12.2 there is no equal situation, i.e., do not find equal check value, then enter step 14.
CN201711287661.7A 2017-12-07 2017-12-07 Key value pair storage structure with compact structure and quick key value pair searching method Active CN108021678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711287661.7A CN108021678B (en) 2017-12-07 2017-12-07 Key value pair storage structure with compact structure and quick key value pair searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711287661.7A CN108021678B (en) 2017-12-07 2017-12-07 Key value pair storage structure with compact structure and quick key value pair searching method

Publications (2)

Publication Number Publication Date
CN108021678A true CN108021678A (en) 2018-05-11
CN108021678B CN108021678B (en) 2022-05-17

Family

ID=62078968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711287661.7A Active CN108021678B (en) 2017-12-07 2017-12-07 Key value pair storage structure with compact structure and quick key value pair searching method

Country Status (1)

Country Link
CN (1) CN108021678B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684325A (en) * 2018-11-07 2019-04-26 天津大学 A kind of efficient RDF data storage inquiry system
CN114422478A (en) * 2021-12-21 2022-04-29 福建星网智慧科技有限公司 Method for mapping GEMPORT ID and MAC address and storage device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197851A1 (en) * 2011-01-27 2012-08-02 Quantum Corporation Considering multiple lookups in bloom filter decision making
CN104115146A (en) * 2012-02-14 2014-10-22 阿尔卡特朗讯公司 Method for storing and searching tagged content items in a distributed system
CN104579974A (en) * 2014-12-22 2015-04-29 湖南大学 Hash Bloom filter (HBF) for name lookup in NDN and data forwarding method
CN105574076A (en) * 2015-11-27 2016-05-11 湖南大学 Key value pair storage structure based on Bloom Filter and method
CN107330094A (en) * 2017-07-05 2017-11-07 湖南大学 The Bloom Filter tree construction and key-value pair storage method of dynamic memory key-value pair

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197851A1 (en) * 2011-01-27 2012-08-02 Quantum Corporation Considering multiple lookups in bloom filter decision making
CN104115146A (en) * 2012-02-14 2014-10-22 阿尔卡特朗讯公司 Method for storing and searching tagged content items in a distributed system
CN104579974A (en) * 2014-12-22 2015-04-29 湖南大学 Hash Bloom filter (HBF) for name lookup in NDN and data forwarding method
CN105574076A (en) * 2015-11-27 2016-05-11 湖南大学 Key value pair storage structure based on Bloom Filter and method
CN107330094A (en) * 2017-07-05 2017-11-07 湖南大学 The Bloom Filter tree construction and key-value pair storage method of dynamic memory key-value pair

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684325A (en) * 2018-11-07 2019-04-26 天津大学 A kind of efficient RDF data storage inquiry system
CN114422478A (en) * 2021-12-21 2022-04-29 福建星网智慧科技有限公司 Method for mapping GEMPORT ID and MAC address and storage device

Also Published As

Publication number Publication date
CN108021678B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
US7558802B2 (en) Information retrieving system
CN110083601B (en) Key value storage system-oriented index tree construction method and system
US9563477B2 (en) Performing concurrent rehashing of a hash table for multithreaded applications
US6240418B1 (en) Database apparatus
CN102521334B (en) Data storage and query method based on classification characteristics and balanced binary tree
CN112669155B (en) Block chain-based transaction distribution execution method, device server and storage medium
CN109416694A (en) The key assignments storage system effectively indexed including resource
CN107545021A (en) A kind of date storage method and device
US20100057647A1 (en) Accommodating learned clauses in reconfigurable hardware accelerator for boolean satisfiability solver
CN107391600A (en) Method and apparatus for accessing time series data in internal memory
Li et al. ASLM: Adaptive single layer model for learned index
CN107153643A (en) Tables of data connection method and device
CN115617712A (en) LRU replacement algorithm based on set associative Cache
CN108021678A (en) A kind of compact-sized key-value pair storage organization and quick key-value pair lookup method
Zhang et al. COLIN: a cache-conscious dynamic learned index with high read/write performance
CN113779154B (en) Construction method and application of distributed learning index model
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
CN108804571B (en) Data storage method, device and equipment
CN113392040B (en) Address mapping method, device and equipment
CN109492020A (en) A kind of data cache method, device, electronic equipment and storage medium
CN111444036B (en) Data relevance perception erasure code memory replacement method, equipment and memory system
CN114238226A (en) NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction
CN110413617B (en) Method for dynamically adjusting hash table group according to size of data volume
CN111949439B (en) Database-based data file updating method and device
CN117729176B (en) Method and device for aggregating application program interfaces based on network address and response body

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant