CN116719813A - Hash table processing method and device and electronic equipment - Google Patents

Hash table processing method and device and electronic equipment Download PDF

Info

Publication number
CN116719813A
CN116719813A CN202310630033.3A CN202310630033A CN116719813A CN 116719813 A CN116719813 A CN 116719813A CN 202310630033 A CN202310630033 A CN 202310630033A CN 116719813 A CN116719813 A CN 116719813A
Authority
CN
China
Prior art keywords
hash table
value pair
key value
key
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310630033.3A
Other languages
Chinese (zh)
Inventor
冯丹
胡燏翀
肖仁智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310630033.3A priority Critical patent/CN116719813A/en
Publication of CN116719813A publication Critical patent/CN116719813A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a processing method, a device and electronic equipment of a hash table, wherein the hash table comprises a persistent data layer, a plurality of hash buckets, each hash bucket stores a plurality of key value pairs, and a bloom filter is arranged on a volatile filter layer; wherein, the key of each key value pair stored in the hash bucket is inserted into the bloom filter and is mapped with the data of N bits in the bloom filter, the data bits mapped by the keys of any two key value pairs in the hash bucket in the bloom filter can not be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key; when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so as to reduce the access cost to a persistent data layer and improve the negative query performance of the hash table. The invention improves the processing performance of the hash table.

Description

Hash table processing method and device and electronic equipment
Technical Field
The invention belongs to the field of computer storage, and in particular relates to a hash table processing method, a hash table processing device and electronic equipment.
Background
Persistent memory (Persistent Memory, PM) provides a new way to build large-scale, low-latency memory and storage systems. PM, such as phase change memory and Intel Optane DC persistent memory modules (Data Center Persistent Memory Module, DCPMM), provide desirable characteristics, such as high capacity, high performance, non-volatility, and byte addressing capability. PM is expected to replace DRAM to become a next generation candidate memory, and fills the performance gap between a Hard Disk Drive (HDD) and a dynamic random access memory (Dynamic Random Access Memory, DRAM), and enriches the storage layers. The DCPMM single device capacity of intel release is up to 256GB, with a maximum single device capacity of 8TB (512 GB/DIMM 16 DIMM), which can provide a high throughput, low latency and low total cost of ownership (Total Cost of Ownership, TCO) real-time processing system for various applications.
Hash indexes are widely used in memory index structures of key-value storage systems, such as Memcached and Redis, due to their constant level of fast lookup performance. One unique weakness of hash indexes is that if multiple entries have key collisions (i.e., hash collisions), then the hash table needs to be re-hashed or resized if it is full. As PM evolves, many researchers have been working on designing efficient PM-based persistent hash indexes, such as Level hash, CCEH, PCLHT, SOFT, clevel, and Dash.
Positive searches find existing elements in the hash table, while negative searches find elements that do not exist in the hash table. However, the PM-based hash index described above still has poor negative search performance. Our study of their negative search analysis shows that the negative query throughput of a persistent hash index is only 45.1% -79.8% of its positive query throughput. Negative queries in PM-based hash index systems can result in large PM accesses, which can significantly reduce system performance, especially when handling workloads where frequent queries do not have elements present, and thus improving negative query performance is also important. And when the data load is too large, the average barrel chain length of the hash barrel is too long, so that the positive query throughput of the hash table is too large, and the system performance is affected.
In summary, the processing performance of the existing hash table needs to be improved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a processing method, a device and electronic equipment of a hash table, and aims to solve the problem that the processing performance of the existing hash table is to be improved.
To achieve the above object, in a first aspect, the present invention provides a method for processing a hash table, where the hash table includes a plurality of hash buckets, each of which stores a plurality of key value pairs, the method including the steps of:
setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so as to reduce the access cost to a persistent data layer when the key value pair does not exist in the hash table, and improve the negative query performance of the hash table.
In one possible implementation, the method further includes the steps of:
when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, and if the corresponding key exists, searching or deleting the corresponding key in the hash table according to the corresponding key value pair.
In one possible implementation, the method further includes the steps of:
when a certain key value pair needs to be inserted into the hash table, judging whether the key value pair to be inserted exists in the hash table, and if so, returning an insertion failure indication;
if the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number;
searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish the key value pair insertion; the merging refreshing mode is used for reducing the refreshing times of cache lines and reducing the insertion delay cost of the hash table.
In one possible implementation manner, searching whether an idle key value pair slot exists in the locked hash bucket, if not, adopting a head insertion method to allocate a new chain bucket from a nonvolatile memory space and assign a temporary bucket pointer, setting the first slot position of the new chain bucket as a position to be inserted with the key value pair, inserting the key value pair to be inserted, and adopting a merging refreshing mode to persist the inserted key value pair to the nonvolatile memory; and then assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persisting the next bucket pointer of the head bucket into the nonvolatile memory by using a cache line refreshing instruction.
In one possible implementation, the method further includes the steps of:
and dynamically setting the initial head barrel length of the hash table according to the data load of the persistent data layer, wherein when the data load is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
In a second aspect, the present invention provides a processing apparatus for a hash table, the hash table including a persistent data layer, including a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the apparatus comprising:
a bloom filter setting unit for setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
and the hash table negative query unit is used for searching corresponding keys in the bloom filter when the value in a certain key value pair in the hash table is required to be searched or deleted, returning a search failure or deletion failure indication if the corresponding key value pair does not exist in the hash table, reducing the access cost to the persistent data layer when the key value pair does not exist in the hash table, and improving the negative query performance of the hash table.
In one possible implementation, the apparatus further includes:
the hash table inserting unit is used for firstly judging whether a key value pair to be inserted exists in the hash table or not when a certain key value pair needs to be inserted in the hash table, and returning an insertion failure indication if the key value pair to be inserted exists in the hash table; if the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number; searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish the key value pair insertion; the merging refreshing mode is used for reducing the refreshing times of cache lines and reducing the insertion delay cost of the hash table.
In one possible implementation manner, the hash table inserting unit is configured to search whether an idle key value pair slot exists in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign the new chain bucket to the temporary bucket pointer, set the first slot of the new chain bucket as a position where a key value pair is to be inserted, insert the key value pair to be inserted, and persistence the inserted key value pair to the nonvolatile memory by adopting a merging refreshing mode; and then assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persisting the next bucket pointer of the head bucket into the nonvolatile memory by using a cache line refreshing instruction.
In one possible implementation, the apparatus further includes:
the head barrel length setting unit is used for dynamically setting the initial head barrel length of the hash table according to the data load capacity of the persistent data layer, when the data load capacity is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
In a third aspect, the present invention provides an electronic device comprising: at least one memory for storing a program; at least one processor for executing a memory-stored program, which when executed is adapted to carry out the method described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, the invention provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
the invention provides a processing method, a device and electronic equipment of a hash table, when key value pairs which do not exist in the hash table are required to be searched or deleted, the traditional method is that negative inquiry is carried out in the hash table of a persistent data layer, all the key value pairs in the hash table are searched once, and when the key value pairs do not exist, the searching cost is large and unnecessary. The invention can rapidly locate the non-existing key through the bloom filter positioned in the volatile filter layer, and avoid the search of the non-existing key value pair in the persistent data layer, thereby improving the negative query performance and the deletion operation of the non-existing key value pair. According to the invention, when the hash table of the persistent data layer inserts key value pairs, cache line refreshing times are reduced through a head insertion method and a combined refreshing operation, and the insertion performance is improved. The invention reduces the average chain barrel length of each head barrel by setting the relatively longer head barrel length, thereby improving the positive inquiry performance.
Drawings
FIG. 1 is a diagram of a nonvolatile memory structure according to an embodiment of the present invention;
FIG. 2 is a diagram of a hash bucket architecture provided by an embodiment of the present invention;
FIG. 3 is a mapping relationship diagram of bloom filters and keys provided by an embodiment of the present invention;
FIG. 4 is a flowchart of a hash table processing method according to an embodiment of the present invention;
FIG. 5 is a hash bucket concurrency control diagram based on lock provided by an embodiment of the present invention;
FIG. 6 is a key-value pair insertion flow chart of a hash table provided by an embodiment of the present invention;
FIG. 7 is a flow chart of a hash table lookup provided by an embodiment of the present invention;
FIG. 8 is a flowchart for deleting hash tables provided by an embodiment of the present invention;
fig. 9 is a diagram of a hash table processing apparatus according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "a plurality" means two or more.
Aiming at the defects and improvement demands of the prior art, the invention provides a processing method of a hash table to reduce negative search to accelerate a persistent hash index, and aims to improve the negative query performance of the persistent hash index, provide high multithreading expandability and simultaneously ensure low data consistency overhead.
The architecture diagram of the nonvolatile memory provided by the embodiment of the invention is shown in fig. 1: the hash table may be a conventional chain hash or other hash types such as a scalable hash (extendible hashing). The volatile filter layer (Volatile Filter Layer, VFL) is a DRAM-based bloom filter that acts to reduce negative lookup operations on hash tables in the PM, greatly enhancing the negative lookup performance of hash tables and reducing the PM access overhead that is more expensive than DRAM. The persistent data layer (Persistent Data Layer, PDL) is a hash table (may be a chain hash or other hash tables such as an extensible hash) based on a persistent memory, where, taking the chain hash as an example, the initial header Bucket length of the chain hash is set to N, where N may be dynamically set according to a scenario, and the initial header Bucket is divided into a header Bucket (Head Bucket) and a chain Bucket (chain Bucket), where each of the header Bucket and the chain Bucket is a multi-slot Bucket. In general, the longer the initial head barrel length, the better, so that the smaller the average barrel chain length of each corresponding head barrel, and the relatively smaller the positive query overhead. However, when the length of the head barrel is too long and the load capacity is relatively small, the memory space is wasted, so that a person skilled in the art can set the length of the head barrel according to the actual situation to integrate the memory and the forward query cost.
In addition, the persistent data layer needs to ensure multi-threaded lock-based concurrency correctness and low data consistency overhead at system failure, while the VFL needs to quickly recover from the consistent persistent data layer at system failure.
Fig. 2 is a mapping relationship diagram of bloom filter and key provided in an embodiment of the present invention. As shown in fig. 2, y 1 、y 2 Y 3 Representing three keys, each key mapped to three bits of the bloom filter, threeThe bits are spread out in different locations. As can be seen from fig. 3, one bit can be mapped by at least one key, and the three bits of different key maps do not overlap completely. If the key is present, the data of the corresponding mapped three bits is set to 1, and if the key is not present, the data of the corresponding bits is set to 0.
To increase cache line efficiency, the size of the header and chain buckets may be an integer multiple of the 64 byte cache line size, may be 64 bytes in size, may be 128 bytes or 256 bytes, and the like. Here, a header bucket and a chain bucket with a size of 64 bytes are illustrated, for example, as shown in fig. 3. Where Token represents a tag, KV represents a key pair, next BP represents the Next Bucket Pointer (BP), and Dummy is the extra complement space added for 64 byte cache line alignment. A flag value of 0 indicates that the corresponding key-value pair slot is not used, and a value of 1 indicates that the corresponding key-value pair slot is occupied. The KV may be a fixed-length key-value pair size, or an 8-byte pointer to a variable-length key-value pair may be stored, where the fixed-length is taken as an example, and the key and value sizes are both 8 bytes.
Fig. 4 is a flowchart of a hash table processing method provided by an embodiment of the present invention, where the hash table includes a persistent data layer, and includes a plurality of hash buckets, each of which stores a plurality of key-value pairs, and the method includes the following steps:
s101, setting a bloom filter in a volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
s102, when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so that access cost to a persistent data layer is reduced when the key value pair does not exist in the hash table, and negative query performance of the hash table is improved.
In one example, FIG. 5 is a hash bucket concurrency control diagram based on a lock provided by an embodiment of the present invention, and it can be seen that read operations employ shared read locks and write operations employ exclusive write locks, each lock size containing a range of 256 64 byte header buckets.
In one example, FIG. 6 is a key-value pair insertion flow chart of a hash table provided by an embodiment of the present invention; the method comprises the following steps:
(1) And calculating a hash value and a head bucket number according to a given key, and finding a bucket pointer corresponding to the chain hash in the persistent data layer.
(2) Before inserting the key value pairs, judging whether the key value pairs to be inserted exist in the chain hash of the nonvolatile memory. If yes, executing (3); otherwise, executing the step (4).
(3) The hash table has key value pairs to be inserted, and the insertion failure is directly returned.
(4) And according to the calculated specific head barrel number, locking the lock range of the head barrel by adopting an intelligent mutual exclusion lock.
(5) Searching whether an idle key value pair slot exists in the initial head barrel and the first chain barrel (if yes), and executing the step (6) if the idle slot exists; if not, step (7) is performed.
(6) And modifying the key value pair slot position value and the key into the value and the key of the key value pair to be inserted in sequence, and setting the mark of the slot position as 1. The modifications are all persisted to nonvolatile memory at a bucket granularity of 64 byte cache line size using a merge refresh approach. At this point the insertion was successful.
(7) A new chain bucket is distributed from a nonvolatile memory space by adopting a head insertion method and assigned to a temporary bucket pointer, a first slot position of the chain bucket is set as a position to be inserted with a key value pair, the value of the slot position, a key and a corresponding flag bit are modified in the same way, and the modification is persisted into PM in a combined cache line refreshing mode based on 8 byte atom update.
(8) And setting a memory fence instruction, assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and using a cache line refreshing instruction to persist the next bucket pointer of the head bucket into the nonvolatile memory.
(9) Setting a memory barrier instruction and inserting keys of the key value pair to be inserted into a bloom filter located in the volatile filter layer. Returning to successful insertion.
In one example, FIG. 7 is a flow diagram of a lookup of hash tables provided by an embodiment of the present invention; the method comprises the following steps:
(1) Searching whether the bloom filter contains a key to be searched or not, and if not, executing the step (2); if present, performing step (3);
(2) The key value pair to be searched does not exist, and the blank is directly returned;
(3) Calculating a hash value and a head bucket number according to a given key, finding a bucket pointer corresponding to chain hash in a persistent data layer, and locking by adopting a shared read lock;
(4) It is determined whether the bucket pointer is empty. If the air is available, the loop ends to execute the step (2); otherwise, executing the step (5);
(5) Sequentially traversing a plurality of slots in the barrel, and executing the step (6) if the corresponding mark is found to be 1 and the key value of the key value to the slot is equal to the key to be searched; otherwise, executing the step (7);
(6) There are key-value pairs to be found, and the corresponding values are returned.
(7) The updated bucket pointer points to the next bucket pointer. And (4) executing the step (4).
In one example, fig. 8 is a flowchart of hash table deletion provided in an embodiment of the present invention; the method comprises the following steps:
(1) Searching whether the bloom filter contains a key to be searched or not, and if not, executing the step (2); if present, performing step (3);
(2) The key value pair to be deleted does not exist, and deletion failure is directly returned;
(3) Calculating a hash value and a head bucket number according to a given key, finding a bucket pointer corresponding to chain hash in a persistent data layer, and locking by adopting an intelligent mutual exclusion lock;
(4) It is determined whether the bucket pointer is empty. If the air is available, the loop ends to execute the step (2); otherwise, executing the step (5);
(5) Sequentially traversing a plurality of slots in the barrel, and executing the step (6) if the corresponding mark is found to be 1 and the key value of the key value to the slot is equal to the key to be searched; otherwise, executing the step (7);
(6) And if the key value pair to be deleted exists, setting the mark of the corresponding key value pair slot to 0, persisting the key value pair to a nonvolatile memory through a cache line refreshing instruction, and returning that the deletion is successful.
(7) The updated bucket pointer points to the next bucket pointer. And (4) executing the step (4).
Fig. 9 is a schematic diagram of a hash table processing apparatus according to an embodiment of the present invention, as shown in fig. 9, including:
a bloom filter setting unit 910 for setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
the hash table negative query unit 920 is configured to, when a value in a certain key value pair in the hash table needs to be searched or deleted, search a corresponding key in the bloom filter, and if the corresponding key does not exist, return a search failure or deletion failure indication to reduce access overhead to the persistent data layer when the key value pair does not exist in the hash table, thereby improving the negative query performance of the hash table.
A hash table inserting unit 930, configured to, when a certain key value pair needs to be inserted into the hash table, first determine whether a key value pair to be inserted already exists in the hash table, and if so, return an insertion failure indication; if the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number; searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish the key value pair insertion; the merging refreshing mode is used for reducing the refreshing times of cache lines and reducing the insertion delay cost of the hash table.
A hash table inserting unit 930, configured to find whether there is an idle key value pair slot in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign a temporary bucket pointer, set the first slot of the new chain bucket as a position to be inserted with a key value pair, insert the key value pair to be inserted, and persist the inserted key value pair to the nonvolatile memory by using a merging and refreshing method; and then assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persisting the next bucket pointer of the head bucket into the nonvolatile memory by using a cache line refreshing instruction.
The header length setting unit 940 is configured to dynamically set an initial header length of the hash table according to a data load amount of the persistent data layer, where the initial header length is relatively longer and an average bucket chain length corresponding to each header is relatively shorter when the data load amount is relatively larger, so as to improve a positive query performance of the hash table.
It should be understood that, the foregoing apparatus is used to perform the method in the foregoing embodiment, and corresponding program modules in the apparatus implement principles and technical effects similar to those described in the foregoing method, and reference may be made to corresponding processes in the foregoing method for the working process of the apparatus, which are not repeated herein.
Based on the method in the above embodiment, the embodiment of the invention provides an electronic device. The apparatus may include: at least one memory for storing programs and at least one processor for executing the programs stored by the memory. Wherein the processor is adapted to perform the method described in the above embodiments when the program stored in the memory is executed.
Based on the method in the above embodiment, the embodiment of the present invention provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.
Based on the method in the above embodiments, an embodiment of the present invention provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.
It is to be appreciated that the processor in embodiments of the invention may be a central processing unit (centralprocessing unit, CPU), other general purpose processor, digital signal processor (digital signalprocessor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The method steps in the embodiments of the present invention may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable PROM (EPROM), electrically erasable programmable EPROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present invention are merely for ease of description and are not intended to limit the scope of the embodiments of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method of processing a hash table, the hash table comprising a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the method comprising the steps of:
setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so as to reduce the access cost to a persistent data layer when the key value pair does not exist in the hash table, and improve the negative query performance of the hash table.
2. The method of claim 1, further comprising the step of:
when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, and if the corresponding key exists, searching or deleting the corresponding key in the hash table according to the corresponding key value pair.
3. The method of claim 1, further comprising the step of:
when a certain key value pair needs to be inserted into the hash table, judging whether the key value pair to be inserted exists in the hash table, and if so, returning an insertion failure indication;
if the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number;
searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish the key value pair insertion; the merging refreshing mode is used for reducing the refreshing times of cache lines and reducing the insertion delay cost of the hash table.
4. A method according to claim 3, characterized in that whether there is a free key value pair slot is searched in the locked hash bucket, if not, a new chain bucket is allocated from the nonvolatile memory space by head insertion and assigned to the temporary bucket pointer, the first slot of the new chain bucket is set as the position of the key value pair to be inserted, the key value pair to be inserted is inserted, and the inserted key value pair is persisted to the nonvolatile memory by adopting a merging refreshing mode; and then assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persisting the next bucket pointer of the head bucket into the nonvolatile memory by using a cache line refreshing instruction.
5. The method according to any one of claims 1 to 4, further comprising the step of:
and dynamically setting the initial head barrel length of the hash table according to the data load of the persistent data layer, wherein when the data load is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
6. A processing apparatus for a hash table, the hash table comprising a persistent data layer including a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the apparatus comprising:
a bloom filter setting unit for setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
and the hash table negative query unit is used for searching corresponding keys in the bloom filter when the value in a certain key value pair in the hash table is required to be searched or deleted, returning a search failure or deletion failure indication if the corresponding key value pair does not exist in the hash table, reducing the access cost to the persistent data layer when the key value pair does not exist in the hash table, and improving the negative query performance of the hash table.
7. The apparatus as recited in claim 6, further comprising:
the hash table inserting unit is used for firstly judging whether a key value pair to be inserted exists in the hash table or not when a certain key value pair needs to be inserted in the hash table, and returning an insertion failure indication if the key value pair to be inserted exists in the hash table; if the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number; searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish the key value pair insertion; the merging refreshing mode is used for reducing the refreshing times of cache lines and reducing the insertion delay cost of the hash table.
8. The apparatus of claim 7, wherein the hash table insertion unit is configured to search whether there is an empty key value pair slot in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign the new chain bucket to the temporary bucket pointer, set a first slot of the new chain bucket as a location where a key value pair is to be inserted, insert the key value pair to be inserted, and persist the inserted key value pair to the nonvolatile memory by using a merge refresh method; and then assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persisting the next bucket pointer of the head bucket into the nonvolatile memory by using a cache line refreshing instruction.
9. The apparatus according to any one of claims 6 to 8, further comprising:
the head barrel length setting unit is used for dynamically setting the initial head barrel length of the hash table according to the data load capacity of the persistent data layer, when the data load capacity is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
10. An electronic device, comprising:
at least one memory for storing a program;
at least one processor for executing the memory-stored program, which processor is adapted to perform the method according to any of claims 1-5, when the memory-stored program is executed.
CN202310630033.3A 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment Pending CN116719813A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310630033.3A CN116719813A (en) 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310630033.3A CN116719813A (en) 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116719813A true CN116719813A (en) 2023-09-08

Family

ID=87867130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310630033.3A Pending CN116719813A (en) 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116719813A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539408A (en) * 2024-01-09 2024-02-09 华中科技大学 Integrated index system for memory and calculation and key value pair memory system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539408A (en) * 2024-01-09 2024-02-09 华中科技大学 Integrated index system for memory and calculation and key value pair memory system
CN117539408B (en) * 2024-01-09 2024-03-12 华中科技大学 Integrated index system for memory and calculation and key value pair memory system

Similar Documents

Publication Publication Date Title
US10706101B2 (en) Bucketized hash tables with remap entries
US8788543B2 (en) Scalable, concurrent resizing of hash tables
JP6764359B2 (en) Deduplication DRAM memory module and its memory deduplication method
JP6356675B2 (en) Aggregation / grouping operation: Hardware implementation of hash table method
KR101599177B1 (en) Data migration for composite non-volatile storage device
US7805427B1 (en) Integrated search engine devices that support multi-way search trees having multi-column nodes
US9495398B2 (en) Index for hybrid database
KR20170112952A (en) Optimized hopscotch multiple hash tables for efficient memory in-line deduplication application
CN110888886B (en) Index structure, construction method, key value storage system and request processing method
WO2016187974A1 (en) Storage space management method and apparatus
US8086641B1 (en) Integrated search engine devices that utilize SPM-linked bit maps to reduce handle memory duplication and methods of operating same
JP6893805B2 (en) Memory module duplicate memory removal method and DRAM memory module for that purpose
CN110377531B (en) Persistent memory storage engine device based on log structure and control method
US10565190B2 (en) Index tree search method and computer
WO2022095685A1 (en) Persistent memory key value system and operation method therefor
US10936500B1 (en) Conditional cache persistence in database systems
CN116719813A (en) Hash table processing method and device and electronic equipment
CN109165321B (en) Consistent hash table construction method and system based on nonvolatile memory
US7987205B1 (en) Integrated search engine devices having pipelined node maintenance sub-engines therein that support database flush operations
US10013442B2 (en) Database value identifier hash map
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
CN113138859A (en) General data storage method based on shared memory pool
WO2016187975A1 (en) Internal memory defragmentation method and apparatus
WO2023029982A1 (en) Method and system for memory allocation
CN114207602A (en) Reducing requests using probabilistic data structures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination