CN114579558A - Method, device and system for managing hash table and computer storage medium - Google Patents

Method, device and system for managing hash table and computer storage medium Download PDF

Info

Publication number
CN114579558A
CN114579558A CN202011362317.1A CN202011362317A CN114579558A CN 114579558 A CN114579558 A CN 114579558A CN 202011362317 A CN202011362317 A CN 202011362317A CN 114579558 A CN114579558 A CN 114579558A
Authority
CN
China
Prior art keywords
hash
bucket
buckets
value pair
hash bucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011362317.1A
Other languages
Chinese (zh)
Inventor
左鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202011362317.1A priority Critical patent/CN114579558A/en
Publication of CN114579558A publication Critical patent/CN114579558A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The application discloses a method, a device and a system for managing a hash table and a computer storage medium, and belongs to the technical field of data storage. The first device, in response to receiving a write request requesting to write the first key value pair, addresses at least two first primary hash buckets corresponding to keys in the first key value pair using at least two hash functions. The first device determines a target empty slot in the at least two first primary hash buckets and their backup hash buckets to write the first key-value pair. The first device then writes the first key-value pair to the target empty slot. Wherein the hash table includes a primary hash bucket that is addressable by the hash function and a backup hash bucket that is not addressable by the hash function. According to the method and the device, only the primary hash bucket corresponding to the keyword needs to be addressed, then the key value pair is inserted into the empty slot of the primary hash bucket or the spare hash bucket, the key value pair does not need to be moved, and the storage performance of the device is improved.

Description

Hash table management method, device and system and computer storage medium
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a method, an apparatus, and a system for managing a hash table, and a computer storage medium.
Background
The hash table is a structure that stores data in the form of key-value pairs (key-value pairs). The records are accessed by mapping the key value to one position in the hash table through the hash function, so that the records can be quickly searched.
Currently, servers often store data using a hop house hash table (hopscotch hashing). The hop-house hash table includes a plurality of hash buckets in series. When a server inserts a new key value pair into the hop-house hash table, the server calculates the hash bucket position corresponding to the key word in the key value pair by using a hash function, and then traverses L hash buckets immediately behind the hash bucket position to find an empty bucket, wherein L is a positive integer. If there are empty buckets in the L hash buckets, the new key-value pair is inserted into the empty bucket. If there is no empty bucket in the L hash buckets, the detection continues to be performed backward until an empty bucket is found, and then the key-value pair before the empty bucket is iteratively moved into the empty bucket until an empty bucket appears in the L hash buckets, and then the new key-value pair is inserted into the empty hash bucket in the L hash buckets.
Since performing an insert operation in the hop-house hash table may result in shifting of key-value pairs, in the worst case, the shift range may even involve the entire hash table, which may result in poor storage performance of the server.
Disclosure of Invention
The application provides a management method, a device and a system of a hash table and a computer storage medium, which can solve the problem that the storage performance of the hash table in the existing server is poor.
In a first aspect, a method for managing a hash table is provided. The hash table comprises a main hash bucket and a standby hash bucket, wherein the main hash bucket can be addressed by a hash function, the standby hash bucket cannot be addressed by the hash function, and the standby hash bucket is located between the two main hash buckets and is shared by the two main hash buckets. The method comprises the following steps: the first device, in response to receiving a write request requesting to write the first key value pair, addresses at least two first primary hash buckets corresponding to keys in the first key value pair using at least two hash functions. The first device determines a target empty slot to write the first key-value pair in at least two first primary hash buckets and a spare hash bucket of the at least two first primary hash buckets. The first device then writes the first key-value pair to the target empty slot.
In the application, when the device needs to perform the insertion operation on the hash table, only the main hash bucket corresponding to the keyword needs to be addressed, then the key value pair is inserted into the empty slot of the main hash bucket or the spare hash bucket thereof, the key value pair does not need to be moved, and the storage performance of the device is improved. The load balance of the hash table can be realized by adopting a plurality of hash functions to address a plurality of main hash buckets corresponding to the keywords and then storing key value pairs in one of the main hash buckets or the standby hash bucket of one of the main hash buckets.
Optionally, the implementation process of the first device determining a target empty slot for writing the first key-value pair in the at least two first primary hash buckets and the backup hash buckets of the at least two first primary hash buckets includes: the first device firstly determines a target hash bucket in at least two first main hash buckets and standby hash buckets of the at least two first main hash buckets, wherein the target hash bucket comprises one or more empty slots; the first device then determines the target empty slot from the unlocked empty slot in the target hash bucket.
Optionally, the implementation process of the first device to determine the target hash bucket in the at least two first primary hash buckets and the standby hash buckets of the at least two first primary hash buckets includes: in response to a presence of a hash bucket, including an empty slot, of the at least two first master hash buckets, the first device determines a target hash bucket among the at least two first master hash buckets. Alternatively, in response to an absence of a hash bucket of the at least two first primary hash buckets comprising an empty slot, the first device determines a target hash bucket among the spare hash buckets of the at least two first primary hash buckets.
Optionally, the implementation process of the first device to determine the target hash bucket in the at least two first master hash buckets includes: and the first device determines the hash bucket with the largest number of empty slots in at least two first main hash buckets as a target hash bucket.
In the application, the first device determines the main hash bucket with the largest number of hollow slots in the plurality of main hash buckets corresponding to the keywords as the target hash bucket, and subsequently writes the key value pair in the hollow slot of the target hash bucket, so that the load balance of the hash table can be realized, and the storage performance of the hash table is better.
Optionally, the determining, by the first device, an implementation procedure of the target hash bucket in the standby hash buckets of the at least two first primary hash buckets includes: and the first equipment determines the hash bucket with the largest number of empty slots in the standby hash buckets of the at least two first main hash buckets as a target hash bucket.
In the application, the first device determines the standby hash bucket with the largest number of empty slots as the target hash bucket in the standby hash buckets of the main hash buckets corresponding to the keywords, and subsequently writes the key value pairs into the empty slots of the target hash bucket, so that load balance of the hash table can be realized, and the storage performance of the hash table is better.
Optionally, the hash table is stored in the second device. In response to receiving a write request requesting to write a first key value pair, a first device addresses, with at least two hash functions, implementation procedures of at least two first primary hash buckets corresponding to keys in the first key value pair, including: the first device responds to a received write request for requesting to write the first key value pair, and at least two first hash positions corresponding to the keywords in the first key value pair are calculated by adopting at least two hash functions. The method includes the steps that first equipment sends first read requests of at least two unilateral Remote Direct Memory Access (RDMA) to second equipment, and each first read request is used for reading a main hash bucket of a first hash position and a standby hash bucket of the main hash bucket. Accordingly, the implementation process of the first device writing the first key-value pair into the target empty slot includes: the first device sends a unilateral RDMA compare and swap (CAS) request to the second device, the CAS request to lock the target empty slot. The first device sends a single-sided RDMA write request to the second device in response to a successful lock on the target empty slot, the write request for writing the first key-value pair in the target empty slot.
In the application, when the first device needs to perform the insertion operation on the hash table in the second device, the hash positions corresponding to the keywords are calculated at first, each hash position is provided with one main hash bucket, then the main hash bucket and the standby hash bucket corresponding to the keywords are read locally in a unilateral RDMA mode, key value pairs are written into the empty slots of the main hash bucket or the standby hash buckets locally, and then the key value pairs are written back to the second device in a unilateral RDMA mode, so that the insertion operation of the first device on the hash table in the second device in the unilateral RDMA mode is realized. Because the first device pointedly reads the main hash bucket corresponding to the keyword and the standby hash bucket thereof, the amount of data which needs to be read locally by the first device is small, transmission resources which need to be occupied between the first device and the second device are small, and the execution efficiency of the inserting operation is high.
Optionally, in response to receiving a read request requesting to read the second key-value pair, the first device addresses at least two second primary hash buckets corresponding to the keys in the second key-value pair using at least two hash functions. The first device queries the second key-value pairs in at least two second primary hash buckets and a spare hash bucket of the at least two second primary hash buckets.
In the application, when the device needs to perform query operation on the hash table, only the main hash bucket corresponding to the keyword needs to be addressed, and then key value pairs are queried in the main hash bucket and the standby hash buckets thereof, so that the query efficiency is high.
Optionally, the hash table is stored in the second device. In response to receiving a read request for requesting to read the second key value pair, the first device adopts at least two hash functions to address at least two second primary hash buckets corresponding to the keywords in the second key value pair, and the implementation process includes: and the first device adopts at least two hash functions to respectively calculate at least two second hash positions corresponding to the keywords in the second key value pair in response to receiving a read request for requesting to read the second key value pair. The first device sends at least two unilateral RDMA second read requests to the second device, wherein each second read request is used for reading a main hash bucket of a second hash position and a standby hash bucket of the main hash bucket.
In the application, when the first device needs to perform query operation on the hash table in the second device, the hash positions corresponding to the keywords are calculated, each hash position is provided with one main hash bucket, then the main hash bucket and the standby hash buckets corresponding to the keywords are read locally in a single-edge RDMA mode, and then key value pairs are queried in the main hash bucket and the standby hash buckets corresponding to the keywords, so that the query operation of the first device on the hash table in the second device in the single-edge RDMA mode is realized. Because the first device pointedly reads the main hash bucket and the standby hash bucket corresponding to the keyword, the amount of data which needs to be read locally by the first device is small, transmission resources which need to be occupied between the first device and the second device are small, the amount of data which needs to be inquired by the first device is small, and the execution efficiency of the inquiry operation is high.
Optionally, the main hash buckets and the standby hash buckets in the hash table are arranged alternately, or the hash table includes a plurality of hash bucket groups arranged in sequence, and each hash bucket group includes two main hash buckets and a standby hash bucket located between the two main hash buckets.
Wherein, the standby hash bucket adjacent to the main hash bucket is the standby hash bucket of the main hash bucket.
In a second aspect, a method for managing a hash table is provided. The hash table comprises a main hash bucket and a standby hash bucket, wherein the main hash bucket can be addressed by a hash function, the standby hash bucket cannot be addressed by the hash function, and the standby hash bucket is located between the two main hash buckets and is shared by the two main hash buckets. The method comprises the following steps: the first device, in response to receiving a read request requesting to read the first key-value pair, addresses at least two first primary hash buckets corresponding to keys in the first key-value pair using at least two hash functions. The first device queries the at least two first primary hash buckets and the spare hash buckets of the at least two first primary hash buckets for the first key-value pair.
Optionally, the hash table is stored in the second device. In response to receiving a read request requesting to read a first key value pair, a first device addresses, with at least two hash functions, at least two first primary hash buckets corresponding to keys in the first key value pair, including: the first device, in response to receiving a read request for requesting to read the first key value pair, calculates at least two first hash positions corresponding to the keywords in the first key value pair by using at least two hash functions. The first device sends at least two unilateral RDMA first read requests to the second device, wherein each first read request is used for reading a main hash bucket of a first hash position and a standby hash bucket of the main hash bucket.
Optionally, in response to receiving a write request requesting to write the second key value pair, the first device addresses at least two second primary hash buckets corresponding to the keys in the second key value pair using at least two hash functions. The first device determines a target empty slot to write the second key-value pair in the at least two second primary hash buckets and the spare hash buckets of the at least two second primary hash buckets. The first device then writes the second key-value pair to the target empty slot.
Optionally, the implementation process of the first device determining a target empty slot for writing the second key-value pair in the at least two second primary hash buckets and the backup hash buckets of the at least two second primary hash buckets includes: the first device firstly determines a target hash bucket in at least two second main hash buckets and standby hash buckets of the at least two second main hash buckets, wherein the target hash bucket comprises one or more empty slots; the first device then determines the target empty slot from the unlocked empty slot in the target hash bucket.
Optionally, the implementation process of the first device determining the target hash bucket in the at least two second primary hash buckets and the standby hash buckets of the at least two second primary hash buckets includes: in response to a presence of a hash bucket, including an empty slot, in the at least two second master hash buckets, the first device determines a target hash bucket among the at least two second master hash buckets. Alternatively, in response to an absence of a hash bucket of the at least two second primary hash buckets comprising an empty slot, the first device determines a target hash bucket in a spare hash bucket of the at least two second primary hash buckets.
Optionally, the implementation process of the first device to determine the target hash bucket in the at least two second master hash buckets includes: and the first device determines the hash bucket with the largest number of empty slots in at least two second main hash buckets as the target hash bucket.
Optionally, the determining, by the first device, an implementation procedure of the target hash bucket in the standby hash buckets of the at least two second primary hash buckets includes: and the first equipment determines the hash bucket with the largest number of empty slots in the standby hash buckets of the at least two second main hash buckets as the target hash bucket.
Optionally, the hash table is stored in the second device. In response to receiving a write request for requesting writing of a second key value pair, a first device addresses, with at least two hash functions, implementation procedures of at least two second primary hash buckets corresponding to keys in the second key value pair, including: and the first device adopts at least two hash functions to respectively calculate at least two second hash positions corresponding to the keywords in the second key value pair in response to receiving a write request for requesting to write the second key value pair. The first device sends at least two unilateral RDMA second read requests to the second device, wherein each second read request is used for reading a main hash bucket of a second hash position and a standby hash bucket of the main hash bucket.
Accordingly, the implementation process of the first device writing the second key-value pair into the target empty slot includes: the first device sends a single-sided RDMA CAS request to the second device, the CAS request for locking the target empty slot. And the first device responds to the successful locking of the target empty slot, and sends a write request of unilateral RDMA to the second device, wherein the write request is used for writing the second key-value pair in the target empty slot.
Optionally, the main hash buckets and the standby hash buckets in the hash table are arranged alternately, or the hash table includes a plurality of hash bucket groups arranged in sequence, and each hash bucket group includes two main hash buckets and a standby hash bucket located between the two main hash buckets.
In a third aspect, an apparatus for managing a hash table is provided. The apparatus comprises a plurality of functional modules that interact to implement the method of the first aspect and its embodiments described above. The functional modules can be implemented based on software, hardware or a combination of software and hardware, and the functional modules can be combined or divided arbitrarily based on specific implementation.
In a fourth aspect, an apparatus for managing a hash table is provided. The apparatus comprises a plurality of functional modules that interact to implement the method of the second aspect and its embodiments described above. The functional modules can be implemented based on software, hardware or a combination of software and hardware, and the functional modules can be combined or divided arbitrarily based on specific implementation.
In a fifth aspect, a hash table management system is provided, including: the hash table comprises a main hash bucket and a standby hash bucket, the main hash bucket can be addressed by a hash function, the standby hash bucket cannot be addressed by the hash function, and the standby hash bucket is located between the two main hash buckets and shared by the two main hash buckets.
The first device is used for responding to a received write request for requesting to write the first key value pair, and adopting at least two hash functions to respectively calculate at least two first hash positions corresponding to the keywords in the first key value pair. The first device is configured to send at least two unilateral RDMA first read requests to the second device, where each first read request is configured to read a primary hash bucket of a first hash location and a backup hash bucket of the primary hash bucket. The first device is used for determining a target empty slot written with the first key-value pair in the read hash bucket. The first device is used for sending a single-side RDMA atomic operation CAS request to the second device, and the CAS request is used for locking a target empty slot. The first device is used for responding to successful locking of the target empty slot, and sending a write request of unilateral RDMA to the second device, wherein the write request is used for writing the first key-value pair in the target empty slot. And/or the first device is configured to, in response to receiving a read request for requesting to read the second key value pair, adopt at least two hash functions to calculate at least two second hash positions corresponding to the keywords in the second key value pair, respectively. The first device is further configured to send at least two one-sided RDMA second read requests to the second device, each second read request for reading a primary hash bucket of the second hash location and a backup hash bucket of the primary hash bucket. The first device is further configured to query the read hash bucket for the second key-value pair.
In a sixth aspect, there is provided an apparatus comprising: a processor and a memory;
the memory for storing a computer program, the computer program comprising program instructions;
the processor is configured to invoke the computer program to implement the method in the first aspect and the embodiments thereof or to implement the method in the second aspect and the embodiments thereof.
In a seventh aspect, a computer storage medium is provided, which has instructions stored thereon, which when executed by a processor, implement the method of the first aspect and its embodiments or implement the method of the second aspect and its embodiments.
In an eighth aspect, a chip is provided, which comprises programmable logic circuits and/or program instructions, and when the chip is run, implements the method of the first aspect and its embodiments or implements the method of the second aspect and its embodiments.
The beneficial effect that technical scheme that this application provided brought includes at least:
the application provides new hash table structure, and the hash table includes main hash bucket and the reserve hash bucket adjacent with main hash bucket, when equipment need insert the operation to hash table execution, only needs to address the main hash bucket that the keyword corresponds, then with the key value to in the dead slot of inserting main hash bucket or its reserve hash bucket, need not to remove the key value pair, has improved the storage performance of equipment. The load balance of the hash table can be realized by adopting a plurality of hash functions to address a plurality of main hash buckets corresponding to the keywords and then storing key value pairs in one of the main hash buckets or the standby hash bucket of one of the main hash buckets. When the device needs to perform query operation on the hash table, the device only needs to address the main hash bucket corresponding to the keyword, and then queries the key value pair in the main hash bucket and the standby hash bucket, so that the query efficiency is high. In addition, the insertion operation and/or query operation of the hash table in another device by one device in a unilateral RDMA mode are realized, transmission resources occupied between the two devices in the process are less, and the operation execution efficiency is higher.
Drawings
Fig. 1 is a schematic structural diagram of a hash table according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of another hash table provided in an embodiment of the present application;
fig. 3 is a flowchart illustrating a method for managing a hash table according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a hash table management system according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating an operation of inserting an RDMA-oriented hash table according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a query operation of an RDMA-oriented hash table according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a management apparatus for a hash table according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another hash table management apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a hash table management apparatus according to another embodiment of the present application;
fig. 10 is a schematic structural diagram of a hash table management apparatus according to another embodiment of the present application;
fig. 11 is a block diagram of an apparatus provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the application provides a storage structure of a hash table. The hash table includes a primary hash bucket and a backup hash bucket (collectively referred to as hash buckets). Wherein the primary hash bucket is addressable by the hash function and the backup hash bucket is not addressable by the hash function. The backup hash bucket is located between and shared by two primary hash buckets. In this embodiment of the application, the main hash bucket may use its adjacent standby hash bucket as its own standby hash bucket.
Optionally, the hash bucket in the hash table provided in the embodiment of the present application includes a plurality of slots, and the number of the slots in the main hash bucket may be the same as or different from the number of the slots in the standby hash bucket. Each slot in the hash bucket is used to store a key-value pair, which includes a key and a value. Each slot in the hash bucket may also be provided with a marker bit as a lock. For example, the flag bit for a slot in the hash bucket is set to 1, indicating that the slot is locked; the flag bit for the slot in the hash bucket is set to 0, indicating that the slot is not locked. If a slot in a hash bucket is locked, indicating that the slot is locked, the slot can no longer be operated on by devices other than the device that locked it.
In one possible implementation, the primary hash buckets and the backup hash buckets in the hash table are arranged alternately. Optionally, fig. 1 is a schematic structural diagram of a hash table provided in an embodiment of the present application. As shown in fig. 1, the hash table includes N hash buckets, denoted as hash bucket 1 to hash bucket N, where N is an odd number greater than 1. The odd hash buckets are main hash buckets, namely the hash bucket 1, the hash bucket 3, the hash buckets 5 and … and the hash bucket N are all main hash buckets; the even numbered hash buckets are standby hash buckets, i.e., hash bucket 2, hash bucket 4, hash bucket 6, hash bucket …, and hash bucket N-1 are all standby hash buckets.
In the hash table shown in FIG. 1, hash bucket 1 and hash bucket 3 share hash bucket 2, hash bucket 3 and hash bucket 5 share hash bucket 4, …, and hash bucket N-2 and hash bucket N share hash bucket N-1. That is, the standby hash buckets for hash bucket 1 include hash bucket 2, the standby hash buckets for hash bucket 3 include hash bucket 2 and hash bucket 4, …, and the standby hash bucket for hash bucket N includes hash bucket N-1, i.e., the other primary hash buckets except hash bucket 1 and hash bucket N each have two standby hash buckets.
In another possible implementation manner, the hash table includes a plurality of hash bucket groups arranged in sequence, and each hash bucket group includes two main hash buckets and a backup hash bucket located between the two main hash buckets. Optionally, fig. 2 is a schematic structural diagram of another hash table provided in the embodiment of the present application. As shown in fig. 2, the hash table includes M hash buckets, which are denoted as hash bucket 1 to hash bucket M, where M is an integer multiple of 3. The hash bucket 1, the hash bucket 2 and the hash bucket 3 form a hash bucket group, the hash bucket 1 and the hash bucket 3 are main hash buckets, and the hash bucket 2 is a standby hash bucket shared by the hash bucket 1 and the hash bucket 3; the hash bucket 4, the hash bucket 5 and the hash bucket 6 form a hash bucket group, the hash bucket 4 and the hash bucket 6 are main hash buckets, and the hash bucket 5 is a standby hash bucket shared by the hash bucket 4 and the hash bucket 6; …, respectively; hash bucket M-2, hash bucket M-1, and hash bucket M form a hash bucket group, hash bucket M-2 and hash bucket M are primary hash buckets, and hash bucket M-1 is a backup hash bucket shared by hash bucket M-2 and hash bucket M.
In the hash buckets shown in fig. 2, the spare hash buckets for hash bucket 1 and hash bucket 3 are hash bucket 2, the spare hash buckets for hash bucket 4 and hash bucket 6 are hash buckets 5, …, and the spare hash buckets for hash bucket M-2 and hash bucket M are hash bucket M-1, i.e., each primary hash bucket has one spare hash bucket.
Based on the hash table provided by the embodiment of the application, the embodiment of the application also provides a management method for the hash table. In the method for managing the hash table, at least two hash functions are used for addressing at least two main hash buckets corresponding to a keyword, that is, at least two hash functions are used for respectively hashing the keyword into at least two main hash buckets. One hash function is used for hashing the key into one main hash bucket, and different hash functions are used for hashing the same key into different hash buckets.
The management method of the hash table provided by the embodiment of the application mainly comprises an insertion operation flow and a query operation flow of the hash table. Optionally, fig. 3 is a schematic flowchart of a method for managing a hash table according to an embodiment of the present application. Steps 301 to 303 show an insertion operation flow of the hash table; steps 304 to 305 show the flow of the hash table lookup operation. As shown in fig. 3, the method includes:
step 301, in response to receiving a write request for requesting to write a first key value pair, the first device addresses at least two first primary hash buckets corresponding to the keys in the first key value pair by using at least two hash functions.
The write request includes a first key-value pair that includes a key and a value. The at least two hash functions include two or more hash functions, each hash function for addressing a master hash bucket corresponding to the key.
Optionally, the addressing ranges of the at least two hash functions provided in the embodiment of the present application in the hash table are different. For example, the at least two hash functions include two hash functions, one hash function addressing a first half of the hash table and another hash function addressing a second half of the hash table. Also for example, the at least two hash functions include three hash functions, one hash function addressing the first 1/3 portion of the hash table, another hash function addressing the middle 1/3 portion of the hash table, and a hash function addressing the last 1/3 portion of the hash table. The number and the design mode of the hash functions used for addressing in the hash table are not limited in the embodiment of the application.
The embodiment of the present application takes an example in which two hash functions are used to address two main hash buckets corresponding to a keyword in a hash table respectively. Optionally, the at least two hash functions comprise a first hash function and a second hash function. After receiving a write request containing a first key value pair, the first device calculates a hash value 1 corresponding to a keyword in the first key value pair by using a first hash function, and then addresses a first main hash bucket in a hash table according to the hash value 1; and calculating a hash value 2 corresponding to the key in the first key value pair by adopting a second hash function, and then addressing another first primary hash bucket in the hash table according to the hash value 2. Wherein one hash value corresponds to one hash position in the hash table. Optionally, the first hash function is used to address the first half of the hash table and the second hash function is used to address the second half of the hash table.
Step 302, the first device determines a target empty slot to write the first key-value pair in the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets.
Optionally, the backup hash buckets of the at least two first primary hash buckets include all backup hash buckets of respective first primary hash buckets of the at least two first primary hash buckets. For example, referring to the hash table as shown in fig. 1 or fig. 2, the at least two first primary hash buckets corresponding to the keys in the first key value pair include hash bucket 3 and hash bucket 7. In the hash table shown in fig. 1, the spare hash buckets of the at least two first primary hash buckets include hash bucket 2, hash bucket 4, hash bucket 6, and hash bucket 8, and step 302 is: the first device determines a target empty slot in hash bucket 2, hash bucket 3, hash bucket 4, hash bucket 6, hash bucket 7, and hash bucket 8 to write the first key-value pair. In the hash table shown in fig. 2, the spare hash buckets of the at least two first primary hash buckets include hash bucket 2 and hash bucket 8, and step 302 is: the first device determines a target empty slot to write the first key-value pair in hash bucket 2, hash bucket 3, hash bucket 7, and hash bucket 8.
In this embodiment of the present application, after obtaining at least two first main hash buckets and standby hash buckets thereof corresponding to a keyword, a first device first queries whether the keyword exists in the at least two first main hash buckets and standby hash buckets thereof, if so, returns an indication that a duplicate keyword exists, and if not, then executes step 302. Optionally, the implementation process of step 302 includes the following steps 3021 to 3022:
in step 3021, the first device determines a target hash bucket among the at least two first primary hash buckets and the spare hash buckets of the at least two first primary hash buckets, the target hash bucket including one or more empty slots.
In a first possible case, the implementation procedure of step 3021 includes: in response to a presence of a hash bucket, including an empty slot, of the at least two first master hash buckets, the first device determines a target hash bucket among the at least two first master hash buckets.
Optionally, the first device determines, as the target hash bucket, the hash bucket with the largest number of empty slots in the at least two first master hash buckets. For example, continuing with the example above, if the number of empty slots in hash bucket 3 is greater than the number of empty slots in hash bucket 7, then the first device determines hash bucket 3 as the target hash bucket; if the number of empty slots in the hash bucket 3 is less than the number of empty slots in the hash bucket 7, the first device determines the hash bucket 7 as a target hash bucket; if the number of empty slots in hash bucket 3 is equal to the number of empty slots in hash bucket 7, the first device determines hash bucket 3 or hash bucket 7 as the target hash bucket.
In the embodiment of the application, the first device determines the main hash bucket with the largest number of empty slots in the plurality of main hash buckets corresponding to the keyword as the target hash bucket, and subsequently writes the key value pair in the empty slot of the target hash bucket, so that load balance of the hash table can be realized, and the storage performance of the hash table is better.
Alternatively, in response to the existence of a hash bucket including an empty slot in the at least two first master hash buckets, the first device may also determine any hash bucket including an empty slot in the at least two first master hash buckets as the target hash bucket.
In a second possible case, the implementation procedure of step 3021 includes: in response to an absence of a hash bucket of the at least two first primary hash buckets comprising an empty slot, the first device determines a target hash bucket in a spare hash bucket of the at least two first primary hash buckets.
Optionally, the first device determines, as the target hash bucket, a hash bucket with the largest number of empty slots in the spare hash buckets of the at least two first primary hash buckets. For example, continuing with the above example, the number of empty slots of hash bucket 3 and hash bucket 7 are both 0, and assuming that the hash table is as shown in fig. 2, if the number of empty slots in hash bucket 2 is greater than the number of empty slots in hash bucket 8, the first device determines hash bucket 2 as the target hash bucket; if the number of empty slots in the hash bucket 2 is less than that of empty slots in the hash bucket 8, the first device determines the hash bucket 8 as a target hash bucket; if the number of empty slots in hash bucket 2 is equal to the number of empty slots in hash bucket 8, the first device determines hash bucket 2 or hash bucket 8 as the target hash bucket.
In the embodiment of the application, the first device determines the standby hash bucket with the largest number of empty slots as the target hash bucket in the standby hash buckets of the main hash buckets corresponding to the keywords, and subsequently writes the key value pairs in the empty slots of the target hash bucket, so that load balance of the hash table can be realized, and the storage performance of the hash table is better.
Alternatively, in response to that no hash bucket of the at least two first main hash buckets includes an empty slot, the first device may also determine, as the target hash bucket, any hash bucket of the standby hash buckets of the at least two first main hash buckets that includes an empty slot.
In a third possible case, the implementation procedure of step 3021 includes: in response to that the hash buckets including empty slots exist in the at least two first main hash buckets and the standby hash buckets of the at least two first main hash buckets, the first device determines any one of the at least two first main hash buckets and the standby hash buckets of the at least two first main hash buckets, which includes empty slots, as the target hash bucket.
Optionally, when no hash bucket in the at least two first main hash buckets includes an empty slot, and no hash bucket in the standby hash buckets of the at least two first main hash buckets includes an empty slot, that is, neither the at least two first main hash buckets nor the standby hash buckets thereof has an empty slot, the first device determines that the first key value pair cannot be written into the hash table. Further, the first device may return a prompt of insertion failure for the write request requesting to write the first key-value pair, and/or issue a capacity expansion prompt for the hash table.
In step 3022, the first device determines the target empty slot from the unlocked empty slot in the target hash bucket.
Optionally, after determining the target hash bucket, the first device searches for empty slots in the target hash bucket in sequence, and if the searched empty slot is locked, continues to search for a next empty slot until an empty slot that is not locked is searched and determined as the target empty slot. If all empty slots in the target hash bucket are locked, the first device determines that the first key value pair cannot be written into the hash table; or, the first device may also replace the target hash bucket, determine, as a new target hash bucket, one of the at least two first primary hash buckets and the backup hash buckets of the at least two first primary hash buckets, except for the target hash bucket, that includes an empty slot, and then determine, in the new target hash bucket, the target empty slot.
Optionally, in step 302, after addressing at least two first primary hash buckets corresponding to the keywords in the first key value pair and the standby hash buckets of the at least two first primary hash buckets, the first device may also directly search for empty slots in the at least two first primary hash buckets and the standby hash buckets thereof, so as to obtain the target empty slot. For example, the first device may sequentially search for empty slots in the at least two first primary hash buckets and the standby hash buckets thereof according to the arrangement order of the at least two first primary hash buckets and the standby hash buckets thereof in the hash table. The embodiment of the present application does not limit the manner of obtaining the target empty slot from the at least two first main hash buckets and the standby hash buckets thereof.
Step 303, the first device writes the first key-value pair into the target empty slot.
After determining the target slot, the first device locks, i.e., atomically locks, the target slot, for example, with the flag position of the target slot being 1. The first device then writes the first key-value pair to the target empty slot. After writing the first key value pair into the target empty slot, the first device unlocks the target empty slot, for example, the mark position of the target empty slot is set to 0.
The embodiment of the application provides a new hash table structure, and the hash table includes main hash bucket and the reserve hash bucket adjacent with main hash bucket, when equipment need carry out the insertion operation to the hash table, only need address the main hash bucket that the keyword corresponds, then with the key value to the dead slot of inserting main hash bucket or its reserve hash bucket, need not to remove the key value pair, improved the storage performance of equipment. The load balance of the hash table can be realized by adopting a plurality of hash functions to address a plurality of main hash buckets corresponding to the keywords and then storing key value pairs in one of the main hash buckets or the standby hash bucket of one of the main hash buckets.
In step 304, in response to receiving a read request requesting to read the second key value pair, the first device addresses at least two second primary hash buckets corresponding to the keys in the second key value pair using at least two hash functions.
The read request includes a key used to index the second key-value pair requested to be read by the read request, i.e., the key in the second key-value pair. The implementation manner of this step can refer to the related explanation of step 301, and details of the embodiment of the present application are not repeated herein.
Step 305, the first device queries the at least two second primary hash buckets and the spare hash buckets of the at least two second primary hash buckets for second key-value pairs.
Optionally, the first device first queries the at least two second master hash buckets for the second key-value pairs; and if the second key value pair is not inquired in the at least two second main hash buckets, inquiring the second key value pair in a standby hash bucket of the at least two second main hash buckets. And if the first device does not inquire the second key value pair in the at least two second main hash buckets and the standby hash bucket thereof, determining that the second key value pair is not stored in the hash table. Further, the first device may return a prompt for a failure to query or absence of the second key-value pair for the read request requesting to read the second key-value pair.
In the embodiment of the application, when the device needs to perform query operation on the hash table, only the main hash bucket corresponding to the keyword needs to be addressed, and then the key value pair is queried in the main hash bucket and the standby hash bucket thereof, so that the query efficiency is high.
The order of steps of the management method for the hash table provided by the embodiment of the application can be properly adjusted, and the steps can be correspondingly increased or decreased according to the situation. Step 301 to step 303 (the hash table inserting operation flow) and step 304 to step 305 (the hash table querying operation flow) do not have a step precedence relationship or a step association relationship, and the first device may only execute step 301 to step 303, or may only execute step 304 to step 305, or may execute both step 301 to step 303 and step 304 to step 305. Any method that can be easily modified by those skilled in the art within the technical scope of the present disclosure is also intended to be covered by the present disclosure.
In the above embodiment, the hash table is stored in the first device, and the first device performs the insertion operation and/or the query operation on the hash table, that is, the first device manages the local hash table, where the first device may be a server. The embodiment of the application can also realize the management of the remote hash table by the first device, that is, the hash table is stored in the second device, and the first device executes an inserting operation and/or a query operation on the hash table in the second device. In the embodiment of the application, the first device implements an insertion operation and/or a query operation on the hash table in the second device by using a unilateral RDMA technique. The method comprises the steps that the first device adopts a unilateral RDMA technology, and the insertion operation and/or the query operation of the first device to the hash table in the second device are completed without the assistance of a processor in the second device.
Optionally, fig. 4 is a schematic structural diagram of a management system of a hash table provided in an embodiment of the present application. As shown in fig. 4, the management system includes: a first device 101 and a second device 102, wherein the second device 102 stores a hash table. The first device 101 may be a server, and the second device 102 may also be a server. The first device 101 is connected with the second device 102 through an RDMA network, and the first device 101 performs a query operation and/or an insert operation on a hash table stored in the second device 102 through the RDMA network.
In an optional embodiment of the present application, the first device 101 is configured to perform an insertion operation on a hash table in the second device 102, and the specific implementation process includes: the first device 101 is configured to, in response to receiving a write request requesting to write a first key value pair, calculate at least two first hash positions corresponding to the keywords in the first key value pair by using at least two hash functions, respectively. The first device 101 is configured to send at least two unilateral RDMA first read requests to the second device 102, where each first read request is configured to read a primary hash bucket of a first hash location and a backup hash bucket of the primary hash bucket. The first device 101 is configured to determine a target empty slot in the read hash bucket to write the first key-value pair. The first device 101 is configured to send a unilateral RDMA CAS request to the second device 102, the CAS request being configured to lock a target empty slot. The first device 101 is configured to send a single-sided RDMA write request to the second device 102 in response to a successful lock on the target empty slot, the write request to write the first key-value pair in the target empty slot.
In the embodiment of the application, when the first device needs to perform an insertion operation on the hash table in the second device, the hash positions corresponding to the keywords are calculated first, each hash position is provided with one main hash bucket, then the main hash bucket and the standby hash bucket corresponding to the keywords are read locally in a unilateral RDMA manner, key value pairs are written into empty slots of the main hash bucket or the standby hash bucket locally, and then written back to the second device in a unilateral RDMA manner, so that the insertion operation of the first device on the hash table in the second device in a unilateral RDMA manner is realized. Because the first device pointedly reads the main hash bucket corresponding to the keyword and the standby hash bucket thereof, the amount of data which needs to be read locally by the first device is small, transmission resources which need to be occupied between the first device and the second device are small, and the execution efficiency of the inserting operation is high.
Optionally, fig. 5 is a schematic diagram of an operation flow of inserting an RDMA-oriented hash table according to an embodiment of the present application. As shown in fig. 5, the process includes:
step 501, in response to receiving a write request for requesting to write a first key value pair, the first device calculates at least two first hash positions corresponding to the keywords in the first key value pair by using at least two hash functions.
For the explanation of this step, reference may be made to the related content of step 301, and details of the embodiment of this application are not repeated herein.
Step 502, the first device sends at least two unilateral RDMA first read requests to the second device, where each first read request is used to read a primary hash bucket at a first hash position and a backup hash bucket of the primary hash bucket.
Optionally, the first device concurrently sends at least two RDMA-one-sided first read requests to the second device. Since the main hash bucket and the standby hash bucket of the main hash bucket are stored in adjacent positions, one main hash bucket and one standby hash bucket thereof can be read at a time by one read request.
Step 503, the first device determines a target empty slot written with the first key value pair in the read hash bucket.
In the embodiment of the present application, the master hash bucket read at the first hash position is referred to as a first master hash bucket. For the explanation of this step, reference may be made to the related content of the step 302, and the description of the embodiment of this application is not repeated herein.
Step 504, the first device sends a single-sided RDMA CAS request to the second device.
The CAS request is used to lock the target empty slot.
And 505, in response to the successful locking of the target empty slot, the first device sends a write request of single-sided RDMA to the second device, where the write request is used for writing the first key-value pair in the target empty slot.
The implementation process of step 505 includes: and the first device responds to the successful locking of the target empty slot, locally writes the first key-value pair into the target empty slot, and then writes the slot written with the first key-value pair back to the second device by using a write request of unilateral RDMA. The write request is also used to lock down the slot written back to the second device (i.e., the aforementioned target empty slot).
In another optional embodiment of the present application, the first device 101 is configured to perform a query operation on a hash table in the second device 102, and the specific implementation process includes: the first device 101 is configured to, in response to receiving a read request requesting to read the second key value pair, respectively calculate at least two second hash positions corresponding to the keywords in the second key value pair by using at least two hash functions. The first device 101 is configured to send at least two unilateral RDMA second read requests to the second device 102, where each second read request is configured to read a primary hash bucket of a second hash location and a backup hash bucket of the primary hash bucket. The first device 101 is configured to query the read hash bucket for the second key-value pair.
In the embodiment of the application, when the first device needs to perform query operation on the hash table in the second device, the hash positions corresponding to the keywords are calculated, each hash position is provided with one main hash bucket, then the main hash bucket and the standby hash bucket corresponding to the keywords are read locally in a single-sided RDMA manner, and then key value pairs are queried in the main hash bucket and the standby hash bucket corresponding to the keywords, so that query operation of the first device on the hash table in the second device in a single-sided RDMA manner is realized. Because the first device pointedly reads the main hash bucket and the standby hash bucket corresponding to the keyword, the amount of data which needs to be read locally by the first device is small, transmission resources which need to be occupied between the first device and the second device are small, the amount of data which needs to be inquired by the first device is small, and the execution efficiency of the inquiry operation is high.
Optionally, fig. 6 is a schematic diagram of a query operation flow of an RDMA-oriented hash table according to an embodiment of the present application. As shown in fig. 6, the process includes:
step 601, in response to receiving a read request for requesting to read the second key value pair, the first device calculates at least two second hash positions corresponding to the keywords in the second key value pair by using at least two hash functions.
For the explanation of this step, reference may be made to the related content of step 301, and details of the embodiment of this application are not repeated herein.
Step 602, the first device sends at least two unilateral RDMA second read requests to the second device, where each second read request is used to read a primary hash bucket of a second hash position and a backup hash bucket of the primary hash bucket.
Optionally, the first device concurrently sends at least two RDMA-one-sided second read requests to the second device. Since the main hash bucket and the standby hash bucket of the main hash bucket are stored in adjacent positions, one main hash bucket and one standby hash bucket thereof can be read at a time by one read request.
In the embodiment of the present application, the master hash bucket read at the second hash position is referred to as a second master hash bucket.
Step 603, the first device queries the read hash bucket for the second key-value pair.
Optionally, the first device queries the keyword in the second read request every time the first device reads one main hash bucket and one standby hash bucket thereof, and if the keyword is queried, obtains a value corresponding to the keyword, and does not process the subsequently received hash bucket. If the slot in which the key is located is locked, the first device resends the read request of unilateral RDMA to read the hash bucket in which the slot is located again to obtain the second key-value pair.
In summary, in the management method of the hash table provided in the embodiment of the present application, a new hash table structure is provided, where the hash table includes a primary hash bucket and a standby hash bucket adjacent to the primary hash bucket, when a device needs to perform an insertion operation on the hash table, only the primary hash bucket corresponding to a keyword needs to be addressed, and then a key value pair is inserted into an empty slot of the primary hash bucket or the standby hash bucket, and a key value pair does not need to be moved, so that the storage performance of the device is improved. The load balance of the hash table can be realized by adopting a plurality of hash functions to address a plurality of main hash buckets corresponding to the keywords and then storing key value pairs in one of the main hash buckets or the standby hash bucket of one of the main hash buckets. When the device needs to perform query operation on the hash table, the device only needs to address the main hash bucket corresponding to the keyword, and then queries the key value pair in the main hash bucket and the standby hash bucket, so that the query efficiency is high. In addition, the insertion operation and/or query operation of the hash table in another device by one device in a unilateral RDMA mode are realized, transmission resources occupied between the two devices in the process are less, and the operation execution efficiency is higher.
Fig. 7 is a schematic structural diagram of a management apparatus for a hash table according to an embodiment of the present application. The hash table comprises a main hash bucket and a standby hash bucket, wherein the main hash bucket can be addressed by a hash function, the standby hash bucket cannot be addressed by the hash function, and the standby hash bucket is located between the two main hash buckets and is shared by the two main hash buckets. As shown in fig. 7, the apparatus 70 includes:
a first addressing module 701, configured to, in response to receiving a write request requesting to write a first key-value pair, address, using at least two hash functions, at least two first primary hash buckets corresponding to keys in the first key-value pair.
A determining module 702 is configured to determine a target empty slot to write the first key-value pair in the at least two first primary hash buckets and the spare hash buckets of the at least two first primary hash buckets.
A writing module 703, configured to write the first key-value pair into the target empty slot.
Optionally, the determining module 702 is configured to: a target hash bucket is determined among at least two first primary hash buckets and a spare hash bucket of the at least two first primary hash buckets, the target hash bucket including one or more empty slots. And determining the target empty slot for the unlocked empty slot in the target hash bucket.
Optionally, the determining module 702 is configured to: in response to a presence of a hash bucket, including an empty slot, of the at least two first master hash buckets, a target hash bucket is determined among the at least two first master hash buckets. Alternatively, in response to no hash buckets of the at least two first primary hash buckets including empty slots, a target hash bucket is determined in a spare hash bucket of the at least two first primary hash buckets.
Optionally, the determining module 702 is configured to: and in response to the existence of the hash buckets comprising empty slots in the at least two first main hash buckets, determining the hash bucket with the largest number of empty slots in the at least two first main hash buckets as the target hash bucket.
Optionally, the determining module 702 is configured to: and determining the hash bucket with the largest number of empty slots in the standby hash buckets of the at least two first main hash buckets as a target hash bucket.
Optionally, the apparatus 70 is applied to a first device, a hash table is stored in a second device, and the first addressing module 701 is configured to: in response to receiving a write request for requesting to write a first key value pair, respectively calculating at least two first hash positions corresponding to the keywords in the first key value pair by adopting at least two hash functions; at least two unilateral RDMA first read requests are then sent to the second device, each first read request being used to read a primary hash bucket of the first hash location and a backup hash bucket of the primary hash bucket. Accordingly, a write module 703 for: sending a CAS request of unilateral RDMA to the second device, wherein the CAS request is used for locking a target empty slot; and in response to the successful locking of the target empty slot, sending a write request of one-sided RDMA to the second device, the write request for writing the first key-value pair in the target empty slot.
Optionally, as shown in fig. 8, the apparatus 70 further comprises:
a second addressing module 704, configured to address, in response to receiving a read request requesting to read the second key-value pair, at least two second primary hash buckets corresponding to the keys in the second key-value pair using at least two hash functions. A query module 705 configured to query the second key-value pairs in at least two second primary hash buckets and a backup hash bucket of the at least two second primary hash buckets.
Optionally, the apparatus 70 is applied to a first device, a hash table is stored in a second device, and the second addressing module 704 is configured to: in response to receiving a read request for requesting to read the second key value pair, at least two second hash positions corresponding to the keywords in the second key value pair are respectively calculated by adopting at least two hash functions; and sending at least two unilateral RDMA second read requests to the second device, wherein each second read request is used for reading a main hash bucket of a second hash position and a standby hash bucket of the main hash bucket.
Optionally, the main hash buckets and the standby hash buckets in the hash table are arranged alternately, or the hash table includes a plurality of hash bucket groups arranged in sequence, and each hash bucket group includes two main hash buckets and a standby hash bucket located between the two main hash buckets.
In summary, in the management apparatus for a hash table provided in the embodiment of the present application, when a device needs to perform an insertion operation on the hash table, it is only necessary to address a primary hash bucket corresponding to a keyword through the first addressing module, and then insert a key value pair into an empty slot of the primary hash bucket or a spare hash bucket thereof through the writing module, which does not need to move the key value pair, thereby improving the storage performance of the device. The load balance of the hash table can be realized by adopting a plurality of hash functions to address a plurality of main hash buckets corresponding to the keywords and then storing key value pairs in one of the main hash buckets or the standby hash bucket of one of the main hash buckets. When the device needs to perform query operation on the hash table, the device only needs to address the main hash bucket corresponding to the keyword through the second addressing module, and then queries the key value pair in the main hash bucket and the standby hash bucket through the query module, so that the query efficiency is high. In addition, the insertion operation and/or query operation of the hash table in another device by one device in a unilateral RDMA mode are realized, transmission resources occupied between the two devices in the process are less, and the operation execution efficiency is higher.
Fig. 9 is a schematic structural diagram of another hash table management apparatus according to an embodiment of the present application. The hash table includes a primary hash bucket that can be addressed by a hash function and a backup hash bucket that cannot be addressed by the hash function, the backup hash bucket being located between and shared by the two primary hash buckets. As shown in fig. 9, the apparatus 90 includes:
a first addressing module 901, configured to, in response to receiving a read request requesting to read a first key-value pair, address at least two first primary hash buckets corresponding to keys in the first key-value pair by using at least two hash functions.
A query module 902 is configured to query the at least two first primary hash buckets and the backup hash buckets of the at least two first primary hash buckets for the first key-value pair.
Optionally, as shown in fig. 10, the apparatus 90 further includes: a second addressing module 903, configured to, in response to receiving a write request requesting to write a second key value pair, address, using at least two hash functions, at least two second primary hash buckets corresponding to the keys in the second key value pair. A determining module 904 for determining a target empty slot to write the second key-value pair in the at least two second primary hash buckets and the spare hash buckets of the at least two second primary hash buckets. A writing module 905, configured to write the second key-value pair into the target empty slot.
In summary, in the management apparatus for a hash table provided in the embodiment of the present application, when a device needs to perform an insertion operation on the hash table, only the second addressing module needs to address the primary hash bucket corresponding to the keyword, and then the writing module inserts the key value pair into an empty slot of the primary hash bucket or the spare hash bucket thereof, so that the key value pair does not need to be moved, and the storage performance of the device is improved. The load balance of the hash table can be realized by adopting a plurality of hash functions to address a plurality of main hash buckets corresponding to the keywords and then storing the key value pairs in one of the main hash buckets or the standby hash bucket of one of the main hash buckets. When the device needs to perform query operation on the hash table, the device only needs to address the main hash bucket corresponding to the keyword through the first addressing module, and then queries key value pairs in the main hash bucket and the standby hash buckets thereof through the query module, so that the query efficiency is high. In addition, the insertion operation and/or query operation of the hash table in another device by one device in a unilateral RDMA mode are realized, transmission resources occupied between the two devices in the process are less, and the operation execution efficiency is higher.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 11 is a block diagram of an apparatus provided in an embodiment of the present application. The device may be a server. As shown in fig. 11, the apparatus 110 includes: a processor 1101 and a memory 1102.
A memory 1102 for storing a computer program comprising program instructions;
a processor 1101, configured to invoke the computer program, to implement the actions performed by the first device in the above method embodiments.
Optionally, the device 110 further comprises a communication bus 1103 and a communication interface 1104.
The processor 1101 includes one or more processing cores, and the processor 1101 executes various functional applications and data processing by running a computer program.
The memory 1102 may be used to store computer programs. Alternatively, the memory may store an operating system and application program elements required for at least one function. The operating system may be a Real Time eXceptive (RTX) operating system, such as LINUX, UNIX, WINDOWS, or OS X.
The communication interface 1104 may be plural, and the communication interface 1104 is used for communication with other devices. For example, in an embodiment of the present application, the communication interface 1104 of the first device may be used to send a single-sided RDMA request to the second device.
The memory 1102 and the communication interface 1104 are connected to the processor 1101 by a communication bus 1103, respectively.
The embodiment of the present application further provides a computer storage medium, where instructions are stored on the computer storage medium, and when the instructions are executed by a processor, the instructions implement the actions performed by the first device in the foregoing method embodiments. Optionally, the computer storage medium is a non-volatile computer-readable storage medium.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
In the embodiments of the present application, the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The term "and/or" in this application is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The above description is only exemplary of the present application and is not intended to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (25)

1. A method of managing a hash table, the hash table comprising a primary hash bucket addressable by a hash function and a backup hash bucket not addressable by the hash function, the backup hash bucket being located between and shared by two primary hash buckets; the method comprises the following steps:
the method comprises the steps that in response to the fact that a write request for requesting to write a first key value pair is received, the first device addresses at least two first primary hash buckets corresponding to keywords in the first key value pair by adopting at least two hash functions;
the first device determining a target empty slot in which to write the first key-value pair in the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets;
the first device writes the first key-value pair to the target empty slot.
2. The method of claim 1, wherein the first device determining a target empty slot for writing the first key-value pair in the at least two first primary hash buckets and a spare hash bucket of the at least two first primary hash buckets comprises:
the first device determining a target hash bucket among the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets, the target hash bucket including one or more empty slots;
and the first device determines the target empty slot from the unlocked empty slot in the target hash bucket.
3. The method of claim 2, wherein the first device determining a target hash bucket among the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets comprises:
responsive to a hash bucket including an empty slot existing in the at least two first master hash buckets, the first device determining the target hash bucket in the at least two first master hash buckets; alternatively, the first and second electrodes may be,
in response to an absence of a hash bucket of the at least two first primary hash buckets comprising an empty slot, the first device determines the target hash bucket in a spare hash bucket of the at least two first primary hash buckets.
4. The method of claim 3, wherein the first device determining the target hash bucket among the at least two first master hash buckets comprises:
and the first device determines the hash bucket with the largest number of empty slots in the at least two first main hash buckets as the target hash bucket.
5. The method of claim 3 or 4, wherein the first device determining the target hash bucket among the backup hash buckets of the at least two first primary hash buckets comprises:
and the first equipment determines the hash bucket with the largest number of empty slots in the standby hash buckets of the at least two first main hash buckets as the target hash bucket.
6. The method of any of claims 1 to 5, wherein the hash table is stored in a second device, and wherein the first device, in response to receiving a write request requesting to write a first key-value pair, addresses at least two first primary hash buckets corresponding to keys in the first key-value pair using at least two hash functions, comprising:
the first device responds to a received write request for requesting to write a first key value pair, and at least two first hash positions corresponding to keywords in the first key value pair are calculated by adopting the at least two hash functions respectively;
the first device sends at least two first read requests of single-sided Remote Direct Memory Access (RDMA) to the second device, wherein each first read request is used for reading a main hash bucket of the first hash position and a standby hash bucket of the main hash bucket;
the first device writing the first key-value pair to the target empty slot, including:
the first device sending a unilateral RDMA compare-and-exchange CAS request to the second device, the CAS request to lock the target empty slot;
the first device sends a unilateral RDMA write request to the second device in response to a successful locking of the target empty slot, the write request for writing the first key-value pair in the target empty slot.
7. The method of any of claims 1 to 6, further comprising:
the first device, in response to receiving a read request for requesting reading of a second key value pair, addresses at least two second primary hash buckets corresponding to keywords in the second key value pair by using the at least two hash functions;
the first device queries the second key-value pairs in the at least two second primary hash buckets and a backup hash bucket of the at least two second primary hash buckets.
8. The method of claim 7, wherein the hash table is stored in a second device, and wherein the first device, in response to receiving a read request requesting to read a second key-value pair, addresses at least two second primary hash buckets corresponding to keys in the second key-value pair using the at least two hash functions, comprises:
the first device responds to a received read request for requesting to read a second key value pair, and at least two second hash positions corresponding to the keywords in the second key value pair are respectively calculated by adopting the at least two hash functions;
the first device sends at least two unilateral RDMA second read requests to the second device, wherein each second read request is used for reading a main hash bucket of the second hash position and a standby hash bucket of the main hash bucket.
9. The method according to any one of claims 1 to 8, wherein the primary hash buckets and the backup hash buckets in the hash table are arranged alternately, or the hash table includes a plurality of hash bucket groups arranged in sequence, and each hash bucket group includes two primary hash buckets and a backup hash bucket located between the two primary hash buckets.
10. A method of managing a hash table, the hash table comprising a primary hash bucket addressable by a hash function and a backup hash bucket not addressable by the hash function, the backup hash bucket being located between and shared by two primary hash buckets; the method comprises the following steps:
the method comprises the steps that in response to receiving a read request for requesting to read a first key value pair, a first device addresses at least two first primary hash buckets corresponding to keywords in the first key value pair by adopting at least two hash functions;
the first device queries the first key-value pairs in the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets.
11. The method of claim 10, further comprising:
the first device, in response to receiving a write request for requesting to write a second key value pair, addresses at least two second primary hash buckets corresponding to the keywords in the second key value pair by using the at least two hash functions;
determining, by the first device, a target empty slot in which to write the second key-value pair in the at least two second primary hash buckets and a spare hash bucket of the at least two second primary hash buckets;
the first device writes the second key-value pair to the target empty slot.
12. A management apparatus of a hash table, the hash table comprising a main hash bucket and a spare hash bucket, the main hash bucket being addressable by a hash function, the spare hash bucket being unaddressable by the hash function, the spare hash bucket being located between and shared by two main hash buckets; the device comprises:
the first addressing module is used for responding to a received writing request for requesting to write a first key value pair, and addressing at least two first primary hash buckets corresponding to the keywords in the first key value pair by adopting at least two hash functions;
a determining module, configured to determine a target empty slot, into which the first key-value pair is written, in the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets;
and the writing module is used for writing the first key-value pair into the target empty slot.
13. The apparatus of claim 12, wherein the determining module is configured to:
determining a target hash bucket among the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets, the target hash bucket including one or more empty slots;
and determining the target empty slot for the unlocked empty slot in the target hash bucket.
14. The apparatus of claim 13, wherein the determining module is configured to:
determining the target hash bucket among the at least two first master hash buckets in response to a presence of a hash bucket including an empty slot among the at least two first master hash buckets; alternatively, the first and second electrodes may be,
determining the target hash bucket in a spare hash bucket of the at least two first primary hash buckets in response to an absence of a hash bucket of the at least two first primary hash buckets comprising an empty slot.
15. The apparatus of claim 14, wherein the determining module is configured to:
determining a hash bucket with the largest number of empty slots in the at least two first main hash buckets as the target hash bucket in response to the existence of the hash bucket comprising an empty slot in the at least two first main hash buckets.
16. The apparatus of claim 14 or 15, wherein the determining means is configured to:
determining a hash bucket with the largest number of empty slots in a spare hash bucket of the at least two first main hash buckets as the target hash bucket in response to no hash bucket of the at least two first main hash buckets including an empty slot.
17. The apparatus according to any one of claims 12 to 16, wherein the apparatus is applied to a first device, wherein the hash table is stored in a second device, and wherein the first addressing module is configured to:
in response to receiving a write request for requesting to write a first key value pair, respectively calculating at least two first hash positions corresponding to keywords in the first key value pair by using the at least two hash functions;
sending at least two first read requests of single-sided Remote Direct Memory Access (RDMA) to the second device, wherein each first read request is used for reading a main hash bucket of the first hash position and a standby hash bucket of the main hash bucket;
the write module is configured to:
sending a unilateral RDMA compare and swap CAS request to the second device, the CAS request to lock the target empty slot;
in response to a successful lock on the target empty slot, send a unilateral RDMA write request to the second device, the write request to write the first key-value pair in the target empty slot.
18. The apparatus of any one of claims 12 to 17, further comprising:
a second addressing module, configured to, in response to receiving a read request requesting to read a second key value pair, address, using the at least two hash functions, at least two second primary hash buckets corresponding to the keywords in the second key value pair;
a query module to query the second key-value pairs in the at least two second primary hash buckets and the spare hash buckets of the at least two second primary hash buckets.
19. The apparatus of claim 18, wherein the apparatus is applied to a first device, wherein the hash table is stored in a second device, and wherein the second addressing module is configured to:
in response to receiving a read request for requesting to read a second key value pair, respectively calculating at least two second hash positions corresponding to the keywords in the second key value pair by using the at least two hash functions;
sending at least two unilateral RDMA second read requests to the second device, wherein each second read request is used for reading a main hash bucket of the second hash position and a standby hash bucket of the main hash bucket.
20. The apparatus according to any one of claims 12 to 19, wherein the primary hash buckets and the backup hash buckets in the hash table are arranged alternately, or the hash table includes a plurality of hash bucket groups arranged in sequence, and each hash bucket group includes two primary hash buckets and a backup hash bucket located between the two primary hash buckets.
21. A management apparatus of a hash table, the hash table comprising a main hash bucket and a spare hash bucket, the main hash bucket being addressable by a hash function, the spare hash bucket being unaddressable by the hash function, the spare hash bucket being located between and shared by two main hash buckets; the device comprises:
the first addressing module is used for responding to a received read request for requesting to read a first key value pair, and addressing at least two first primary hash buckets corresponding to keywords in the first key value pair by adopting at least two hash functions;
a query module to query the first key-value pairs in the at least two first primary hash buckets and a backup hash bucket of the at least two first primary hash buckets.
22. The apparatus of claim 21, further comprising:
a second addressing module, configured to, in response to receiving a write request requesting to write a second key value pair, address, using the at least two hash functions, at least two second primary hash buckets corresponding to the keywords in the second key value pair;
a determining module configured to determine a target empty slot, into which the second key-value pair is written, in the at least two second primary hash buckets and a backup hash bucket of the at least two second primary hash buckets;
and the writing module is used for writing the second key-value pair into the target empty slot.
23. A hash table management system, comprising: the first device and the second device, wherein a hash table is stored in the second device, the hash table comprises a main hash bucket and a standby hash bucket, the main hash bucket can be addressed by a hash function, the standby hash bucket cannot be addressed by the hash function, and the standby hash bucket is positioned between the two main hash buckets and shared by the two main hash buckets;
the first device is used for responding to a received write request for requesting to write a first key value pair, at least two first hash positions corresponding to the keywords in the first key value pair are respectively calculated by adopting at least two hash functions,
the first device is configured to send at least two RDMA unilateral remote direct memory access first read requests to the second device, each of the first read requests being configured to read a primary hash bucket of the first hash location and a backup hash bucket of the primary hash bucket,
the first device is to determine a target empty slot in the read hash bucket to write the first key-value pair,
the first device to send a unilateral RDMA compare-and-swap CAS request to the second device, the CAS request to lock the target empty slot,
the first device is configured to send a write request of RDMA on one side to the second device in response to a successful locking of the target empty slot, the write request being configured to write the first key-value pair in the target empty slot;
and/or the presence of a gas in the gas,
the first device is configured to, in response to receiving a read request requesting to read a second key value pair, calculate at least two second hash positions corresponding to the keywords in the second key value pair respectively using the at least two hash functions,
the first device to send at least two RDMA one-sided second read requests to the second device, each second read request to read a primary hash bucket of the second hash location and a backup hash bucket of the primary hash bucket,
and the first device is used for inquiring the second key-value pair in the read hash bucket.
24. An apparatus, comprising: a processor and a memory;
the memory for storing a computer program, the computer program comprising program instructions;
the processor is configured to invoke the computer program to implement the hash table management method according to any one of claims 1 to 11.
25. A computer storage medium having stored thereon instructions which, when executed by a processor, carry out a method of managing a hash table according to any one of claims 1 to 11.
CN202011362317.1A 2020-11-28 2020-11-28 Method, device and system for managing hash table and computer storage medium Pending CN114579558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011362317.1A CN114579558A (en) 2020-11-28 2020-11-28 Method, device and system for managing hash table and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011362317.1A CN114579558A (en) 2020-11-28 2020-11-28 Method, device and system for managing hash table and computer storage medium

Publications (1)

Publication Number Publication Date
CN114579558A true CN114579558A (en) 2022-06-03

Family

ID=81766692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011362317.1A Pending CN114579558A (en) 2020-11-28 2020-11-28 Method, device and system for managing hash table and computer storage medium

Country Status (1)

Country Link
CN (1) CN114579558A (en)

Similar Documents

Publication Publication Date Title
CN110113420B (en) NVM-based distributed message queue management system
US8161244B2 (en) Multiple cache directories
US7373514B2 (en) High-performance hashing system
US10599677B2 (en) Methods and systems of splitting database indexes and digests
CN102088484B (en) Write lock method of distributed file system and a system thereof
CN111400307B (en) Persistent hash table access system supporting remote concurrent access
US11269772B2 (en) Persistent memory storage engine device based on log structure and control method thereof
CN105320654A (en) Dynamic bloom filter and element operating method based on same
CN108762668B (en) Method and device for processing write conflict
US20230305724A1 (en) Data management method and apparatus, computer device, and storage medium
CN110858162A (en) Memory management method and device and server
US20180373634A1 (en) Processing Node, Computer System, and Transaction Conflict Detection Method
CN109165321B (en) Consistent hash table construction method and system based on nonvolatile memory
CN108762915B (en) Method for caching RDF data in GPU memory
CN114936188A (en) Data processing method and device, electronic equipment and storage medium
CN113779154B (en) Construction method and application of distributed learning index model
CN107908713B (en) Distributed dynamic rhododendron filtering system based on Redis cluster and filtering method thereof
US11403273B1 (en) Optimizing hash table searching using bitmasks and linear probing
CN116719813A (en) Hash table processing method and device and electronic equipment
CN114579558A (en) Method, device and system for managing hash table and computer storage medium
CN107832121B (en) Concurrency control method applied to distributed serial long transactions
US7185029B1 (en) Method and apparatus for maintaining, and updating in-memory copies of the first and second pointers to reference the new versions of the first and second control structures that indicate available and allocated portions of usable space in the data file
CN112380004B (en) Memory management method, memory management device, computer readable storage medium and electronic equipment
CN114490540A (en) Data storage method, medium, device and computing equipment
CN112637327B (en) Data processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination