CN115309745A - Key value pair storage method, device, equipment and medium - Google Patents

Key value pair storage method, device, equipment and medium Download PDF

Info

Publication number
CN115309745A
CN115309745A CN202210976474.4A CN202210976474A CN115309745A CN 115309745 A CN115309745 A CN 115309745A CN 202210976474 A CN202210976474 A CN 202210976474A CN 115309745 A CN115309745 A CN 115309745A
Authority
CN
China
Prior art keywords
key
value pair
bucket
buckets
bucket group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210976474.4A
Other languages
Chinese (zh)
Inventor
夏文
陈祺
胡浩
李诗逸
邓才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202210976474.4A priority Critical patent/CN115309745A/en
Publication of CN115309745A publication Critical patent/CN115309745A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a key value pair storage method, a device, equipment and a medium, which relate to the technical field of computers and comprise the following steps: determining key value pair buckets in the key value pair bucket group, and judging whether the residual capacity in the key value pair buckets is smaller than the occupied capacity of the key value pairs to be stored; if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of a local hash table; if the repeated hash times are less than the extension times, determining a key-value pair transfer storage bucket group, and transferring and storing the historical key-value pairs belonging to the key-value pair virtual bucket group to the key-value pair transfer storage bucket group; and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored in the target key-value pair storage bucket. The method and the device can improve the key value pair storage efficiency, improve the reading performance of the index under the intensive reading and skew reading scenes, and reduce the overhead of maintaining the completeness of the index.

Description

Key value pair storage method, device, equipment and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a key value pair storage method, apparatus, device, and medium.
Background
At present, the appearance of persistent memory (PMEM) significantly improves the performance of the conventional hash index. A large number of existing PMEM-oriented hash indexes are mainly optimized in performance by using hardware characteristics such as Non Uniform Memory Access (NUMA), cache granularity alignment, read-write asymmetry and the like; a few existing researches focus on the algorithm and structure of the hash index itself, and common optimization techniques include: fingerprint acceleration, active concurrency control, load balancing, and the like. Moreover, the existing research usually neglects the optimization of the read performance, and even sacrifices the read performance to replace the improvement of the write performance, which results in that the existing index scheme cannot fully exert the performance advantages of PMEM in large-scale read-dense and read-skew scenarios. For this reason, the read throughput of the index is severely limited by the overhead caused by the hash collision. Hash collisions require multiple additional PMEM accesses for a query, resulting in a linear increase in single query time with probe times, especially under negative query (key-value pairs of queries are not in the hash table) workload.
Therefore, how to fully exert the hardware characteristics of PMEM and the inherent advantages of perfect hash index in the key value pair storage process is a problem to be solved in the field, so that the key value pair storage efficiency is improved, the index reading performance under the reading intensive and reading skew scenes is improved, and the overhead of maintaining the index perfectness is reduced.
Disclosure of Invention
In view of this, an object of the present invention is to provide a key value pair storage method, device, apparatus, and medium, which can fully utilize hardware characteristics of PMEM and inherent advantages of perfect hash index, thereby improving efficiency of key value pair storage, improving reading performance of index under read intensive and read skew scenarios, and reducing overhead of maintaining index perfection. The specific scheme is as follows:
in a first aspect, the present application discloses a key-value pair storage method, including:
determining key value pair buckets in a key value pair bucket group, and judging whether the residual capacity in the key value pair buckets is smaller than the occupied capacity of the key value pairs to be stored or not;
if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of a local hash table;
if the repeated hash times are less than the extension times, determining a key-value pair transfer storage bucket group, and transferring and storing the historical key-value pairs belonging to the key-value pair virtual bucket group to the key-value pair transfer storage bucket group;
and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored into the target key-value pair storage bucket.
Optionally, before determining key-value pair buckets in the key-value pair bucket group, the method further includes:
acquiring key value pair storage information, and calculating key value pair keywords in the key value pair storage information to obtain a unit number and a guide index;
determining location information of the key-value pair to be stored in the key-value pair virtual bucket group based on the unit number and the index.
Optionally, the determining key-value pair buckets in the key-value pair bucket group includes:
inputting the unit number and the index into a preset mapper to obtain a layer index and a bucket offset;
determining the number of layers of the key-value pair bucket group and the position information of the key-value pair buckets based on the layer index and the bucket offset to obtain the key-value pair buckets.
Optionally, the transferring and storing the historical key-value pairs belonging to the key-value-pair virtual bucket group to the key-value-pair transfer bucket group includes:
screening out target historical key-value pairs to be transferred from all the historical key-value pairs belonging to the key-value pair virtual bucket group;
and transferring and storing the target historical key-value pair to the key-value pair transfer bucket group.
Optionally, the screening out the target historical key-value pair to be transferred from all the historical key-value pairs belonging to the key-value pair virtual bucket group includes:
determining the unit number of the key-value pair virtual bucket group, and acquiring the unit numbers of all historical key-value pairs in the key-value pair virtual bucket group;
and judging whether the unit number of the key-value pair virtual bucket group is consistent with the unit number of the historical key-value pair, and if the unit number of the key-value pair virtual bucket group is not consistent with the unit number of the historical key-value pair, taking the historical key-value pair as the target historical key-value pair to be transferred.
Optionally, after determining the size relationship between the rehashing number and the extending number, the method further includes:
if the number of rehashing times is equal to the number of extension times, acquiring the remaining capacity of all the key-value pair buckets in the key-value pair bucket group;
and screening out the key-value pair transfer buckets from the key-value pair bucket group based on the capacity of the key values to be stored and the residual capacity of all the key-value pair buckets in the key-value pair bucket group so as to transfer and store the historical key-value pairs belonging to the key-value pair buckets to the key-value pair transfer buckets.
Optionally, the key-value pair storage method further includes:
and if the occupied capacity of the key value pair to be stored is larger than the residual capacity of all the key value pair buckets in the key value pair bucket group, expanding the key value pair bucket group according to a preset expansion method to obtain a new key value pair bucket group, increasing the numerical values of the rehashing times and the extension times, and then jumping to the step of determining the key value pair buckets in the key value pair bucket group.
In a second aspect, the present application discloses a key-value pair storage device comprising:
the key value pair storage barrel determining module is used for determining key value pair storage barrels in the key value pair storage barrel group and judging whether the residual capacity in the key value pair storage barrels is smaller than the occupied capacity of the key value pairs to be stored;
the judging module is used for judging the size relation between the re-hashing times of the key value pair virtual bucket group and the extension times of the local hash table, wherein the pre-acquired re-hashing times of the key value pair virtual bucket group are larger than the occupied capacity of the key value pair to be stored;
a historical key-value pair transfer module, configured to determine a key-value pair transfer bucket group if the rehashing times are smaller than the extension times, and transfer and store the historical key-value pair belonging to the key-value pair virtual bucket group to the key-value pair transfer bucket group;
and the target key-value pair storage module is used for determining a target key-value pair storage bucket and storing the key-value pairs to be stored into the target key-value pair storage bucket.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the aforementioned key-value pair storage method.
In a fourth aspect, the present application discloses a computer storage medium for storing a computer program; wherein the computer program realizes the steps of the key-value pair storage method disclosed in the foregoing when executed by a processor.
The key value pair storage method comprises the steps of determining key value pair storage buckets in a key value pair storage bucket group, and judging whether the residual capacity in the key value pair storage buckets is smaller than the occupied capacity of the key value pairs to be stored; if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of a local hash table; if the repeated hash times are less than the extension times, determining a key-value pair transfer storage bucket group, and transferring and storing the historical key-value pairs belonging to the key-value pair virtual bucket group to the key-value pair transfer storage bucket group; and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored into the target key-value pair storage bucket. The extensible hash and the perfect hash are combined, corresponding operation is carried out by judging the size relation between the repeated hash times of the key value pair virtual bucket group and the extension times of the local hash table, extra expenses caused by hash collision during query are eliminated by introducing the perfect hash, the reading performance of the index is released, the hardware characteristic of PMEM and the inherent advantages of the perfect hash index are fully played, the storage efficiency of the key value pair is improved, the reading performance of the index under the intensive reading and skew reading scenes is improved, and the expense for maintaining the completeness of the index is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a key-value pair storage method disclosed herein;
FIG. 2 is a block diagram of a key-value pair storage system according to the present disclosure;
FIG. 3 is a detailed flowchart of a key-value pair storage method disclosed in the present application;
fig. 4 is a schematic diagram illustrating a specific flow of a key-value pair storage method disclosed in the present application;
fig. 5 is a schematic diagram illustrating a specific flow of a key-value pair storage method disclosed in the present application;
FIG. 6 is a schematic diagram illustrating a specific flow of a key-value pair storage method disclosed in the present application;
FIG. 7 is a schematic diagram of a key-value pair storage device according to the present disclosure;
fig. 8 is a block diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
At present, the appearance of persistent memory (PMEM) significantly improves the performance of the conventional hash index. A large number of existing PMEM-oriented hash indexes are mainly optimized in performance by using hardware characteristics such as Non Uniform Memory Access (NUMA), cache granularity alignment, read-write asymmetry and the like; a few existing researches focus on the algorithm and structure of the hash index itself, and common optimization techniques include: fingerprint acceleration, active concurrency control, load balancing, and the like. Existing research usually ignores optimization of read performance, and even sacrifices read performance in exchange for improvement of write performance, which results in that existing indexing schemes cannot fully exploit the performance advantages of PMEM in large-scale read density and read skew scenarios. For this reason, the read throughput of the index is severely limited by the overhead caused by the hash collision. Hash collisions require multiple additional PMEM accesses for a query, resulting in a linear increase in single query time with probe times, especially under negative query (key-value pairs of queries are not in the hash table) workloads. Therefore, how to fully exert the hardware characteristics of PMEM and the inherent advantages of perfect hash index in the key value pair storage process is a problem to be solved in the field, so that the key value pair storage efficiency is improved, the index reading performance under the reading intensive and reading skew scenes is improved, and the overhead of maintaining the index perfectness is reduced.
Referring to fig. 1, an embodiment of the present invention discloses a key-value pair storage method, which specifically includes:
step S11: and determining key value pair buckets in the key value pair bucket group, and judging whether the residual capacity in the key value pair buckets is smaller than the occupied capacity of the key value pairs to be stored.
In this embodiment, before determining the key-value pair buckets in the key-value pair bucket group, the method further includes: acquiring key value pair storage information, and calculating key value pair keywords in the key value pair storage information to obtain a unit number and a guide index; and determining the position information of the key-value pair to be stored in the key-value pair virtual bucket group based on the unit number and the guide index, then inputting the unit number and the guide index into a preset mapper to obtain a layer index and a bucket offset, and determining the layer number of the key-value pair storage bucket group and the position information of the key-value pair storage bucket based on the layer index and the bucket offset to obtain the key-value pair storage bucket.
It is understood that, in the present application, for the scalable perfect hash index of the hybrid storage architecture PMEM-DRAM (dynamic random access memory), a perfect hash usually consists of an index structure and a key value table, and the position of a key value pair in the key value table is determined by looking up the index. The key value pair establishes a one-to-one correspondence (no hash collision) with a unique physical bucket through one-time hashing, one-time mapping and one-time persistence, so that query operation can be completed only by one-time PMEM (physical random access), and as shown in FIG. 2, the specific system structure of the application is divided into three parts: an index structure in DRAM, a hash table in PMEM and functional components. In particular, in the first part, the present invention uses a data structure named element in FIG. I as the fast index structure. As shown in FIG. one, to reduce the expensive moving overhead, three types of metadata are maintained per unit. Wherein the lock field is used for concurrency control; LD indicates the number of times the current unit rehashed; the Guide Array (GA) is an Array for recording key value pair movement displacement; in the second part, the invention organizes the physical buckets in layers, each layer containing a fixed number of physical buckets, indexed by layer pointers. Each pointer in the layer pointer array points to the pointer of the first physical bucket of each layer, and the other layer pointer array also maintains GD to indicate the extension times of the hash table and cooperates with LD to control the overflow operation; bucket metadata occupies the first 48 bytes of each bucket, containing some fields that assist in performing basic operations (insert, query, and delete). Wherein, 4 bytes of version lock ensures the concurrency of key-value pair operation; the used slot field is used for accumulating the number of used slots, the invention uses the least significant 8 bits of each key word to generate a unique fingerprint so as to accelerate the searching process, and the bitmap field is used for checking the validity of the key value pair in the slot; the metadata also stores the unit number and the index of the key value pair so as to locate the virtual bucket corresponding to the key value pair; when performing failover, the status field indicates whether the physical bucket is in a coherency state. In addition, padding bytes are added at the end of the metadata to keep the same access granularity as the PMEM; in the third part, the hashing and module-taking components respectively calculate the unit number and the index for positioning to a unique virtual bucket, and the mapper component is used for constructing the mapping relation between the virtual bucket and the physical bucket.
Specifically, the mapping device receives the unit number and the index added with the displacement as input, outputs the layer index and the bucket offset, and determines a physical bucket according to the two values, so that the virtual bucket is mapped to a unique physical bucket. It is noted that one physical bucket may map to multiple virtual buckets, and that one virtual bucket will map to only one physical bucket. That is, key-value pair storage information is obtained first, a key-value pair keyword k in the key-value pair storage information is determined, and then a unit number c and a guide index g for indexing a virtual bucket are specifically calculated by the following formulas.
c=hash(k)modN;
g=kmodG;
Wherein, N represents the number of units, and G represents the length of the index array.
Then, the virtual bucket is mapped to a unique physical bucket using a mapper, which receives as inputs the unit number s and the index g, and outputs the layer index l and the bucket offset o by the following formula.
Figure BDA0003798636270000071
g′=(c+GA[c])modG;
o=hash g′ (c)mod B;
Where GA (Guide Array), B represents the number of physical buckets per layer, GD (Global Depth, stretch), LD (Local Depth, hash).
Step S12: and if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of the local hash table.
Step S13: and if the rehashing times are less than the extension times, determining a key-value pair transfer bucket group, and transferring and storing the historical key-value pairs in the key-value pair virtual bucket group to the key-value pair transfer bucket group.
In this embodiment, if the number of rehashing times is less than the number of extension times, a key-value-pair transfer bucket group is determined, then a target historical key-value pair to be transferred is screened from all historical key-value pairs belonging to the key-value-pair virtual bucket group, and the target historical key-value pair is transferred and stored to the key-value-pair transfer bucket group.
After determining the key-value pair transfer bucket group, the specific process is as follows: determining the unit number of the key-value pair virtual bucket group, acquiring the unit numbers of all historical key-value pairs in the key-value pair virtual bucket group, then judging whether the unit numbers of the key-value pair virtual bucket group are consistent with the unit numbers of the historical key-value pairs, and if the unit numbers of the key-value pair virtual bucket group are not consistent with the unit numbers of the historical key-value pairs, taking the historical key-value pairs as the target historical key-value pairs to be transferred.
In this embodiment, if the remaining capacity of the key-value pair bucket is less than the occupied capacity of the key-value pair to be stored, and the rehashing number is less than the extension number, that is, when the physical bucket is full and LD < GD, a local rehashing operation is performed. Unlike the full-table rehashing in previous research, the invention performs rehashing by taking a physical bucket group as a unit. In other words, in the re-hash phase, only the set of physical buckets that caused the insertion failure are operated on.
Step S14: and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored in the target key-value pair storage bucket.
After transferring and storing the target historical key-value pairs to the key-value-pair transfer bucket group, repeating the calculation steps to obtain target key-value-pair buckets, and then storing the key-value pairs to be stored to the target key-value-pair buckets.
In this embodiment, key-value pair buckets in a key-value pair bucket group are determined, and whether the remaining capacity in the key-value pair buckets is smaller than the occupied capacity of key-value pairs to be stored is judged; if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of a local hash table; if the rehashing times are less than the extension times, determining a key-value pair transfer bucket group, and transferring and storing the historical key-value pairs in the key-value pair virtual bucket group to the key-value pair transfer bucket group; and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored in the target key-value pair storage bucket. The extensible hash and the perfect hash are combined, corresponding operation is carried out by judging the size relation between the repeated hash times of the key value pair virtual bucket group and the extension times of the local hash table, extra expenses caused by hash collision during query are eliminated by introducing the perfect hash, the reading performance of the index is released, the hardware characteristic of PMEM and the inherent advantages of the perfect hash index are fully played, the storage efficiency of the key value pair is improved, the reading performance of the index under the intensive reading and skew reading scenes is improved, and the expense for maintaining the completeness of the index is reduced.
Referring to fig. 3, an embodiment of the present invention discloses a key-value pair storage method, which may specifically include:
step S21: and determining key value pair buckets in the key value pair bucket group, and judging whether the residual capacity in the key value pair buckets is smaller than the occupied capacity of the key value pairs to be stored.
Step S22: and if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of the local hash table.
Step S23: if the repeated hash times are equal to the extension times, acquiring the residual capacity of all the key-value pair buckets in the key-value pair bucket group, and then screening out the key-value pair transfer buckets from the key-value pair bucket group based on the capacity of the key values to be stored and the residual capacity of all the key-value pair buckets in the key-value pair bucket group, so as to transfer and store the historical key-value pairs belonging to the key-value pair buckets to the key-value pair transfer buckets.
Step S24: and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored in the target key-value pair storage bucket.
In this embodiment, if the occupied capacity of the key value pair to be stored is greater than the remaining capacity of all the key value pair buckets in the key value pair bucket group, the key value pair bucket group is expanded according to a preset expansion method to obtain a new key value pair bucket group, the values of the rehashing times and the extension times are increased, and then the step of determining the key value pair buckets in the key value pair bucket group is skipped.
Specifically, for example, in a double manner, the key value pair bucket group is expanded in a double manner, and the unit and the layer pointer may also be expanded, after the expansion is successful, the new layer pointer may point to a new physical bucket layer in sequence, and the number of rehashing times of the key value pair virtual bucket group and the number of extending times of the local hash table (that is, the values of LD and GD) are increased, and after the LD and GD are changed, the mapping relationship between the unit for storing the key value pair and the physical bucket group may also be changed. The preset expansion method in this application includes, but is not limited to, doubling.
In this embodiment, if the occupied capacity of the key value pair to be stored is greater than the remaining capacity of all key value pair buckets in the key value pair bucket group, it represents that the insertion of the key value pair to be stored fails, and after the expansion, because the target key value pair bucket of the key value pair to be stored before hashing is determined (the same key value pair buckets in different layers), it is not necessary to reestablish the mapping relationship between the key value pair to be stored and the target key value pair bucket, which omits unnecessary computation, and then reinsert the key value pair which has failed before insertion.
As shown in fig. 4, the specific operation steps are as follows: 1 is added to LDs in all cells (cell 0 and cell 2) pointing to virtual bucket group No. 1. When the LD in a unit is incremented by 1, the unit will map to a different level of physical bucket group, so that the key-value pair hashed to unit 2 will map to the first level of physical buckets (i.e., target key-value pair buckets). Traversing the historical key-value pairs belonging to the virtual bucket group No. 0, acquiring the unit numbers of the historical key-value pairs, and skipping if the unit numbers of the key-value pair virtual bucket group are equal to the unit numbers (the unit No. 0 in the legend) of the historical key-value pairs; otherwise, once the key value pair does not equal the cell number of the virtual bucket group to the cell number of the historical key value pair, the key value pair is transferred to the physical bucket group corresponding to another cell (cell number 2 in the legend).
Acquiring the residual capacity of all key value pair buckets, then judging whether the occupied capacity of the key value pair to be stored is larger than the residual capacity of all the key value pair buckets in a key value pair bucket group, and if the occupied capacity of the key value pair to be stored is larger than the residual capacity of all the key value pair buckets in the key value pair bucket group, acquiring the residual capacity of all the key value pair buckets; and screening out target key-value pair buckets from all the key-value pair buckets according to the residual capacity of all the key-value pair buckets so as to transfer and store the historical key-value pairs belonging to the key-value pair buckets to the target key-value pair buckets. As shown in fig. 5, when inserting a key-value pair into physical bucket 2 fails, a free bucket (physical bucket No. 0 in fig. five, i.e., target key-value pair bucket) is found in the physical bucket group corresponding to the unit. And moving the historical key-value pair in the No. 2 virtual bucket to a physical bucket 0, and storing the key-value pair to be stored to the key-value pair storage bucket. Then, after the move, the index array is updated so that the moved key-value pair can be repositioned the next time. Atomically increasing GA [2] by (0-2) mod3=1.
As shown in fig. 5, the basic operations (such as inserting, querying, and deleting) of the hash table of the present application are specifically performed by first performing hashing and modulo, that is, obtaining key-value pair storage information, calculating key-value pair keywords in the key-value pair storage information to obtain a unit number and a guide index, then determining location information of the key-value pair to be stored based on the unit number and the guide index, thereby locating a unique virtual bucket, then performing mapping, that is, mapping the virtual bucket to a unique physical bucket, that is, inputting the unit number and the guide index to a preset mapper to obtain a layer index and a bucket offset, and then determining location information of the key-value pair storage bucket based on the layer index and the bucket offset to obtain the key-value pair storage bucket. Inserting the key value pair into a physical bucket and persisting, wherein the specific query operation is to traverse the historical key value pair belonging to the number 0 virtual bucket group, obtain the unit number of the historical key value pair, and skip if the unit number of the key value pair virtual bucket group is equal to the unit number of the historical key value pair (the number 0 unit in the legend); otherwise, once the unit number of the key value pair virtual bucket group is not equal to the unit number of the historical key value pair, the key value pair is transferred to a physical bucket group corresponding to another unit (the No. 2 unit in the figure), the invention uses SIMD to accelerate the searching and comparing process, so that the searching of the whole bucket can be completed only by one comparison. If the key-value pair is not in this physical bucket, then this key-value pair must not be in the hash table. When deleting a key value pair, the invention firstly inquires the key value pair in the hash table. If the key-value pair is found, the corresponding position of the bitmap is set to 0, which indicates that the key-value pair at the position is invalid, and then the bucket metadata is persisted to complete the delete operation. Further, as shown in fig. 6, the present application expands the cell array, the layer pointer, and the physical bucket layer in a double manner. After the expansion is successful, the new layer pointer will point to the newly generated physical bucket layer in turn. As with the local re-hash operation, the LD and GD values for a unit are atomically increased, and the mapping relationship between the unit and the set of physical buckets is changed after the LD and GD values are changed according to the above formula. The rehashing results in the insertion of a failed physical bucket group. Here, unlike the partial re-hashing, since the destination physical bucket of the key-value pair before hashing is already determined (the same physical bucket of a different layer), there is no need to re-establish the mapping relationship between the key-value pair and the physical bucket, unnecessary computation is omitted, and then the previously inserted failed key-value pair is re-inserted. The present invention achieves approximately 2.21 times higher read throughput than the most advanced schemes of the prior art, with 99-min tail delays of only 1/3 of those schemes.
In this embodiment, key-value pair buckets in a key-value pair bucket group are determined, and whether the remaining capacity in the key-value pair buckets is smaller than the occupied capacity of key-value pairs to be stored is judged; if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of a local hash table; if the repeated hash times are equal to the extension times, acquiring the residual capacity of all the key-value pair buckets in the key-value pair bucket group, and then screening out the key-value pair transfer buckets from the key-value pair bucket group based on the capacity of the key values to be stored and the residual capacity of all the key-value pair buckets in the key-value pair bucket group so as to transfer and store the historical key-value pairs belonging to the key-value pair buckets to the key-value pair transfer buckets; and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored in the target key-value pair storage bucket. The extensible hash and the perfect hash are combined, corresponding operation is carried out by judging the size relation between the repeated hash times of the key value pair virtual bucket group and the extension times of the local hash table, extra expenses caused by hash collision during query are eliminated by introducing the perfect hash, the reading performance of the index is released, the hardware characteristic of PMEM and the inherent advantages of the perfect hash index are fully played, the storage efficiency of the key value pair is improved, the reading performance of the index under the intensive reading and skew reading scenes is improved, and the expense for maintaining the completeness of the index is reduced.
Referring to fig. 7, an embodiment of the present invention discloses a key-value pair storage device, which may specifically include:
a key-value pair bucket determining module 11, configured to determine key-value pair buckets in a key-value pair bucket group, and determine whether remaining capacity in the key-value pair buckets is smaller than occupied capacity of key-value pairs to be stored;
a judging module 12, configured to judge a size relationship between the number of rehashing times of the pre-obtained key value pair virtual bucket group and the number of extension times of the local hash table if the remaining capacity in the key value pair storage bucket is smaller than the occupied capacity of the key value pair to be stored;
a historical key-value pair transfer module 13, configured to determine a key-value pair transfer bucket group if the rehashing times are smaller than the extension times, and transfer and store the historical key-value pair belonging to the key-value pair virtual bucket group to the key-value pair transfer bucket group;
and the target key-value pair storage module 14 is configured to determine a target key-value pair bucket, and store the key-value pair to be stored in the target key-value pair bucket.
In this embodiment, key-value pair buckets in a key-value pair bucket group are determined, and whether the remaining capacity in the key-value pair buckets is smaller than the occupied capacity of key-value pairs to be stored is judged; if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group acquired in advance and the extension times of a local hash table; if the repeated hash times are less than the extension times, determining a key-value pair transfer storage bucket group, and transferring and storing the historical key-value pairs belonging to the key-value pair virtual bucket group to the key-value pair transfer storage bucket group; and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored into the target key-value pair storage bucket. The extensible hash and the perfect hash are combined, corresponding operation is carried out by judging the size relation between the repeated hash times of the key value pair virtual bucket group and the extension times of the local hash table, extra expenses caused by hash collision during query are eliminated by introducing the perfect hash, the reading performance of the index is released, the hardware characteristic of PMEM and the inherent advantages of the perfect hash index are fully played, the storage efficiency of the key value pair is improved, the reading performance of the index under the intensive reading and skew reading scenes is improved, and the expense for maintaining the completeness of the index is reduced.
In some embodiments, the key-value pair bucket determining module 11 may specifically include:
the information acquisition module is used for acquiring key value pair storage information and calculating key value pair keywords in the key value pair storage information to obtain a unit number and a guide index;
and the position information determining module is used for determining the position information of the key-value pair to be stored in the key-value pair virtual bucket group based on the unit number and the index.
In some embodiments, the key-value pair bucket determining module 11 may specifically include:
a mapper output module for inputting the unit number and the index into a preset mapper to obtain a layer index and a bucket offset;
a key-value pair bucket determining module, configured to determine, based on the layer index and the bucket offset, a number of layers of the key-value pair bucket group and location information of the key-value pair bucket, so as to obtain the key-value pair bucket.
In some specific embodiments, the history key-value pair transferring module 13 may specifically include:
the screening module is used for screening the target historical key value pair to be transferred from all the historical key value pairs belonging to the key value pair virtual bucket group;
and the transfer module is used for transferring and storing the target historical key-value pair to the key-value pair transfer bucket group.
In some specific embodiments, the history key-value pair transferring module 13 may specifically include:
a unit number determining module, configured to determine the unit numbers of the key-value-pair virtual bucket groups, and obtain the unit numbers of all historical key-value pairs in the key-value-pair virtual bucket groups;
and the judging module is used for judging whether the unit number of the key-value pair virtual bucket group is consistent with the unit number of the historical key-value pair, and if the unit number of the key-value pair virtual bucket group is inconsistent with the unit number of the historical key-value pair, the historical key-value pair is used as the target historical key-value pair to be transferred.
In some specific embodiments, the determining module 12 may specifically include:
a remaining capacity obtaining module, configured to obtain remaining capacities of all the key-value pair buckets in the key-value pair bucket group if the rehashing number is equal to the extending number;
and the transfer storage module is used for screening out the key-value pair transfer buckets from the key-value pair bucket group based on the capacity of the key values to be stored and the residual capacity of all the key-value pair buckets in the key-value pair bucket group so as to transfer and store the historical key-value pairs belonging to the key-value pair buckets into the key-value pair transfer buckets.
In some specific embodiments, the target key-value pair storage module 14 may specifically include:
and the expansion module is used for expanding the key value pair storage barrel group according to a preset expansion method to obtain a new key value pair storage barrel group if the occupied capacity of the key value pair to be stored is larger than the residual capacity of all the key value pair storage barrels in the key value pair storage barrel group, increasing the numerical values of the rehashing times and the extension times, and then jumping to the step of determining the key value pair storage barrels in the key value pair storage barrel group.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, which is loaded and executed by the processor 21 to implement the relevant steps of the key-value pair storage method executed by the electronic device disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol that can be applied to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20, so as to realize the operation and processing of the data 223 in the memory 22 by the processor 21, which may be Windows, unix, linux, and the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the key-value-pair storage method disclosed by any of the foregoing embodiments and executed by the electronic device 20. The data 223 may include data received by the key-value pair storage device and transmitted from an external device, data collected by the input/output interface 25, and the like.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Further, an embodiment of the present application further discloses a computer-readable storage medium, in which a computer program is stored, and when the computer program is loaded and executed by a processor, the steps of the key-value pair storage method disclosed in any of the foregoing embodiments are implemented.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The key value pair storage method, device, apparatus and storage medium provided by the present invention are described in detail above, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A key-value pair storage method, comprising:
determining key value pair buckets in the key value pair bucket group, and judging whether the residual capacity in the key value pair buckets is smaller than the occupied capacity of the key value pairs to be stored;
if the residual capacity in the key value pair storage barrel is smaller than the occupied capacity of the key value pair to be stored, judging the size relation between the re-hashing times of the key value pair virtual barrel group obtained in advance and the extension times of a local hash table;
if the repeated hash times are less than the extension times, determining a key-value pair transfer storage bucket group, and transferring and storing the historical key-value pairs belonging to the key-value pair virtual bucket group to the key-value pair transfer storage bucket group;
and determining a target key-value pair storage bucket, and storing the key-value pairs to be stored in the target key-value pair storage bucket.
2. The key-value pair storage method of claim 1, wherein prior to determining the key-value pair buckets in the set of key-value pair buckets, further comprising:
acquiring key value pair storage information, and calculating key value pair keywords in the key value pair storage information to obtain a unit number and a guide index;
determining location information of the key-value pair to be stored in the key-value pair virtual bucket group based on the unit number and the index.
3. The key-value pair storage method of claim 2, wherein determining key-value pair buckets in the set of key-value pair buckets comprises:
inputting the unit number and the index into a preset mapper to obtain a layer index and a bucket offset;
determining the number of layers of the key-value pair bucket group and the position information of the key-value pair buckets based on the layer index and the bucket offset to obtain the key-value pair buckets.
4. The key-value pair storage method according to claim 1, wherein the transferring and storing the historical key-value pairs belonging to the set of key-value pair virtual buckets to the set of key-value pair transfer buckets includes:
screening out target historical key-value pairs to be transferred from all the historical key-value pairs belonging to the key-value pair virtual bucket group;
and transferring and storing the target historical key-value pair to the key-value pair transfer bucket group.
5. The key-value pair storage method according to claim 4, wherein the screening out the target historical key-value pair to be transferred from all the historical key-value pairs belonging to the key-value pair virtual bucket group comprises:
determining the unit number of the key-value pair virtual bucket group, and acquiring the unit numbers of all historical key-value pairs in the key-value pair virtual bucket group;
and judging whether the unit number of the key-value pair virtual bucket group is consistent with the unit number of the historical key-value pair, and if the unit number of the key-value pair virtual bucket group is not consistent with the unit number of the historical key-value pair, taking the historical key-value pair as the target historical key-value pair to be transferred.
6. The key-value pair storage method according to any one of claims 1 to 5, wherein after determining the magnitude relationship between the number of rehashing times and the number of extension times, the method further comprises:
if the number of rehashing times is equal to the number of extension times, acquiring the remaining capacity of all the key-value pair buckets in the key-value pair bucket group;
and screening out the key-value pair transfer buckets from the key-value pair bucket group based on the capacity of the key values to be stored and the residual capacity of all the key-value pair buckets in the key-value pair bucket group so as to transfer and store the historical key-value pairs belonging to the key-value pair buckets to the key-value pair transfer buckets.
7. The key-value pair storage method according to claim 6, further comprising:
and if the occupied capacity of the key value pair to be stored is larger than the residual capacity of all the key value pair buckets in the key value pair bucket group, expanding the key value pair bucket group according to a preset expansion method to obtain a new key value pair bucket group, increasing the numerical values of the rehashing times and the extension times, and then jumping to the step of determining the key value pair buckets in the key value pair bucket group.
8. A key-value pair storage device, comprising:
the key value pair storage barrel determining module is used for determining key value pair storage barrels in the key value pair storage barrel group and judging whether the residual capacity in the key value pair storage barrels is smaller than the occupied capacity of the key value pairs to be stored;
the judging module is used for judging the size relation between the re-hashing times of the key value pair virtual bucket group and the extension times of the local hash table, wherein the pre-acquired re-hashing times of the key value pair virtual bucket group are larger than the occupied capacity of the key value pair to be stored;
a historical key-value pair transfer module, configured to determine a key-value pair transfer bucket group if the rehashing times are smaller than the extension times, and transfer and store the historical key-value pair belonging to the key-value pair virtual bucket group to the key-value pair transfer bucket group;
and the target key-value pair storage module is used for determining a target key-value pair storage bucket and storing the key-value pairs to be stored into the target key-value pair storage bucket.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the key-value pair storage method of any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements a key-value pair storage method as claimed in any one of claims 1 to 7.
CN202210976474.4A 2022-08-15 2022-08-15 Key value pair storage method, device, equipment and medium Pending CN115309745A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210976474.4A CN115309745A (en) 2022-08-15 2022-08-15 Key value pair storage method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210976474.4A CN115309745A (en) 2022-08-15 2022-08-15 Key value pair storage method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115309745A true CN115309745A (en) 2022-11-08

Family

ID=83862925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210976474.4A Pending CN115309745A (en) 2022-08-15 2022-08-15 Key value pair storage method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115309745A (en)

Similar Documents

Publication Publication Date Title
US11899641B2 (en) Trie-based indices for databases
US5542087A (en) Linear hashing for distributed records
CN110083601B (en) Key value storage system-oriented index tree construction method and system
US8868926B2 (en) Cryptographic hash database
US4611272A (en) Key-accessed file organization
CN109683811B (en) Request processing method for hybrid memory key value pair storage system
US7287131B1 (en) Method and apparatus for implementing a fully dynamic lock-free hash table
CN111459846B (en) Dynamic hash table operation method based on hybrid DRAM-NVM
KR101467589B1 (en) Dynamic fragment mapping
JP6764359B2 (en) Deduplication DRAM memory module and its memory deduplication method
US9495398B2 (en) Index for hybrid database
US7590625B1 (en) Method and system for network load balancing with a compound data structure
CN106599091B (en) RDF graph structure storage and index method based on key value storage
CN108134739B (en) Route searching method and device based on index trie
CN111126625B (en) Extensible learning index method and system
CN103051543A (en) Route prefix processing, lookup, adding and deleting method
US20220027349A1 (en) Efficient indexed data structures for persistent memory
WO2000062154A1 (en) Apparatus and method for providing a cyclic buffer
JP2009512950A (en) Architecture and method for efficiently bulk loading Patricia Tri
US20230342395A1 (en) Network key value indexing design
Rivest On the Optimality of Elia's Algorithm for Performing Best-Match Searches.
Bercea et al. Fully-dynamic space-efficient dictionaries and filters with constant number of memory accesses
US20240028560A1 (en) Directory management method and system for file system based on cuckoo hash and storage medium
CN111541617B (en) Data flow table processing method and device for high-speed large-scale concurrent data flow
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination