CN117539408B

CN117539408B - Integrated index system for memory and calculation and key value pair memory system

Info

Publication number: CN117539408B
Application number: CN202410030995.XA
Authority: CN
Inventors: 吴兵; 冯丹; 童薇; 刘景宁; 罗弘杰
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-03-12
Anticipated expiration: 2044-01-09
Also published as: CN117539408A

Abstract

The invention discloses a memory-calculation integrated index system and a key value pair storage system, which belong to the field of memory and calculation intersection, and comprise: a controller and a plurality of array clusters operable in parallel; the array cluster includes a plurality of content addressable cross-point arrays; in the array, each row is used for storing one key and a valid flag bit; the controller is used for performing index operation on the data of the key values stored in the array; the indexing operation includes: an insertion operation; the inserting operation includes: for key value data to be inserted [ k ] _i ,v _i ]At bond k _i Corresponding hash bucket B _i Searching a row which does not store a valid key in the mapped array cluster, and if the searching is successful, setting a key k _i After storing the allocated row, the valid flag is set to be valid, and the value v is set _i Writing into an array row; otherwise, the return operation fails. The invention can finish the index operation of the key value pair in the memory, improves the parallelism of the index operation and reduces the index operation time delay.

Description

Integrated index system for memory and calculation and key value pair memory system

Technical Field

The invention belongs to the field of storage and calculation intersection, and particularly relates to a storage and calculation integrated index system and a key value pair storage system.

Background

Key-value storage systems organize data in simple key-value pairs, each data item having a unique key and associated data (value), and are widely used in high-performance, low-latency, scalable and highly concurrent applications. The data index is an important component of the high-performance key value storage system, can provide efficient data operation, and can finish quick insertion, inquiry, update and deletion of data (value) corresponding to a given key. Because of the different designs of the data index structures, each index structure has a respective way to perform data operations, but can generally be divided into two broad categories: 1) Tree index structure, index is organized into a multi-level tree structure, traversing hierarchically from root to leaf node to perform operations on target data, such as b+ tree, log structured merge tree, skip list, radix tree, etc.; 2) And the hash index structure is used for directly obtaining an index value by applying a hash function to a given key and then positioning data corresponding to the index value in a linear table to finish operation. Tree indexes require traversing nodes in a multi-level tree, whose query time complexity is typically O (log N), where N is the data storage amount. While hash indexes, due to their flat memory structure (linear table), can provide query operations within the theoretical time complexity of O (1). However, the hash index is inevitably affected by hash collision, that is, after the hash function is calculated, a plurality of keys are mapped to the same position of the linear table, and no matter what collision processing method is adopted, for example, an open-chain method, a linear detection method and the like, more times of data access and comparison are introduced to process the collision.

Whether the node traversal of the tree index or the conflict processing of the hash index brings more frequent comparison and access, leads to the reduction of the index operation efficiency and influences the storage performance of the key value on the storage system. At present, although researchers put forward a novel index structure such as learning index, the index structure is generally organized into a tree shape, and the problems of reduced index operation efficiency caused by multiple rounds of calculation, comparison and access still exist, so that the storage performance of a key value pair system still needs to be further improved.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention provides a storage and calculation integrated indexing system and a key value pair storage system, and aims to finish the indexing operation of key value pair data in a memory by utilizing the parallel data comparison capability of a cross point array so as to improve the parallelism of the indexing operation and reduce the time delay of the indexing operation.

To achieve the above object, according to one aspect of the present invention, there is provided a storage-in-one indexing system including: a controller and a plurality of array clusters;

each array cluster comprises a plurality of arrays; the array is a cross-point array consisting of resistive memory cells and is content addressable; in the array, each row is used for storing one key and a valid flag bit; the value of the valid flag bit is f _v When the corresponding row stores a valid key, the valid flag bit has a value of f _i When, the corresponding row is indicated as not storing a valid key; f (f) _v ≠f _i The method comprises the steps of carrying out a first treatment on the surface of the At the initial time, the effective flag bit of each row is f _i The method comprises the steps of carrying out a first treatment on the surface of the Multiple array clusters may operate in parallel;

the controller is used for performing index operation on the data of the key values stored in the array; the indexing operation includes: an insertion operation; the inserting operation includes:

(I1) For key value data to be inserted [ k ] _i , v _i ]In hash bucket B _i Searching a valid flag bit f in the mapped array cluster _i If the search is successful, go to step (I2); otherwise, returning to failure operation;

hash bucket B _i For key k _i Hash value of (2) in the hash table; in the hash table, each hash bucket is mapped to an array cluster and is used for storing the addresses of the arrays in the array cluster;

(I2) Couple key value to data k _i , v _i ]Key k of (2) _i After being stored in the allocated row, the effective mark position is f _v And will have the value v _i Written into one array row.

Further, the indexing operation further includes: inquiring operation; the query operation includes:

(S1) for key k to be queried _s Sequentially reading hash buckets B _s And find the stored key k in the corresponding array _s And the effective flag bit is f _v Line L of (2) _s If (if)If the search is successful, the step (S2) is carried out; otherwise, the returned data does not exist;

hash bucket B _s For key k _s Hash value of (2) in the hash table;

(S2) slave line L _s Corresponding in-row read value v for stored value _s And returns.

Further, the indexing operation further includes: updating operation; the updating operation includes:

(U1) for key value data [ k ] to be updated _u , v _u ]Sequentially reading hash buckets B _u And find the stored key k in the corresponding array _u And the effective flag bit is f _v Line L of (2) _u If the search is successful, the step (U2) is carried out; otherwise, the returned data does not exist;

hash bucket B _u For key k _u Hash value of (2) in the hash table;

(U2) line L _u The value stored in the row corresponding to the stored value is updated to v _u 。

Further, the indexing operation further includes: a delete operation; the deletion operation includes:

(D1) For key k to be deleted _d Sequentially reading hash buckets B _d And find the stored key k in the corresponding array _d And the effective flag bit is f _v Line L of (2) _d If the search is successful, the step (D2) is carried out; otherwise, the returned data does not exist;

hash bucket B _d For key k _d Hash value of (2) in the hash table;

(D2) Will line L _d The effective mark position of (1) is f _i 。

Further, the controller is further configured to perform a hash capacity expansion operation; the Hash capacity expansion operation comprises the following steps:

(E1) Expanding the capacity of the hash table to 2N, and determining the corresponding relation between each hash bucket and the array cluster; n is the capacity of the current hash table, and N is the power of 2;

(E2) Determining hash buckets corresponding to the stored effective keys in the new hash tables in the array clusters, and moving the effective keys to the array clusters mapped by the corresponding hash buckets;

for any one active keykThe hash bucket sequence number whose hash value corresponds to the original hash table is recorded asiJudging the log of the hash value ₂ Whether the N bit is 0, if so, determining a valid keykThe corresponding hash bucket sequence number in the new hash table isiOtherwise, determine the valid keykThe corresponding hash bucket sequence number in the new hash table isi+N。

Further, the inserting operation further includes:

in the process of bonding key k _i Store key k while in the allocated row _i High (H-log) hash value of (2) ₂ N) bits are stored as hash valid bits into one array row; keys stored in the same array, whose hash valid bits are stored in the same array and aligned by columns; h represents the length of the hash value;

in the hash capacity expansion operation, the log of the hash value of the valid key ₂ The N-bit acquisition mode comprises the following steps:

log-th in reading hash value from array ₂ The hash valid bit corresponding to the N bits is located in the column.

Further, in the hash table, the size of each hash bucket is the same as the size of one CPU cache line.

Further, in the hash table, different hash buckets map to different array clusters.

According to still another aspect of the present invention, there is provided a key value pair storage system including the above-mentioned integrated index system of the present invention.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

(1) The storage and calculation integrated indexing system provided by the invention is realized based on the synergy of the cross point array and the hash table formed by the resistive random access memory units, and the in-situ indexing operation is realized by utilizing the content addressable function (parallel data comparison function) of the array, so that a key of a key value pair data is mapped to a hash bucket after being subjected to the hash function, and then the specific operation on the data is completed by utilizing the in-situ indexing operation of the array. In general, the invention can effectively improve index operation performance.

(2) In the preferred technical scheme of the invention, when hash capacity expansion is carried out, based on the capacity relation of the hash tables before and after the hash capacity expansion, the moving position of the data is determined by only using the corresponding bit in the hash value, so that the forward direction of all the row data in the array can be finished in the index system, the data movement in the memory is finished, and the participation of a CPU (central processing unit) is not needed, thereby reducing the data movement between the CPU and the memory, improving the efficiency of the hash capacity expansion, reducing the influence on the index function during the hash capacity expansion, and further improving the integral index performance of the index system.

(3) In a further preferred technical scheme of the invention, hash valid bits are determined in advance and used for indicating the directions of valid keys in the array when hash expansion is carried out each time, the hash valid bits are stored in the array in a column alignment mode, when the hash expansion is carried out, the directions of the valid keys in the array can be determined by parallelly reading the columns of the corresponding hash valid bits from the array according to the capacity of the current hash table, the whole process is completed in an index system, the CPU does not need to participate in data movement, and the hash expansion efficiency is further improved.

(4) In a further preferred embodiment of the present invention, in the hash table, the capacity of each hash bucket is the same as one CPU cache line, so that the cache utilization of the CPU can be improved.

(5) In a further preferred scheme of the invention, different hash buckets are mapped to different array clusters in the hash table, so that parallelism of a hardware bottom layer can be maximized, and overall index operation performance is further improved.

Drawings

Fig. 1 is a schematic diagram of a conventional resistive random access memory.

Fig. 2 is a schematic diagram of a conventional cross-point array structure composed of a resistive random access memory.

FIG. 3 is a schematic diagram of a content addressable process of an array in accordance with an embodiment of the present invention.

Fig. 4 is a schematic hardware architecture of a unified memory index system according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a hardware and software collaborative hash index according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of an inserting operation according to an embodiment of the present invention.

FIG. 7 is a schematic diagram of a query operation according to an embodiment of the present invention.

FIG. 8 is a diagram illustrating an update operation according to an embodiment of the present invention.

Fig. 9 is a schematic diagram of a deletion operation in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

In order to solve the technical problem that the conventional index mode for the data aiming at the key value brings frequent comparison and access and leads to the reduction of the index operation efficiency, the invention provides a storage and calculation integrated index system and a key value pair storage system.

Before explaining the technical scheme of the invention in detail, a brief description is given below of a resistive random access memory cell and a cross-point array structure formed by the resistive random access memory cell.

The structure of the resistive random access memory unit is shown in fig. 1, and the basic structure of the resistive random access memory unit consists of a resistive material, a bottom electrode and a top electrode. The resistance change material can be metal oxide, and the like, and the resistance can realize reversible conversion between a high resistance state and a low resistance state under the action of an external electric field, so that the writing of data is completed. By applying a small voltage, the data reading can be accomplished by measuring the current flowing or the converted voltage without changing the state of the cell.

As shown in fig. 2, the cross-point array of resistive random access memory cells can apply a calculated voltage on a word line (row) of the array, and obtain the input calculated voltage vector and the matrix vector multiplication operation result between the conductance matrices stored in the array by reading the current flowing on a bit line (column). The complexity of the matrix vector multiplication operation implemented is only O (1), and is therefore widely learned for applications such as accelerating neural networks, graph computation, and the like that are dominated by matrix vector multiplication. The cross-point array may also implement parallel data contrast operations, also known as content addressable functions, by which rows of data stored in the array that are identical to input data may be determined. FIG. 3 is a schematic diagram of one possible content addressable scheme, in this implementation, two resistive memory cells are used to form a content addressable cell, and two corresponding columns form an input contrast cell. The content stored in the content addressable unit may be determined according to the mapping relationship below fig. 3, for example, if the logical value to be stored is 1, then the data actually stored in the content addressable unit is (1, 0). With the help of the input registers, each 1-bit input data will be converted into an input to two bit lines, and the specific conversion rule is determined by the mapping relationship below in fig. 3, for example, when the logic value "0" is input, the input is converted into an input comparison unit and then is (0, 1). The input of more bit lines realizes the parallel comparison of the input multi-bit data and the storage data of each row. If a row of data stored in the array matches (is the same as) a given input data, the Sense Amplifier (SA) will input a "1" signal, and conversely output a "0".

It should be noted that, as shown in fig. 3, only one way of implementing content-addressable for the resistive cross-point array is possible, and other implementations or other nonvolatile memories, and even conventional memories such as DRAM and SRAM, are possible.

In practical applications, the key-value pair data is generally represented by [ key, value ], where key represents a key, value represents a value corresponding to the key, and in the following embodiments, a similar representation manner will be adopted, specifically, [ k, v ] is used to represent the key-value pair data, and k and v represent a key and a value, respectively.

The following are examples.

Example 1:

a computationally integrated indexing system, as shown in fig. 4, comprising: a controller and a plurality of array clusters;

multiple array clusters may operate in parallel; each array cluster includes a plurality of arrays; the array is a cross-point array consisting of resistive memory cells and is content addressable; in the array, each row is used for storing one key and a valid flag bit; the valid flag bit is used for indicating whether the corresponding row stores a valid key; in this embodiment, when the value of the valid flag bit is 1, it indicates that the corresponding row stores a valid key, and when the value of the valid flag bit is 0, it indicates that the corresponding row does not store a valid key; at the initial time, the valid flag bit of each row is 0.

Optionally, as shown in fig. 4, in this embodiment, the array specifically uses a memory architecture similar to a DRAM memory to construct the integrated index hardware, so as to avoid a complex on-chip interconnection structure. Wherein the input registers and sense amplifiers required for the content addressable function can be shared with the input registers and sense amplifiers used for the storage function. It is only necessary to adapt the sense amplifier to support 2 reference value comparisons for normal read operations and content addressable operations, respectively. It should be noted that the hardware organization form shown in fig. 4 is only an alternative implementation manner of the embodiment of the present invention, and in other embodiments of the present invention, the organization of the array may be completed by using other organization forms such as an H tree.

The controller is used for performing index operation on the data of the key values stored in the array; in this embodiment, the indexing operation specifically includes an insert operation, a query operation, an update operation, and a delete operation. These operations are all done within the array.

In this embodiment, the keys and the valid flag bits thereof are stored in one array row, so that parallel comparison of the input keys and valid flag bits can be completed. It is easy to understand that in the initialized state, all rows of the array are not inserted with data, and the valid flag bit of each row is 0; after deleting a certain line of data, the valid flag bit is also 0.

As shown in fig. 5, in this embodiment, on the basis of storing integrated index hardware, a hash table with cooperation of software and hardware is designed, where each hash bucket in the hash table is mapped to an array cluster and is used to store the addresses of the arrays in the array cluster; alternatively, in this embodiment, the serial number of the array in the array cluster is used as the address of the array cluster, and each array address occupies 8B. The capacity of each hash bucket is the same as that of one CPU cache line, so that one hash bucket can store a plurality of array addresses to fill one CPU cache line, and the cache utilization rate of the CPU can be improved. In order to maximize the parallelism of the bottom layer of the hardware, in this embodiment, different hash buckets are mapped to different array clusters, and since different array clusters can operate in parallel, the system performance can be further improved.

In this embodiment, when the key value is mapped to the hash table, the hash bucket number mapped to the hash table is hash (k)% N, where hash (k) represents the hash value of the key k, N represents the current capacity of the hash table, and% represents the remainder operation.

Based on the designed hash table, in this embodiment, the key k of the key value pair data [ k, v ] is mapped to a hash bucket after a hash function, and then a corresponding operation can be performed on an array in the array cluster mapped by the hash bucket. Similar to the conventional hash table-based indexing system, the hash value hash (k) of key k is calculated and the corresponding hash bucket is located in the hash table, which is completed by the CPU.

In this embodiment, the inserting operation includes:

FIG. 6 shows an example of an insert operation in which a key k to be inserted _i =0011; the left side of fig. 6 shows finding empty rows within the array, i.e. rows for which the valid flag bit is 0, and the right side of fig. 6 shows writing data to the found empty rows. It should be noted that if a plurality of empty rows are found in the array, one empty row is selected randomly, in this embodiment, the first empty row is selected; if there is no empty line, the return operation fails. Furthermore, the value v data can be stored simply using one storage unit, not a content addressable unit, because it does not need to participate in data comparison. At the same time, value v _i Can be stored in the sum key k _i The valid flag bits in the same array can be stored in different arrays, and only the corresponding row of the array where the operation value is located is needed after the matching row number of the key is acquired.

In this embodiment, the query operation includes:

(S1) for key k to be queried _s Sequentially reading hash buckets B _s And find the stored key k in the corresponding array _s And the effective flag bit is f _v Line L of (2) _s If the search is successful, the step (S2) is shifted to; otherwise, the returned data does not exist;

hash bucket B _s For key k _s Hash value of (2) in the hash table;

FIG. 7 shows an example of a query operation in which a key k to be queried _s =1010; the left side of FIG. 7 shows the search for k _s To the right of FIG. 7, represents the value v required for reading from the row of the array of stored values corresponding to that row _s 。

In this embodiment, the updating operation includes:

hash bucket B _u For key k _u Hash value of (2) in the hash table;

FIG. 8 is an example of an update operation in which a key value to be updated is to key k in data _u =0100; the left side of FIG. 8 shows the search for k _u The right side of fig. 8 shows a row for updating the stored value corresponding to the row.

For the delete operation, only the valid flag bit of the matching row is cleared. Accordingly, in the present embodiment, the deleting operation includes:

hash bucket B _d For key k _d Hash value of (2) in the hash table;

(D2) Will line L _d The effective mark position of (1) is f _i 。

FIG. 9 is an example of a delete operation in which key k of key value versus data to be deleted _d =1010; the left side of FIG. 9 shows the search for k _d The valid flag bit for this row is cleared as shown on the right side of fig. 9.

The embodiment designs the corresponding hash table based on the content addressable array, so that in-situ indexing operation can be realized in the array, a large number of CPU memory access operations are omitted, the indexing operation steps are effectively reduced, and the indexing operation efficiency is improved. And the parallelism of the bottom hardware is fully utilized, and the overall performance of the system is further improved.

As the key value to be indexed increases for data, the hash table is gradually filled, and when the remaining space of the hash table is small, hash expansion is required. In the hash capacity expansion process, data movement is involved, and in order to further improve the performance of the index system, the embodiment is implemented by using in-memory data movement. When the hash is expanded, the calculation of the index value is converted from the original hash (k)% N to hash (k)% (2N). In this embodiment, the capacity N of the hash table is a power of 2, so that after the data originally stored in the ith hash bucket is moved to a new hash bucket, the data may be moved to the ith hash bucket or the (i+n) th hash bucket of the new hash table, depending on the log of the hash value hash (k) ₂ In this embodiment, the bit is used as a hash valid bit, if the bit is 0, after calculating the index value according to hash (k)% (2N), the array address of the key k is still located in the ith hash bucket, otherwise, in the (i+n) th hash bucket, and based on this, the controller is further configured to perform a hash expansion operation; the Hash capacity expansion operation comprises the following steps:

In fact, the hash value of the key is high (H-log ₂ N) bits are used as hash valid bits for indicating the directions of valid keys in the array when the hash is expanded each time, and H represents the length of the hash value; in order to facilitate the utilization of the information, in this embodiment, the hash valid bits are stored in the array while the data is inserted, and by adopting the parallel reading function of the array, a certain hash valid bit of all rows of the array can be read, so that the destination of the data of all rows of the array can be known in the memory, and the data movement in the memory is completed. Accordingly, in this embodiment, the inserting operation further includes:

in the process of bonding key k _i Store key k while in the allocated row _i High (H-log) hash value of (2) ₂ N) bits are stored as hash valid bits into one array row; keys stored in the same array, whose hash valid bits are stored in the same array and aligned by columns;

In this embodiment, the data memory in the hash expansion can be moved by supporting the commands like mv (ori_addr, to_addr0, to_addr1, bit_pos) to all the valid keys stored in one array, where ori_addr represents the original array address, and bit_pos represents the hash valid bit of the bit_pos in the hash value corresponding to the key; when the effective bits of the bit_pos are 0 and 1, the to_addr0 and the to_addr1 respectively represent the array addresses to which the data in the original array need to be moved, and the array addresses are respectively determined according to array clusters mapped by the ith hash bucket and the (i+N) th hash bucket in the new hash table; when the hash valid bit is 0, the valid key is moved from the array corresponding to ori_addr to the array corresponding to to_addr0, and when the hash valid bit is 1, the valid key is moved from the array corresponding to ori_addr to the array corresponding to to_addr1.

Through the Hash capacity expansion operation, the Hash capacity expansion can be completed only through the movement of the stored data, the participation of a CPU is not needed, and the Hash capacity expansion efficiency is further improved.

Example 2:

a key-value pair storage system comprising the storage-as-a-whole index system provided in embodiment 1 above.

In the key value pair storage system, the index system defines a key value data storage mode and completes the storage of the key value pair data, and based on the integrated storage index system provided in the above embodiment 1, the key value pair storage system provided in the present embodiment can provide efficient data operation, and performance is effectively improved.

It is to be understood that, in this embodiment, other modules matched with the index system, such as a module for monitoring and managing, security and authority control, load balancing, backup, etc. to improve usability/reliability are further included, and the specific implementation manner of these modules is the same as that of the conventional key value pair storage system, which will not be repeated here.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A presence-based indexing system, comprising: a controller and a plurality of array clusters;

the plurality of array clusters may operate in parallel; each array cluster includes a plurality of arrays; the array is a cross-point array consisting of resistive memory cells and is content addressable; each row in the array is used for storing one key and the valid flag bit thereof; the value of the valid flag bit is f _v When the corresponding row stores a valid key, the valid flag bit has the value f _i When, the corresponding row is indicated as not storing a valid key; f (f) _v ≠f _i The method comprises the steps of carrying out a first treatment on the surface of the At the initial time, the effective flag bit of each row is f _i ；

The controller is used for performing indexing operation on the data of the key values stored in the array; the indexing operation includes: an insertion operation; the inserting operation includes:

hash bucket B _i For key k _i Hash value of (2) in the hash table; each hash bucket is mapped to one array cluster in the hash table and is used for storing the addresses of the arrays in the array cluster;

(I2) The key value is applied to the data [ k ] _i , v _i ]Key k of (2) _i After being stored in the allocated row, the effective mark position is f _v And will have the value v _i Written into one array row.

2. The presence indexing system of claim 1, wherein the indexing operation further comprises: inquiring operation; the query operation includes:

hash bucket B _s For key k _s Hash value of (2) in the hash table;

3. The presence indexing system of claim 1, wherein the indexing operation further comprises: updating operation; the updating operation includes:

hash bucket B _u For key k _u Hash value of (2) in the hash table;

4. The presence indexing system of claim 1, wherein the indexing operation further comprises: a delete operation; the deleting operation includes:

hash bucket B _d For key k _d Hash value of (2) in the hash table;

(D2) Will line L _d The effective mark position of (1) is f _i 。

5. The integrated storage indexing system of any one of claims 1 to 4, wherein the controller is further configured to perform a hash expansion operation; the hash capacity expansion operation comprises the following steps:

6. The presence indexing system of claim 5, wherein the inserting operation further comprises:

in the hash capacity expansion operation, the log of the hash value of the effective key ₂ The N-bit acquisition mode comprises the following steps:

7. The system of any one of claims 1-4, wherein the hash table has a size of each hash bucket that is the same as a size of a CPU cache line.

8. The system of any one of claims 1-4, wherein different hash buckets in the hash table map to different array clusters.

9. A key-value pair storage system comprising the integrated storage indexing system of any one of claims 1 to 8.