CN117369709A - Data storage management method and device, storage medium and system - Google Patents

Data storage management method and device, storage medium and system Download PDF

Info

Publication number
CN117369709A
CN117369709A CN202210768866.1A CN202210768866A CN117369709A CN 117369709 A CN117369709 A CN 117369709A CN 202210768866 A CN202210768866 A CN 202210768866A CN 117369709 A CN117369709 A CN 117369709A
Authority
CN
China
Prior art keywords
ordered
index
hash
members
ordered set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210768866.1A
Other languages
Chinese (zh)
Inventor
杨俊�
陈宬
卢冕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202210768866.1A priority Critical patent/CN117369709A/en
Publication of CN117369709A publication Critical patent/CN117369709A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C16/00Erasable programmable read-only memories
    • G11C16/02Erasable programmable read-only memories electrically programmable
    • G11C16/06Auxiliary circuits, e.g. for writing into memory
    • G11C16/10Programming or data input circuits
    • G11C16/102External programming circuits, e.g. EPROM programmers; In-circuit programming or reprogramming; EPROM emulators
    • G11C16/105Circuits or methods for updating contents of nonvolatile memory, especially with 'security' features to ensure reliable replacement, i.e. preventing that old data is lost before new data is reliably written

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a data storage management method and device, a storage medium, and a system, wherein the method comprises the following steps: acquiring an ordered set; writing an ordered index of the ordered set into a persistent storage device, wherein the ordered index comprises members in the ordered set and scores of the members, and the members in the ordered index are ordered according to the scores of the members; and writing a hash index of the ordered set to a dynamic random access memory, wherein the hash index includes a pointer to a location in the persistent storage where the member is located.

Description

Data storage management method and device, storage medium and system
Technical Field
The present application relates generally to the field of data persistence and, more particularly, to a method and apparatus for data storage management, storage medium, system utilizing a persistent storage device (Persistent Memory, PMem) and dynamic random access memory (Dynamic Random Access Memory, DRAM).
Background
In order to preserve data before power failure as much as possible when the device is powered down and restarted, the prior art has transplanted a conventional Memory type data structure to a Non-volatile Memory (NVM) to implement data persistence. In view of the significant improvement in PMem performance and the access advantage of single byte addressing, the PMem in NVM is selected to implement the above data persistence in the prior art, but the migration operation of such a memory data structure is usually the migration of the whole data structure, and for PMem with slightly poorer performance than DRAM, the performance of the whole data structure is often reduced to some extent. In addition, since the PMem has poorer performance than the read operation, one data modification, insertion, and deletion may cause multiple write operations (write amplification), and thus more performance may be lost in implementing the above data persistence in the PMem.
In addition, the ordered set is a set composed of ordered congeners, and contains a member (member) and a Score (Score) of each member, so that the ordered set supports operations such as searching, sorting, modifying and the like aiming at scores in addition to original read-write operations of the unordered set, and the ordered set generally uses two indexes to manage data. However, as such, modification, insertion, deletion, etc. of data requires modification of both indexes, so performance is severely affected when implementing persistence of ordered sets in PMem.
Disclosure of Invention
Exemplary embodiments of the present application provide a method and apparatus for persisting an ordered set to address at least the above-mentioned problems of the prior art.
According to a first aspect of embodiments of the present disclosure, there is provided a data storage management method, the method including: acquiring an ordered set; writing an ordered index of the ordered set into a persistent storage device, wherein the ordered index comprises members in the ordered set and scores of the members, and the members in the ordered index are ordered according to the scores of the members; and writing a hash index of the ordered set to a dynamic random access memory, wherein the hash index includes a pointer to a location in the persistent storage where the member is located.
Optionally, the writing the ordered index of the ordered set to persistent storage includes: and writing the ordered index into the persistent storage device in one form of a tree, a skip list, a single linked list and a double linked list.
Optionally, the writing the hash index of the ordered set to the dynamic random access memory includes: creating an empty hash index in the dynamic random access memory; and traversing each hash bucket in the hash indexes of the ordered set in turn, and writing the pointer corresponding to the member in each hash bucket into the corresponding hash bucket in the newly built hash index.
Optionally, the method further comprises: responding to the received member query request to be a first type of query which is queried by utilizing a target score, and querying the target member by utilizing the ordered index; responding to the received member query request to be a second type query which is queried by utilizing a target member, and querying the target member by utilizing the hash index; and feeding back the query result.
Optionally, said querying the target member using the ordered index includes: the target member is queried by searching the ordered index based on the target score.
Optionally, the querying the target member using the hash index includes: calculating the hash value of the target member; searching the hash index based on the hash value of the target member; and in response to searching the hash index for the hash bucket corresponding to the hash value, reading an address corresponding to the hash value from the hash bucket, and searching the persistent storage device according to the address corresponding to the hash value to query the target member.
Optionally, the method further comprises: processing the ordered set in response to receiving a processing operation for the ordered set; when the processing operation is a first type operation, the ordered index is adjusted according to the scores of all the members in the processed ordered set, wherein the first type operation is only used for modifying the scores of the members in the ordered set; and when the processing operation is a second type operation, the ordered index and the hash index are adjusted according to each member in the processed ordered set, wherein the second type operation comprises a member inserting operation, a member deleting operation and a member modifying operation aiming at the ordered set.
Optionally, the adjusting the ordered index according to the score of each member in the processed ordered set includes: and modifying the sequence of the members in the ordered index according to the scores of the members in the processed ordered set.
Optionally, the adjusting the ordered index and the hash index according to each member in the processed ordered set includes: calculating the score and hash value of each member after processing; modifying the sequence of the members in the ordered index according to the scores of the members after processing; modifying the hash index according to the hash value of each member after processing and the address of the member in the ordered index after modification.
Optionally, when a device using the method is powered up, the hash index is reconstructed in the dynamic random access memory based on all members contained in the ordered index.
Optionally, reconstructing the hash index in the dynamic random access memory includes: and calculating a hash value of each member by traversing each member contained in the ordered index, and reconstructing the hash index in the dynamic random access memory according to each member and the hash value of each member.
According to a second aspect of embodiments of the present disclosure, there is provided a data storage management apparatus, the apparatus comprising: an acquisition unit configured to acquire an ordered set; and a processing unit configured to: writing an ordered index of an ordered set into a persistent storage device, wherein the ordered index comprises members in the ordered set and scores of the members, and the members in the ordered index are ordered according to the scores of the members; and writing a hash index of the ordered set to a dynamic random access memory, wherein the hash index includes a pointer to a location in the persistent storage where the member is located.
Optionally, the processing unit is configured to write the ordered index to the persistent storage in one of a tree, a skip list, a singly linked list, and a doubly linked list.
Optionally, the processing unit is configured to: creating an empty hash index in the dynamic random access memory; and traversing each hash bucket in the hash indexes of the ordered set in turn, and writing the pointer corresponding to the member in each hash bucket into the corresponding hash bucket in the newly built hash index.
Optionally, the processing unit is further configured to: responding to the received member query request to be a first type of query which is queried by utilizing a target score, and querying the target member by utilizing the ordered index; responding to the received member query request to be a second type query which is queried by utilizing a target member, and querying the target member by utilizing the hash index; and feeding back the query result.
Optionally, the processing unit is configured to query the target member with the ordered index by: searching the ordered index based on the target score to find the target member.
Optionally, the processing unit is configured to query the target member with the hash index by: calculating the hash value of the target member; searching the hash index based on the hash value of the target member; and in response to the hash index searching the hash bucket corresponding to the hash value, reading an address corresponding to the hash value from the hash bucket, and searching the persistent storage device according to the address corresponding to the hash value to inquire the target member.
Optionally, the processing unit is further configured to: processing the ordered set in response to receiving a processing operation for the ordered set; when the processing operation is a first type operation, the ordered index is adjusted according to the scores of all the members in the processed ordered set, wherein the first type operation is only used for modifying the scores of the members in the ordered set; and when the processing operation is a second type operation, the ordered index and the hash index are adjusted according to each member in the processed ordered set, wherein the second type operation comprises a member inserting operation, a member deleting operation and a member modifying operation aiming at the ordered set.
Optionally, the processing unit is further configured to adjust the ordered set according to the score of each member in the processed ordered set by: and modifying the sequence of the members in the ordered index according to the scores of the members in the processed ordered set.
Optionally, the processing unit is further configured to adjust the ordered index and the hash index according to each member in the processed ordered set by: calculating the score and hash value of each member after processing; modifying the sequence of the members in the ordered index according to the scores of the members after processing; modifying the hash index according to the hash value of each member after processing and the address of the member in the ordered index after modification.
Optionally, the processing unit is further configured to: when the device is powered up, the hash index is reconstructed in the dynamic random access memory based on all members contained in the ordered index.
Optionally, the processing unit is further configured to reconstruct the hash index in the dynamic random access memory by: and calculating a hash value of each member by traversing each member contained in the ordered index, and reconstructing the hash index in the dynamic random access memory according to each member and the hash value of each member.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a data storage management method as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform a data storage management method as described above.
According to the technical scheme for making the ordered set persistent in the exemplary embodiment of the application, the hash index and the ordered index of the ordered set are respectively stored in the dynamic random access memory and the persistent storage device, namely, the PMEM with the persistent capability and the high-performance DRAM are combined to realize the high-performance persistent ordered set based on the PMEM, wherein the ordered index stored in the persistent storage device is ordered by the score of each member in the ordered set, the hash index stored in the dynamic random access memory takes the calculated hash value of the member as the index and points to the data in the ordered index in a pointer mode, so that the time spent for persistence of the hash table can be saved by only persistence of the ordered index instead of persistence of the hash index, and the writing performance is greatly improved.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
These and/or other aspects and advantages of the present application will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings, wherein:
FIG. 1 is a flow chart illustrating a data storage management method according to an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram illustrating ordered collection storage in dynamic random access memory and persistent storage according to an exemplary embodiment of the present application;
fig. 3 (a) to 3 (h) are diagrams illustrating a process of recovering a hash index at the time of power-off restart according to an exemplary embodiment of the present application;
fig. 4 is a block diagram illustrating a data storage management device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments will be described below in order to explain the present invention by referring to the figures.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The embodiments described in the examples below are not representative of all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "including at least one of a and B" includes three cases side by side as follows: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.
Before starting the description of the present disclosure, some of the matters or terms involved in the description of the present disclosure are explained so that the present disclosure can be more easily understood:
NVM, which refers broadly to a Memory that can store data in the event of power failure, such as Flash Memory, is widely known as a type of NVM, also referred to as first generation NVM, while the present invention refers to new generation NVM (STT-RAM, PCM, reRAM, 3D xPoint), also referred to as persistent storage (PMem), which is essentially different from the first generation NVM in that: (1) The performance is greatly improved and is more similar to a DRAM used for computer main memory; (2) The problem that the access must be addressed by a multiple of a fixed-size byte number (such as Flash is usually 4KB (4 x 1024 bytes)) due to physical characteristic limitation is solved.
Fig. 1 is a flowchart illustrating a data storage management method according to an exemplary embodiment of the present application.
As shown in fig. 1, in step S110, an ordered set is acquired.
In the exemplary embodiment of the present application, the ordered set may be obtained from a local memory, or may be obtained from a network, but the present application is not limited thereto. As described in the background section, an ordered set is a set of ordered congeners that contains the score of a member and each member, and the score of a member and each member can be managed using an ordered index and a hash index. For example, table 1 below shows an ordered set user, ranking, containing members kris, mike, frank, tim, martin and tom, with scores of 1, 91, 200, 220, 250, and 251, respectively, the members of the ordered set being ordered by score, i.e., the members are ordered.
TABLE 1
It should be noted that, the scores of the members in the ordered set may have various calculation manners, and may be custom designed in an actual scenario, which is not particularly limited in the embodiments of the present disclosure. For example, the scores of the members in the ordered set may be calculated using the command zscore in the remote dictionary service Redis.
In step S120, the ordered index of the ordered set is written into the persistent storage device, where the ordered index includes the members in the ordered set and the scores of the members, and the members in the ordered index are ordered according to the scores of the members. In an exemplary embodiment of the present application, the step of writing the ordered index of the ordered set to the persistent storage includes: the ordered index is written to the persistent storage in one of a tree, a skip list, a singly linked list, and a doubly linked list. In other words, the form of the ordered index of the ordered set may include, but is not limited to, any of the following: tree, skip list, singly linked list, and doubly linked list. In an actual scene, there may be other expression forms, and there is no particular limitation to this.
As shown in fig. 2, the ordered index is written to the persistent storage in the form of a persistent jump table, for example, this process may be implemented as follows: the method comprises the steps of constructing an ordered index into an ordered chain table, constructing the ordered chain table as a bottom layer chain table (namely a layer 1 chain table) of a persistent jump table through the ordered chain table, selecting some members from other members except the member with the largest score and the smallest score in the ordered chain table according to a certain algorithm (such as a random extraction method), forming the ordered chain table (namely a layer 2 chain table) of the members and the scores thereof, adding a pointer field for each member which is just selected, enabling the pointer in the pointer field to point to the member which is the same as the member in the layer 1 chain table, and repeatedly constructing a new chain table in a similar manner until the member except the member with the largest score and the member with the smallest score can not be selected. In the constructed persistent jump table, the next pointer of each node in the bottom linked list is a pointer of a first format, the next pointers of each node in other layer linked lists are pointers of a second format, the first format at least comprises an address part addr and a dirty bit dirtyflag, the dirty bit dirtyflag is used for determining whether to execute flush operation on the pointers with the first format, and the second format only comprises the address part. For example, in FIG. 2, in the persistent-hop table, the underlying linked list is made up of a head node, n intermediate nodes, and a tail node, where each intermediate node includes key-value pairs (score s and member m), next pointers, and heights (heights), the n intermediate nodes being in accordance with The score is connected in high-low order, the next pointer of each node in the head node, the middle node and the tail node in the bottom link list is a first format pointer, the next pointer of each node in the upper link list is a second format, and in addition, in the bottom link list, the 1 st node except the head node comprises a key value pair (s 1 And m 1 ) Node 2 includes a key-value pair (s 2 And m 2 ) Node 3 includes a key value pair (s 3 And m 3 ) The nth node includes a key value pair (s n And m n ) The next pointer included in the 1 st node points to the 2 nd node, the next pointer included in the 2 nd node points to the 3 rd node, the next pointer included in the n-1 st node points to the n th node, the connection order of the n nodes and the score s 1 、s 2 、s 3 、...s n Corresponds to the ordering of (c). The present application is not so limited and any method by which an ordered index may be persisted in a persistent store may be applied to the present application.
Here, the ordered index is only persisted in the persisted storage, while the hash index in the dynamic random access memory described below does not need to be persisted, so that modification is needed to be persisted for only the ordered index in the persisted storage (e.g., persisted skip list), and the hash index in the dynamic random access memory only needs to be modified in the dynamic random access memory without being persisted, which can save the time spent for persisting the hash list and greatly improve the writing performance.
In step S130, a hash index of the ordered set is written to the dynamic random access memory, wherein the hash index includes a pointer for pointing to a location in the persistent storage where the member is located.
Specifically, firstly, an empty hash index is newly built in a dynamic random access memory, then each hash bucket in the hash indexes of the ordered set is traversed in sequence, data in the traversed hash bucket is copied into a corresponding hash bucket in the newly built hash indexes, namely, pointers in the traversed hash bucket are written into the corresponding hash bucket in the newly built hash indexes, so that the member searching the ordered indexes through the hash indexes can be realized, and the performance advantages brought by the hash indexes and the ordered indexes can be enjoyed for searching the data.
As shown in fig. 2, after the hash index is written into the dynamic random access memory, the hash index includes a plurality of hash buckets, wherein each hash bucket is composed of one linked list, and each hash bucket corresponds to one hash value, and in each node except for a head node and a tail node in each linked list, a numerical field part is a pointer to a position of a certain member in the ordered index in the persistent storage. For example, in FIG. 2, the 1 st hash bucket contains only one intermediate node whose value field portion points to member m in the ordered index 1 Similarly, two intermediate nodes are contained in the 2 nd hash bucket, wherein the value field portion of the 1 st intermediate node points to member m in the ordered index 2 . Any intermediate node included in the hash index includes a pointer to a member in the ordered index.
Furthermore, the method shown in fig. 1 may further comprise: querying the target member with an ordered index or a hash index in response to receiving the query request for the target member; and feeds back the query result.
The query request may be a first type of query using the scores of the members or may be a second type of query using the members (i.e., querying using the member names). If the query request is a first type of query, the target member is queried using the ordered index, and if the query request is a second type of query, the target member is queried using the hash index.
In exemplary embodiments of the present disclosure, the ordered index may be utilized to query the target member, for example, when a user wants to query whether there is a member of a certain score in the ordered set, or who the member of a certain score is. Specifically, the step of querying the target member using the ordered index includes: the target member is searched for based on the score search ordered index corresponding to the target member.
For example, as shown in FIG. 2, if the user wants to find the score s in an ordered combination 3 Target member of (2)m 3 Then can be based on s 3 Searching the ordered index, since the order in fig. 2 is stored in the form of a skip table (i.e., a persistent skip table), the ordered index in fig. 2 can be rapidly searched in the method of searching the skip table, since s is included in the ordered index in fig. 2 3 So that the target member m can be found 3 At this point, the query results may be fed back, indicating the presence of target member m in the ordered index 3 And the query result can also carry the queried target member m 3 . If the user wants to find a score s in the ordered combination k Target member m of (2) k Then based on s k Searching the ordered index if s is not found k Indicating that there is no m in the ordered index k At this point, the query results may be fed back, indicating that there is no target member m in the ordered index k
In another exemplary embodiment of the present disclosure, the target member may be queried using a hash index, for example, when a user wants to confirm whether the target member is present in the ordered set. Specifically, the step of querying the target member using the hash index includes: calculating a hash value of the target member; searching a hash index based on the hash value of the target member; in response to searching the hash bucket in the hash index for the hash value, reading an address corresponding to the hash value from the hash bucket, and searching the persistent storage device for the target member in accordance with the address corresponding to the hash value.
For example, as shown in FIG. 2, if the target member is m k Then first calculate m k Hash value h of (2) k Then based on the hash value h k Searching the hash index of fig. 2, in response to the hash value h being searched in the hash index k A corresponding hash bucket, for example, a hash bucket of line 3 in the hash index in fig. 2, reads an address corresponding to the hash value from the hash bucket, reads each of the intermediate nodes in turn due to the existence of a plurality of intermediate nodes in the hash bucket, searches for a target member according to the read address, searches for a persistent storage according to the address included in a certain intermediate node, and if the target member is queried according to the address search persistent storage included in the certain intermediate nodeThe member stops reading addresses in subsequent ones of the plurality of intermediate nodes and returns a query result indicating that the target member is present in the ordered index, but if the hash index does not find the hash value h k And if the target member is not queried by the corresponding hash bucket or the persistent storage device according to the address search included in each of the plurality of intermediate nodes, returning a query result which indicates that the target member does not exist in the ordered index.
In the above two embodiments, for the query or read operation of data, the corresponding index structure or hash index may be selected according to the target of the operation to find the data in the most efficient manner, so that the performance advantage brought by the two indexes may be enjoyed.
In addition, on the basis of any of the foregoing embodiments, the data storage management method provided by the embodiment of the disclosure may further include: processing the ordered set in response to receiving a processing operation for the ordered set; when the processing operation is a first type operation, adjusting the ordered index according to the scores of the members in the processed ordered set, wherein the first type operation is only used for modifying the scores of the members in the ordered set; when the processing operation is a second type operation, the ordered index and the hash index are adjusted according to each member in the processed ordered set, wherein the second type operation comprises a member inserting operation, a member deleting operation and a member modifying operation aiming at the ordered set.
In an exemplary embodiment of the present disclosure, the step of adjusting the ordered index according to the score of each member in the processed ordered set includes: and modifying the sequence of the members in the ordered index according to the scores of the members in the processed ordered set.
In particular, since the first type of operation is only used to modify the scores of the members in the ordered set, i.e., the first type of operation involves only modification of the scores of the members, the members in the ordered index sorted by scores may be affected, but the hash index may not be affected accordingly, so only the ordered index needs to be adjusted, e.g., as shown in FIG. 2, if anyThe order index is ordered by score from small arrival, then when the processing operation is for m 1 Score s of (2) 1 Performing a modifying operation, and modified s 1 Becomes smaller than s 2 But greater than s 3 When it is necessary to adjust member m in the ordered index 1 And m 2 In order of (2), i.e. modifying the address pointed to by the pointer of the relevant node in the ordered index as shown in fig. 2, thereby modifying member m 1 And m 2 Is a sequence of (a) and (b). Since the above modifications involve only ordered indexes and not hash indexes, the time taken to operate the hash table can be avoided, saving this part of the performance.
In another exemplary embodiment of the present disclosure, the step of adjusting the order index and the hash index according to each member of the processed ordered set includes: calculating the score and hash value of each member after processing; modifying the sequence of the members in the ordered index according to the scores of the members after processing; modifying the hash index according to the hash value of each member after processing and the address of the member in the modified ordered index.
In particular, since the second class of operations includes a member insertion operation, a member deletion operation, and a member modification operation for the ordered set, the second class of operations may involve members in the ordered set and corresponding scores, e.g., if the second class of operations is a member modification operation, e.g., member m is modified 2 After which m is obtained 2 ' then the modified m needs to be calculated 2 Score s of 2 ' sum hash value h 2 ' then according to the score s 2 ' the scores of the other members adjust the precedence of the members in the ordered index and based on the hash value h 2 'search the hash index for the corresponding hash bucket, if there is a corresponding hash bucket, then add a pointer to the modified member's address in persistent storage to the hash bucket, while from the hash value h 2 Deleting and modifying member m in corresponding hash bucket 2 A corresponding pointer. Similarly, if the second type of operation is a member delete operation and a member insert operation for an ordered set, then bothThe ordered index and hash index need to be modified.
Furthermore, in the above-described embodiment, since only the ordered index is persisted in the persistence storage and the hash index is stored in the dynamic random access memory, for the operations of modification, insertion, deletion, etc. of the member, only the ordered index (e.g., the persistence skip table) in the persistence storage needs to be persisted, whereas the hash index in the dynamic random access memory needs to be modified only in the dynamic random access memory without being persisted, which means that not all operations of the ordered set need to persist both indexes, compared with the technical scheme of persisting both the ordered index and the hash index in the prior art, the time spent for persisting the hash table can be saved, and the writing performance is greatly improved.
In addition, in the above-described process of processing an ordered set in response to receiving a processing operation for the ordered set, if an unexpected power failure occurs during modification of the ordered index in the persistent storage device, data consistency is ensured by the persistent ordered index, for example, by the persistent jump table, and after the persistent ordered index is updated (for example, member modification, member insertion, member deletion, and the like), the hash index is modified again, so that an unexpected power failure occurs during the process, and since all data is correctly saved in the persistent storage device, the hash index is reconstructed after the power failure is restarted, ensuring that the subsequent lookup will not lose data. Briefly, when the device is powered down to restart the recovery of the persistent ordered set, the recovery of the entire persistent ordered set can be completed by traversing all members in the ordered index (e.g., persistent jump table) in the persistent storage, recomputing the hash value and building the hash index in the dynamic random access memory. That is, when a device using the method is powered up, the hash index is reconstructed in the dynamic random access memory based on all members contained in the ordered index, thereby restoring the ordered set. The process of recovering the persistent ordered set by powering-off the device is described below with reference to fig. 3 (a) to 3 (h).
In an exemplary embodiment of the present application, the step of reconstructing the hash index in the dynamic random access memory includes: the hash value of each member is calculated by traversing each member contained in the ordered index and the hash index is reconstructed therefrom based on each member and the hash value of each member.
In particular, if an unexpected power down occurs, the ordered index of the ordered set can be read directly from the persistent storage at restart without additional operations, which is mainly due to its characteristic of being able to store data in the event of power down. In the following description, the description is given taking an example in which the ordered index is stored in the persistent storage in the form of a persistent jump table, but the present application is not limited thereto, and the ordered index may be persistent in the persistent storage in the form of a tree, a single-chain table, or a doubly-linked table. Thereafter, the hash index is reconstructed according to the following procedure:
first, as shown in fig. 3 (a), the traversal starts from the header of the persistent jump table, and a null hash index (i.e., hash table) is created in the dynamic random access memory.
Then, as shown in fig. 3 (b) to 3 (g), sequentially traversing the data in all the intermediate nodes in the bottom linked list of the persistent jump table, calculating the hash value of the member in the node currently traversed, finding the corresponding hash bucket in the hash index, inserting a pointer pointing to the position of the corresponding member in the persistent jump table, and in the process, newly adding a pointer in the hash index every time one intermediate node in the bottom linked list in the persistent jump table is traversed.
Finally, as shown in fig. 3 (h), when the tail node of the bottom linked list of the persistent jump table is encountered in the above traversal process, the traversal is stopped. So far, all members in the ordered index are traversed, and accordingly, the hash index in the dynamic random access memory is completely and correctly recovered, and at the moment, the recovered ordered set can start to normally accept the read-write request.
Therefore, when the power is turned off and restarted, the hash index can be reconstructed by virtue of the persistent ordered index in the persistent storage device, the persistence of the hash index can be saved, the read and write of the persistent storage device can be reduced, meanwhile, the high performance of the dynamic random access memory can be further fully utilized, the search performance close to that of the original memory ordered set can be obtained, and compared with the method of using the PMem persistent ordered set (namely the hash index and the ordered index) originally, the better search performance can be obtained, in addition, the address stored in the hash index can be accurately pointed to a member of the ordered set in the persistent storage device under the condition that the hash index is reconstructed, and the hash value of a target member can be searched in the reconstructed hash index during searching, so that the data cannot be lost.
Fig. 4 is a block diagram of a data storage management device 800 according to an exemplary embodiment of the present disclosure.
Referring to fig. 4, the apparatus 400 may include a data acquisition unit 410 and a processing unit 420. Specifically, the acquisition unit 410 may be configured to acquire an ordered set. The processing unit 420 may be configured to: writing an ordered index of an ordered set into a persistent storage device, and writing a hash index of the ordered set into a dynamic random access memory, wherein the ordered index comprises members in the ordered set and scores of the members, the members in the ordered index are ordered according to the scores of the members, and the hash index comprises pointers for pointing to positions of the members in the persistent storage device.
In an exemplary embodiment of the present disclosure, processing unit 420 is configured to write the ordered index to the persistent storage in one of a tree, a skip list, a singly linked list, and a doubly linked list.
In an exemplary embodiment of the present disclosure, the processing unit 420 is configured to: creating an empty hash index in the dynamic random access memory; and traversing each hash bucket in the hash indexes of the ordered set in turn, and writing the pointer corresponding to the member in each hash bucket into the corresponding hash bucket in the newly built hash index.
In an exemplary embodiment of the present disclosure, the processing unit 420 is further configured to: responding to the received member query request to be a first type of query which is queried by utilizing a target score, and querying the target member by utilizing the ordered index; responding to the received member query request to be a second type query which is queried by utilizing a target member, and querying the target member by utilizing the hash index; and feeding back the query result.
The processing unit 420 is configured to query the target member with the ordered index by: searching the ordered index based on the target score to find the target member.
The processing unit 420 is configured to query the target member with the hash index by: calculating the hash value of the target member; searching the hash index based on the hash value of the target member; and in response to the hash index searching the hash bucket corresponding to the hash value, reading an address corresponding to the hash value from the hash bucket, and searching the persistent storage device according to the address corresponding to the hash value to inquire the target member.
The processing unit 420 is further configured to: processing the ordered set in response to receiving a processing operation for the ordered set; when the processing operation is a first type operation, the ordered index is adjusted according to the scores of all the members in the processed ordered set, wherein the first type operation is only used for modifying the scores of the members in the ordered set; and when the processing operation is a second type operation, the ordered index and the hash index are adjusted according to each member in the processed ordered set, wherein the second type operation comprises a member inserting operation, a member deleting operation and a member modifying operation aiming at the ordered set.
The processing unit 420 is further configured to adjust the ordered set according to the score of each member in the processed ordered set by: and modifying the sequence of the members in the ordered index according to the scores of the members in the processed ordered set.
The processing unit 420 is further configured to adjust the ordered index and the hash index according to each member of the processed ordered set by: calculating the score and hash value of each member after processing; modifying the sequence of the members in the ordered index according to the scores of the members after processing; modifying the hash index according to the hash value of each member after processing and the address of the member in the ordered index after modification.
The processing unit 420 is further configured to: when the device is powered up, the hash index is reconstructed in the dynamic random access memory based on all members contained in the ordered index.
The processing unit 420 is further configured to reconstruct the hash index in the dynamic random access memory by: and calculating a hash value of each member by traversing each member contained in the ordered index, and reconstructing the hash index in the dynamic random access memory according to each member and the hash value of each member.
Since the data storage management method shown in fig. 1 may be performed by the apparatus 400 shown in fig. 4, any relevant details concerning the operations performed by the units in fig. 4 may be found in the corresponding descriptions with respect to fig. 1 to 3.
Data storage management methods and apparatuses, storage media, and systems according to exemplary embodiments of the present application have been described above with reference to fig. 1 to 4. However, it should be understood that: the apparatus shown in the figures may be configured as software, hardware, firmware, or any combination thereof, respectively, that performs the specified functions. For example, the apparatus may correspond to an application specific integrated circuit, to a pure software code, or to a module in which software is combined with hardware. Furthermore, one or more functions implemented by the apparatus may also be uniformly performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).
It should be appreciated that the data storage management method according to an exemplary embodiment of the present application may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present application, a computer-readable storage medium storing instructions may be provided that, when executed by at least one computing device, cause the at least one computing device to perform the data storage management method described above.
The instructions stored in the computer-readable storage medium described above may be run in an environment deployed in a computer device, such as a client, host, proxy, server, etc.
On the other hand, the data storage management apparatus according to the exemplary embodiment of the present invention may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium, such as a storage medium, so that the processor can perform the corresponding operations by reading and executing the corresponding program code or code segments.
For example, according to an exemplary embodiment of the present application, a system may be provided that includes at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the data storage management method described above.
In particular, the above-described system may be deployed in a server or client, as well as on a node in a distributed network environment. Furthermore, the system may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the above set of instructions. In addition, the system may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). Additionally, all components of the system may be connected to each other via a bus and/or a network.
Here, the system is not necessarily a single system, but may be any device or aggregate of circuits capable of executing the above-described instructions (or instruction set) alone or in combination. The system may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with locally or remotely (e.g., via wireless transmission).
In the system, the at least one computing device may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example and not limitation, the at least one computing device may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like. The computing device may execute instructions or code stored in one of the storage devices, wherein the storage devices may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The storage device may be integrated with the computing device, for example, with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage devices may include stand-alone devices, such as external disk drives, storage arrays, or other storage devices usable by any database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through an I/O port, network connection, or the like, such that the computing device is capable of reading instructions stored in the storage device.
The foregoing description of exemplary embodiments of the invention has been presented only to be understood as illustrative and not exhaustive, and the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention shall be subject to the scope of the claims.

Claims (10)

1. A method of data storage management, the method comprising:
acquiring an ordered set;
writing an ordered index of the ordered set into a persistent storage device, wherein the ordered index comprises members in the ordered set and scores of the members, and the members in the ordered index are ordered according to the scores of the members; and is also provided with
Writing a hash index of the ordered set to a dynamic random access memory, wherein the hash index includes a pointer for pointing to a location of the member in the persistent storage.
2. The method of claim 1, wherein the writing the ordered index of the ordered set to persistent storage comprises: and writing the ordered index into the persistent storage device in one form of a tree, a skip list, a single linked list and a double linked list.
3. The method of claim 1, wherein writing the hash index of the ordered set to dynamic random access memory comprises:
creating an empty hash index in the dynamic random access memory;
and traversing each hash bucket in the hash indexes of the ordered set in turn, and writing the pointer corresponding to the member in each hash bucket into the corresponding hash bucket in the newly built hash index.
4. The method of claim 1, wherein the method further comprises:
responding to the received member query request to be a first type of query which is queried by utilizing a target score, and querying the target member by utilizing the ordered index;
responding to the received member query request to be a second type query which is queried by utilizing a target member, and querying the target member by utilizing the hash index;
and feeding back the query result.
5. The method of claim 1, wherein the method further comprises:
processing the ordered set in response to receiving a processing operation for the ordered set;
when the processing operation is a first type operation, the ordered index is adjusted according to the scores of all the members in the processed ordered set, wherein the first type operation is only used for modifying the scores of the members in the ordered set;
And when the processing operation is a second type operation, the ordered index and the hash index are adjusted according to each member in the processed ordered set, wherein the second type operation comprises a member inserting operation, a member deleting operation and a member modifying operation aiming at the ordered set.
6. The method of claim 1, wherein the method further comprises:
when a device using the method is powered up, the hash index is reconstructed in the dynamic random access memory based on all members contained in the ordered index.
7. The method of claim 6, wherein reconstructing the hash index in the dynamic random access memory comprises:
and calculating a hash value of each member by traversing each member contained in the ordered index, and reconstructing the hash index in the dynamic random access memory according to each member and the hash value of each member.
8. A data storage management apparatus, the apparatus comprising:
an acquisition unit configured to acquire an ordered set; and
a processing unit configured to:
Writing an ordered index of an ordered set into a persistent storage device, wherein the ordered index comprises members in the ordered set and scores of the members, and the members in the ordered index are ordered according to the scores of the members; and is also provided with
Writing a hash index of the ordered set to a dynamic random access memory, wherein the hash index includes a pointer for pointing to a location of the member in the persistent storage.
9. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1-7.
10. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1-7.
CN202210768866.1A 2022-06-30 2022-06-30 Data storage management method and device, storage medium and system Pending CN117369709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210768866.1A CN117369709A (en) 2022-06-30 2022-06-30 Data storage management method and device, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210768866.1A CN117369709A (en) 2022-06-30 2022-06-30 Data storage management method and device, storage medium and system

Publications (1)

Publication Number Publication Date
CN117369709A true CN117369709A (en) 2024-01-09

Family

ID=89391577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210768866.1A Pending CN117369709A (en) 2022-06-30 2022-06-30 Data storage management method and device, storage medium and system

Country Status (1)

Country Link
CN (1) CN117369709A (en)

Similar Documents

Publication Publication Date Title
US11481121B2 (en) Physical media aware spacially coupled journaling and replay
US11853549B2 (en) Index storage in shingled magnetic recording (SMR) storage system with non-shingled region
CN109086388B (en) Block chain data storage method, device, equipment and medium
CN110018998B (en) File management method and system, electronic equipment and storage medium
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
US10489289B1 (en) Physical media aware spacially coupled journaling and trim
WO2013152357A1 (en) Cryptographic hash database
CN111177143B (en) Key value data storage method and device, storage medium and electronic equipment
CN103595797B (en) Caching method for distributed storage system
CN113806300B (en) Data storage method, system, device, equipment and storage medium
US9619322B2 (en) Erasure-coding extents in an append-only storage system
US10210067B1 (en) Space accounting in presence of data storage pre-mapper
US10289345B1 (en) Contention and metadata write amplification reduction in log structured data storage mapping
CN109165321A (en) A kind of consistency Hash table construction method and system based on nonvolatile memory
US11436256B2 (en) Information processing apparatus and information processing system
CN114746854A (en) Data provider agnostic change handling in mobile client applications
US10209909B1 (en) Storage element cloning in presence of data storage pre-mapper
CN114490540B (en) Data storage method, medium, device and computing equipment
US10073874B1 (en) Updating inverted indices
CN117369709A (en) Data storage management method and device, storage medium and system
WO2014061847A1 (en) Apparatus and method for logging and recovering transaction of database built in mobile environment
CN113204520A (en) Remote sensing data rapid concurrent read-write method based on distributed file system
CN108694209B (en) Distributed index method based on object and client
KR101618999B1 (en) Network boot system
CN112084141A (en) Full-text retrieval system capacity expansion method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination