CN114185815A - Method, equipment and system for realizing memory key value storage - Google Patents

Method, equipment and system for realizing memory key value storage Download PDF

Info

Publication number
CN114185815A
CN114185815A CN202111498954.6A CN202111498954A CN114185815A CN 114185815 A CN114185815 A CN 114185815A CN 202111498954 A CN202111498954 A CN 202111498954A CN 114185815 A CN114185815 A CN 114185815A
Authority
CN
China
Prior art keywords
log
memory
writing
nvm
log record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111498954.6A
Other languages
Chinese (zh)
Inventor
徐宁
谢娜
付钰
齐学成
胡卉芪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202111498954.6A priority Critical patent/CN114185815A/en
Publication of CN114185815A publication Critical patent/CN114185815A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • G06F12/0261Garbage collection, i.e. reclamation of unreferenced memory using reference counting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method, equipment and a system for realizing memory key value storage. The method for realizing the memory key value storage comprises the following steps: in a distributed storage system constructed based on a non-volatile memory (NVM) and Remote Direct Memory Access (RDMA), according to a control instruction, the following operations are executed: writing the log record in the volatile cache of the main equipment into the NVM of the main equipment; sending the log record to a backup device over the RDMA; and writing the log record to a system memory of the primary device. By using the distributed cache system provided by the RDMA and the NVM in a mixed manner, the log records after the persistence processing are synchronously written into the NVM of the primary device and the system memory DRAM of the primary device and are sent to the backup device (the NVM written into the backup device) through the RDMA, so that the high availability and the high performance of the memory key value persistence storage are realized.

Description

Method, equipment and system for realizing memory key value storage
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, device, and system for implementing memory key value storage.
Background
The current cache or distributed cache is often used for key nodes requiring performance, and once a problem occurs, a system such as a back-end database may be punctured. The master-slave backup of the cache can solve the problem, but for the existing cache master-slave backup, the direct network performance often cannot meet the speed access requirement of the memory level, thereby slowing down the performance of the whole cache system.
Data replication in the active/standby mode can be realized in a highly available manner. The active/standby mode is usually a primary/standby mode or a multi-standby mode, and the read/write request is processed by the primary device. In the active/standby mode, since all data needs to be copied, written and amplified, and the network performance is insufficient, the memory key value storage performance is generally greatly affected. Furthermore, low speed network I/O and storage I/O tend to be serious performance bottlenecks. There is currently no good implementation to address these issues.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method, a device, and a system for implementing memory key value storage, so as to implement high availability of persistent storage of memory key values.
In order to achieve the above object, a first aspect of the present application provides a method for implementing memory key value storage, where the method for implementing memory key value storage includes: in a distributed storage system constructed based on a non-volatile memory (NVM) and Remote Direct Memory Access (RDMA), according to a control instruction, the following operations are executed: writing the log record in the volatile cache of the main equipment into the NVM of the main equipment; sending the log record to a backup device over the RDMA; and writing the log record to a system memory of the primary device.
In this embodiment of the present application, optionally, the writing the log record in the volatile cache of the host device to the NVM of the host device includes: writing the log records in batch in the log system based on an additional mode by taking the position pointed by a tail pointer of the log system as a starting point; adjusting a tail pointer of the log system to point to a new tail end; and when the log system needs to be rebuilt, scanning the log system from the position pointed by the tail pointer for rebuilding.
Optionally, the writing the log record into the system memory of the host device includes: when the bytes of the data recorded by the log are larger than the preset bytes, storing the pointer of the data recorded by the log into a hash index table of a system memory of the main equipment; otherwise, storing the data recorded by the log into the hash index table.
Optionally, after sending the log record to the backup device via the RDMA, the backup device performs the following operations: writing the log record into an NVM of a backup device; and writing the log record into a system memory of the backup device.
Optionally, the method for implementing memory key value storage further includes: creating a new volatile cache region in a volatile cache of the master device; and writing a new log record into the created new volatile cache region according to the control instruction.
A second aspect of the present application provides a primary device for implementing memory key value storage, where the primary device is a distributed storage system configured with a non-volatile memory NVM and remote direct memory access RDMA, and the primary device includes a controller configured to: according to the control instruction, the following operations are executed: writing the log record in the volatile cache of the main device into the NVM of the main device; sending the log record to a backup device over the RDMA; and writing the log record to a system memory of the primary device.
In this embodiment of the present application, optionally, the writing the log record in the volatile cache of the master device into the NVM of the master device includes: writing the log records in batch in the log system based on an additional mode by taking the position pointed by a tail pointer of the log system as a starting point; adjusting a tail pointer of the log system to point to a new tail end; and when the log system needs to be rebuilt, scanning the log system from the position pointed by the tail pointer for rebuilding.
Optionally, the writing the log record into the system memory of the main device includes: when the bytes of the data recorded by the log are larger than the preset bytes, storing the pointer of the data recorded by the log into a hash index table of a system memory of the main equipment; otherwise, storing the data recorded by the log into the hash index table.
Optionally, the controller of the master device is further configured to: : creating a new volatile cache region in a volatile cache of the master device; and writing a new log record into the created new volatile cache region according to the control instruction.
A third aspect of the present application provides a backup device for implementing memory key value storage, where the backup device is a distributed storage system configured with a non-volatile memory NVM and remote direct memory access RDMA, and the backup device includes a controller configured to: according to the control instruction, the following operations are executed: writing the received log record into the NVM of the backup device; and writing the log record into a system memory of the backup device.
In this embodiment, optionally, the controller of the backup device is further configured to: and writing the log record of the NVM of the backup equipment into the main equipment according to the control instruction.
A fourth aspect of the present application provides a system for implementing memory key value storage, where the system for implementing memory key value storage includes any one of the above primary devices for implementing memory key value storage and at least one of the above backup devices for implementing memory key value storage.
A fifth aspect of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements any of the above-described methods for implementing memory key value storage.
Through the technical scheme, the distributed cache system provided by the RDMA and the NVM is used in a mixed mode, and the log records subjected to persistence processing are synchronously written into the NVM of the primary device and the DRAM of the system memory of the primary device and are sent to the backup device (the NVM written into the backup device) through the RDMA, so that high availability and high performance of memory key value persistence storage are achieved.
Additional features and advantages of embodiments of the present application will be described in detail in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the embodiments of the disclosure, but are not intended to limit the embodiments of the disclosure. In the drawings:
fig. 1 is a schematic flowchart of a method for implementing a memory key value store according to an embodiment of the present invention;
FIGS. 2A and 2B are schematic diagrams of two modes of a persistent memory configuration;
FIG. 3 is a block diagram of a distributed storage system built on NVM and RDMA;
FIGS. 4A and 4B are schematic diagrams illustrating the comparison between the throughput and the latency of remote access PMem and local read/write;
FIG. 5 is a schematic structural diagram of a system for implementing a memory key value store according to an embodiment of the present invention;
fig. 6A and 6B are schematic diagrams of optimized write throughput and latency results.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the specific embodiments described herein are only used for illustrating and explaining the embodiments of the present application and are not used for limiting the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that if directional indications (such as up, down, left, right, front, and back … …) are referred to in the embodiments of the present application, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
In the current distributed cache system, if high available support is not provided, when a cache node has a problem, a back-end database cannot meet the access requirement of a cache level due to the fact that the back-end database is based on a disk, and is easy to break down to bring a larger problem; while if high available support is provided, the high performance required for caching cannot be handled due to current network latency and throughput and disk performance.
Therefore, based on the technical development of network and storage, the embodiment of the application can solve the bottleneck of a performance-critical cache system by introducing new hardware in the main design and the standby design of the cache system and elaborately adjusting a transmission mechanism and a data storage mechanism (an open-only storage system based on log is database) based on the characteristics of the hardware, and provides a high-performance distributed cache with high availability. Detailed embodiments are as follows.
Fig. 1 is a flowchart illustrating a method for implementing a memory key value store according to an embodiment of the present invention. Referring to fig. 1, the method for implementing memory key value storage may include the following steps:
step S110: in the distributed storage system constructed based on the non-volatile memory NVM and the remote direct memory access RDMA, the following operations are performed according to the control instruction (steps S11-S13) to achieve high availability of the memory key value persistent storage.
A Non-Volatile Memory (NVM) is a Memory in which stored data does not disappear when a current is turned off (power failure).
The embodiment of the invention adopts a Remote Direct Memory Access (RDMA) network to solve the network transmission bottleneck.
RDMA networks provide high bandwidth and low latency access to remote machine memory by bypassing the core (bypass) and bypassing the remote CPU through a zero-copy network.
RDMA supports two verbs: memory Verbs and Message Verbs. Memory Verbs, also called unilateral Verbs (control instructions), i.e. read (read) and write (write), do not require the involvement of a remote CPU, i.e. the remote CPU is completely bypassed. The Message Verbs, also called bilateral Verbs (control commands), namely send (send) and receive (receive), provide bilateral messaging at the user plane. This requires the remote machine CPU to be involved, requiring the remote machine to issue a receive operation in advance before initiating the send operation.
Embodiments of the present invention may employ, for example, Intel ao persistent memory (Optane DCPMM) to provide byte level addressing, ultra low latency, and high density persistent media. It can be configured into two modes, one is a memory mode, and the other is an AppDirect mode. The memory mode can directly expand the memory without supporting persistence; the AppDirect mode is where an application directly accesses a persisted Memory (PMem) through a Load and Store (Store) instruction.
Different configurations of Optane DCPMM may result in different hardware topologies for CPU cache, DRAM, Optane DCPMM, and RDMA NIC (RNIC). Referring to FIG. 2A, the RDMA verbs actually write data to the L3 Cache of the CPU via a PCIe controller and an IO controller in Direct memory Access (DCA) technology. In memory mode, the RDMA verb can only see the memory portion of PMem without persistence guarantee, since the DRAM is now transparent to the IMC. Referring to FIG. 2B, in AppDirect mode, the RDMA verbs can see both DRAM and PMem parts, and data can be freely transferred between the two through the RDMA verbs.
Based on the working principle of RDMA and PMem, the embodiment of the invention preferably adopts RDMA and a nonvolatile memory NVM to construct a distributed storage system, so as to realize that the magnitude of data transmission and persistence is hundreds of nanoseconds or less, and the high availability of cache master and backup copy can be greatly improved.
In a distributed storage system constructed by NVM and RDMA, data can be stored persistently according to control instructions. The control instruction, such as a store instruction (save), a search instruction (select), etc., is used by the CPU to correspondingly execute a write operation, a read operation, etc.
The embodiment of the invention preferably solves the problem of high availability of memory storage, so that the operation aiming at the save instruction is preferentially carried out. Fig. 3 is a schematic diagram of a framework of a distributed storage system constructed based on NVM and RDMA, and referring to fig. 3, after a save instruction is received by a master device (PRIMARY), a CPU of the master device may perform the following operations (steps S11-S13):
step S11: writing the log record in the volatile cache (volatile memory buffer) of the host device into the NVM of the host device.
Redis has two persistence modes, and the embodiment of the invention adopts the persistence mode of ap-only to record all commands of write operation (write) into a disk file in an additional mode. The storage of appendix-only is widely used in disk-based storage systems because the access granularity of SSDs/HDDs is coarse (e.g., 4KB) so that data can be cached in Megabytes (MB) before being flushed to disks (disks). In some related researches, the existing persistent storage of data is basically designed for disk or simulated NVM and cannot be simply migrated to real NVM hardware. In this regard, the embodiment of the present application is preferably based on a redo-log mechanism, and optimizes the distributed storage system by using pipeline blocking technology, and preferably adopts a log record (redo-only log) as a unique persisted version data copy, that is, adopts the log is the database.
The format of each Key Value data in the log record may be LogID, length, Key, Value.
Preferably, step S11 may include: writing the log records in batch in the log system based on an additional mode by taking the position pointed by a tail pointer of the log system as a starting point; adjusting a tail pointer of the log system to point to a new tail end; and when the log system needs to be rebuilt, scanning the log system from the position pointed by the tail pointer for rebuilding.
By way of example, the log transaction redo-log of the log system on NVM ensures data consistency (crash consistency) through the tail pointer tail-pointer. After the log is written in batches each time, tail-pointer is adjusted to point to the new tail end, and then the index in the memory is updated. During reconstruction, only the whole redo-log needs to be scanned from the tail end.
Step S12: and sending the log record to a backup device through the RDMA.
In step S11, after the log record in the volatile cache area (volatile memory buffer) is persisted locally in the primary device, the log record is written into the NVM of the primary device, and at the same time, the primary device concurrently transmits the persisted log record to a plurality of backup devices.
Preferably, the method for implementing memory key value storage may further include: creating a new volatile cache region in a volatile cache (volatile memory buffer) of the primary device; and writing a new log record into the created new volatile cache region according to the control instruction.
By way of example, in the pipeline mechanism, after the log record is locally persisted in the primary device, a volatile cache (volatile memory buffer) on the primary side immediately continues to process a new request, for example, execute a new save instruction or execute a read (read) instruction (from the backup device), and uses another new volatile cache memory buffer to batch the log record, and at the same time, the primary device concurrently transmits the persisted log record to multiple backup devices.
Preferably, after said sending said log record to a backup device (REPLICA) over said RDMA, said backup device (REPLICA) may perform the following operations: writing the log record into an NVM of a backup device; and writing the log record into a system memory of the backup device.
The system architecture of the backup device (REPLICA) is substantially identical to the system architecture of the PRIMARY device (PRIMARY). The PRIMARY sends (send) the log record in the volatile cache (volatile memory buffer) of the PRIMARY device to the NVM of the backup device through the RDMA, and REPLICA correspondingly executes write operation, writes the log record into the NVM of the backup device, and simultaneously writes the log record into a system memory DRAM of the backup device.
Sending (send) the log record in the volatile cache (volatile memory buffer) of the primary device to the backup device through the RDMA, and writing the log record in the NVM of the backup device in batch based on an appending manner when writing the log record in the NVM log system of the backup device, which may include: writing the log records in batch in the log system based on an adding mode by taking the position pointed by a tail pointer of the backup device log system as a starting point; adjusting a tail pointer of the log system to point to a new tail end; when the log system needs to be rebuilt, the log system is scanned from the position pointed by the tail pointer for rebuilding.
Step S13: and writing the log record into a system memory DRAM of the main device.
In parallel with steps S11 and S12, the log records after the persistence processing are simultaneously stored in the system memory DRAM of the primary device, so that the high availability of the persistent storage of the memory key values can be realized.
Preferably, step S13 may include: when the bytes of the data recorded by the log are larger than the preset bytes, storing the pointer of the data recorded by the log into a Hash Index table Hash Index of a system memory of the main equipment; otherwise, storing the data recorded by the log into the HashIndex of the hash index table.
However, as previously discussed, PMem does not perform as well as volatile memory for small data reads and writes. The embodiment of the invention adopts the Hash Index which is treated differently to solve the problem. I.e., to cache hot and small entries in system memory, and to distinguish pointers to entries from cache of entries. The Hash Index stores an 8-byte pointer for a large entry (e.g., bytes of logged data >256 bytes of the preset bytes) and a cache for a small entry (e.g., bytes of logged data < 256 bytes of the preset bytes) in system memory.
Accordingly, the backup device is only used to service the read-only command request on a daily basis, and thus, in step S12, the backup device (REPLICA) writes the log record into the system memory of the backup device in the same manner as in step S13. That is, when the bytes of the data recorded by the log are greater than the preset bytes, storing the pointer of the data recorded by the log into a Hash Index table Hash Index of a system memory of the backup device; otherwise, storing the data recorded by the log into the Hash Index of the Hash Index table.
The cacheline size of current CPUs is 64 bytes and the access granularity of PMem is 256 bytes, in which case the bandwidth for small value read and write accesses is poor despite PMem being byte addressed. The workload investigation example for Memcached at Facebook shows that more than 40% of entries are less than 13 bytes and more than 70% of entries are less than 300 bytes, which also results in a large number of small log entries. This problem wastes PMem access bandwidth significantly. Based on the above theory and practical evaluations, for small data volume writes, remote access to PMem via RDMA write is much more delayed than local write (clb/ntstore). As shown in FIGS. 4A, 4B, the latency of RDMA write is 20.5 times that of write (clb/ntstore) for a 256 byte write. Therefore, when the log record is written into the system memories of the main device and the backup device, the differentiated Hash Index can well solve the problem.
Fig. 5 is a schematic structural diagram of a system for implementing a memory key value store according to an embodiment of the present invention. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Referring to fig. 5, the system for implementing memory key value storage includes a PRIMARY device (PRIMARY) for implementing memory key value storage and at least one backup device (REPLICA) for implementing memory key value storage.
Wherein the PRIMARY device (PRIMARY) for implementing memory key value storage is a distributed storage system configured with a non-volatile memory NVM and remote direct memory access RDMA, the PRIMARY device (PRIMARY) comprising a controller configured to: according to the control instruction, the following operations are executed to realize the high availability of the memory key value persistent storage: writing the log record in the volatile cache of the main device into the NVM of the main device; sending the log record to a backup device over the RDMA; and writing the log record to a system memory of the primary device.
Each backup device (REPLICA) for implementing memory key value storage of at least one backup device (REPLICA) for implementing memory key value storage is a distributed storage system configured with a non-volatile memory (NVM) and a Remote Direct Memory Access (RDMA), the backup device (REPLICA) comprising a controller configured to: according to the control instruction, the following operations are executed to realize the high availability of the memory key value persistent storage: writing the received log record into the NVM of the backup device; and writing the log record into a system memory of the backup device.
Preferably, an embodiment of the present invention provides a redesigned FaRPC Engine (RPC Engine), and an execution flow is as follows:
1) the log record in the volatile cache (Bufff) of the master device is subjected to persistence processing, for example, the master device (PRIMARY) receives a control instruction (for example, save instruction) and starts the persistence processing.
2) And writing the log records in the volatile cache (Bufff) of the master device into the NVM of the master device in batches.
Preferably, the writing the log record in the volatile cache of the master device to the NVM of the master device includes: writing the log records in batch in the log system based on an additional mode by taking the position pointed by a tail pointer of the log system as a starting point; adjusting a tail pointer of the log system to point to a new tail end; and when the log system needs to be rebuilt, scanning the log system from the position pointed by the tail pointer for rebuilding.
The log transaction redo-log of the log system on the NVM ensures data consistency (crash consistency) through the tail pointer tail-pointer.
3) A new volatile buffer (Bufff) is created for executing further control instructions, by means of which the volatile buffer (Bufff) of the primary device synchronously sends (send) the log records to the at least one backup device (REPLICA).
Preferably, the controller of the PRIMARY device (PRIMARY) is further configured to: creating a new volatile cache area (Bufff) in the volatile cache of the master device; and writing a new log record into the created new volatile cache region (Bufff) according to the control instruction. The new volatile cache (Bufff) may execute other control instructions, such as a new save instruction, to batch new log records.
4) Correspondingly to 3), preferably, the controller of the backup device (REPLICA) is further configured to: writing the log record of the NVM of the backup device into a PRIMARY device (PRIMARY) according to a control instruction. When the PRIMARY device (PRIMARY) needs to read (read) log records from the at least one backup device (REPLICA), the log records may be read back from the log system of the NVM of the backup device (REPLICA) and written (write) into the volatile cache (Bufff) of the PRIMARY device.
5) In parallel with 1), 2), 3), the backup log is written into the system memory of the main device and the system memory of the backup device.
Preferably, the writing the log record into the system memory of the main device includes: when the bytes of the data recorded by the log are larger than the preset bytes, storing the pointer of the data recorded by the log into a hash index table of a system memory of the main equipment; otherwise, storing the data recorded by the log into the hash index table. The log record is written into the system memory of the backup device, and details are not repeated here.
A fifth aspect of the embodiments of the present invention provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method for implementing memory key value storage.
It should be noted that the contents of the primary device, the backup device, the system, and the computer program product for implementing memory key value storage provided in the embodiments of the present invention are similar to the contents of the method for implementing memory key value storage provided in the embodiments of the present invention, and therefore detailed technical explanation and effects refer to the method embodiments and are not repeated herein.
Taking redis as an example, the embodiment of the present invention may implement the write throughput and the latency effect optimized as shown in fig. 6A and fig. 6B, where fig. 6A is an optimization schematic diagram of 100% output throughput, and fig. 6B is an optimization schematic diagram of 50% output and 50% input throughput.
Therefore, the embodiment of the invention mixedly uses the distributed cache system provided by the RDMA and the NVM, and simultaneously writes the log records after the persistence processing into the NVM of the primary device and the DRAM of the system memory of the primary device synchronously, and sends the log records to the backup device (the NVM written into the backup device) through the RDMA, so as to realize high availability and high performance of the memory key value persistence storage. And differentiates the Hash Index on top of the system by the processing mechanism of the ap-depend-only on top of the NVM to improve the small and hot data reading speed. Moreover, a pipeline blocking mechanism between the PRIMARY and backup copies, namely, the transmission of data copy is synchronously performed and new writing to the PRIMARY device (PRIMARY) is received by creating a new volatile buffer (buff), so that the high availability of the memory key value persistent storage is further realized.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (13)

1. A method for implementing a memory key value store, the method comprising:
in a distributed storage system constructed based on a non-volatile memory (NVM) and Remote Direct Memory Access (RDMA), according to a control instruction, the following operations are executed:
writing the log record in the volatile cache of the main equipment into the NVM of the main equipment;
sending the log record to a backup device over the RDMA; and is
And writing the log record into a system memory of the main equipment.
2. The method of claim 1, wherein writing a log record in a volatile cache of a primary device to an NVM of the primary device comprises:
writing the log records in batch in the log system based on an additional mode by taking the position pointed by a tail pointer of the log system as a starting point;
adjusting a tail pointer of the log system to point to a new tail end;
and when the log system needs to be rebuilt, scanning the log system from the position pointed by the tail pointer for rebuilding.
3. The method of claim 1, wherein writing the log record to a system memory of a primary device comprises:
when the bytes of the data recorded by the log are larger than the preset bytes, storing the pointer of the data recorded by the log into a hash index table of a system memory of the main equipment;
otherwise, storing the data recorded by the log into the hash index table.
4. The method for implementing memory key storage according to claim 1, wherein after the sending the log record over the RDMA to a backup device, the backup device performs the following:
writing the log record into an NVM of a backup device; and is
And writing the log record into a system memory of the backup device.
5. The method of claim 1, wherein the method for implementing a memory key-value store further comprises:
creating a new volatile cache region in a volatile cache of the master device;
and writing a new log record into the created new volatile cache region according to the control instruction.
6. A primary device for implementing memory key value storage, the primary device being a distributed storage system configured with a non-volatile memory NVM and remote direct memory access RDMA, the primary device comprising a controller configured to: according to the control instruction, the following operations are executed:
writing the log record in the volatile cache of the main device into the NVM of the main device;
sending the log record to a backup device over the RDMA; and is
And writing the log record into a system memory of the main device.
7. The primary device for implementing memory key value storage according to claim 6, wherein the writing the log record in the volatile cache of the primary device to the NVM of the primary device comprises:
writing the log records in batch in the log system based on an additional mode by taking the position pointed by a tail pointer of the log system as a starting point;
adjusting a tail pointer of the log system to point to a new tail end;
and when the log system needs to be rebuilt, scanning the log system from the position pointed by the tail pointer for rebuilding.
8. The primary device for implementing memory key-value storage as claimed in claim 6, wherein said writing the log record to the system memory of the primary device comprises:
when the bytes of the data recorded by the log are larger than the preset bytes, storing the pointer of the data recorded by the log into a hash index table of a system memory of the main equipment;
otherwise, storing the data recorded by the log into the hash index table.
9. The primary device for implementing a memory key-value store according to claim 6, wherein the controller of the primary device is further configured to: :
creating a new volatile cache region in a volatile cache of the master device; and
and writing a new log record into the created new volatile cache region according to the control instruction.
10. A backup device for implementing memory key value storage, wherein the backup device is a distributed storage system configured with a non-volatile memory NVM and remote direct memory access RDMA, the backup device comprising a controller configured to: according to the control instruction, the following operations are executed:
writing the received log record into the NVM of the backup device; and is
And writing the log record into a system memory of the backup device.
11. The backup device for implementing memory key value storage as claimed in claim 10, wherein the controller of the backup device is further configured to:
and writing the log record of the NVM of the backup equipment into the main equipment according to the control instruction.
12. A system for implementing a memory key-value store, wherein the system for implementing a memory key-value store comprises the primary device for implementing a memory key-value store according to any one of claims 6 to 9 and at least one backup device for implementing a memory key-value store according to any one of claims 10 to 11.
13. A computer program product comprising a computer program, characterized in that the computer program realizes the method for realizing memory key-value storage according to any one of claims 1-5 when executed by a processor.
CN202111498954.6A 2021-12-09 2021-12-09 Method, equipment and system for realizing memory key value storage Pending CN114185815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111498954.6A CN114185815A (en) 2021-12-09 2021-12-09 Method, equipment and system for realizing memory key value storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111498954.6A CN114185815A (en) 2021-12-09 2021-12-09 Method, equipment and system for realizing memory key value storage

Publications (1)

Publication Number Publication Date
CN114185815A true CN114185815A (en) 2022-03-15

Family

ID=80542922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111498954.6A Pending CN114185815A (en) 2021-12-09 2021-12-09 Method, equipment and system for realizing memory key value storage

Country Status (1)

Country Link
CN (1) CN114185815A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117255101A (en) * 2023-11-16 2023-12-19 苏州元脑智能科技有限公司 Data processing method, device, equipment and medium of distributed storage system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117255101A (en) * 2023-11-16 2023-12-19 苏州元脑智能科技有限公司 Data processing method, device, equipment and medium of distributed storage system
CN117255101B (en) * 2023-11-16 2024-02-20 苏州元脑智能科技有限公司 Data processing method, device, equipment and medium of distributed storage system

Similar Documents

Publication Publication Date Title
CA2893304C (en) Data storage method, data storage apparatus, and storage device
US10606803B2 (en) Data cloning in memory-based file systems
US9959074B1 (en) Asynchronous in-memory data backup system
CN109582223B (en) Memory data migration method and device
CN111708738B (en) Method and system for realizing interaction of hadoop file system hdfs and object storage s3 data
US11240306B2 (en) Scalable storage system
US10606746B2 (en) Access request processing method and apparatus, and computer system
US11099768B2 (en) Transitioning from an original device to a new device within a data storage array
US11677633B2 (en) Methods and systems for distributing topology information to client nodes
WO2019089057A1 (en) Scalable storage system
CN114185815A (en) Method, equipment and system for realizing memory key value storage
CN112988680B (en) Data acceleration method, cache unit, electronic device and storage medium
US11068299B1 (en) Managing file system metadata using persistent cache
US11288238B2 (en) Methods and systems for logging data transactions and managing hash tables
CN117059147A (en) Tracking memory modifications at cache line granularity
EP3136245B1 (en) Computer
US12007942B2 (en) Methods and systems for seamlessly provisioning client application nodes in a distributed system
WO2020231392A1 (en) Distributed virtual file system with shared page cache
US11586353B2 (en) Optimized access to high-speed storage device
CN113448722A (en) Mapping method of process memory and instance processing method based on serverless architecture
WO2023093091A1 (en) Data storage system, smart network card, and computing node
JP6157158B2 (en) Information processing apparatus, control method thereof, and program
JP2019174858A (en) Storage apparatus, replication system, replication method and program
CN112084123A (en) Data processing method and device and data processing system
JP2019144932A (en) Information processing device, information processing system, information processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination