WO2023151545A1 - Data storage system, data storage method and apparatus, and related device - Google Patents

Data storage system, data storage method and apparatus, and related device Download PDF

Info

Publication number
WO2023151545A1
WO2023151545A1 PCT/CN2023/074690 CN2023074690W WO2023151545A1 WO 2023151545 A1 WO2023151545 A1 WO 2023151545A1 CN 2023074690 W CN2023074690 W CN 2023074690W WO 2023151545 A1 WO2023151545 A1 WO 2023151545A1
Authority
WO
WIPO (PCT)
Prior art keywords
target data
cache area
data
storage system
key
Prior art date
Application number
PCT/CN2023/074690
Other languages
French (fr)
Chinese (zh)
Inventor
李楚
叶茂
冯永刚
王�锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023151545A1 publication Critical patent/WO2023151545A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]

Definitions

  • the present application relates to the technical field of data storage, and in particular to a data storage system, data storage method, device and related equipment.
  • a data storage system refers to a system that organizes and allocates storage space for storage devices and is responsible for storing and retrieving data.
  • the data storage system can provide data read and write services in a multi-device mode.
  • one of the devices in the data storage system may provide a data access interface externally, and receive new data to be stored through the data access interface.
  • the device Normally, after the device caches the new data locally, it feeds back an acknowledgment message (acknowledgment, ack) that the data is written successfully to the data sender, and then writes the new data in the local cache to other devices in the data storage system Perform persistent storage to reduce the response delay of the storage data system to store data.
  • acknowledgement acknowledgment
  • a data storage system a data storage method, a device, a computing device, a storage medium and a computer program product are provided, so that the data storage system has relatively low response time delay and high reliability of data storage.
  • an embodiment of the present invention provides a data storage system, where the data storage system includes a first device and a second device, and the first device includes a first cache area, and the second device includes a second cache area.
  • the first device is used to receive a processing request, and the data processing request may be used to request writing target data, etc., and may be sent to the first device by other devices or an application program on the first device, and the first The device can write the target data in the data processing request into the second cache area based on RDMA, and write the target data into the local first cache area; the second device is used to persist the target data written in the second cache area storage.
  • the data processing request is completed. Since RDMA does not require the participation of the CPU, and this solution does not require the second device to persistently store the target data before considering the data processing request to be processed, this makes the response delay of the data storage system to store the target data relatively low. is close to storing data only in the cache area of the first device.
  • the target data is stored in the cache areas of the first device and the second device respectively, so that when the cache of one device fails, the data storage system can also avoid the loss of the target data through the cache of the other device, so that This can improve the reliability of data stored in the data storage system. Therefore, the first aspect provides a data storage system with low latency and improved data reliability.
  • the first device is further configured to persistently store the target data in the second cache area on the second device after the target data is written into the first cache area and the second cache area Before, generate a response message that the data processing request was successful.
  • the storage system regards the execution of the data processing request as completed, and can generate a response message to the data processing request, so that the response message can be used to send data to According to the feedback from the application processing the request or other devices that the target data has been stored, the response delay for the data storage system to store the target data is low, and the data storage system can persist the target data after feeding back the response message storage.
  • the first device is further configured to, after determining that the target data is persistently stored, eliminate the target data in the first cache area. In this way, the storage space occupied by the target data in the first cache area can be released under the condition that the target data is guaranteed to be persistently stored, so that the first device can use the released storage space to store other newly written data later. , realizing recycling of storage resources in the first cache area.
  • the second device is further configured to, after determining that the target data is stored persistently, eliminate the target data in the second cache area. In this way, the storage space occupied by the target data in the second cache area can be released, so that subsequent data of the first device can be cached in the released storage space, and the storage resources of the second cache area can be recycled.
  • the first device when writing the target data into the second cache area, is specifically configured to generate a WAL including the target data, and then write the WAL into the second cache area based on the RDMA technology.
  • the second device can persistently store the target data in the WAL by replaying the log.
  • the first device is specifically configured to write the WAL including the target data into the first cache area.
  • the target data may be acquired according to the WAL by playing back a log.
  • the first device may also directly write the target data into the first cache area.
  • the first device when the first device writes the target data into the first cache area, it is specifically used to generate a key-value pair, where the key in the key-value pair is the identifier of the target data, and the key-value pair The value in is the target data, so that the first device can write the key-value pair into the first cache area.
  • the first device can generate an index in the first cache area based on the target data, so that when the target data needs to be read, the first device can quickly find the target data from the local first cache area by traversing the index.
  • the first device when the target data is written into the first cache area through a key-value pair, the value in the key-value pair also corresponds to the first LSN of the WAL, and the first device is also used to obtain the first LSN of the WAL The second LSN corresponding to the WAL played back by the second device, and when the first LSN is equal to the second LSN, the first device may remove the key-value pair from the first cache area.
  • the first device can use the LSN corresponding to the WAL to identify which target data in the first cache area is persistently stored, and release the storage space occupied by the target data in the first cache area to support the first device to continue Write other data to realize the recycling of storage resources in the first cache area; and for the data that has not been persisted through the LSN, it can continue to be stored in the first cache area to improve the storage of the data in the data storage system reliability.
  • the first device is further configured to receive a data read request, where the data read request includes an identifier of the target data, so that the first device can read the data from the first cache area according to the identifier of the target data Check whether the target data is included, and, when the target data is included in the first cache area, the first device may feed back the found target data. In this way, the first device can quickly find the target data to be read from the local cache area, and can obtain the target data without remotely accessing the second device, thereby effectively improving the efficiency of the data storage system for feeding back target data.
  • the first device is further configured to: when the target data is not included in the first cache area, if the target data is eliminated from the first cache area because the persistent storage has been completed, the first device The target data is requested from the second device, so that the first device subsequently feeds back the target data.
  • the first device is specifically configured to write the target data into the second cache area based on the unilateral RDMA technology, so that the delay in writing the target data from the first device to the second device can be further effectively reduced , so that the High response efficiency of data storage system to store target data.
  • the first device may also write the target data into the second cache area by using a bilateral RDMA technology.
  • the data storage system may specifically be a distributed file system, where the first device implements a client in the distributed file system, and the second device implements a server in the distributed file system, And in the distributed file system, the second device persistently stores the target data in a file format.
  • an embodiment of the present invention provides a data storage method, the data storage method is applied to a data storage system, the data storage system includes a first device and a second device, and the first device includes a first cache area, the second The second device includes a second cache area, and the method includes: the first device receives a data processing request, and the data processing request includes target data; and, the first device writes the target data into the second cache area based on RDMA, and writes the target data into the second cache area.
  • the first cache area includes: the first device receives a data processing request, and the data processing request includes target data; and, the first device writes the target data into the second cache area based on RDMA, and writes the target data into the second cache area.
  • the method further includes: after the target data is written into the first cache area and the target data is written into the second cache area, and after the second Before the device persistently stores the target data in the second cache area, the first device generates a response message indicating that the data processing request is successful.
  • the method further includes: after determining that the target data is persistently stored, the first device eliminates the target data in the first cache area.
  • the first device writes the target data into the second cache area based on Remote Direct Memory Access (RDMA), including: the first device generates WAL including the target data; the first device writes the WAL into the second cache area based on RDMA. Second cache area.
  • RDMA Remote Direct Memory Access
  • the writing the target data into the first cache area by the first device includes: writing the WAL into the first cache area by the first device.
  • the first device writes the target data into the first cache area, including: the first device generates a key-value pair, the key in the key-value pair is the identifier of the target data, and the value in the key-value pair is the target data; the first device writes the key-value pair into the first cache area.
  • the value in the key-value pair also corresponds to the first log sequence number LSN of the WAL
  • the method further includes: the first device obtains the second LSN corresponding to the WAL that has been played back by the second device; When the first LSN is equal to the second LSN, the first device eliminates the key-value pairs in the first cache area.
  • the method further includes: the first device receives a data read request, and the data read request includes the identifier of the target data; the first device searches whether the target data is included in the first cache area according to the identifier of the target data data; when the target data is included in the first cache area, the target data is fed back.
  • the method further includes: when the first cache area does not include the target data, the first device requests the second device for the target data.
  • the first device writing the target data into the second cache area based on remote direct memory access includes: the first device writing the target data into the second cache area based on unilateral RDMA.
  • the data storage system includes a distributed file system
  • the first device implements a client in the distributed file system
  • the second device implements a server in the distributed file system
  • the second device implements the target Data is stored persistently in a file format.
  • the technical effects of the second aspect and the implementation methods in the second aspect can be referred to the corresponding first aspect and the first aspect The technical effects of each of the implementation modes are not repeated here.
  • the embodiment of the present invention provides a storage device, the storage device is used as the first device in the data storage system, the data storage system further includes a second device, and the storage device includes a first cache area and a processing A device, wherein the first cache area is used to cache data; the processor is used to execute the following method by running a computer program: receiving A data processing request, the data processing request including target data; writing the target data into the second cache area based on remote direct memory access (RDMA), so that the target data in the second cache area is persistently stored; and Writing the target data into the first cache area.
  • RDMA remote direct memory access
  • the processor is further configured to: after the target data is written into the first cache area and the target data is written into the second cache area, and after the Before the target data in the second cache area is persistently stored, a response message indicating that the data processing request is successful is generated.
  • the processor is configured to: delete the target data in the first cache area after determining that the target data is persistently stored.
  • the processor is configured to: generate a write-ahead log WAL including target data; and write the WAL into the second cache area based on RDMA.
  • the processor is configured to: write the WAL into the first cache area; or generate a key-value pair, where a key in the key-value pair is an identifier of the target data , the value in the key-value pair is the target data, and the key-value pair is written into the first cache area.
  • the value in the key-value pair also corresponds to the first log sequence number LSN of the WAL
  • the processor is further configured to: obtain the WAL corresponding to the playback of the second device the second LSN of the second LSN; when the first LSN is equal to the second LSN, eliminate the key-value pair in the first cache area.
  • the processor is further configured to: receive a data read request, where the data read request includes an identifier of the target data; according to the identifier of the target data, search for the first Whether the target data is included in a buffer area; when the target data is included in the first buffer area, the target data is fed back; when the target data is not included in the first buffer area, the target data is sent to the The second device requests the target data.
  • the processor is configured to: write the target data into the second cache area based on unilateral RDMA.
  • the data storage system includes a distributed file system
  • the first device implements a client in the distributed file system
  • the second device implements a client in the distributed file system
  • the server end and the second device persistently stores the target data in a file format.
  • the storage device provided by the third aspect corresponds to the first device in the data storage system provided by the first aspect
  • the third aspect and the technical effects of the implementations in the third aspect can be referred to the corresponding first aspect
  • the technical effects of each implementation manner in the first aspect details will not be described here.
  • an embodiment of the present invention provides a computing device, including: a processor and a memory; the memory is used to store instructions, and when the computing device is running, the processor executes the instructions stored in the memory, so that the computing The device executes the second aspect or the data storage method executed by the first device in any implementation manner of the second aspect.
  • the memory may be integrated in the processor, or independent of the processor.
  • a computing device may also include a bus. Wherein, the processor is connected to the memory through the bus. Wherein, the memory may include a readable memory and a random access memory.
  • the embodiment of the present invention also provides a readable storage medium, the readable storage medium stores a program or an instruction, and when it is run on a computer, any one of the above-mentioned second aspect or the second aspect In the implementation manner, the data storage method executed by the first device is executed.
  • an embodiment of the present invention further provides a computer program product containing instructions, which, when run on a computer, cause the computer to execute the above-mentioned second aspect or any implementation manner of the second aspect executed by the first device. data storage method.
  • FIG. 1 is a schematic structural diagram of an exemplary data storage system provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an exemplary distributed file system provided by an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a data storage method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a data reading method provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a data storage device provided by an embodiment of the present invention.
  • the data storage system 100 may include a device 101 and a device 102, and data interaction may be performed between the device 101 and the device 102, such as data communication through Hypertext Transfer Protocol (Hypertext Transfer Protocol, HTTP).
  • HTTP Hypertext Transfer Protocol
  • the device 101 can provide a file access interface, such as a portable operating system interface (portable operating system interface, POSIX), etc., so that the device 103 connected to the data storage system 100 or the application (application) 1011 in the device 101 can pass the file
  • the access interface implements operations such as deleting, reading, writing, and modifying files in the data storage 100 .
  • the device 101 may include a cache area 1, and data may be cached through the cache area 1 .
  • the device 102 includes a cache area 2 , and the device 102 can cache data through the cache area 2 . Further, the device 102 may also include a persistent storage area, so that the device 102 may write the data in the cache area 2 into the persistent storage area for persistent storage.
  • the cache area 2 can be implemented by non-volatile memory (non-volatile memory, NVM) such as read-only memory (read-only memory, ROM), flash memory (flash memory), storage class memory (storage class memory, SCM), etc. ), or the cache area 2 may also be implemented by a volatile memory (volatile memory) such as a random-access memory (random-access memory, RAM), which is not limited in this embodiment.
  • NVM non-volatile memory
  • ROM read-only memory
  • flash memory flash memory
  • storage class memory storage class memory
  • SCM storage class memory
  • the persistent storage area can be realized by a hard disk drive (HDD) or other storage media that can store data for a long time without power supply.
  • target data new data
  • the device 101 When the device 103 or the application 1011 writes new data (hereinafter referred to as target data) to the data storage system 100, if the device 101 only writes the target data into the local cache area 1, although this can reduce the storage capacity of the data storage system 100
  • an embodiment of the present invention provides a data storage method, aiming at improving the reliability of data storage while making the data storage system 100 have a smaller response time delay.
  • the device 101 not only writes it into the local cache area 1, but also writes the target data into the cache area 2 of the device 102 based on RDMA technology, so that the target The data is stored in the caches of the device 101 and the device 102 at the same time, and the device 102 can then persistently store the target data in the cache area 2 .
  • the processing of the data processing request is completed.
  • the target data can be stored persistently without the device 102
  • the data processing request is considered to be processed only later, which makes the response delay of the data storage system 100 to store the target data low, and the delay can be close to only storing the data in the cache area 1 of the device 101 .
  • the target data is stored in the cache areas of device 101 and device 102 respectively, so that when the target data cache fails on the device 101 side, the data storage system 100 can also prevent the target data from being lost through the cache on the device 102 side. , so that the reliability of data storage in the data storage system 100 can be improved. In this way, while the data storage system 100 has a small response delay, the reliability of data storage is also high.
  • the system architecture shown in FIG. 1 is only used as an example, and is not intended to limit its specific implementation to this example.
  • the persistent storage area can also be deployed outside the data storage system 100, that is, the device 102 can remotely send the target data in the cache area 2 to the persistent storage area for persistent storage, etc. This embodiment does not limit it.
  • the data storage system shown in FIG. 1 may specifically be the distributed file system 200 shown in FIG. 2 .
  • the distributed file system 200 includes a client 201 and a server 202 .
  • the client 201 may be an application program provided by the distributed file system 200, which may be implemented by the device 101 in FIG.
  • the device 101 where the client is located can also run the application 1011 shown in FIG. 1 , and the application 1011 can create and delete files in the distributed file system 200 through the file access interface provided by the client 201 , read, write, modify and other operations.
  • the server 202 can be implemented by one or more devices 102 in FIG. 1 , and the one or more devices 102 can be integrated with a memory so that the server 202 can be provided with a cache area 2 through the memory, which can be used for high-speed Cache data; further, the memory integrated on the device 102 can also provide a persistent storage area for the server 202, which can be used for persistent storage of data.
  • the device 102 may include multiple storages 1, so that the multiple storages 1 may form a buffer pool; and the device 102 may also include multiple storages 2, so that the multiple storages 2 may build a persistent storage pool.
  • the memory 1 can be non-volatile memory (non-volatile memory, NVM) such as read-only memory (read-only memory, ROM), flash memory (flash memory), storage class memory (storage class memory, SCM), etc. ), or, the memory 1 may also be a volatile memory (volatile memory) such as a random-access memory (random-access memory, RAM).
  • NVM non-volatile memory
  • ROM read-only memory
  • flash memory flash memory
  • storage class memory storage class memory
  • SCM storage class memory
  • RAM random-access memory
  • the memory 2 may be a hard disk drive (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD) and other devices that can be used for persistent storage of data. In this embodiment, specific implementation manners of the memory 1 and the memory 2 are not limited.
  • the data storage system shown in FIG. 1 may also be applicable to other application scenarios except FIG. 2 , which is not limited in this embodiment.
  • the system structure shown in FIG. 1 may be applicable to an application scenario of centralized storage or an application scenario of distributed storage.
  • one or more devices 102 can form a central node, and data is stored centrally in this central node, and all data processing services of the entire data storage system 100 are centrally deployed on this central node superior.
  • a disk control separation architecture can be adopted between the device 102 and the storage 1 implementing the cache area 2, that is, the device 102 and the storage 1 are deployed independently, or an integrated disk control architecture can be used between the device 102 and the storage 1, that is, the device 102
  • data in the data storage system 100 may be distributed and stored on multiple independent storage nodes.
  • the device 102 can be integrated with the storage 1 so that the device 102 has computing capability and storage capability at the same time, and a virtual machine can be created on the device 102, or no virtual machine can be created.
  • a storage-computing separation architecture may be adopted between the device 102 and the memory 1 implementing the cache area 2, that is, the device 102 and the memory 1 are deployed independently and communicate through a network.
  • the memory 1 may include one or more different storage media. In this embodiment This is not limited.
  • FIG. 3 it is a schematic flowchart of a data storage method in an embodiment of the present invention, and the method can be applied to the data storage system 100 shown in FIG. 1 . In actual application, the method may also be applied to other applicable data storage systems.
  • the following uses the data storage system 100 shown in FIG. 1 as an example for illustration. The method may specifically include:
  • S301 The device 101 receives a data processing request, where the data processing request includes target data.
  • the application 1011/device 103 When the application 1011/device 103 writes the target data into the data storage system 100, or modifies the data stored in the data storage system 100 to the target data, it can generate a data processing request including the target data, and pass the data access interface (such as POSIX, etc.) sends the data processing request to the device 101 . In this way, the device 101 can analyze the received data processing request to obtain the target data to be stored by the data storage system 100 . In practical applications, the data processing request may also carry operation indication information for the target data, such as write operation, modification operation, etc., so that the device 101 can obtain an operation record for the target data based on the data processing request.
  • POSIX POSIX, etc.
  • the device 101 writes the target data into the cache area 1 in the device 101 .
  • the device 101 writes the target data in the data processing request to the cache area 2 in the device 102 based on RDMA.
  • the device 101 may execute step S202 first, and then execute step S203; or, the device 101 may execute step S202 and step S203 at the same time; or, the device 101 may first execute step S203 and then execute step S202, etc. Not limited.
  • the device 101 may write the target data into the local cache area 1 and the cache area 2 in the device 102 .
  • the device 102 can realize the direct writing of data through remote direct memory access (RDMA) technology, that is, directly write the target data into the cache area 2 through the network , the intervention of the operating system in the device 102 is not required, which not only reduces the impact of data writing on the performance of the operating system of the device 102, but also takes less time to write data into the cache area 2 .
  • RDMA remote direct memory access
  • the response delay for the data storage system 100 to cache the target data is relatively small. It is worth noting that when the device 101 successfully writes the target data into the cache area 1 and the cache area 2, the device 101 can feed back to the application 1011 or the device 103 that the data processing request has been processed, that is, the storage of the target data is completed. Specifically, the device 101 can generate a response message indicating that the data processing request is successful, and send the response message to the application 1011 or the device 103, so that the data storage system 100 can notify the application 1011 before actually performing persistent storage on the target data. Or the device 102 finishes storing the target data.
  • the response efficiency of the data storage system 100 to the application 1011 or the device 103 can be effectively improved and the response delay can be reduced.
  • the data storage system 100 may complete the persistent storage of the target data in a future time period.
  • the device 101 may write the target data into the cache area 2 based on a one-sided RDMA technology, so as to further reduce the time delay for the device 101 to write data into the cache area 2, thereby The overall time delay for the data storage system 100 to respond to the data processing request can be further reduced.
  • the target data is stored in the caches of the device 101 and the device 102 at the same time, which makes it possible for the data storage system 100 to use the target data in the cache of the other party to avoid data loss when there is a problem with one of the caches.
  • the reliability of data storage in the data storage system 100 can be improved.
  • the device 102 may support multiple devices 101 to read and write data, for example, the device 102 may support multiple clients 201 to read and write data. Therefore, the device 102 may pre-determine the corresponding cache areas 2 for multiple different devices 101 , and the cache areas 2 corresponding to different devices 101 do not overlap. In this way, problems such as data overwriting when different devices 101 write data into the cache of the device 102 can be avoided.
  • device 101 may pre-apply for cache space from device 102, such as sending a cache allocation request including device 101 identifier to device 102, so as to request device 102 to allocate The corresponding cache area 2 is allocated, and the address space of the cache area 2 is fed back to the device 101 . In this way, the subsequent device 101 can directly write the target data into the cache area 2 in the device 102 based on the address space.
  • the device 101 may pre-write the target data and the operation instruction information for the target data into a write-ahead log (write- ahead logging, WAL), so that the device 101 can write the WAL including the target data into the local cache area 1, and directly write the WAL including the target data through two-sided (two-sided) RDMA technology or unilateral RDMA technology into cache area 2 in device 102.
  • WAL write- ahead logging
  • device 101 writes WAL including target data into cache area 1, and directly writes the target data into cache area 2 through RMDA (including unilateral RDMA or bilateral RDMA) technology.
  • RMDA including unilateral RDMA or bilateral RDMA
  • the device 101 may directly write the target data into the local cache area 1, and write the WAL including the target data into the cache area 2 through the RDMA technology.
  • the device 101 directly writes the target data into the cache area 1, and directly writes the target data into the cache area 2 through RMDA technology.
  • the device 101 directly writes the target data into the cache area 2 through the RDMA technology, and generates a key-value pair (key-value) according to the target data, wherein the key in the key-value pair (key ) is the identification of the target data, and the value (value) in the key-value pair is the target data, so that the device 101 can use the key-value pair as a data query index and write it into the local cache area 1, so that the subsequent query target data.
  • the device 101 can generate corresponding key-value pairs for each data, so that the device 101 can generate and save corresponding indexes based on multiple key-value pairs, such as building a tree Indexes to structures, etc.
  • the device 101 when the device 101 saves the key-value pair in the local cache area 1, the device 101 can add the key-value pair to the index to update the index, so that the subsequent device 101 can search for the key-value pair by traversing the index. to the data that the application 1011 needs to read.
  • the device 101 writes the WAL including the target data into the cache area 2 through the RDMA technology, and writes the key-value pairs generated according to the target data into the cache area 1 .
  • the above-mentioned implementation methods of caching the target data in the device 101 and the device 102 are only for some exemplary illustrations. In practical applications, the device 101 can also realize the caching of the target data in the device 101 and the device 102 in other ways. The embodiment does not limit this.
  • the device 101 may generate multiple WALs within a period of time, for example, the application 1011 continuously sends multiple data within the period of time, or multiple processes/threads of the application 1011 send multiple data in parallel.
  • the device 101 may write the multiple WALs into the cache area 2 one by one, that is, each time a WAL is generated, the device 101 writes the WAL into the cache area 2 of the device 102 .
  • the client 102 can also write multiple WALs into the cache area 2 in batches.
  • the device 101 can write 10 WALs in batches to the cache area 2 each time, so as to reduce the number of writes to the cache area 2.
  • the number of communications between the device 101 and the device 102 is the number of WALs.
  • the device 101 when writing multiple WALs in batches, can periodically send multiple WALs in batches to the cache area 2 of the device 102, for example, the device 101 can write multiple WALs generated sequentially within 1 second into the cache area in batches 2, etc.; or, when multiple processes/threads of the application 1011 send multiple When storing data, the device 101 can generate multiple WALs in parallel, so that the device 101 can write the multiple WALs generated in parallel to the device 102 in batches.
  • the device 101 may encapsulate multiple operations performed inside the data storage system 100 into a single semantic, thereby further reducing the number of WALs sent by the device 101 to the device 102 .
  • the operation instruction issued by the application 1011 instructs to perform numerical scaling on all data belonging to the user according to the preset scaling ratio. That is, multiple scaling operations are performed.
  • device 101 can generate a single semantic-level log for the multiple scaling operations performed on the multiple data, and write the semantic-level log into the cache area 2 of device 102, which is different from the multiple scaling operations performed separately
  • the number of logs generated by the device 101 can be effectively reduced, so that the number of logs sent by the device 101 to the device 102 can be effectively reduced.
  • the device 102 when playing back the log, can play back multiple operations according to the semantics recorded in the log.
  • the device 101 may also determine whether the operation on the target data is a legal operation according to the received data processing request. When it is determined that the operation on the target data is a legal operation, the device 101 executes the operation and stores the target data in the cache area 1 and the cache area 2; and when it is determined that the operation on the target data is an illegal operation, the device 101 It is possible to refuse to perform the operation and refuse to store the target data, etc. For example, when the operation on the target data exceeds the scope of operation authority of the application 1011, the device 101 may determine that the operation is illegal, so that the device 101 refuses to perform the operation and refuse to store the target data.
  • the operation on the target data is the operation of creating a file
  • the device 101 may also determine that the operation is illegal and refuse to execute The operation to create a file.
  • the device 101 After the device 101 writes the target data into the local cache area 1 and the cache area 2 of the device 102, it can return a feedback result of successful operation to the application 1011 to indicate to the application 1011 that the target data has been successfully stored in the data storage system 100. In this way, for the application 1011, the response delay of the data storage system 100 for storing the target data is small, which meets the requirement of the application 1011 for low-latency response.
  • S304 The device 102 persistently stores the target data in the cache area 2 .
  • the address space in the cache area 2 is smaller than the address space used by the device 102 for persistently storing data, therefore, the device 102 can downwrite the data stored in the cache area 2 to the persistent storage area.
  • the device 102 can obtain the target data by playing back the WAL, and write it into the memory 2 .
  • the device 102 may periodically scan the cache area 2 corresponding to each device 101 , and when there is data in the cache area 2 , the device 102 may persistently store the data in the memory 2 . Or, when the amount of data stored in the cache area 2 reaches a preset threshold, the device 102 may persistently store the data in the cache area 2 into the memory 2, and when the amount of data stored in the cache area 2 does not reach When the threshold is preset, the device 102 may not persistently store the data in the current scan cycle, and then perform persistent storage of the data when it is subsequently determined that the amount of data in the cache area 2 reaches the preset threshold.
  • the device 101 can notify the device 102 to update the cache area 2 after writing the target data into the cache area 2. 2 for persistent storage of data.
  • the device 101 may periodically notify the device 102 to persistently store the data in the cache area 2 .
  • the device 101 may not need to send a notification of persistently stored data to the device 102 in this period.
  • the data storage system 100 may also trigger the device 102 to persistently store the data in the cache area 2 in other ways, which is not limited in this embodiment.
  • the device 101 and/or the device 102 may delete the target data in the cache area 1, so as to release the storage space occupied by the target data in the cache area 1.
  • the device 101 may record a first log serial number (log serial number, LSN) corresponding to the target data when generating the WAL.
  • LSN log serial number
  • the LSN of the WAL is generally a monotonically increasing positive integer, that is, for multiple WALs generated sequentially, the LSNs corresponding to the multiple WALs increase gradually.
  • the device 102 may feed back the second LSN corresponding to the played back WAL to the device 101 .
  • the device 101 can determine, based on the received second LSN, that all WALs with LSNs smaller than the second LSN have completed playback, so that all data included in the WALs with LSNs smaller than the second LSN have been written into the memory 2 .
  • the device 101 can compare the size between the first LSN and the second LSN, and when the first LSN is equal to the second LSN, it indicates that the WAL including the target data has completed playback, and the target data has been persistently stored in In the memory 2, at this time, the device 101 can eliminate the target data in the cache area, for example, can eliminate the WAL including the target data, etc., and release the storage space occupied by the target data in the cache area 1, so as to support the device 101 to the More other data is stored in cache area 1.
  • data written by the application 1011 multiple times may be stored in the cache area 1, and the device 101 may generate a WAL including the data each time the data written by the application 1011 is stored.
  • device 101 can eliminate one or more WALs whose LSN is less than or equal to the second LSN, and WALs whose LSN is greater than the second LSN can continue to be stored in cache area 1 without being eliminated.
  • the value in the key-value pair may correspond to the first value of the WAL.
  • An LSN the first LSN may be recorded in the local cache by the device 101 when generating the WAL.
  • the device 102 persistently stores the target data in the cache area 2 to the memory 2, it may feed back the second LSN corresponding to the played back WAL to the device 101 .
  • device 101 can compare the size between the first LSN corresponding to the value in the key-value pair and the second LSN corresponding to the WAL that device 102 has played back.
  • the characterization target data has been persistently stored in the storage 2 by the server, so that the device 101 can eliminate the key-value pair in the storage area 1 to release the key-value pair in the cache area 1
  • the storage space used in .
  • the first LSN is greater than the second LSN, it indicates that the target data has not been persistently stored, and at this time, the key-value pair can continue to be stored in the cache area 1 .
  • the device 101 may also determine whether the target data has been persistently stored in other ways, such as adding a gradually increasing Timestamps, serial numbers, etc. determine whether to eliminate target data from the cache area 1, which is not limited in this embodiment.
  • the device 102 may also delete the target data in the cache area 2 after the target data is persistently stored, so as to reclaim the storage space occupied by the target data in the cache area 2 .
  • the target data is stored in the cache area 2 in the form of WAL
  • the device 102 after the device 102 completes the playback of the WAL and persistently stores the target data, it can eliminate the WAL whose LSN is smaller than the above-mentioned second LSN, and release the WAL The occupied storage space, in order to use the released storage space to continue to store other WAL newly written by the device 101.
  • the device 102 after the device 102 eliminates the WAL whose LSN is smaller than the above-mentioned second LSN, it can feed back the address information corresponding to the released storage space to the device 101, so that the subsequent device 101 can write the newly generated WAL into the released storage according to the address information. space.
  • FIG. 4 taking the application 1011 reading the target data stored in the data storage system 100 as an example, exemplarily illustrates the data reading process.
  • FIG. 4 a schematic flow diagram of a data reading method is shown, and the method may specifically include:
  • the device 101 receives a data read request sent by the application 1011, where the data read request includes an identifier of target data.
  • the device 101 searches whether the target data is stored in the local cache area 1 according to the identifier of the target data. If found, continue to execute step S306; and if not found, continue to execute step S303.
  • S403 The device 101 requests the device 102 to feed back the target data.
  • the data storage system 100 will first cache the target data in the local cache area 1 of the device 101 . Therefore, the device 101 may first search from the local cache area 1 whether the target data is still stored in the local cache. In this way, if the device 101 finds the target data, the device 101 can directly feed back the target data in the cache area 1 to the application 1011 , thereby effectively reducing the time delay for the data storage system 100 to respond to the application 1011 .
  • the data stored in the local cache area 1 of the device 101 has a certain timeliness, that is, the data may be eliminated by the device 101 after being stored in the cache area 1 for a period of time, therefore, when the device 101 does not find the The target data indicates that the target data has been persistently stored by the device 102 , so that the device 101 can request the target data from the device 102 .
  • this embodiment provides the following exemplary implementation manners of searching for target data in cache area 1:
  • the target data is directly stored in the cache area 1, and the device 101 can search the target data from the cache area 1 according to the identifier of the target data by traversing the data in the cache area 1.
  • the target data is stored in the cache area 1 in the form of WAL, then the device 101 can determine whether there is a WAL containing the target data according to the identifier of the target data by traversing each WAL in the cache area 1, Therefore, the device 101 can obtain the target data by playing back the WAL.
  • the target data is stored in the cache area 1 in the form of a key-value pair
  • the device 101 can use the identifier of the target data as a key to traverse the pre-built index to find out the A key-value pair, so that the value in the key-value pair is the target data to be searched.
  • the device 101 may also search for target data from the local cache area 1 in other ways, which is not limited in this embodiment.
  • S404 The device 102 searches for the target data from the persistent storage area.
  • the specific process of device 102 searching for target data from the persistent storage area may refer to the implementation manner of device 101 searching for target data in cache area 1, which will not be described in detail here in this embodiment.
  • the embodiment of the present invention also provides a computing device.
  • the computing device 500 may include a communication interface 510 and a processor 520 .
  • the computing device 500 may further include a memory 530 .
  • the memory 530 may be set inside the computing device 500 , and may also be set outside the computing device 500 .
  • each action performed by the first device in the embodiments shown in FIG. 3 and FIG. 4 may be implemented by the processor 520 .
  • the processor 520 may obtain the data processing request through the communication interface 510, and use it to implement any method executed in FIG. 3 and FIG. 4 .
  • each step of the processing flow can be implemented through the integrated logic circuit of the hardware in the processor 520
  • instructions in the form of software complete the methods executed in FIG. 3 and FIG. 4 .
  • the program codes executed by the processor 520 to implement the above methods may be stored in the memory 530 .
  • the memory 530 is connected to the processor 520, such as a coupled connection.
  • Some features of the embodiments of the present invention may be implemented/supported by the processor 520 executing program instructions or software codes in the memory 530 .
  • the software components being loaded on the memory 530 can be summarized functionally or logically, for example, the data writing module 502, the eliminating module 503, and the searching module 504 shown in FIG. 5 .
  • the function of the communication module 501 can be realized by the communication interface 510 .
  • Any communication interface involved in the embodiments of the present invention may be a circuit, a bus, a transceiver or any other device that can be used for information exchange.
  • the communication interface 510 in the computing device 500 for example, the other device may be a device connected to the computing device 500 .
  • the processor involved in the embodiment of the present invention may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or Execute the methods, steps and logic block diagrams disclosed in the embodiments of the present invention.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the methods disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the coupling in the embodiments of the present invention is an indirect coupling or communication connection between devices, modules or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
  • the processor may operate in conjunction with the memory.
  • the memory may be a nonvolatile memory, such as a hard disk or a solid state disk, or a volatile memory, such as a random access memory.
  • a memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the embodiment of the present invention does not limit the specific connection medium among the communication interface, the processor, and the memory.
  • the memory, the processor, and the communication interface can be connected through a bus.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • an embodiment of the present invention also provides a storage device.
  • the storage device is used as the first device in the data storage system, and the data storage system further includes a second device.
  • the data storage system can be, for example, It is the system shown in FIG. 1 or FIG. 2 or the like.
  • the storage device includes a first cache area (such as the cache area 1 in the above embodiment) and a processor, the first cache area is used for caching data, and the processor is used to execute the following method by running a computer program:
  • the processor executes the method performed by the device 101 provided in any one or more of the foregoing embodiments through a computer program.
  • an embodiment of the present invention also provides a computer storage medium, in which a software program is stored, and when the software program is read and executed by one or more processors, any one or more of the above-mentioned The method executed by the data storage system 100 provided in the embodiment.
  • the computer storage medium may include: various media capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.
  • an embodiment of the present invention also provides a chip, which includes a processor, configured to implement the functions of the data storage system 100 involved in the above embodiments, for example, to implement the functions executed in FIG. 3 and FIG. 4 Methods.
  • the chip further includes a memory for necessary program instructions and data executed by the processor.
  • the chip may consist of chips, or may include chips and other discrete devices.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a data storage system, comprising a first device and a second device, the first device comprising a first cache area, and the second device comprising a second cache area, wherein the first device is used for receiving a processing request, writing target data in the data processing request into the second cache area on the basis of RDMA, and writing the target data into the first cache area, wherein after the target data is successfully written into the first cache area and the second cache area, the data processing request is completed in the data storage system; and the second device is used for persistent storage of the target data written into the second cache area. In this way, the response time delay of storing the target data by the data storage system is relatively low, and the target data can be prevented from being lost by means of double-cache, such that the data storage reliability of the data storage system can be improved. In addition, the present application further discloses a corresponding data storage method and apparatus, and a related device.

Description

一种数据存储系统、数据存储方法、装置及相关设备A data storage system, data storage method, device and related equipment 技术领域technical field
本申请涉及数据存储技术领域,尤其涉及一种数据存储系统、数据存储方法、装置及相关设备。The present application relates to the technical field of data storage, and in particular to a data storage system, data storage method, device and related equipment.
背景技术Background technique
数据存储系统,是指对用于对存储设备的存储空间进行组织和分配,并负责将数据以进行存储以及对该数据进行检索的系统。实际应用的部分场景中,数据存储系统可以采用多设备模式提供数据读写服务。具体的,数据存储系统中的其中一个设备可以对外提供数据访问接口,并通过该数据访问接口接收需要进行存储的新数据。通常情况下,该设备将新数据缓存至本地后,向数据发送方反馈数据写入成功的应答消息(acknowledgement,ack),后续再将本地缓存中的新数据写入数据存储系统中的其它设备进行持久化存储,以此减小存储数据系统存储数据的响应时延。A data storage system refers to a system that organizes and allocates storage space for storage devices and is responsible for storing and retrieving data. In some practical application scenarios, the data storage system can provide data read and write services in a multi-device mode. Specifically, one of the devices in the data storage system may provide a data access interface externally, and receive new data to be stored through the data access interface. Normally, after the device caches the new data locally, it feeds back an acknowledgment message (acknowledgment, ack) that the data is written successfully to the data sender, and then writes the new data in the local cache to other devices in the data storage system Perform persistent storage to reduce the response delay of the storage data system to store data.
但是,当对外提供数据访问接口的设备发生故障或者该设备的缓存故障时,本地缓存中存储的数据因为没有及时进行持久化存储而发生丢失,这就降低了数据存储系统的可靠性。However, when the device that provides the data access interface fails or the cache of the device fails, the data stored in the local cache is lost because it is not persistently stored in time, which reduces the reliability of the data storage system.
发明内容Contents of the invention
提供一种数据存储系统、数据存储方法、装置、计算设备、存储介质以及计算机程序产品,以使得数据存储系统在具有较小响应时延的同时,数据存储的可靠性也较高。A data storage system, a data storage method, a device, a computing device, a storage medium and a computer program product are provided, so that the data storage system has relatively low response time delay and high reliability of data storage.
第一方面,本发明实施例提供一种数据存储系统,该数据存储系统包括第一设备以及第二设备,并且,第一设备包括第一缓存区域,第二设备包括第二缓存区域。其中,第一设备用于接收处理请求,该数据处理请求例如可以用于请求写入目标数据等,可以由其它设备或者该第一设备上的应用程序发送给该第一设备,并且,第一设备能够基于RDMA将数据处理请求中的目标数据写入第二缓存区域,将该目标数据写入本地的第一缓存区域;第二设备用于将写入第二缓存区域中的目标数据进行持久化存储。In a first aspect, an embodiment of the present invention provides a data storage system, where the data storage system includes a first device and a second device, and the first device includes a first cache area, and the second device includes a second cache area. Wherein, the first device is used to receive a processing request, and the data processing request may be used to request writing target data, etc., and may be sent to the first device by other devices or an application program on the first device, and the first The device can write the target data in the data processing request into the second cache area based on RDMA, and write the target data into the local first cache area; the second device is used to persist the target data written in the second cache area storage.
在第一设备将目标数据写入本地的第一缓存区域,并通过RDMA技术将目标数据也写入第二设备的第二缓存区域之后,该数据处理请求即为处理完成。由于RDMA不需要CPU参与,并且本方案并不需要第二设备把目标数据进行持久化存储后才认为该数据处理请求完成处理,这使得数据存储系统存储该目标数据的响应时延较低,延时与仅把数据存储到第一设备的缓存区域接近。同时,目标数据分别存储于第一设备以及第二设备中的缓存区域,这使得其中一个设备的缓存发生故障时,数据存储系统也能通过另一个设备的缓存来避免该目标数据发生丢失,以此可以提高数据存储系统存储数据的可靠性。因此,第一方面提供了一种兼顾延时低并且提高数据可靠性的数据存储系统。After the first device writes the target data into the local first cache area and writes the target data into the second cache area of the second device through RDMA technology, the data processing request is completed. Since RDMA does not require the participation of the CPU, and this solution does not require the second device to persistently store the target data before considering the data processing request to be processed, this makes the response delay of the data storage system to store the target data relatively low. is close to storing data only in the cache area of the first device. At the same time, the target data is stored in the cache areas of the first device and the second device respectively, so that when the cache of one device fails, the data storage system can also avoid the loss of the target data through the cache of the other device, so that This can improve the reliability of data stored in the data storage system. Therefore, the first aspect provides a data storage system with low latency and improved data reliability.
在一种可能的实施方式中,第一设备还用于在目标数据被写入第一缓存区域以及第二缓存区域之后,并且在第二设备将第二缓存区域中的目标数据进行持久化存储之前,生成数据处理请求成功的响应消息。如此,在目标数据成功写入两个缓存区域后,存储系统视为数据处理请求执行完毕,并且可以生成数据处理请求的响应消息,以便利用该响应消息向发送数 据处理请求的应用或者其它设备反馈该目标数据完成存储,这使得数据存储系统存储该目标数据的响应时延较低,而数据存储系统可以在反馈该响应消息之后在对该目标数据进行持久化存储。In a possible implementation manner, the first device is further configured to persistently store the target data in the second cache area on the second device after the target data is written into the first cache area and the second cache area Before, generate a response message that the data processing request was successful. In this way, after the target data is successfully written into the two cache areas, the storage system regards the execution of the data processing request as completed, and can generate a response message to the data processing request, so that the response message can be used to send data to According to the feedback from the application processing the request or other devices that the target data has been stored, the response delay for the data storage system to store the target data is low, and the data storage system can persist the target data after feeding back the response message storage.
在一种可能的实施方式中,第一设备还用于在确定目标数据被持久化存储后,淘汰第一缓存区域中的目标数据。如此,可以在保证目标数据被持久化存储的情况下,释放该目标数据在第一缓存区域中所占用的存储空间,以便后续第一设备可以利用被释放的存储空间存储其它新写入的数据,实现第一缓存区域的存储资源的循环利用。In a possible implementation manner, the first device is further configured to, after determining that the target data is persistently stored, eliminate the target data in the first cache area. In this way, the storage space occupied by the target data in the first cache area can be released under the condition that the target data is guaranteed to be persistently stored, so that the first device can use the released storage space to store other newly written data later. , realizing recycling of storage resources in the first cache area.
在一种可能的实施方式中,第二设备还用于在确定目标数据被持久化存储后,淘汰第二缓存区域中的目标数据。如此,可以释放该目标数据在第二缓存区域中所占用的存储空间,以便后续第一设备其它数据缓存至该被释放的存储空间,实现第二缓存区域的存储资源的循环利用。In a possible implementation manner, the second device is further configured to, after determining that the target data is stored persistently, eliminate the target data in the second cache area. In this way, the storage space occupied by the target data in the second cache area can be released, so that subsequent data of the first device can be cached in the released storage space, and the storage resources of the second cache area can be recycled.
在一种可能的实施方式中,第一设备在将目标数据写入第二缓存区域时,具体用于生成包括目标数据的WAL,再基于RDMA技术将该WAL写入第二缓存区域。这样,第二设备后续可以通过回放日志的方式,将WAL中的目标数据进行持久化存储。In a possible implementation manner, when writing the target data into the second cache area, the first device is specifically configured to generate a WAL including the target data, and then write the WAL into the second cache area based on the RDMA technology. In this way, the second device can persistently store the target data in the WAL by replaying the log.
在一种可能的实施方式中,第一设备具体用于将包括目标数据的WAL写入第一缓存区域中。这样,当后续第一设备从本地的第一缓存区域中读取目标数据时,可以通过回放日志的方式根据WAL获取目标数据。可选地,第一设备也可以是直接将目标数据写入第一缓存区域中。In a possible implementation manner, the first device is specifically configured to write the WAL including the target data into the first cache area. In this way, when the first device subsequently reads the target data from the local first cache area, the target data may be acquired according to the WAL by playing back a log. Optionally, the first device may also directly write the target data into the first cache area.
在一种可能的实施方式中,第一设备在将目标数据写入第一缓存区域,具体用于生成键值对,该键值对中的键为目标数据的标识,并且,该键值对中的值为目标数据,从而第一设备可以将该键值对写入第一缓存区域。如此,第一设备可以基于该目标数据在第一缓存区域中生成索引,从而当需要读取该目标数据时,第一设备可以从本地的第一缓存区域通过遍历该索引快速查找到目标数据。In a possible implementation manner, when the first device writes the target data into the first cache area, it is specifically used to generate a key-value pair, where the key in the key-value pair is the identifier of the target data, and the key-value pair The value in is the target data, so that the first device can write the key-value pair into the first cache area. In this way, the first device can generate an index in the first cache area based on the target data, so that when the target data needs to be read, the first device can quickly find the target data from the local first cache area by traversing the index.
在一种可能的实施方式中,当目标数据通过键值对写入第一缓存区域时,该键值对中的值还对应于WAL的第一LSN,则,第一设备还用于获取第二设备已回放的WAL对应的第二LSN,并且,当该第一LSN与第二LSN相等时,第一设备可以将该键值对从第一缓存区域中淘汰。如此,第一设备可以利用WAL对应的LSN识别第一缓存区域中的哪些目标数据被持久化存储,并通过释放该目标数据在第一缓存区域中所占用的存储空间,来支持第一设备继续写入其它数据,实现第一缓存区域的存储资源的循环利用;而对于通过LSN确定尚未完成持久化的数据,可以继续存储至第一缓存区域中,以提高该数据在数据存储系统中进行存储的可靠性。In a possible implementation manner, when the target data is written into the first cache area through a key-value pair, the value in the key-value pair also corresponds to the first LSN of the WAL, and the first device is also used to obtain the first LSN of the WAL The second LSN corresponding to the WAL played back by the second device, and when the first LSN is equal to the second LSN, the first device may remove the key-value pair from the first cache area. In this way, the first device can use the LSN corresponding to the WAL to identify which target data in the first cache area is persistently stored, and release the storage space occupied by the target data in the first cache area to support the first device to continue Write other data to realize the recycling of storage resources in the first cache area; and for the data that has not been persisted through the LSN, it can continue to be stored in the first cache area to improve the storage of the data in the data storage system reliability.
在一种可能的实施方式中,第一设备还用于接收数据读取请求,该数据读取请求包括目标数据的标识,从而第一设备可以根据该目标数据的标识,从第一缓存区域中查找是否包括目标数据,并且,当第一缓存区域中包括该目标数据时,第一设备可以反馈查找出的目标数据。如此,第一设备可以从本地的缓存区域中快速查找出所要读取的目标数据,可以不用通过远程访问第二设备的方式获取目标数据,从而可以有效提高数据存储系统反馈目标数据的效率。In a possible implementation manner, the first device is further configured to receive a data read request, where the data read request includes an identifier of the target data, so that the first device can read the data from the first cache area according to the identifier of the target data Check whether the target data is included, and, when the target data is included in the first cache area, the first device may feed back the found target data. In this way, the first device can quickly find the target data to be read from the local cache area, and can obtain the target data without remotely accessing the second device, thereby effectively improving the efficiency of the data storage system for feeding back target data.
在一种可能的实施方式中,第一设备还用于当第一缓存区域中不包括目标数据时,如该目标数据因为已经完成持久化存储而从第一缓存区域中淘汰,则第一设备向第二设备请求该目标数据,以便第一设备后续反馈该目标数据。In a possible implementation manner, the first device is further configured to: when the target data is not included in the first cache area, if the target data is eliminated from the first cache area because the persistent storage has been completed, the first device The target data is requested from the second device, so that the first device subsequently feeds back the target data.
在一种可能的实施方式中,第一设备具体用于基于单边RDMA技术将目标数据写入第二缓存区域,如此,可以进一步有效降低第一设备向第二设备写入目标数据的时延,从而可以提 高数据存储系统存储目标数据的响应效率。可选地,第一设备也可以是通过双边RDMA技术将目标数据写入第二缓存区域。In a possible implementation manner, the first device is specifically configured to write the target data into the second cache area based on the unilateral RDMA technology, so that the delay in writing the target data from the first device to the second device can be further effectively reduced , so that the High response efficiency of data storage system to store target data. Optionally, the first device may also write the target data into the second cache area by using a bilateral RDMA technology.
在一种可能的实施方式中,数据存储系统具体可以是分布式文件系统,其中,该第一设备实现分布式文件系统中的客户端,该第二设备实现分布式文件系统中的服务端,并且在该分布式文件系统中,第二设备将目标数据通过文件格式进行持久化存储。In a possible implementation manner, the data storage system may specifically be a distributed file system, where the first device implements a client in the distributed file system, and the second device implements a server in the distributed file system, And in the distributed file system, the second device persistently stores the target data in a file format.
第二方面,本发明实施例提供一种数据存储方法,该数据存储方法应用于数据存储系统,该数据存储系统包括第一设备以及第二设备,并且,第一设备包括第一缓存区域,第二设备包括第二缓存区域,该方法包括:第一设备接收数据处理请求,该数据处理请求包括目标数据;并且,第一设备基于RDMA将目标数据写入第二缓存区域,将目标数据写入第一缓存区域。In a second aspect, an embodiment of the present invention provides a data storage method, the data storage method is applied to a data storage system, the data storage system includes a first device and a second device, and the first device includes a first cache area, the second The second device includes a second cache area, and the method includes: the first device receives a data processing request, and the data processing request includes target data; and, the first device writes the target data into the second cache area based on RDMA, and writes the target data into the second cache area. The first cache area.
在一种可能的实施方式中,该方法还包括:在所述目标数据被写入所述第一缓存区域以及所述目标数据被写入所述第二缓存区域之后,并且在所述第二设备将所述第二缓存区域中的目标数据进行持久化存储之前,所述第一设备生成所述数据处理请求成功的响应消息。In a possible implementation manner, the method further includes: after the target data is written into the first cache area and the target data is written into the second cache area, and after the second Before the device persistently stores the target data in the second cache area, the first device generates a response message indicating that the data processing request is successful.
在一种可能的实施方式中,该方法还包括:在确定目标数据被持久化存储后,第一设备淘汰第一缓存区域中的目标数据。In a possible implementation manner, the method further includes: after determining that the target data is persistently stored, the first device eliminates the target data in the first cache area.
在一种可能的实施方式中,第一设备基于远程直接内存访问RDMA将目标数据写入第二缓存区域,包括:第一设备生成包括目标数据的WAL;第一设备基于RDMA将WAL写入第二缓存区域。In a possible implementation manner, the first device writes the target data into the second cache area based on Remote Direct Memory Access (RDMA), including: the first device generates WAL including the target data; the first device writes the WAL into the second cache area based on RDMA. Second cache area.
在一种可能的实施方式中,第一设备将目标数据写入第一缓存区域,包括:第一设备将WAL写入第一缓存区域。In a possible implementation manner, the writing the target data into the first cache area by the first device includes: writing the WAL into the first cache area by the first device.
在一种可能的实施方式中,第一设备将目标数据写入第一缓存区域,包括:第一设备生成键值对,键值对中的键为目标数据的标识,键值对中的值为目标数据;第一设备将键值对写入第一缓存区域。In a possible implementation manner, the first device writes the target data into the first cache area, including: the first device generates a key-value pair, the key in the key-value pair is the identifier of the target data, and the value in the key-value pair is the target data; the first device writes the key-value pair into the first cache area.
在一种可能的实施方式中,键值对中的值还对应于WAL的第一日志序列号LSN,方法还包括:第一设备获取第二设备已回放的WAL对应的第二LSN;当第一LSN与第二LSN相等时,第一设备淘汰第一缓存区域中的键值对。In a possible implementation manner, the value in the key-value pair also corresponds to the first log sequence number LSN of the WAL, and the method further includes: the first device obtains the second LSN corresponding to the WAL that has been played back by the second device; When the first LSN is equal to the second LSN, the first device eliminates the key-value pairs in the first cache area.
在一种可能的实施方式中,方法还包括:第一设备接收数据读取请求,数据读取请求包括目标数据的标识;第一设备根据目标数据的标识,查找第一缓存区域中是否包括目标数据;当第一缓存区域中包括目标数据时,反馈目标数据。In a possible implementation manner, the method further includes: the first device receives a data read request, and the data read request includes the identifier of the target data; the first device searches whether the target data is included in the first cache area according to the identifier of the target data data; when the target data is included in the first cache area, the target data is fed back.
在一种可能的实施方式中,方法还包括:当第一缓存区域中不包括目标数据时,第一设备向第二设备请求目标数据。In a possible implementation manner, the method further includes: when the first cache area does not include the target data, the first device requests the second device for the target data.
在一种可能的实施方式中,第一设备基于远程直接内存访问RDMA将目标数据写入第二缓存区域,包括:第一设备基于单边RDMA将目标数据写入第二缓存区域。In a possible implementation manner, the first device writing the target data into the second cache area based on remote direct memory access (RDMA) includes: the first device writing the target data into the second cache area based on unilateral RDMA.
在一种可能的实施方式中,数据存储系统包括分布式文件系统,第一设备实现分布式文件系统中的客户端,第二设备实现分布式文件系统中的服务端,并且第二设备将目标数据通过文件格式进行持久化存储。In a possible implementation manner, the data storage system includes a distributed file system, the first device implements a client in the distributed file system, the second device implements a server in the distributed file system, and the second device implements the target Data is stored persistently in a file format.
由于第二方面提供的数据存储方法,对应于第一方面提供的数据存储系统,因此,第二方面以及第二方面中各实施方式所具有技术效果,可以参见相应的第一方面以及第一方面中各实施方式所具有的技术效果,在此不做赘述。Since the data storage method provided by the second aspect corresponds to the data storage system provided by the first aspect, the technical effects of the second aspect and the implementation methods in the second aspect can be referred to the corresponding first aspect and the first aspect The technical effects of each of the implementation modes are not repeated here.
第三方面,本发明实施例提供了一种存储设备,该存储设备作为数据存储系统中的第一设备,所述数据存储系统还包括第二设备,所述存储设备包括第一缓存区域以及处理器,其中,第一缓存区域,用于缓存数据;处理器,用于通过运行计算机程序执行以下方法:接收 数据处理请求,所述数据处理请求包括目标数据;基于远程直接内存访问RDMA将所述目标数据写入所述第二缓存区域,以便所述第二缓存区域中的目标数据被持久化存储;并将所述目标数据写入所述第一缓存区域。In the third aspect, the embodiment of the present invention provides a storage device, the storage device is used as the first device in the data storage system, the data storage system further includes a second device, and the storage device includes a first cache area and a processing A device, wherein the first cache area is used to cache data; the processor is used to execute the following method by running a computer program: receiving A data processing request, the data processing request including target data; writing the target data into the second cache area based on remote direct memory access (RDMA), so that the target data in the second cache area is persistently stored; and Writing the target data into the first cache area.
在一种可能的实施方式中,所述处理器还用于:在所述目标数据被写入所述第一缓存区域以及所述目标数据被写入所述第二缓存区域之后,并且在所述第二缓存区域中的目标数据被持久化存储之前,生成所述数据处理请求成功的响应消息。In a possible implementation manner, the processor is further configured to: after the target data is written into the first cache area and the target data is written into the second cache area, and after the Before the target data in the second cache area is persistently stored, a response message indicating that the data processing request is successful is generated.
在一种可能的实施方式中,所述处理器用于:确定所述目标数据被持久化存储后,淘汰所述第一缓存区域中的目标数据。In a possible implementation manner, the processor is configured to: delete the target data in the first cache area after determining that the target data is persistently stored.
在一种可能的实施方式中,所述处理器用于:生成包括目标数据的预写式日志WAL;基于RDMA将所述WAL写入所述第二缓存区域。In a possible implementation manner, the processor is configured to: generate a write-ahead log WAL including target data; and write the WAL into the second cache area based on RDMA.
在一种可能的实施方式中,所述处理器用于:将所述WAL写入所述第一缓存区域;或者,生成键值对,所述键值对中的键为所述目标数据的标识,所述键值对中的值为所述目标数据,并将所述键值对写入所述第一缓存区域。In a possible implementation manner, the processor is configured to: write the WAL into the first cache area; or generate a key-value pair, where a key in the key-value pair is an identifier of the target data , the value in the key-value pair is the target data, and the key-value pair is written into the first cache area.
在一种可能的实施方式中,所述键值对中的值还对应于所述WAL的第一日志序列号LSN,所述处理器还用于:获取所述第二设备已回放的WAL对应的第二LSN;当所述第一LSN与所述第二LSN相等时,淘汰所述第一缓存区域中的所述键值对。In a possible implementation manner, the value in the key-value pair also corresponds to the first log sequence number LSN of the WAL, and the processor is further configured to: obtain the WAL corresponding to the playback of the second device the second LSN of the second LSN; when the first LSN is equal to the second LSN, eliminate the key-value pair in the first cache area.
在一种可能的实施方式中,所述处理器还用于:接收数据读取请求,所述数据读取请求包括所述目标数据的标识;根据所述目标数据的标识,优先查找所述第一缓存区域中是否包括所述目标数据;当所述第一缓存区域中包括所述目标数据时,反馈所述目标数据;当所述第一缓存区域中不包括所述目标数据时,向所述第二设备请求所述目标数据。In a possible implementation manner, the processor is further configured to: receive a data read request, where the data read request includes an identifier of the target data; according to the identifier of the target data, search for the first Whether the target data is included in a buffer area; when the target data is included in the first buffer area, the target data is fed back; when the target data is not included in the first buffer area, the target data is sent to the The second device requests the target data.
在一种可能的实施方式中,所述处理器用于:基于单边RDMA将所述目标数据写入所述第二缓存区域。In a possible implementation manner, the processor is configured to: write the target data into the second cache area based on unilateral RDMA.
在一种可能的实施方式中,所述数据存储系统包括分布式文件系统,所述第一设备实现所述分布式文件系统中的客户端,所述第二设备实现所述分布式文件系统中的服务端,并且所述第二设备将所述目标数据通过文件格式进行持久化存储。In a possible implementation manner, the data storage system includes a distributed file system, the first device implements a client in the distributed file system, and the second device implements a client in the distributed file system The server end, and the second device persistently stores the target data in a file format.
由于第三方面提供的存储设备,对应于第一方面提供的数据存储系统中的第一设备,因此,第三方面以及第三方面中各实施方式所具有技术效果,可以参见相应的第一方面以及第一方面中各实施方式所具有的技术效果,在此不做赘述。Since the storage device provided by the third aspect corresponds to the first device in the data storage system provided by the first aspect, the third aspect and the technical effects of the implementations in the third aspect can be referred to the corresponding first aspect As well as the technical effects of each implementation manner in the first aspect, details will not be described here.
第四方面,本发明实施例提供一种计算设备,包括:处理器和存储器;该存储器用于存储指令,当该计算设备运行时,该处理器执行该存储器存储的该指令,以使该计算设备执行上述第二方面或第二方面的任一实现方式中第一设备所执行的数据存储方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。In a fourth aspect, an embodiment of the present invention provides a computing device, including: a processor and a memory; the memory is used to store instructions, and when the computing device is running, the processor executes the instructions stored in the memory, so that the computing The device executes the second aspect or the data storage method executed by the first device in any implementation manner of the second aspect. It should be noted that the memory may be integrated in the processor, or independent of the processor. A computing device may also include a bus. Wherein, the processor is connected to the memory through the bus. Wherein, the memory may include a readable memory and a random access memory.
第五方面,本发明实施例还提供一种可读存储介质,所述可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述第二方面或第二方面的任一实现方式中第一设备所执行的数据存储方法被执行。In the fifth aspect, the embodiment of the present invention also provides a readable storage medium, the readable storage medium stores a program or an instruction, and when it is run on a computer, any one of the above-mentioned second aspect or the second aspect In the implementation manner, the data storage method executed by the first device is executed.
第六方面,本发明实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面或第二方面的任一实现方式中第一设备所执行的数据存储方法。In a sixth aspect, an embodiment of the present invention further provides a computer program product containing instructions, which, when run on a computer, cause the computer to execute the above-mentioned second aspect or any implementation manner of the second aspect executed by the first device. data storage method.
另外,第二方面至六方面中任一种实现方式所带来的技术效果可参见第一方面中不同实现方式所带来的技术效果,此处不再赘述。 In addition, for the technical effects brought about by any one of the implementations from the second aspect to the sixth aspect, please refer to the technical effects brought about by different implementations in the first aspect, which will not be repeated here.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some implementations recorded in this application. For example, those skilled in the art can also obtain other drawings based on these drawings.
图1为本发明实施例提供的一示例性数据存储系统的结构示意图;FIG. 1 is a schematic structural diagram of an exemplary data storage system provided by an embodiment of the present invention;
图2为本发明实施例提供的一示例性分布式文件系统的结构示意图;FIG. 2 is a schematic structural diagram of an exemplary distributed file system provided by an embodiment of the present invention;
图3为本发明实施例提供的一种数据存储方法的流程示意图;FIG. 3 is a schematic flowchart of a data storage method provided by an embodiment of the present invention;
图4为本发明实施例提供的一种数据读取方法的流程示意图;FIG. 4 is a schematic flowchart of a data reading method provided by an embodiment of the present invention;
图5为本发明实施例提供的一种数据存储装置的结构示意图;FIG. 5 is a schematic structural diagram of a data storage device provided by an embodiment of the present invention;
具体实施方式Detailed ways
参见图1,为一示例性数据存储系统的结构示意图。如图1所示,数据存储系统100可以包括设备101以及设备102,并且设备101以及设备102之间可以进行数据交互,如通过超文本传输协议(Hypertext Transfer Protocol,HTTP)进行数据通信等。Referring to FIG. 1 , it is a schematic structural diagram of an exemplary data storage system. As shown in FIG. 1, the data storage system 100 may include a device 101 and a device 102, and data interaction may be performed between the device 101 and the device 102, such as data communication through Hypertext Transfer Protocol (Hypertext Transfer Protocol, HTTP).
其中,设备101可以提供文件访问接口,如可移植操作系统接口(portable operating system interface,POSIX)等,从而与数据存储系统100连接的设备103或者设备101中的应用(application)1011可以通过该文件访问接口,实现在数据存储100中针对文件进行删除、读取、写入、修改等操作。并且,设备101中可以包括缓存区域1,并可以通过该缓存区域1缓存数据。Wherein, the device 101 can provide a file access interface, such as a portable operating system interface (portable operating system interface, POSIX), etc., so that the device 103 connected to the data storage system 100 or the application (application) 1011 in the device 101 can pass the file The access interface implements operations such as deleting, reading, writing, and modifying files in the data storage 100 . In addition, the device 101 may include a cache area 1, and data may be cached through the cache area 1 .
设备102包括缓存区域2,并且设备102可以通过该缓存区域2缓存数据。进一步地,设备102中还可以包括持久化存储区域,从而设备102可以将缓存区域2中的数据写入持久化存储区域中进行持久化存储。示例性地,缓存区域2可以通过只读内存(read-only memory,ROM)、闪存(flash memory)、存储级存储器(storage class memory,SCM)等非易失性存储器(non-volatile memory,NVM)进行实现,或者,缓存区域2也可以是通过随机存取存储器(random-access memory,RAM)等易失性存储器(volatile memory)进行实现,本实施例对此并不进行限定。持久化存储区域,可以是通过硬盘(hard disk drive,HDD)或者其他不需要电源即可长期存储数据的存储介质进行实现,其中,硬盘可以是磁盘,或者固态硬盘(solid-state drive,SSD)或者SCM盘等。The device 102 includes a cache area 2 , and the device 102 can cache data through the cache area 2 . Further, the device 102 may also include a persistent storage area, so that the device 102 may write the data in the cache area 2 into the persistent storage area for persistent storage. Exemplarily, the cache area 2 can be implemented by non-volatile memory (non-volatile memory, NVM) such as read-only memory (read-only memory, ROM), flash memory (flash memory), storage class memory (storage class memory, SCM), etc. ), or the cache area 2 may also be implemented by a volatile memory (volatile memory) such as a random-access memory (random-access memory, RAM), which is not limited in this embodiment. The persistent storage area can be realized by a hard disk drive (HDD) or other storage media that can store data for a long time without power supply. The hard disk can be a magnetic disk or a solid-state drive (SSD) Or SCM disk, etc.
当设备103或者应用1011向数据存储系统100写入新数据(以下称之为目标数据)时,如果设备101仅将目标数据写入本地的缓存区域1,这虽然能够减小数据存储系统100存储应用1011发送的目标数据的响应时延,但是,当设备101发生故障或者该缓存区域1发生故障时,设备103或者应用1011发送的目标数据可能会发生丢失,这就降低了数据存储系统100存储数据的可靠性。When the device 103 or the application 1011 writes new data (hereinafter referred to as target data) to the data storage system 100, if the device 101 only writes the target data into the local cache area 1, although this can reduce the storage capacity of the data storage system 100 The response delay of the target data sent by the application 1011, but when the device 101 fails or the cache area 1 fails, the target data sent by the device 103 or the application 1011 may be lost, which reduces the storage capacity of the data storage system 100. Data reliability.
基于此,本发明实施例提供了一种数据存储方法,旨在使得数据存储系统100具有较小响应时延的同时,提高数据存储的可靠性。具体实现时,对于设备103或者应用1011发送的目标数据,设备101不仅将其写入本地的缓存区域1,还基于RDMA技术将该目标数据写入设备102中的缓存区域2,以使得该目标数据同时在设备101以及设备102的缓存中存储,而设备102后续可以将缓存区域2中的目标数据再进行持久化存储。在设备101将目标数据写入本地的缓存区域1,并通过RDMA技术将其写入设备102的缓存区域之后,数据处理请求即为处理完成。由于RDMA不需要CPU参与,并且可以不用设备102把目标数据进行持久化存储 后才认为该数据处理请求完成处理,这使得数据存储系统100存储该目标数据的响应时延较低,其延时可以接近于仅把数据存储到设备101的缓存区域1。同时,目标数据分别存储于设备101以及设备102中的缓存区域,这使得目标数据在设备101侧的缓存发生故障时,数据存储系统100也能通过设备102侧的缓存来避免该目标数据发生丢失,以此可以提高数据存储系统100存储数据的可靠性。如此,数据存储系统100在具有较小响应时延的同时,数据存储的可靠性也较高。Based on this, an embodiment of the present invention provides a data storage method, aiming at improving the reliability of data storage while making the data storage system 100 have a smaller response time delay. During specific implementation, for the target data sent by the device 103 or the application 1011, the device 101 not only writes it into the local cache area 1, but also writes the target data into the cache area 2 of the device 102 based on RDMA technology, so that the target The data is stored in the caches of the device 101 and the device 102 at the same time, and the device 102 can then persistently store the target data in the cache area 2 . After the device 101 writes the target data into the local cache area 1 and writes it into the cache area of the device 102 through the RDMA technology, the processing of the data processing request is completed. Since RDMA does not require CPU participation, and the target data can be stored persistently without the device 102 The data processing request is considered to be processed only later, which makes the response delay of the data storage system 100 to store the target data low, and the delay can be close to only storing the data in the cache area 1 of the device 101 . At the same time, the target data is stored in the cache areas of device 101 and device 102 respectively, so that when the target data cache fails on the device 101 side, the data storage system 100 can also prevent the target data from being lost through the cache on the device 102 side. , so that the reliability of data storage in the data storage system 100 can be improved. In this way, while the data storage system 100 has a small response delay, the reliability of data storage is also high.
值得注意的是,图1所示的系统架构仅作为一种示例,并不用于限定其具体实现局限于该示例。例如,在其它可能的系统架构中,持久化存储区域也可以是部署于数据存储系统100外部,即设备102可以将缓存区域2中的目标数据远程发送给持久化存储区域进行持久化存储等,本实施例对此并不进行限定。It should be noted that the system architecture shown in FIG. 1 is only used as an example, and is not intended to limit its specific implementation to this example. For example, in other possible system architectures, the persistent storage area can also be deployed outside the data storage system 100, that is, the device 102 can remotely send the target data in the cache area 2 to the persistent storage area for persistent storage, etc. This embodiment does not limit it.
举例来说,图1所示的数据存储系统,具体可以是如图2所示的分布式文件系统200。如图2所示,该分布式文件系统200包括客户端201以及服务端202。其中,客户端201,可以是分布式文件系统200对外提供的应用程序,可以通过图1中的设备101实现,并且该客户端201所在的设备101中还包括缓存区域1。进一步地,该客户端所在设备101还可以运行有图1所示的应用1011,并且,该应用1011可以通过客户端201提供的文件访问接口实现在分布式文件系统200中针对文件进行创建、删除、读取、写入、修改等操作。For example, the data storage system shown in FIG. 1 may specifically be the distributed file system 200 shown in FIG. 2 . As shown in FIG. 2 , the distributed file system 200 includes a client 201 and a server 202 . Wherein, the client 201 may be an application program provided by the distributed file system 200, which may be implemented by the device 101 in FIG. Further, the device 101 where the client is located can also run the application 1011 shown in FIG. 1 , and the application 1011 can create and delete files in the distributed file system 200 through the file access interface provided by the client 201 , read, write, modify and other operations.
服务端202,可以通过图1中的一个或者多个设备102实现,并且,该一个或者多个设备102上可以集成有存储器,以便通过该存储器为服务端202提供缓存区域2,可以用于高速缓存数据;进一步地,设备102上集成的存储器,还可以为服务端202提供持久化存储区域,可以用于对数据进行持久化存储。具体地,设备102中可以包括多个存储器1,从而该多个存储器1可以构成缓存池;并且,设备102还可以包括多个存储器2,从而该多个存储器2可以构建持久化存储池。示例性地,存储器1例如可以是只读内存(read-only memory,ROM)、闪存(flash memory)、存储级存储器(storage class memory,SCM)等非易失性存储器(non-volatile memory,NVM),或者,存储器1也可以是随机存取存储器(random-access memory,RAM)等易失性存储器(volatile memory)。存储器2可以是硬盘(hard disk drive,HDD)或者固态硬盘(solid-state drive,SSD)等可用于持久化存储数据的设备。本实施例中,对于存储器1以及存储器2的具体实现方式并不进行限定。The server 202 can be implemented by one or more devices 102 in FIG. 1 , and the one or more devices 102 can be integrated with a memory so that the server 202 can be provided with a cache area 2 through the memory, which can be used for high-speed Cache data; further, the memory integrated on the device 102 can also provide a persistent storage area for the server 202, which can be used for persistent storage of data. Specifically, the device 102 may include multiple storages 1, so that the multiple storages 1 may form a buffer pool; and the device 102 may also include multiple storages 2, so that the multiple storages 2 may build a persistent storage pool. Exemplarily, the memory 1 can be non-volatile memory (non-volatile memory, NVM) such as read-only memory (read-only memory, ROM), flash memory (flash memory), storage class memory (storage class memory, SCM), etc. ), or, the memory 1 may also be a volatile memory (volatile memory) such as a random-access memory (random-access memory, RAM). The memory 2 may be a hard disk drive (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD) and other devices that can be used for persistent storage of data. In this embodiment, specific implementation manners of the memory 1 and the memory 2 are not limited.
实际应用时,图1所示的数据存储系统也可以是适用于除图2之外的其它应用场景,本实施例对此并不进行限定。例如,图1所示的系统结构可以适用于集中式存储的应用场景或者分布式存储的应用场景。In actual application, the data storage system shown in FIG. 1 may also be applicable to other application scenarios except FIG. 2 , which is not limited in this embodiment. For example, the system structure shown in FIG. 1 may be applicable to an application scenario of centralized storage or an application scenario of distributed storage.
其中,在集中式存储应用场景中,可以由一台或多台设备102组成中心节点,数据集中存储于这个中心节点中,并且整个数据存储系统100的所有数据处理业务都集中部署在这个中心节点上。此时,设备102与实现缓存区域2的存储器1之间可以采用盘控分离架构,即设备102与存储器1独立部署,或者,设备102与存储器1之间可以采用盘控一体架构,即设备102可以具有槽位,并通过该槽位将存储器1放置在该设备102中,与该设备102集成部署。Among them, in the centralized storage application scenario, one or more devices 102 can form a central node, and data is stored centrally in this central node, and all data processing services of the entire data storage system 100 are centrally deployed on this central node superior. At this time, a disk control separation architecture can be adopted between the device 102 and the storage 1 implementing the cache area 2, that is, the device 102 and the storage 1 are deployed independently, or an integrated disk control architecture can be used between the device 102 and the storage 1, that is, the device 102 There may be a slot, and the storage 1 is placed in the device 102 through the slot, and integrated with the device 102 for deployment.
在分布式存储应用场景中,数据存储系统100中的数据可以分散存储在多个独立的存储节点上。此时,设备102可以与存储器1集成部署,使得该设备102同时具有计算能力以及存储能力,并且在该设备102上可以创建虚拟机,或者也可以不创建虚拟机。或者,设备102与实现缓存区域2的存储器1之间可以采用存算分离架构,即设备102与存储器1独立部署并通过网络进行通信。另外,存储器1中可以包括一种或者多种不同的存储介质,本实施例 对此并不进行限定。In a distributed storage application scenario, data in the data storage system 100 may be distributed and stored on multiple independent storage nodes. At this point, the device 102 can be integrated with the storage 1 so that the device 102 has computing capability and storage capability at the same time, and a virtual machine can be created on the device 102, or no virtual machine can be created. Alternatively, a storage-computing separation architecture may be adopted between the device 102 and the memory 1 implementing the cache area 2, that is, the device 102 and the memory 1 are deployed independently and communicate through a network. In addition, the memory 1 may include one or more different storage media. In this embodiment This is not limited.
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合附图对本发明实施例中的各种非限定性实施方式进行示例性说明。显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,基于上述内容所获得的所有其它实施例,都属于本发明保护的范围。In order to make the above objects, features and advantages of the present invention more comprehensible, various non-limiting implementations in the embodiments of the present invention will be illustrated below in conjunction with the accompanying drawings. Apparently, the described embodiments are some, but not all, embodiments of the present invention. All other embodiments obtained based on the embodiments of the present invention and the above content belong to the protection scope of the present invention.
如图3所示,为本发明实施例中一种数据存储方法的流程示意图,该方法可以应用于如图1所示的数据存储系统100中。实际应用时,该方法也可以是应用于其它可适用的数据存储系统中。为便于理解与描述,下面以应用于图1所示的数据存储系统100为例进行示例性说明,该方法具体可以包括:As shown in FIG. 3 , it is a schematic flowchart of a data storage method in an embodiment of the present invention, and the method can be applied to the data storage system 100 shown in FIG. 1 . In actual application, the method may also be applied to other applicable data storage systems. For ease of understanding and description, the following uses the data storage system 100 shown in FIG. 1 as an example for illustration. The method may specifically include:
S301:设备101接收数据处理请求,该数据处理请求包括目标数据。S301: The device 101 receives a data processing request, where the data processing request includes target data.
应用1011/设备103在向数据存储系统100写入目标数据,或者将数据存储系统100中已存储的数据修改为目标数据时,可以生成包括该目标数据的数据处理请求,并通过数据访问接口(如POSIX等)将该数据处理请求发送给设备101。这样,设备101可以对接收到的数据处理请求进行解析,获得数据存储系统100所要存储的目标数据。实际应用时,该数据处理请求中还可以携带有针对该目标数据的操作指示信息,如写入操作、修改操作等,从而设备101可以基于该数据处理请求获得针对该目标数据的操作记录。When the application 1011/device 103 writes the target data into the data storage system 100, or modifies the data stored in the data storage system 100 to the target data, it can generate a data processing request including the target data, and pass the data access interface ( Such as POSIX, etc.) sends the data processing request to the device 101 . In this way, the device 101 can analyze the received data processing request to obtain the target data to be stored by the data storage system 100 . In practical applications, the data processing request may also carry operation indication information for the target data, such as write operation, modification operation, etc., so that the device 101 can obtain an operation record for the target data based on the data processing request.
S302:设备101将该目标数据写入设备101中的缓存区域1。S302: The device 101 writes the target data into the cache area 1 in the device 101 .
S303:设备101基于RDMA将数据处理请求中的目标数据写入设备102中的缓存区域2。S303: The device 101 writes the target data in the data processing request to the cache area 2 in the device 102 based on RDMA.
其中,设备101可以先执行步骤S202,再执行步骤S203;或者,设备101可以同时执行步骤S202以及步骤S203;又或者,设备101可以先执行步骤S203再执行步骤S202等,本实施例对此并不进行限定。Wherein, the device 101 may execute step S202 first, and then execute step S203; or, the device 101 may execute step S202 and step S203 at the same time; or, the device 101 may first execute step S203 and then execute step S202, etc. Not limited.
设备101在接收到目标数据后,可以将该目标数据写入本地的缓存区域1以及设备102中的缓存区域2。其中,在将目标数据写入缓存区域2时,设备102可以通过远程直接内存访问(remote direct memory access,RDMA)技术实现数据的直接写入,即通过网络将目标数据直接写入缓存区域2中,可以不用设备102中的操作系统的干预,这不仅可以减小数据写入对于设备102的操作系统性能的影响,而且数据写入缓存区域2的耗时也较短。After receiving the target data, the device 101 may write the target data into the local cache area 1 and the cache area 2 in the device 102 . Wherein, when writing the target data into the cache area 2, the device 102 can realize the direct writing of data through remote direct memory access (RDMA) technology, that is, directly write the target data into the cache area 2 through the network , the intervention of the operating system in the device 102 is not required, which not only reduces the impact of data writing on the performance of the operating system of the device 102, but also takes less time to write data into the cache area 2 .
由于客户端将目标数据写入缓存区域1以及缓存区域2所需的耗时较短,这使得数据存储系统100缓存该目标数据的响应时延较小。值得注意的是,当设备101成功将目标数据写入缓存区域1以及缓存区域2时,设备101可以向应用1011或者设备103反馈该数据处理请求已完成处理,也即完成对于目标数据的存储。具体地,设备101可以生成数据处理请求成功的响应消息,并将该响应消息发送给应用1011或者设备103,从而数据存储系统100可以在实际对目标数据执行持久化存储之前,即可通知应用1011或者设备102完成对于目标数据的存储。如此,可以有效提高数据存储系统100对于应用1011或者设备103的响应效率、降低响应时延。实际应用时,数据存储系统100可以在未来的时间段内完成该目标数据的持久化存储。在进一步可能的实施方式中,设备101可以基于单边(one-sided)RDMA技术将目标数据写入缓存区域2,以此可以进一步减少设备101向缓存区域2中写入数据的时延,从而可以进一步减少数据存储系统100响应该数据处理请求的整体时延。Since it takes less time for the client to write the target data into the cache area 1 and the cache area 2 , the response delay for the data storage system 100 to cache the target data is relatively small. It is worth noting that when the device 101 successfully writes the target data into the cache area 1 and the cache area 2, the device 101 can feed back to the application 1011 or the device 103 that the data processing request has been processed, that is, the storage of the target data is completed. Specifically, the device 101 can generate a response message indicating that the data processing request is successful, and send the response message to the application 1011 or the device 103, so that the data storage system 100 can notify the application 1011 before actually performing persistent storage on the target data. Or the device 102 finishes storing the target data. In this way, the response efficiency of the data storage system 100 to the application 1011 or the device 103 can be effectively improved and the response delay can be reduced. In practical applications, the data storage system 100 may complete the persistent storage of the target data in a future time period. In a further possible implementation, the device 101 may write the target data into the cache area 2 based on a one-sided RDMA technology, so as to further reduce the time delay for the device 101 to write data into the cache area 2, thereby The overall time delay for the data storage system 100 to respond to the data processing request can be further reduced.
并且,设备101以及设备102的缓存中同时存储有目标数据,这使得当其中任意一方的缓存出现问题时,数据存储系统100均可以利用另一方缓存中的目标数据来避免发生数据丢失,以此可以提高数据存储系统100存储数据的可靠性。 Moreover, the target data is stored in the caches of the device 101 and the device 102 at the same time, which makes it possible for the data storage system 100 to use the target data in the cache of the other party to avoid data loss when there is a problem with one of the caches. The reliability of data storage in the data storage system 100 can be improved.
实际应用场景中,设备102可能会支持多个设备101进行数据读写,例如设备102可以支持多个客户端201读写数据等。因此,设备102可以预先为多个不同的设备101确定其对应的缓存区域2,并且,不同设备101对应的缓存区域2不存在重叠。如此,可以避免不同设备101向设备102的缓存中写入数据时发生数据覆盖等问题。作为一种实现示例,设备101在向设备102发送目标数据之前,可以预先向设备102申请缓存空间,如向设备102发送包括设备101标识的缓存分配请求等,以请求设备102为该设备101分配分配相应的缓存区域2,并将缓存区域2的地址空间反馈给设备101。这样,后续设备101可以基于该地址空间直接将目标数据写入设备102中的缓存区域2。In an actual application scenario, the device 102 may support multiple devices 101 to read and write data, for example, the device 102 may support multiple clients 201 to read and write data. Therefore, the device 102 may pre-determine the corresponding cache areas 2 for multiple different devices 101 , and the cache areas 2 corresponding to different devices 101 do not overlap. In this way, problems such as data overwriting when different devices 101 write data into the cache of the device 102 can be avoided. As an implementation example, before sending target data to device 102, device 101 may pre-apply for cache space from device 102, such as sending a cache allocation request including device 101 identifier to device 102, so as to request device 102 to allocate The corresponding cache area 2 is allocated, and the address space of the cache area 2 is fed back to the device 101 . In this way, the subsequent device 101 can directly write the target data into the cache area 2 in the device 102 based on the address space.
本实施例中,提供了以下六种在设备101以及设备102中缓存目标数据的实现方式:In this embodiment, the following six implementations of caching target data in the device 101 and the device 102 are provided:
在第一种可能的实施方式中,设备101在向本地的缓存区域1中写入目标数据之前,可以预先将该目标数据以及针对该目标数据的操作指示信息写入预写式日志(write-ahead logging,WAL),从而设备101可以将包括该目标数据的WAL写入本地的缓存区域1,并通过双边(two-sided)RDMA技术或者单边RDMA技术,将包括该目标数据的WAL直接写入设备102中的缓存区域2。In a first possible implementation manner, before writing the target data into the local cache area 1, the device 101 may pre-write the target data and the operation instruction information for the target data into a write-ahead log (write- ahead logging, WAL), so that the device 101 can write the WAL including the target data into the local cache area 1, and directly write the WAL including the target data through two-sided (two-sided) RDMA technology or unilateral RDMA technology into cache area 2 in device 102.
在第二种可能的实施方式中,设备101将包括目标数据的WAL写入缓存区域1,并通过RMDA(包括单边RDMA或者双边RDMA)技术直接将该目标数据写入缓存区域2。In a second possible implementation manner, device 101 writes WAL including target data into cache area 1, and directly writes the target data into cache area 2 through RMDA (including unilateral RDMA or bilateral RDMA) technology.
在第三种可能的实施方式中,设备101可以直接将目标数据写入本地的缓存区域1,并通过RDMA技术将包括目标数据的WAL写入在向缓存区域2。In a third possible implementation manner, the device 101 may directly write the target data into the local cache area 1, and write the WAL including the target data into the cache area 2 through the RDMA technology.
在第四种可能的实施方式中,设备101直接将目标数据写入缓存区域1中,并通过RMDA技术直接将该目标数据写入缓存区域2。In a fourth possible implementation manner, the device 101 directly writes the target data into the cache area 1, and directly writes the target data into the cache area 2 through RMDA technology.
在第五种可能的实施方式中,设备101通过RDMA技术直接将目标数据写入缓存区域2,并根据目标数据生成键值对(key-value),其中,该键值对中的键(key)为目标数据的标识,键值对中的值(value)为目标数据,从而设备101可以该键值对作为数据查询索引并将其写入本地的缓存区域1中,以便后续根据该键值对查询目标数据。实际应用时,针对应用1011发送的多个数据,设备101可以为每个数据分别生成相应的键值对,从而设备101可以基于多个键值对生成相应的索引并进行保存,如构建树形结构的索引等。这样,当设备101在本地的缓存区域1中保存该键值对时,设备101可以将该键值对添加至该索引中,以实现索引的更新,从而后续设备101可以通过遍历该索引方式查找到应用1011所需读取的数据。In a fifth possible implementation manner, the device 101 directly writes the target data into the cache area 2 through the RDMA technology, and generates a key-value pair (key-value) according to the target data, wherein the key in the key-value pair (key ) is the identification of the target data, and the value (value) in the key-value pair is the target data, so that the device 101 can use the key-value pair as a data query index and write it into the local cache area 1, so that the subsequent query target data. In practical applications, for multiple data sent by the application 1011, the device 101 can generate corresponding key-value pairs for each data, so that the device 101 can generate and save corresponding indexes based on multiple key-value pairs, such as building a tree Indexes to structures, etc. In this way, when the device 101 saves the key-value pair in the local cache area 1, the device 101 can add the key-value pair to the index to update the index, so that the subsequent device 101 can search for the key-value pair by traversing the index. to the data that the application 1011 needs to read.
在第六种可能的实施方式中,设备101通过RDMA技术将包括目标数据的WAL写入缓存区域2,并将根据目标数据生成的键值对写入缓存区域1。In a sixth possible implementation manner, the device 101 writes the WAL including the target data into the cache area 2 through the RDMA technology, and writes the key-value pairs generated according to the target data into the cache area 1 .
值得注意的是,上述在设备101以及设备102中缓存目标数据的实现方式仅作为一些示例性说明,实际应用时,设备101也可以通过其他方式实现目标数据在设备101以及设备102的缓存,本实施例对此并不进行限定。It should be noted that the above-mentioned implementation methods of caching the target data in the device 101 and the device 102 are only for some exemplary illustrations. In practical applications, the device 101 can also realize the caching of the target data in the device 101 and the device 102 in other ways. The embodiment does not limit this.
通常情况下,设备101可以在一段时间段内生成多个WAL,如应用1011在该时间段内连续发送多个数据,或者应用1011的多个进程/线程并行发送多个数据等。在此过程中,设备101可以将该多个WAL逐个写入缓存区域2中,即每生成一个WAL,设备101则将该WAL写入设备102中的缓存区域2。或者,客户端102也可以是将多个WAL批量写入缓存区域2中,如设备101每次可以向缓存区域2中批量写入10个WAL等,以此可以减少向缓存区域2中写入多个WAL时设备101与设备102之间的通信次数。其中,在批量写入多个WAL时,设备101可以周期性的向设备102的缓存区域2中批量发送多个WAL,如设备101可以将1秒内依次生成的多个WAL批量写入缓存区域2等;或者,当应用1011的多个进程/线程并行发送多个 数据时,设备101可以并行生成多个WAL,从而设备101可以将并行生成的多个WAL批量写入设备102。Normally, the device 101 may generate multiple WALs within a period of time, for example, the application 1011 continuously sends multiple data within the period of time, or multiple processes/threads of the application 1011 send multiple data in parallel. During this process, the device 101 may write the multiple WALs into the cache area 2 one by one, that is, each time a WAL is generated, the device 101 writes the WAL into the cache area 2 of the device 102 . Alternatively, the client 102 can also write multiple WALs into the cache area 2 in batches. For example, the device 101 can write 10 WALs in batches to the cache area 2 each time, so as to reduce the number of writes to the cache area 2. The number of communications between the device 101 and the device 102 is the number of WALs. Among them, when writing multiple WALs in batches, the device 101 can periodically send multiple WALs in batches to the cache area 2 of the device 102, for example, the device 101 can write multiple WALs generated sequentially within 1 second into the cache area in batches 2, etc.; or, when multiple processes/threads of the application 1011 send multiple When storing data, the device 101 can generate multiple WALs in parallel, so that the device 101 can write the multiple WALs generated in parallel to the device 102 in batches.
作为一种实现示例,设备101可以将数据存储系统100内部执行的多个操作封装成单个语义,以此可以进一步减少设备101向设备102发送WAL的数量。具体的,应用1011下发的操作指令指示对属于用户的所有数据按照预设缩放比例进行数值缩放,则设备101在接收该操作指令后,可以对该用户的多个数据分别进行缩放操作,也即执行多次缩放操作。此时,设备101可以针对该多个数据所执行的多次缩放操作,生成单个语义级日志,并将该语义级日志写入设备102的缓存区域2中,这相对于针对多次缩放操作分别生成多个日志的方式而言,可以有效减少设备101生成日志的数量,从而可以有效减少设备101向设备102发送日志的数量。相应的,设备102在该回放该日志时,可以根据该日志中记录的语义回放多个操作。As an implementation example, the device 101 may encapsulate multiple operations performed inside the data storage system 100 into a single semantic, thereby further reducing the number of WALs sent by the device 101 to the device 102 . Specifically, the operation instruction issued by the application 1011 instructs to perform numerical scaling on all data belonging to the user according to the preset scaling ratio. That is, multiple scaling operations are performed. At this point, device 101 can generate a single semantic-level log for the multiple scaling operations performed on the multiple data, and write the semantic-level log into the cache area 2 of device 102, which is different from the multiple scaling operations performed separately In terms of the manner of generating multiple logs, the number of logs generated by the device 101 can be effectively reduced, so that the number of logs sent by the device 101 to the device 102 can be effectively reduced. Correspondingly, when playing back the log, the device 102 can play back multiple operations according to the semantics recorded in the log.
进一步地,设备101在缓存目标数据之前,还可以根据接收到的数据处理请求,判断针对该目标数据的操作是否为合法操作。当确定针对该目标数据的操作为合法操作时,设备101才执行该操作并在缓存区域1以及缓存区域2中存储该目标数据;而当确定针对该目标数据的操作为非法操作时,设备101可以拒绝执行该操作并拒绝存储该目标数据等。例如,当针对目标数据的操作超出应用1011所具有的操作权限范围时,设备101可以判定该操作不合法,从而设备101拒绝执行该操作以及拒绝存储该目标数据。又例如,当针对该目标数据的操作为创建文件的操作时,若设备101确定根据目标数据所创建的文件名称与其它文件的名称相同时,设备101也可以判定该操作不合法,并拒绝执行创建文件的操作。Further, before caching the target data, the device 101 may also determine whether the operation on the target data is a legal operation according to the received data processing request. When it is determined that the operation on the target data is a legal operation, the device 101 executes the operation and stores the target data in the cache area 1 and the cache area 2; and when it is determined that the operation on the target data is an illegal operation, the device 101 It is possible to refuse to perform the operation and refuse to store the target data, etc. For example, when the operation on the target data exceeds the scope of operation authority of the application 1011, the device 101 may determine that the operation is illegal, so that the device 101 refuses to perform the operation and refuse to store the target data. For another example, when the operation on the target data is the operation of creating a file, if the device 101 determines that the name of the file created based on the target data is the same as that of other files, the device 101 may also determine that the operation is illegal and refuse to execute The operation to create a file.
实际应用时,设备101在将目标数据写入本地的缓存区域1以及设备102的缓存区域2之后,可以向的应用1011返回操作成功的反馈结果,以指示应用1011该目标数据已经成功存储至数据存储系统100。如此,对于应用1011而言,数据存储系统100存储目标数据的响应时延较小,满足应用1011的低时延响应的要求。In actual application, after the device 101 writes the target data into the local cache area 1 and the cache area 2 of the device 102, it can return a feedback result of successful operation to the application 1011 to indicate to the application 1011 that the target data has been successfully stored in the data storage system 100. In this way, for the application 1011, the response delay of the data storage system 100 for storing the target data is small, which meets the requirement of the application 1011 for low-latency response.
S304:设备102将缓存区域2中的目标数据进行持久化存储。S304: The device 102 persistently stores the target data in the cache area 2 .
通常情况下,缓存区域2中的地址空间小于设备102用于持久化存储数据的地址空间,因此,设备102可以将缓存区域2中存储的数据下刷至持久化存储区域。其中,当缓存区域2中存储的数据为包括目标数据的WAL时,设备102可以通过回放该WAL获得目标数据,并将其写入存储器2中。Usually, the address space in the cache area 2 is smaller than the address space used by the device 102 for persistently storing data, therefore, the device 102 can downwrite the data stored in the cache area 2 to the persistent storage area. Wherein, when the data stored in the cache area 2 is WAL including the target data, the device 102 can obtain the target data by playing back the WAL, and write it into the memory 2 .
在一种可能的实现方式中,设备102可以周期性的扫描各个设备101各自对应的缓存区域2,并且当缓存区域2存在数据时,设备102可以将该数据持久化存储至存储器2中。或者,当缓存区域2中存储的数据的量达到预设阈值时,设备102可以将该缓存区域2中的数据持久化存储至存储器2中,而当缓存区域2中存储的数据的量未达到预设阈值时,设备102在当前扫描周期内可以不对其进行持久化存储,而在后续确定该缓存区域2中的数据量达到该预设阈值时,再将该数据进行持久化存储等。In a possible implementation manner, the device 102 may periodically scan the cache area 2 corresponding to each device 101 , and when there is data in the cache area 2 , the device 102 may persistently store the data in the memory 2 . Or, when the amount of data stored in the cache area 2 reaches a preset threshold, the device 102 may persistently store the data in the cache area 2 into the memory 2, and when the amount of data stored in the cache area 2 does not reach When the threshold is preset, the device 102 may not persistently store the data in the current scan cycle, and then perform persistent storage of the data when it is subsequently determined that the amount of data in the cache area 2 reaches the preset threshold.
而在另一种可能的实施方式中,由于缓存区域2中的数据是由设备101写入,因此,设备101可以在向缓存区域2中写入目标数据后,即可通知设备102将缓存区域2中的数据进行持久化存储。或者,设备101可以周期性的通知设备102持久化存储缓存区域2中的数据。其中,当在一个周期中,设备101未向缓存区域2中写入数据时,设备101在该周期内可以无需向设备102发送持久化存储数据的通知。In another possible implementation, since the data in the cache area 2 is written by the device 101, the device 101 can notify the device 102 to update the cache area 2 after writing the target data into the cache area 2. 2 for persistent storage of data. Alternatively, the device 101 may periodically notify the device 102 to persistently store the data in the cache area 2 . Wherein, when the device 101 does not write data into the cache area 2 in a period, the device 101 may not need to send a notification of persistently stored data to the device 102 in this period.
实际应用时,数据存储系统100也可以是通过其它方式触发设备102对缓存区域2中的数据进行持久化存储,本实施例对此并不进行限定。 In actual application, the data storage system 100 may also trigger the device 102 to persistently store the data in the cache area 2 in other ways, which is not limited in this embodiment.
可以理解,实际应用场景中缓存区域1以及缓存区域2的数据存储能力有限,难以支持设备101以及设备102存储过多数量的数据,因此,在进一步可能的实施方式中,在目标数据被持久化存储至存储器2后,设备101和/或设备102可以淘汰缓存区域1中的目标数据,以释放该目标数据在缓存区域1中所占用的存储空间。It can be understood that in actual application scenarios, the data storage capacity of the cache area 1 and the cache area 2 is limited, and it is difficult to support the device 101 and the device 102 to store an excessive amount of data. Therefore, in a further possible implementation, after the target data is persisted After being stored in the memory 2, the device 101 and/or the device 102 may delete the target data in the cache area 1, so as to release the storage space occupied by the target data in the cache area 1.
作为一种实现示例,当缓存区域1中存储有包括目标数据的WAL时,设备101可以在生成WAL时记录该目标数据对应的第一日志序列号(log serial number,LSN)。其中,WAL的LSN通常为单调递增的正整数,即对于依次生成的多个WAL,该多个WAL分别对应的LSN逐渐增大。当设备102将缓存区域2中的目标数据持久化存储至存储器2后,可以向设备101反馈已回放的WAL对应的第二LSN。这样,设备101可以根据接收到的第二LSN,确定LSN小于该第二LSN的WAL均已完成回放,从而LSN小于该第二LSN的WAL中包括的数据均已写入至存储器2中。此时,设备101可以比较第一LSN与第二LSN之间的大小,并且,当第一LSN等于第二LSN时,表征包括目标数据的WAL已经完成回放,并且目标数据已经被持久化存储至存储器2中,此时,设备101可以淘汰缓存区域中的目标数据,例如可以淘汰包括该目标数据的WAL等,释放该目标数据在缓存区域1中所占用的存储空间,以便支持设备101向该缓存区域1中存储更多其它数据。As an implementation example, when WAL including the target data is stored in the cache area 1, the device 101 may record a first log serial number (log serial number, LSN) corresponding to the target data when generating the WAL. Wherein, the LSN of the WAL is generally a monotonically increasing positive integer, that is, for multiple WALs generated sequentially, the LSNs corresponding to the multiple WALs increase gradually. After the device 102 persistently stores the target data in the cache area 2 to the memory 2, it may feed back the second LSN corresponding to the played back WAL to the device 101 . In this way, the device 101 can determine, based on the received second LSN, that all WALs with LSNs smaller than the second LSN have completed playback, so that all data included in the WALs with LSNs smaller than the second LSN have been written into the memory 2 . At this time, the device 101 can compare the size between the first LSN and the second LSN, and when the first LSN is equal to the second LSN, it indicates that the WAL including the target data has completed playback, and the target data has been persistently stored in In the memory 2, at this time, the device 101 can eliminate the target data in the cache area, for example, can eliminate the WAL including the target data, etc., and release the storage space occupied by the target data in the cache area 1, so as to support the device 101 to the More other data is stored in cache area 1.
实际应用时,缓存区域1中可以存储有应用1011多次写入的数据,并且,设备101在每次存储应用1011写入的数据时均可以生成包括该数据的WAL。这样,设备101可以淘汰LSN小于等于该第二LSN的一个或者多个WAL,而对于LSN大于该第二LSN的WAL,其可以继续存储于缓存区域1中而不被淘汰。In actual application, data written by the application 1011 multiple times may be stored in the cache area 1, and the device 101 may generate a WAL including the data each time the data written by the application 1011 is stored. In this way, device 101 can eliminate one or more WALs whose LSN is less than or equal to the second LSN, and WALs whose LSN is greater than the second LSN can continue to be stored in cache area 1 without being eliminated.
在另一种实现示例中,当目标数据通过键值对的方式存储于缓存区域1而缓存区域2中存储有包括目标数据的WAL时,该键值对中的值可以对应于该WAL的第一LSN,该第一LSN可以由设备101在生成WAL时记录在本地缓存中。当设备102将缓存区域2中的目标数据持久化存储至存储器2后,可以向设备101反馈已回放的WAL对应的第二LSN。这样,设备101可以通过比较键值对中的值所对应的第一LSN与设备102已回放的WAL对应的第二LSN之间的大小。当第一LSN与第二LSN相等时,表征目标数据已经被服务端持久化存储至存储器2中,从而设备101可以淘汰存储区域1中的键值对,以释放该键值对在缓存区域1中占用的存储空间。而当第一LSN大于该第二LSN时,表征该目标数据尚未被持久化存储,此时,该键值对可以继续存储于缓存区域1中。In another implementation example, when the target data is stored in the cache area 1 in the form of a key-value pair and the WAL including the target data is stored in the cache area 2, the value in the key-value pair may correspond to the first value of the WAL. An LSN, the first LSN may be recorded in the local cache by the device 101 when generating the WAL. After the device 102 persistently stores the target data in the cache area 2 to the memory 2, it may feed back the second LSN corresponding to the played back WAL to the device 101 . In this way, device 101 can compare the size between the first LSN corresponding to the value in the key-value pair and the second LSN corresponding to the WAL that device 102 has played back. When the first LSN is equal to the second LSN, the characterization target data has been persistently stored in the storage 2 by the server, so that the device 101 can eliminate the key-value pair in the storage area 1 to release the key-value pair in the cache area 1 The storage space used in . When the first LSN is greater than the second LSN, it indicates that the target data has not been persistently stored, and at this time, the key-value pair can continue to be stored in the cache area 1 .
值得注意的是,上述多种实现示例仅作为一种示例性说明,实际应用时,设备101也可以是通过其它方式确定目标数据是否已经被持久化存储,如通过为目标数据添加逐渐增大的时间戳、编号等方式确定是否从缓存区域1中淘汰目标数据等,本实施例对此并不进行限定。It is worth noting that the various implementation examples above are only used as an illustration. In actual application, the device 101 may also determine whether the target data has been persistently stored in other ways, such as adding a gradually increasing Timestamps, serial numbers, etc. determine whether to eliminate target data from the cache area 1, which is not limited in this embodiment.
另外,设备102也可以在目标数据被持久化存储之后,对缓存区域2中的目标数据进行淘汰,以回收该目标数据在缓存区域2中所占用的存储空间。例如,当目标数据以WAL形式存储于缓存区域2时,设备102在完成对于WAL的回放并且对目标数据进行持久化存储后,可以将LSN小于上述第二LSN的WAL进行淘汰,并释放该WAL所占用的存储空间,以便利用释放的存储空间继续存储设备101新写入的其它WAL。其中,设备102在淘汰LSN小于上述第二LSN的WAL后,可以将释放的存储空间对应的地址信息反馈给设备101,以便后续设备101根据该地址信息将新生成的WAL写入被释放的存储空间。In addition, the device 102 may also delete the target data in the cache area 2 after the target data is persistently stored, so as to reclaim the storage space occupied by the target data in the cache area 2 . For example, when the target data is stored in the cache area 2 in the form of WAL, after the device 102 completes the playback of the WAL and persistently stores the target data, it can eliminate the WAL whose LSN is smaller than the above-mentioned second LSN, and release the WAL The occupied storage space, in order to use the released storage space to continue to store other WAL newly written by the device 101. Wherein, after the device 102 eliminates the WAL whose LSN is smaller than the above-mentioned second LSN, it can feed back the address information corresponding to the released storage space to the device 101, so that the subsequent device 101 can write the newly generated WAL into the released storage according to the address information. space.
上述实施例是以在设备101以及设备102存储目标数据为例进行示例性说明,通常情况下,应用1011或者设备103还可能会读取数据存储系统100中已存储的目标数据。下面结合 图4,以应用1011读取数据存储系统100中存储的目标数据为例,对数据读取过程进行示例性说明。参见图4,示出了一种数据读取方法的流程示意图,该方法具体可以包括:The above-mentioned embodiments are illustrated by taking the storage of target data in the device 101 and the device 102 as an example. Normally, the application 1011 or the device 103 may also read the target data stored in the data storage system 100 . Combine below FIG. 4 , taking the application 1011 reading the target data stored in the data storage system 100 as an example, exemplarily illustrates the data reading process. Referring to FIG. 4, a schematic flow diagram of a data reading method is shown, and the method may specifically include:
S401:设备101接收到应用1011发送的数据读取请求,该数据读取请求包括目标数据的标识。S401: The device 101 receives a data read request sent by the application 1011, where the data read request includes an identifier of target data.
S402:设备101根据目标数据的标识,查找本地的缓存区域1中是否存储有目标数据。若查找到,则继续执行步骤S306;而若查找不到,则继续执行步骤S303。S402: The device 101 searches whether the target data is stored in the local cache area 1 according to the identifier of the target data. If found, continue to execute step S306; and if not found, continue to execute step S303.
S403:设备101请求设备102反馈目标数据。S403: The device 101 requests the device 102 to feed back the target data.
可以理解,由于数据存储系统100在存储目标数据的过程中,会先在设备101的本地缓存区域1中缓存该目标数据。因此,设备101可以优先从本地的缓存区域1中查找该目标数据是否仍然继续存储于本地缓存。这样,如果设备101查找到该目标数据,则设备101可以直接将缓存区域1中的目标数据反馈给应用1011,以此可以有效降低数据存储系统100响应应用1011的时延。另外,由于设备101的本地缓存区域1存储的数据具有一定的时效性,即数据在缓存区域1中存储一段时间后可能会被设备101淘汰,因此,当设备101从缓存区域1中未查找到该目标数据时,表征该目标数据已经被设备102持久化存储,从而设备101可以向设备102请求该目标数据。It can be understood that, during the process of storing target data, the data storage system 100 will first cache the target data in the local cache area 1 of the device 101 . Therefore, the device 101 may first search from the local cache area 1 whether the target data is still stored in the local cache. In this way, if the device 101 finds the target data, the device 101 can directly feed back the target data in the cache area 1 to the application 1011 , thereby effectively reducing the time delay for the data storage system 100 to respond to the application 1011 . In addition, since the data stored in the local cache area 1 of the device 101 has a certain timeliness, that is, the data may be eliminated by the device 101 after being stored in the cache area 1 for a period of time, therefore, when the device 101 does not find the The target data indicates that the target data has been persistently stored by the device 102 , so that the device 101 can request the target data from the device 102 .
作为一些示例,本实施例提供了以下几种在缓存区域1中查找目标数据的示例性实现方式:As some examples, this embodiment provides the following exemplary implementation manners of searching for target data in cache area 1:
在第一种实现示例中,目标数据直接存储于缓存区域1中,则设备101可以通过遍历缓存区域1中数据的方式,根据目标数据的标识从缓存区域1中查找目标数据。In the first implementation example, the target data is directly stored in the cache area 1, and the device 101 can search the target data from the cache area 1 according to the identifier of the target data by traversing the data in the cache area 1.
在第二种实现示例中,目标数据以WAL的形式存储于缓存区域1中,则设备101可以通过遍历缓存区域1中的各个WAL,根据目标数据的标识确定是否存在包括该目标数据的WAL,从而设备101可以通过回放该WAL得到目标数据。In the second implementation example, the target data is stored in the cache area 1 in the form of WAL, then the device 101 can determine whether there is a WAL containing the target data according to the identifier of the target data by traversing each WAL in the cache area 1, Therefore, the device 101 can obtain the target data by playing back the WAL.
在第三种实现示例中,目标数据以键值对的形式存储于缓存区域1中,则设备101可以将目标数据的标识作为键,对预先构建的索引进行遍历,以查找出包括该键的键值对,从而该键值对中的值即为所要查找的目标数据。In the third implementation example, the target data is stored in the cache area 1 in the form of a key-value pair, then the device 101 can use the identifier of the target data as a key to traverse the pre-built index to find out the A key-value pair, so that the value in the key-value pair is the target data to be searched.
实际应用时,设备101也可以是通过其它方式从本地的缓存区域1中查找目标数据,本实施例对此并不进行限定。In actual application, the device 101 may also search for target data from the local cache area 1 in other ways, which is not limited in this embodiment.
S404:设备102从持久化存储区域中查找目标数据。S404: The device 102 searches for the target data from the persistent storage area.
其中,设备102从持久化存储区域中查找目标数据的具体过程,可以参照设备101从缓存区域1中查找目标数据的实现方式,本实施例在此不做赘述。Wherein, the specific process of device 102 searching for target data from the persistent storage area may refer to the implementation manner of device 101 searching for target data in cache area 1, which will not be described in detail here in this embodiment.
S405:设备102将查找到的目标数据反馈给设备101。S405: The device 102 feeds back the found target data to the device 101.
S406:设备101将目标数据反馈给应用1011。S406: The device 101 feeds back the target data to the application 1011.
上文中结合图1至图4,详细描述了本发明所提供的数据存储方法以及数据读取方法,下面将结合图5,描述根据本发明所提供的计算设备。The data storage method and the data reading method provided by the present invention are described in detail above with reference to FIG. 1 to FIG. 4 , and the computing device provided according to the present invention will be described below in conjunction with FIG. 5 .
与上述方法同样的发明构思,本发明实施例还提供一种计算设备,如图5所示,计算设备500中可以包括通信接口510、处理器520。可选的,计算设备500中还可以包括存储器530。其中,存储器530可以设置于计算设备500内部,还可以设置于计算设备500外部。示例性地,上述图3以及图4所示实施例中第一设备执行的各个动作均可以由处理器520实现。处理器520可以通过通信接口510获取数据处理请求,并用于实现图3以及图4中所执行的任一方法。在实现过程中,处理流程的各步骤可以通过处理器520中的硬件的集成逻辑电路 或者软件形式的指令完成图3以及图4中执行的方法。为了简洁,在此不再赘述。处理器520用于实现上述方法所执行的程序代码可以存储在存储器530中。存储器530和处理器520连接,如耦合连接等。With the same inventive concept as the above method, the embodiment of the present invention also provides a computing device. As shown in FIG. 5 , the computing device 500 may include a communication interface 510 and a processor 520 . Optionally, the computing device 500 may further include a memory 530 . Wherein, the memory 530 may be set inside the computing device 500 , and may also be set outside the computing device 500 . Exemplarily, each action performed by the first device in the embodiments shown in FIG. 3 and FIG. 4 may be implemented by the processor 520 . The processor 520 may obtain the data processing request through the communication interface 510, and use it to implement any method executed in FIG. 3 and FIG. 4 . In the implementation process, each step of the processing flow can be implemented through the integrated logic circuit of the hardware in the processor 520 Or instructions in the form of software complete the methods executed in FIG. 3 and FIG. 4 . For the sake of brevity, details are not repeated here. The program codes executed by the processor 520 to implement the above methods may be stored in the memory 530 . The memory 530 is connected to the processor 520, such as a coupled connection.
本发明实施例的一些特征可以由处理器520执行存储器530中的程序指令或者软件代码来完成/支持。存储器530上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图5所示的数据写入模块502、淘汰模块503、查找模块504。而通信模块501的功能可以由通信接口510实现。Some features of the embodiments of the present invention may be implemented/supported by the processor 520 executing program instructions or software codes in the memory 530 . The software components being loaded on the memory 530 can be summarized functionally or logically, for example, the data writing module 502, the eliminating module 503, and the searching module 504 shown in FIG. 5 . The function of the communication module 501 can be realized by the communication interface 510 .
本发明实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如计算设备500中的通信接口510,示例性地,该其它装置可以是与该计算设备500相连的设备等。Any communication interface involved in the embodiments of the present invention may be a circuit, a bus, a transceiver or any other device that can be used for information exchange. For example, the communication interface 510 in the computing device 500 , for example, the other device may be a device connected to the computing device 500 .
本发明实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor involved in the embodiment of the present invention may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or Execute the methods, steps and logic block diagrams disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
本发明实施例中的耦合是装置、模块或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、模块或模块之间的信息交互。The coupling in the embodiments of the present invention is an indirect coupling or communication connection between devices, modules or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, modules or modules.
处理器可能和存储器协同操作。存储器可以是非易失性存储器,比如硬盘或固态硬盘等,还可以是易失性存储器,例如随机存取存储器。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The processor may operate in conjunction with the memory. The memory may be a nonvolatile memory, such as a hard disk or a solid state disk, or a volatile memory, such as a random access memory. A memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
本发明实施例中不限定上述通信接口、处理器以及存储器之间的具体连接介质。比如存储器、处理器以及通信接口之间可以通过总线连接。所述总线可以分为地址总线、数据总线、控制总线等。The embodiment of the present invention does not limit the specific connection medium among the communication interface, the processor, and the memory. For example, the memory, the processor, and the communication interface can be connected through a bus. The bus can be divided into address bus, data bus, control bus and so on.
基于以上实施例,本发明实施例还提供了一种存储设备,该存储设备作为数据存储系统中的第一设备,该数据存储系统还包括第二设备,示例性地,该数据存储系统例如可以是图1或者图2所示的系统等。其中,该存储设备包括第一缓存区域(如上述实施例中的缓存区域1)以及处理器,该第一缓存区域用于缓存数据,该处理器用于通过运行计算机程序执行以下方法:Based on the above embodiments, an embodiment of the present invention also provides a storage device. The storage device is used as the first device in the data storage system, and the data storage system further includes a second device. Exemplarily, the data storage system can be, for example, It is the system shown in FIG. 1 or FIG. 2 or the like. Wherein, the storage device includes a first cache area (such as the cache area 1 in the above embodiment) and a processor, the first cache area is used for caching data, and the processor is used to execute the following method by running a computer program:
接收数据处理请求,所述数据处理请求包括目标数据;receiving a data processing request, the data processing request including target data;
基于远程直接内存访问RDMA将所述目标数据写入所述第二缓存区域,以便所述第二缓存区域中的目标数据被持久化存储;并将所述目标数据写入所述第一缓存区域。Writing the target data into the second cache area based on remote direct memory access RDMA, so that the target data in the second cache area is persistently stored; and writing the target data into the first cache area .
进一步的,该处理器通过与版型计算机程序执行上述任意一个或多个实施例提供的设备101所执行的方法。Further, the processor executes the method performed by the device 101 provided in any one or more of the foregoing embodiments through a computer program.
基于以上实施例,本发明实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现上述任意一个或多个实施例提供的数据存储系统100执行的方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。Based on the above embodiments, an embodiment of the present invention also provides a computer storage medium, in which a software program is stored, and when the software program is read and executed by one or more processors, any one or more of the above-mentioned The method executed by the data storage system 100 provided in the embodiment. The computer storage medium may include: various media capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.
基于以上实施例,本发明实施例还提供了一种芯片,该芯片包括处理器,用于实现上述实施例所涉及的数据存储系统100的功能,例如用于实现图3以及图4中所执行的方法。可选地,所述芯片还包括存储器,所述存储器,用于处理器所执行必要的程序指令和数据。该芯片,可以由芯片构成,也可以包含芯片和其他分立器件。 Based on the above embodiments, an embodiment of the present invention also provides a chip, which includes a processor, configured to implement the functions of the data storage system 100 involved in the above embodiments, for example, to implement the functions executed in FIG. 3 and FIG. 4 Methods. Optionally, the chip further includes a memory for necessary program instructions and data executed by the processor. The chip may consist of chips, or may include chips and other discrete devices.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a description of the manner in which objects with the same attribute are described in the embodiments of the present application.
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的范围。这样,倘若本发明实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。 Apparently, those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the scope of the embodiments of the present invention. In this way, if the modifications and variations of the embodiments of the present invention fall within the scope of the claims of the application and their equivalent technologies, the application also intends to include these modifications and variations.

Claims (21)

  1. 一种数据存储系统,其特征在于,所述数据存储系统包括第一设备以及第二设备,所述第一设备包括第一缓存区域,所述第二设备包括第二缓存区域;A data storage system, characterized in that the data storage system includes a first device and a second device, the first device includes a first cache area, and the second device includes a second cache area;
    所述第一设备,用于接收数据处理请求,并基于远程直接内存访问RDMA将所述数据处理请求中的目标数据写入所述第二缓存区域,将所述目标数据写入所述第一缓存区域;The first device is configured to receive a data processing request, and write target data in the data processing request into the second cache area based on Remote Direct Memory Access (RDMA), and write the target data into the first cache area. cache area;
    所述第二设备,用于将所述第二缓存区域中的目标数据进行持久化存储。The second device is configured to persistently store the target data in the second cache area.
  2. 根据权利要求1所述的数据存储系统,其特征在于,所述第一设备还用于:The data storage system according to claim 1, wherein the first device is further used for:
    在所述目标数据被写入所述第一缓存区域以及所述目标数据被写入所述第二缓存区域之后,并且在所述第二设备将所述第二缓存区域中的目标数据进行持久化存储之前,生成所述数据处理请求成功的响应消息。After the target data is written into the first cache area and the target data is written into the second cache area, and the second device persists the target data in the second cache area Before storage, generate a response message indicating that the data processing request is successful.
  3. 根据权利要求1或2所述的数据存储系统,其特征在于,所述第一设备还用于:The data storage system according to claim 1 or 2, wherein the first device is further used for:
    在所述目标数据被持久化存储后,淘汰所述第一缓存区域中的目标数据。After the target data is persistently stored, the target data in the first cache area is eliminated.
  4. 根据权利要求1至3任一项所述的数据存储系统,其特征在于,所述第一设备具体用于:The data storage system according to any one of claims 1 to 3, wherein the first device is specifically used for:
    生成包括目标数据的预写式日志WAL;Generate a write-ahead log WAL that includes the target data;
    基于RDMA将所述WAL写入所述第二缓存区域。Writing the WAL into the second cache area based on RDMA.
  5. 根据权利要求4所述的数据存储系统,其特征在于,所述第一设备具体用于:The data storage system according to claim 4, wherein the first device is specifically used for:
    将所述WAL写入所述第一缓存区域;或者,writing the WAL into the first cache area; or,
    生成键值对,所述键值对中的键为所述目标数据的标识,所述键值对中的值为所述目标数据;将所述键值对写入所述第一缓存区域。generating a key-value pair, where the key in the key-value pair is the identifier of the target data, and the value in the key-value pair is the target data; and writing the key-value pair into the first cache area.
  6. 根据权利要求5所述的数据存储系统,其特征在于,所述键值对中的值还对应于所述WAL的第一日志序列号LSN,所述第一设备还用于:The data storage system according to claim 5, wherein the value in the key-value pair also corresponds to the first log sequence number LSN of the WAL, and the first device is also used for:
    获取所述第二设备已回放的WAL对应的第二LSN;Obtain a second LSN corresponding to the WAL that has been played back by the second device;
    当所述第一LSN与所述第二LSN相等时,淘汰所述第一缓存区域中的所述键值对。When the first LSN is equal to the second LSN, the key-value pair in the first cache area is eliminated.
  7. 根据权利要求1至6任一项所述的数据存储系统,其特征在于,所述第一设备还用于:The data storage system according to any one of claims 1 to 6, wherein the first device is further used for:
    接收数据读取请求,所述数据读取请求包括所述目标数据的标识;receiving a data read request, where the data read request includes an identifier of the target data;
    根据所述目标数据的标识,优先查找所述第一缓存区域中是否包括所述目标数据;According to the identification of the target data, first search whether the target data is included in the first cache area;
    当所述第一缓存区域中包括所述目标数据时,反馈所述目标数据;when the target data is included in the first cache area, feeding back the target data;
    当所述第一缓存区域中不包括所述目标数据时,向所述第二设备请求所述目标数据。When the target data is not included in the first cache area, request the target data from the second device.
  8. 根据权利要求1至7任一项所述的数据存储系统,其特征在于,所述第一设备具体用于基于单边RDMA将所述目标数据写入所述第二缓存区域。The data storage system according to any one of claims 1 to 7, wherein the first device is specifically configured to write the target data into the second cache area based on unilateral RDMA.
  9. 根据权利要求1至8任一项所述的数据存储系统,其特征在于,所述数据存储系统包括分布式文件系统,所述第一设备实现所述分布式文件系统中的客户端,所述第二设备实现所述分布式文件系统中的服务端,并且所述第二设备将所述目标数据通过文件格式进行持久化存储。The data storage system according to any one of claims 1 to 8, wherein the data storage system includes a distributed file system, the first device implements a client in the distributed file system, and the The second device implements the server in the distributed file system, and the second device persistently stores the target data in a file format.
  10. 一种数据存储方法,其特征在于,所述数据存储方法应用于数据存储系统,所述数据存储系统包括第一设备以及第二设备,所述第一设备包括第一缓存区域,所述第二设备包括第二缓存区域,所述方法包括:A data storage method, characterized in that the data storage method is applied to a data storage system, the data storage system includes a first device and a second device, the first device includes a first cache area, and the second The device includes a second cache area, the method includes:
    所述第一设备接收数据处理请求,所述数据处理请求包括目标数据;The first device receives a data processing request, the data processing request including target data;
    所述第一设备基于远程直接内存访问RDMA将所述目标数据写入所述第二缓存区域;The first device writes the target data into the second cache area based on Remote Direct Memory Access (RDMA);
    所述第一设备将所述目标数据写入所述第一缓存区域; writing the target data into the first cache area by the first device;
    所述第二设备将所述第二缓存区域中的目标数据进行持久化存储。The second device persistently stores the target data in the second cache area.
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:The method according to claim 10, characterized in that the method further comprises:
    在所述目标数据被写入所述第一缓存区域以及所述目标数据被写入所述第二缓存区域之后,并且在所述第二设备将所述第二缓存区域中的目标数据进行持久化存储之前,所述第一设备生成所述数据处理请求成功的响应消息。After the target data is written into the first cache area and the target data is written into the second cache area, and the second device persists the target data in the second cache area Before storage, the first device generates a response message indicating that the data processing request is successful.
  12. 根据权利要求10或11所述的方法,其特征在于,所述方法还包括:The method according to claim 10 or 11, wherein the method further comprises:
    在所述目标数据被持久化存储后,所述第一设备淘汰所述第一缓存区域中的目标数据。After the target data is persistently stored, the first device eliminates the target data in the first cache area.
  13. 根据权利要求10至12任一项所述的方法,其特征在于,所述第一设备基于远程直接内存访问RDMA将所述目标数据写入所述第二缓存区域,包括:The method according to any one of claims 10 to 12, wherein the first device writes the target data into the second cache area based on Remote Direct Memory Access (RDMA), comprising:
    所述第一设备生成包括目标数据的预写式日志WAL;said first device generates a write-ahead log WAL comprising target data;
    所述第一设备基于RDMA将所述WAL写入所述第二缓存区域。The first device writes the WAL into the second cache area based on RDMA.
  14. 根据权利要求13所述的方法,其特征在于,所述第一设备将所述目标数据写入所述第一缓存区域,包括:The method according to claim 13, wherein writing the target data into the first cache area by the first device comprises:
    所述第一设备将所述WAL写入所述第一缓存区域;或者,The first device writes the WAL into the first cache area; or,
    所述第一设备生成键值对,所述键值对中的键为所述目标数据的标识,所述键值对中的值为所述目标数据;将所述键值对写入所述第一缓存区域。The first device generates a key-value pair, the key in the key-value pair is the identifier of the target data, and the value in the key-value pair is the target data; write the key-value pair into the The first cache area.
  15. 一种存储设备,其特征在于,所述存储设备作为数据存储系统中的第一设备,所述数据存储系统还包括第二设备,所述存储设备包括:A storage device, characterized in that the storage device is used as a first device in a data storage system, and the data storage system further includes a second device, and the storage device includes:
    第一缓存区域,用于缓存数据;The first cache area is used to cache data;
    处理器,用于通过运行计算机程序执行以下方法:Processor for performing the following methods by running a computer program:
    接收数据处理请求,所述数据处理请求包括目标数据;receiving a data processing request, the data processing request including target data;
    基于远程直接内存访问RDMA将所述目标数据写入所述第二缓存区域,以便所述第二缓存区域中的目标数据被持久化存储;并将所述目标数据写入所述第一缓存区域。Writing the target data into the second cache area based on remote direct memory access RDMA, so that the target data in the second cache area is persistently stored; and writing the target data into the first cache area .
  16. 根据权利要求15所述的存储设备,其特征在于,所述处理器还用于:The storage device according to claim 15, wherein the processor is further configured to:
    在所述目标数据被写入所述第一缓存区域以及所述目标数据被写入所述第二缓存区域之后,并且在所述第二缓存区域中的目标数据被持久化存储之前,生成所述数据处理请求成功的响应消息。After the target data is written into the first cache area and the target data is written into the second cache area, and before the target data in the second cache area is persistently stored, generating the A successful response message to the above data processing request.
  17. 根据权利要求15或16所述的存储设备,其特征在于,所述处理器用于:The storage device according to claim 15 or 16, wherein the processor is configured to:
    确定所述目标数据被持久化存储后,淘汰所述第一缓存区域中的目标数据。After it is determined that the target data is persistently stored, the target data in the first cache area is eliminated.
  18. 根据权利要求15至17任一项所述的存储设备,其特征在于,所述处理器用于:The storage device according to any one of claims 15 to 17, wherein the processor is configured to:
    生成包括目标数据的预写式日志WAL;Generate a write-ahead log WAL that includes the target data;
    基于RDMA将所述WAL写入所述第二缓存区域。Writing the WAL into the second cache area based on RDMA.
  19. 根据权利要求18所述的存储设备,其特征在于,所述处理器用于:The storage device according to claim 18, wherein the processor is configured to:
    将所述WAL写入所述第一缓存区域;或者,writing the WAL into the first cache area; or,
    生成键值对,所述键值对中的键为所述目标数据的标识,所述键值对中的值为所述目标数据;将所述键值对写入所述第一缓存区域。generating a key-value pair, where the key in the key-value pair is the identifier of the target data, and the value in the key-value pair is the target data; and writing the key-value pair into the first cache area.
  20. 一种计算设备,其特征在于,所述计算设备包括处理器和存储器;A computing device, characterized in that the computing device includes a processor and a memory;
    所述处理器用于执行所述存储器中存储的指令,以使得所述设备执行权利要求10至14中任一项所述第一设备执行的方法。The processor is configured to execute instructions stored in the memory, so that the device executes the method performed by the first device according to any one of claims 10 to 14.
  21. 一种计算机可读存储介质,其特征在于,包括指令,所述指令用于实现权利要求10至14中任一项所述第一设备执行的方法。 A computer-readable storage medium, characterized by comprising instructions, the instructions are used to implement the method performed by the first device according to any one of claims 10 to 14.
PCT/CN2023/074690 2022-02-08 2023-02-07 Data storage system, data storage method and apparatus, and related device WO2023151545A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210118694.3A CN116610598A (en) 2022-02-08 2022-02-08 Data storage system, data storage method, data storage device and related equipment
CN202210118694.3 2022-02-08

Publications (1)

Publication Number Publication Date
WO2023151545A1 true WO2023151545A1 (en) 2023-08-17

Family

ID=87563604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074690 WO2023151545A1 (en) 2022-02-08 2023-02-07 Data storage system, data storage method and apparatus, and related device

Country Status (2)

Country Link
CN (1) CN116610598A (en)
WO (1) WO2023151545A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648959A (en) * 2016-09-07 2017-05-10 华为技术有限公司 Data storage method and storage system
US20180341429A1 (en) * 2017-05-25 2018-11-29 Western Digital Technologies, Inc. Non-Volatile Memory Over Fabric Controller with Memory Bypass
CN110647480A (en) * 2018-06-26 2020-01-03 华为技术有限公司 Data processing method, remote direct memory access network card and equipment
CN112988680A (en) * 2021-03-30 2021-06-18 联想凌拓科技有限公司 Data acceleration method, cache unit, electronic device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648959A (en) * 2016-09-07 2017-05-10 华为技术有限公司 Data storage method and storage system
US20180341429A1 (en) * 2017-05-25 2018-11-29 Western Digital Technologies, Inc. Non-Volatile Memory Over Fabric Controller with Memory Bypass
CN110647480A (en) * 2018-06-26 2020-01-03 华为技术有限公司 Data processing method, remote direct memory access network card and equipment
CN112988680A (en) * 2021-03-30 2021-06-18 联想凌拓科技有限公司 Data acceleration method, cache unit, electronic device and storage medium

Also Published As

Publication number Publication date
CN116610598A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
US8347050B2 (en) Append-based shared persistent storage
US10649969B2 (en) Memory efficient persistent key-value store for non-volatile memories
US10241722B1 (en) Proactive scheduling of background operations for solid state drives
US10860245B2 (en) Method and apparatus for optimizing data storage based on application
CN111078607B (en) Network access programming framework deployment method and system for RDMA (remote direct memory access) and nonvolatile memory
WO2019001521A1 (en) Data storage method, storage device, client and system
US10223364B2 (en) Managing a binary object in a database system
CN110727403B (en) Metadata management method and device
US10884926B2 (en) Method and system for distributed storage using client-side global persistent cache
WO2020199760A1 (en) Data storage method, memory and server
US10719240B2 (en) Method and device for managing a storage system having a multi-layer storage structure
WO2017113211A1 (en) Method and device for processing access request, and computer system
US20240086113A1 (en) Synchronous write method and device, storage system and electronic device
CN111694806B (en) Method, device, equipment and storage medium for caching transaction log
WO2023151545A1 (en) Data storage system, data storage method and apparatus, and related device
CN117112219A (en) Method and device for accessing memory data of host
CN113805789A (en) Metadata processing method in storage device and related device
WO2022257685A1 (en) Storage system, network interface card, processor, and data access method, apparatus, and system
US11947419B2 (en) Storage device with data deduplication, operation method of storage device, and operation method of storage server
US11662949B2 (en) Storage server, a method of operating the same storage server and a data center including the same storage server
US11592986B2 (en) Methods for minimizing fragmentation in SSD within a storage system and devices thereof
CN115509437A (en) Storage system, network card, processor, data access method, device and system
WO2019165901A1 (en) Data merging method, fpga-based merger and database system
US20230105067A1 (en) Metadata Processing Method in Storage Device, and Related Device
WO2022222523A1 (en) Log management method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23752325

Country of ref document: EP

Kind code of ref document: A1