WO2017113960A1 - 一种数据处理方法以及NVMe存储器 - Google Patents

一种数据处理方法以及NVMe存储器 Download PDF

Info

Publication number
WO2017113960A1
WO2017113960A1 PCT/CN2016/103268 CN2016103268W WO2017113960A1 WO 2017113960 A1 WO2017113960 A1 WO 2017113960A1 CN 2016103268 W CN2016103268 W CN 2016103268W WO 2017113960 A1 WO2017113960 A1 WO 2017113960A1
Authority
WO
WIPO (PCT)
Prior art keywords
nvme
value
host
storage space
memory
Prior art date
Application number
PCT/CN2016/103268
Other languages
English (en)
French (fr)
Inventor
邱鑫
许慧锋
郭海涛
刘洪广
刘华伟
谭春毅
吉辛维克多
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP16880751.9A priority Critical patent/EP3260971B1/en
Priority to EP21155651.9A priority patent/EP3916536A1/en
Priority to CN201680003110.0A priority patent/CN107209644B/zh
Publication of WO2017113960A1 publication Critical patent/WO2017113960A1/zh
Priority to US15/971,990 priority patent/US10705974B2/en
Priority to US16/899,294 priority patent/US11467975B2/en
Priority to US17/947,812 priority patent/US20230011387A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/72Details relating to flash memory management
    • G06F2212/7201Logical to physical mapping or translation of blocks or pages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • Embodiments of the present invention relate to the field of storage, and more particularly to the field of NVMe.
  • the NVMe (Non-Volatile Memory Express) protocol is a high-speed interface protocol used in storage systems.
  • the NVMe protocol provides faster read/write speed and lower latency than the SCSI protocol.
  • the popularity is getting higher and higher.
  • KV key value
  • the steps are: the host needs to put the KV command (generally by Key, Value, and Metadata constitutes a conversion to block data (for example, splitting/merging a KV name into one or at least two block data); the host assigns an LBA address to the block data; the host sends the block data to the NVMe memory; the NVMe memory receives the block data The block data is then stored one by one according to the assigned LBA address.
  • the present invention provides a data processing method, an NVMe memory, and a scheme of a storage system.
  • the efficiency of writing KV data to the NVMe memory can be improved. In some scenarios, correspondingly, the efficiency of reading KV data from the NVMe memory is also improved.
  • a first aspect of the embodiments of the present invention provides a data processing method, including: fast non-volatile storage NVMe memory receiving an NVMe write command sent by a host, where the NVMe write command Carrying a key, the NMVe write command carries a value pointer, the value pointer points to a first storage space in the host, the first storage space is used to store a value, and the key belongs to the same KV pair as the value
  • the NVMe memory obtains the key from the NVMe write command, obtains a value length according to the value pointer, allocates a second storage space for the value according to the value length, and the second storage space is in the
  • the NVMe memory sends a first transmission request to the host, obtains the value from the host, and saves the value in the second storage space. Based on the scheme, in the process of transferring KV data from the host to the NVMe memory, it is not necessary to convert the KV data into a block form, thereby improving the storage efficiency of the
  • the NVMe memory sends the first transmission request to the host and obtains the value from the host, specifically, the NVMe memory sends a DMA transmission request to the host. Obtaining the value from the host, the DMA instruction carrying the first storage space as an access address, carrying the second storage space as a write address, wherein the NVMe memory and the host use a PCIe bus connection.
  • This scheme provides a way to store the value transmission using DMA.
  • the NVMe memory sends a first transmission request to the host and obtains the value from the host, where the method includes: the memory sending an RDMA transmission request to the host, Obtaining the value from the host, the RDMA instruction carrying the first storage space as an access address, carrying the second storage space as a write address, wherein the NVMe storage and the host are connected by a fabric bus .
  • This scheme provides a way to store the value transmission using RDMA.
  • the NVMe write command further carries a field of the KV number, where the KV number field is used to describe the number of KVs in the NVMe write command
  • the NVMe memory obtains the same number of keys as the number of KVs from the NVMe write command, and obtains the same number of values as the number of KVs. This scheme can support multiple KVs being carried in the same NVMe write command.
  • the NVMe write command further carries a field of a KV format, where a field of the KV format describes a structure of a field in the NVMe write command
  • the NVMe memory obtains each field from the NVMe write command according to the field content defined by the KV format field.
  • the scheme can enable the same NVMe device to support NVMe write command messages of KV in multiple formats, and use one of the format write command messages during a specific write operation.
  • the same NVMe device can also support the NVMe read command message of the KV in multiple formats.
  • the method further includes: the NVMe memory obtaining a metadata length according to the metadata pointer Allocating a fourth storage space for the metadata according to the length of the metadata, the fourth storage space is in the NVMe storage; the NVMe storage obtaining the from the host by using the first transmission request Metadata, the metadata is saved in the fourth storage space.
  • This scenario describes how to store metadata in KV.
  • the NVMe memory releases the storage space allocated for the value, and is released as The fourth storage space of the metadata allocation.
  • the solution can release the resources occupied by the value in time.
  • the NVMe memory sends a first transmission request to the host and obtains the value from the host, specifically including one of: the memory sending At least two DMA transfer requests to the host to obtain the value, each DMA transfer request for requesting to obtain a portion of the value, and when any one DMA transfer request fails to execute, the NVMe memory is released for the value assigned a storage space, wherein the NVMe memory and the host are connected by a PCIe bus; or the memory sends at least two RDMA transfer requests to the host to obtain the value, and each RDMA transfer request is used to request an acquisition Part of the value, when any one RDMA transfer request fails to execute, the NVMe memory releases the storage space allocated for the value, wherein the NVMe memory and the main The machine is connected with Frabic.
  • the scheme can divide the same value into multiple RDMA transmission requests for transmission, reducing the amount of data per RDMA transmission.
  • the method further includes: generating a mapping relationship between the key and the second storage space.
  • This mapping relationship can provide a basis for subsequent KV (especially value) reading schemes.
  • the method further includes: the NVMe memory receiving an NVMe read command from the host, the NVMe read command carrying the key.
  • the NVMe memory may query the mapping relationship according to the key, obtain location information of the second storage space storing the value, and send location information of the second storage space. Giving the host; the NVMe memory receiving a second transmission request sent by the host, the second transmission request for requesting obtaining data stored in the second storage space.
  • An eighth possible implementation of the first aspect may enable reading of KV (or value).
  • the NVMe memory before receiving the second transmission request sent by the host, further includes: the host reserving the according to a size of the second storage space The third storage space in the host. After the NVMe memory sends the value to the host, the method further includes: the host writing the received data from the second storage space to the fourth storage space.
  • This scenario describes the operations performed by the host during the process of reading KV (or value).
  • the NVMe read command sent by the host that is received by the NVMe memory further carries free space information of the host, where the NVMe memory receives the After the NVMe reads the command, the method further includes: determining, by the NVMe, whether the free storage space of the host is greater than or equal to the second storage space, and if yes, performing sending the location information of the second storage space to the host Step, if no, end the step.
  • the solution carries the free space information of the host in the NVMe read command, and the host does not need to After receiving the NVMe memory response, it is determined whether the free space is sufficient and the storage space is reserved, which can reduce the number of interactions between the NVMe memory and the host, and improve the efficiency of the host reading the value. It should be noted that the free space here is a space with consecutive addresses.
  • the first storage space is described by a first address of the first storage space and the value length; and the second storage space is first of a second storage space Address description.
  • This scenario describes how the storage space is described. By means of the content described in this description, it is possible to locate the storage location of the read value and the storage location of the write value.
  • the present invention provides an embodiment of an NVMe memory, including a controller and a storage medium, the controller and the storage medium being connected, the storage medium is configured to provide a storage space, and the processor is configured to Execution: receiving a fast non-volatile storage NVMe write command sent by the host, where the NVMe write command carries a key, the NMVe write command carries a value pointer, and the value pointer points to a first storage space in the host, The first storage space is used to store a value, and the key belongs to the same KV pair; the key is obtained from the NVMe write command, and the value length is obtained according to the value pointer, according to the value length.
  • the value allocates a second storage space, the second storage space is in the storage medium; sending a first transmission request to the host, obtaining the value from the host, and saving the value in the first Two storage spaces.
  • the NVMe memory is configured as a method of performing one aspect and a method of various implementations of the first aspect.
  • a third aspect of the present invention provides a method for implementing a data processing method, the method comprising: a non-volatile storage fast NVMe memory receiving an NVMe write command, a header of the NVMe write command carrying a key, the NMVe command also carrying Value: the key corresponds to the value, the key and the value belong to the same KV pair; the NVMe memory obtains the key and the value from the NVMe write command; the NVMe memory The value is stored in a storage medium of the NVMe memory.
  • the NVMe write command further carries a field with a number of KVs for describing the number of KVs in the NVMe write command, the NVMe memory obtaining the same number of keys as the number of KVs from the NVMe write command, and obtaining the number of KVs The same number of values.
  • the NVMe write command further carries a field of a KV format, where a field of the KV format describes a structure of a field in the NVMe write command, the NVMe The memory obtains each field from the NVMe write command according to the field content defined by the KV format field.
  • the NVMe write command further carries the length of the key
  • the NVMe memory obtains the key from the write command, specifically: from the key The preset starting position, the key is obtained from the write command according to the length of the key.
  • the NVMe memory obtains the value specific from the NVMe write command Including: the NVMe memory obtains the value according to the length of the value from the position indicated by the offset.
  • the method further includes: generating a mapping relationship between the key and the value storage space.
  • a third possible implementation manner is the fifth possible implementation manner of the third aspect of the present application.
  • the method further includes: the host sending an NVMe read command to the NVMe memory, where the NVMe read command carries the The NVMe memory receives the read command from the host, and obtains the key from the NVMe read command; the NVMe memory uses the key to search for the value storage space from the mapping relationship; The NVMe memory obtains the value from using the value storage space; the response message of the NVMe memory generating the NVMe read command is sent to the host, and the response message carries the value.
  • the present invention provides an implementation of an NVMe memory, including a controller and Storage medium.
  • the controller is configured to perform the methods of the three aspects and the methods of the various implementations of the third aspect.
  • a fifth aspect of the embodiments of the present invention provides a storage device, which may be a physical device, such as an NVMe memory, or a logical device, such as a program running in a processor of an NVMe memory, or a storage server. program of.
  • the device includes: an interface module, configured to receive an NVMe write command sent by the host, where the NVMe write command carries a key, the NMVe write command carries a value pointer, and the value pointer points to a first storage space in the host, The first storage space is used to store a value, and the key belongs to the same KV pair; the processing module is configured to obtain the key from the NVMe write command, and obtain a value length according to the value pointer.
  • the storage module is configured to send a first transmission request to the host, obtain the value from the host, and save the value in the second storage In space.
  • the second storage space may be provided by a storage medium in the NVMe storage, the storage medium being coupled to a processor of the NVMe storage. Based on the scheme, in the process of transferring KV data from the host to the NVMe memory, it is not necessary to convert the KV data into a block form, thereby improving the storage efficiency of the KV data.
  • the storing, by the storage module, the first transmission request to the host, and obtaining the value from the host specifically, the storage module sends a DMA transmission request to the host And obtaining the value from the host, where the DMA instruction carries the first storage space as an access address, and carries the second storage space as a write address.
  • the storage device (the storage device is hardware) and the host are connected by a PCIe bus, or the NVMe memory where the storage device (the storage device is software) is connected to the host by a PCIe bus. This scheme provides a way to store the value transmission using DMA.
  • the storing, by the storage module, the first transmission request to the host, and obtaining the value from the host specifically, the storage module sends an RDMA transmission request to the host Obtaining the value from the host, the RDMA instruction carrying the first storage space as an access address, and carrying the second storage space as a write address
  • the storage device (the storage device is hardware) and the host are connected by a fabric bus, or the NVMe storage device where the storage device (the storage device is software) is connected to the host by a fabric bus. This scheme provides a way to store the value transmission using RDMA.
  • the NVMe write command further carries a field of the KV number, where the KV number field is used to describe the number of KVs in the NVMe write command
  • the NVMe memory obtains the same number of keys as the number of KVs from the NVMe write command, and obtains the same number of values as the number of KVs. This scheme can support multiple KVs being carried in the same NVMe write command.
  • the NVMe write command further carries a field of a KV format, where a field of the KV format describes a structure of a field in the NVMe write command
  • the NVMe memory obtains each field from the NVMe write command according to the field content defined by the KV format field.
  • the scheme can enable the same NVMe device to support NVMe write command messages of KV in multiple formats, and use one of the format write command messages during a specific write operation.
  • the same NVMe device can also support the NVMe read command message of the KV in multiple formats.
  • the method further includes: the NVMe memory obtaining a metadata length according to the metadata pointer Allocating a fourth storage space for the metadata according to the length of the metadata, the fourth storage space is in the NVMe storage; the NVMe storage obtaining the from the host by using the first transmission request Metadata, the metadata is saved in the fourth storage space.
  • This scenario describes how to store metadata in KV.
  • the processing module is further configured to release a storage space allocated for the value, And releasing a fourth storage space allocated for the metadata.
  • the solution can release the resources occupied by the value in time.
  • the storage module sends the first transmission request to the host and obtains the value from the host, and specifically includes one of the following: the processing module Transmitting at least two DMA transfer requests to the host to obtain the value, each DMA transfer request is used to request to obtain a part of the value, and when any one DMA transfer request fails to execute, the storage module is released as the Value allocated storage space, wherein the NVMe memory and the host are connected by a PCIe bus; or the storage module sends at least two RDMA transfer requests to the host to obtain the value, and each RDMA transfer request is used Upon requesting to obtain a portion of the value, when any one of the RDMA transfer requests fails to execute, the storage module releases the storage space allocated for the value, wherein the NVMe memory and the host are connected by Frabic.
  • the scheme can divide the same value into multiple RDMA transmission requests for transmission, reducing the amount of data per RDMA transmission.
  • the processing module is further configured to: generate a mapping relationship between the key and the second storage space. This mapping relationship can provide a basis for subsequent KV (especially value) reading schemes.
  • the interface module is further configured to receive an NVMe read command from the host, where the NVMe read command carries the key.
  • the processing module is further configured to: obtain, according to the key, the location information of the second storage space that stores the value, and use the processing module to use the interface module to The location information of the second storage space is sent to the host; the interface module is further configured to receive a second transmission request sent by the host, where the second transmission request is used to request to obtain the storage in the second storage space.
  • the data is further configured to receive a second transmission request sent by the host, where the second transmission request is used to request to obtain the storage in the second storage space.
  • An eighth possible implementation of the fifth aspect can implement reading of KV (or value).
  • the host before the receiving, by the interface module, the second transmission request sent by the host, the host is further configured to: reserve according to the size of the second storage space a third storage space in the host. After the storage device sends the value to the host, the host is further configured to write the received data from the second storage space into the The fourth storage space.
  • This scheme describes the functions that the host has in reading KV (or value).
  • the NVMe read command sent by the host that is received by the interface module further carries the free space information of the host, where the NVMe memory receives the After the NVMe reads the command, the method further includes: determining, by the NVMe, whether the free storage space of the host is greater than or equal to the second storage space, and if yes, performing sending the location information of the second storage space to the host Step, if no, end the step.
  • the solution carries the free space information of the host in the NVMe read command, and the host does not need to determine whether the free space is sufficient and reserves the storage space after receiving the NVMe memory response, thereby reducing the number of interactions between the NVMe memory and the host, and improving the number of interactions.
  • the efficiency of the host reading the value It should be noted that the free space here is a space with consecutive addresses.
  • the first storage space is described by a first address of the first storage space and the value length; and the second storage space uses a first storage space Address description.
  • This scenario describes how the storage space is described. By means of the content described in this description, it is possible to locate the storage location of the read value and the storage location of the write value.
  • the sixth aspect the storage device provided by the fifth aspect and the various possible implementation manners of the fifth aspect, the present invention further provides an implementation of the storage system, where the storage system comprises a host and a storage device.
  • a seventh aspect of the present invention provides a storage device, which may be a physical device, such as an NVMe memory, or a logical device, such as a program running in a processor of an NVMe memory, or a program in a storage server.
  • the device includes: an interface module, configured to receive an NVMe write command, the head of the NVMe write command carries a key, the NMVe command further carries a value, the key and the value correspond, the key and the The value belongs to the same KV pair; the processing module is configured to obtain the key and the value from the NVMe write command, and the storage module is configured to save the value in a storage medium of the NVMe memory.
  • the NVMe write command further carries a field with a number of KVs for describing the number of KVs in the NVMe write command, the processing module obtaining the same number of keys from the NVMe write command as the number of KVs, and obtaining the number of KVs The same number of values.
  • the NVMe write command further carries a field of a KV format, where the field of the KV format describes a structure of a field in the NVMe write command, and the processing The module obtains each field from the NVMe write command according to the field content defined by the KV format field.
  • the NVMe write command further carries the length of the key
  • the processing module obtains the key from the write command, specifically: from the key The preset starting position, the key is obtained from the write command according to the length of the key.
  • the processing module obtains the value specific from the NVMe write command Including: the processing module obtains the value according to the length of the value from the position indicated by the offset.
  • the storage module is further configured to: generate a mapping relationship between the key and the value storage space.
  • the seventh possible implementation manner is based on the fifth possible implementation manner of the seventh aspect, and provides a mode for the host: the host is configured to send an NVMe read command to the NVMe memory, where the NVMe read command is carried.
  • the interface module is further configured to receive the read command from the host, and obtain the key from the NVMe read command; the processing module is further configured to use the key from the mapping relationship. Searching the value storage space; the processing module is further configured to obtain the value from using the value storage space; the processing module is further configured to generate a response message of the NVMe read command, by using the interface module A response message is sent to the host, and the response message carries the value.
  • the eighth aspect in combination with the storage device provided by the seventh aspect and the various possible implementation manners of the seventh aspect, the present invention further provides an implementation of the storage system, where the storage system includes a host and a storage device.
  • a ninth aspect The memory provided by the second aspect and the various possible implementations of the second aspect, the invention further provides an implementation of the storage system, the storage system comprising a host and an NVMe memory.
  • the memory provided by the fourth aspect and the various possible implementations of the fourth aspect, the invention further provides an implementation of the storage system, the storage system comprising a host and a storage device.
  • a solution for a data processing method, an NVMe memory, and a storage system is provided based on the present invention.
  • the efficiency of writing KV data to the NVMe memory can be improved.
  • 1 is a flow chart of an embodiment of a data processing method
  • FIG. 2 is a schematic diagram of a NVMe command format
  • FIG. 3 is a schematic diagram of a NVMe command format
  • FIG. 4 is a schematic diagram of a NVMe command format
  • FIG. 5 is a schematic diagram of a NVMe command format
  • FIG. 6 is a schematic diagram of a mapping relationship between a key and a value storage location and a metadata storage location
  • FIG. 7 is a topological diagram of a hardware topology embodiment of a data processing system
  • Figure 8 is a flow chart of an embodiment of a data processing method
  • Figure 9 is a schematic diagram of a command format
  • Figure 10 is a schematic diagram of a command format
  • Figure 11 is a schematic diagram of a command format
  • Figure 13 is a flow chart of an embodiment of a method of reading data
  • Figure 15 is a logic functional diagram of an embodiment of a data processing system.
  • Non-Volatile Memory Express is a logical device interface that supports access to non-volatile storage media using the PCIe bus.
  • the NVMe interface can be used for flash media storage, such as solid state drive SSDs.
  • a device on the NVMe interface called an NVMe device.
  • the NVMe memory is a type of NVMe device, and refers to an NVMe device having a storage function.
  • the NVMe memory (in the embodiment of the present invention, the NVME memory is referred to as a memory) is mainly described as an example.
  • KV storage also known as K/V storage
  • K/V storage is a type of storage technology.
  • the key-value/key value pair is the basic data model.
  • a Key-value pair can include a key and a value. After extending the key-value pair, the Key-value pair can also include metadata (Metadata). The key uniquely marks a value.
  • the operations that the host needs to perform include: first, converting the KV data into a block form; then, allocating the logical block address (Logic Block Address) on the memory to the block. LBA); Next, a write command is generated, which carries the block and the LBA of the block; after receiving the block, the NVMe memory stores the block according to the physical address corresponding to the LBA. If you want to read these KV data, you need to read the block data first and then convert the block data into KV data. In the process of writing KV data, allocating LBA addresses and converting KV data into block data consumes a lot of time and system resources of the host, increasing system delay.
  • the host needs to send the location information (such as the starting address + length) of the block to be read to the NVMe memory.
  • the location information such as the starting address + length
  • re-converting the block data into KV data also consumes a large number of operations of the host. Resources, increasing system latency.
  • NVMe scenarios and NVMe Over Fabric (NOF) scenarios.
  • the host and the memory are connected by a bus (for example, a PCIe bus).
  • the memory is stored.
  • the storage is a component of the host, for example, the host is a server, and the storage is an NVMe interface solid state drive (SSD) in the server.
  • the host and the memory are connected by a fabric (for example, Ethernet, FC, etc.).
  • the host includes both memory and a processor, and the host and the memory can be two separate devices.
  • the host is the initiator of the read data and the write data, and thus is called an initiator; the memory is a responder of the read request or the write request, which is also called a target.
  • the embodiment of the present invention can extend the existing NVMe protocol and propose a new NVMe command.
  • the new NVMe command extends and optimizes existing NVMe commands (such as the NVMe command defined in NVMe Standard Protocol 1.2.1 or the NVMe command defined in NVMe over Fabric Standard Protocol 1.0) so that NVMe devices can directly Support KV interface.
  • the KV data can be directly transmitted between the memory and the host through the NVMe protocol.
  • the host can directly write KV data to the memory through the NVMe protocol, or read the KV directly from the memory through the NVMe protocol. Therefore, the repeated conversion between the KV format and the Block format as in the prior art is avoided between the memory and the host, which reduces system complexity and improves system performance.
  • the NVMe commands in the embodiments of the present invention all refer to extended NVMe commands, unless otherwise specified.
  • the key is directly carried in the command header of the NVMe command. Therefore, the command header of the NVMe can be directly read (the payload of the NVMe command does not need to be parsed) to obtain the key.
  • the NVMe command For value, it can be carried in the NVMe command, and the value can be obtained by reading the NVMe command.
  • the NVMe command may not directly carry the value, but a pointer carrying the value.
  • the value pointer directly or indirectly points to the storage space of the value, and the value can be obtained from the value storage space by the DMA/RDMA technique.
  • Number of KV The number of KVs transmitted in an NVMe command.
  • the KV includes a key and a value corresponding to the key.
  • KV includes a key and a value pointer, the value pointer points to the value storage space, and the value corresponds to the key.
  • Key Uniquely identifies a Value, which can also be called a key.
  • the combination of Key and corresponding value can be referred to as a KV or KV pair.
  • KV key value
  • KV pair A combination of key and value, also known as KV pair.
  • metadata is also included in the KV.
  • Metadata An attribute used to describe a value. For example, if Value is a movie, the metadata may include information not limited to: movie name, duration, starring, and the like.
  • Common Header A part of the Header that refers to the same part of the command header as the existing NVMe command header.
  • KV Format ID Indicates the format of the current NVMe command. Or define the contents of each field in the NVMe command.
  • the NVMe command for transmitting KV can have multiple command formats, which are distinguished by the KV Format ID.
  • NVMe messages of different command formats can have different fields or different fields.
  • Key Length Describes the length of the key.
  • Value Length Describes the length of the value.
  • Metadata Length Describes the length of the metadata.
  • Value Offset Describes the offset position of value in the NVMe command.
  • Metadata Offset Describes the offset position of the metadata in the NVMe command.
  • DPTR Data Pointer
  • MPTR Metal Pointer: A metadata pointer that points to the metadata to be transmitted.
  • PRP Physical Region Page Entry: Physical area page instance (or physical area page format entry), which can record pointers.
  • PRP is two data transmission protocols commonly used in the NVMe protocol and can be used in the NVMe over PCIe architecture.
  • a PRP entry can point to a pointer to a physical memory page.
  • SGL (Scatter Gather List) Entry Decentralized collection table instance (or decentralized collection table format entry), which can record pointers.
  • SGL is one of the two data transmission protocols commonly used in the NVMe protocol and can be used in the NVMe over PCIe/Fabric architecture.
  • the present invention provides an embodiment of a data processing method that can be used between an NVMe memory and a host.
  • the host generates an NVMe write command (hereinafter referred to as a write command) and sends the command to the memory.
  • the write command carries the value and the key corresponding to the value.
  • the write command is sent to the NVMe memory through the host's NVMe interface.
  • the key and value can be carried in the payload or in the command header.
  • the Key may be carried in the header of the command, or may be carried in a payload.
  • the fields carried in the write command/read command may be in the command header or in the payload, unless otherwise stated.
  • value can be carried in the command header or in the payload.
  • the write command may also carry a number of KV, which is used to describe the number of KVs.
  • the memory can read the KV according to the number of KV indicated by number of KV, and stop reading after reading all KVs.
  • the write command further carries the KV Format ID
  • the NVMe command for transmitting the KV may have multiple command formats.
  • the fields may be different, and the positions of the fields may also be different.
  • the KV Format ID is used to indicate the command format used by this command.
  • the position of the key in the command header of the write command may be unfixed or may be a preset fixed position. If the position of the key in the write command is not fixed, the key position information may be further carried in the write command.
  • the key position information may be a combination of a key length and a key offset, and the key offset describes a starting position of the key in the write command.
  • the key location information may also be a combination of the start position of the key in the write command and the end position of the key in the write command. If the key is preset in the start position of the write command, the key position information may be a key length, and the key length describes the length of the key.
  • pre-setting the starting position of the key There are two options for pre-setting the starting position of the key. One is to pre-set the position of the key in the command, for example, starting from the first bit of the command, starting from the 20th bit; the other is pre-setting the relative position of the key and other fields, such as the key field in the key length After the field, and the key field is adjacent to the key length field. Similarly, the "pre-set" positions involved in this embodiment and other embodiments may use both solutions, such as the value, the starting position of the metadata, which will be introduced later in this embodiment.
  • the key can be fixed length or variable length.
  • the key is a fixed length, which means that the length of the key in each command is the same.
  • Key is variable length means that the length of the key can be different in different commands.
  • the receiver of the command can ignore the position information (for example, key length), or the write command can disregard the position information.
  • the memory After receiving the write command, the memory directly reads the key according to the preset starting position and the fixed length.
  • the position of value in the write command can be fixed or not fixed. If the value of the value in the write command is not fixed, the write command may further carry the value location information.
  • the value location information may be a value length, and the value is preset in the start position of the write command, and thus may not be carried in the write command. Starting from the beginning of the value and using the value length as the read length, you can read the value from the write command.
  • the value position information may be a combination of value length and value offset, and the value offset describes the starting position of the value in the write command.
  • the value location information may also be a combination of the start position of the value in the write command and the end position of the value in the write command.
  • Value can be fixed length or variable length.
  • the recipient of the command can To ignore location information (such as value length), or to not carry location information in the write command.
  • the receiver of the command can directly read the value according to the preset starting position and the fixed length.
  • the write command further carries the value of the metadata.
  • Metadata can be carried in the command header of the write command or in the payload.
  • the location of metadata in the write command can be fixed. If the location of the metadata in the write command is not fixed, the write command can further carry the metadata location information.
  • the metadata location information is similar to the value location information. Therefore, refer to the description of the value location information, which is not described here.
  • metadata can be fixed length or variable length.
  • the common header is part of the NVMe command header.
  • the command carries two KVs, so there are two keys, namely key1 and key2; there are two values, value1 and value2; two key lengths are key1 length and key2length; two value lengths are value1. Length and value2 length.
  • the command also carries 2 metadata, namely metadata1 and metadata2; and corresponding metadata length1 and metadata length2.
  • FIG. 3 it is a schematic diagram of another NVMe command format.
  • the value position information value1 offset, value2 offset
  • the metadata position information metadata position information
  • the number of KV is also added, and the command in the example of FIG. 3 is added. It carries 2 KVs, so the value of number of KV is 2.
  • the location information in the NVMe command describes the location of the field in the NVMe. In Fig. 3, it is indicated by an arrow: the position of the data to be written (for example, the starting position) can be found by the offset.
  • the value of the value1 offset field describes the offset position of value1 in the command, so that the value1 offset can be read.
  • the distribution of the fields is determined by the type of the field, and the fields of the same type are adjacent. For example, both value and metadata belong to the data part, so the adjacent; value location information and metadata location information belong to the location information, and therefore are adjacent.
  • FIG. 4 it is a schematic diagram of another NVMe command format. Compared with Figure 3, the fields are attributed to KV. Distribution, adjacent to the same KV field. In the command format of Figure 4, the field of KV1 is followed by the field of KV2. In addition, in the command format of Figure 4, value and value length; metadata and metadata length.
  • FIG. 5 it is a schematic diagram of another NVMe command format.
  • the fields of the same KV are adjacent.
  • the value length is adjacent to the metadata length, and the value is adjacent to the metadata.
  • the NVMe interface of the memory receives the write command, and the memory obtains the key and value from the write command. If the write command carries metadata, the metadata is also obtained from the write command.
  • the memory is composed of a controller and a storage medium, and the controller includes a processor, and optionally, may also include a memory.
  • the storage medium is, for example, a flash memory or a magnetic disk.
  • the memory can also be a hard disk with management capabilities, called a smart hard disk.
  • the memory can determine which format the received command is through the KV Format ID. Then read the contents of key and value according to the position of the key, value and other fields in this format.
  • Different NVMe command formats can have different data location relationships. For example, in some formats, the location of the key is fixed; in some formats, the location of the key is not fixed and is determined by the key location information.
  • a write command carries a KV
  • this KV If at least two KVs are carried in the write command, the at least two KVs are read.
  • the memory can know the number of KVs through the number of KV fields, and complete the read operation after reading the corresponding number of KVs. In addition to using the number of KV to mark the completion of the read operation, you can also add a terminator to the command. After reading the terminator, the KV in the entire command has been read.
  • the memory reads the key through the fixed position. If the location of the key is not fixed, the key is read from the write command according to the key location information.
  • the location information of the key is key length, and the key is preset in the start position of the write command. Starting from the start position of the key, the key length is used as the read length to continuously read the data, and the write command can be obtained.
  • Carry the key For example, if the location information of the key is a combination of a key length and a key offset, the key offset is used as the starting position of the key, and the key length is used as the read length to read the data continuously, and the data can be obtained in the write command.
  • Key For example, if the location information of the key is composed of the start position and the end position, the data between the start position and the end position is read, and the key is obtained.
  • the memory can obtain key length/value length/metadata length.
  • the lengths of the command header, KV format ID, key length, and value length are fixed, and the relative positions of the fields are also fixed, so that key offset, value offset, etc. are not required.
  • the order in which the memory reads the write command fields is: read the command header, read the KV format ID, read the key 1 length, read the key1 according to the value recorded by the key 1 length, read the value1length, and read the value1 according to the value1 length
  • the memory stores the value. Specifically, the controller of the memory stores the key and value in a non-volatile storage medium. After the memory interface receives the value, it is sent to the processor. In this step, the processor temporarily stores the value in the memory and then sends it from the memory to the non-volatile storage medium.
  • the memory can also record the mapping relationship between key and value storage space.
  • the mapping relationship may be stored by a storage medium of the memory; or may be sent to the host and stored by the host. The following is an example of storage by a memory.
  • the write command also carries metadata
  • the metadata is also stored, and the mapping relationship between the key and the metadata storage space is recorded.
  • the value storage space can be described by the starting address and value length of the value storage space.
  • Metadata storage space can include the starting address of the metadata storage space and the length of the metadata data description.
  • 6 is a schematic diagram of a mapping relationship, including a mapping of a starting address of a key and a value storage space, a mapping of a key and a value length, a mapping of a key and a starting address of a metadata storage space, and a mapping of a key and a metadata length.
  • the key can be used as an index to find the starting address, the value length, the starting address of the metadata storage space, and the metadata length of the corresponding value storage space in the mapping relationship.
  • the content depicted in the schematic of Figure 6 is recorded in the KV Management Unit. If the value and metadata are fixed length, the value length and the length of the metadata are optional.
  • the storage space can be a logical location or a physical location. As long as the memory controller can use the storage space, the value and metadata can be read from the storage medium.
  • Steps 11 - 13 above are the process of writing KV, and the next steps 14 - 16 are the process of reading KV.
  • the two processes are independent of each other, and the KV requested by the read command and the KV written by the write command may not be the same.
  • the host generates an NVMe read command (hereinafter referred to as a read command), and sends a read command to the memory, and the read command carries the key.
  • the key can be carried in the command header of the read command.
  • the host that generates the read command and the host that generates the write command can be the same host or different hosts.
  • the command format of the read command refers to the format of the write command in step 11 and FIG.
  • Read commands include the command header, KV format ID, and key.
  • the read command carries key location information.
  • a read command can carry a key or carry at least two keys. When carrying at least two keys, the number of KV field may be carried to describe the number of keys carried.
  • the memory receives the read command through the NVMe interface and obtains the key from the read command. From the KV management unit stored in the memory, look up the value storage space corresponding to the key. A response message that obtains a value from the value storage space and constructs a read command is sent to the host, and the response message carries the value.
  • the way to obtain the key is slightly different depending on the command format.
  • the key can be obtained from a fixed position of the read command, or can be obtained from the location information carried in the read command. Get the key in the write command.
  • the storage space of the metadata can be queried in a similar manner and the metadata is obtained.
  • the KV management unit is stored in the host, the key is used to query the value storage space in the host, and the value storage space is sent to the memory, and the memory obtains the value by the value storage space and sends the value to the host.
  • the host receives the value and stores it. For example, it may be stored in the host's memory (such as a cache) or in the host's non-volatile storage medium.
  • the value and the metadata are carried in the NVMe command, and the conversion between the KV and the block is not required, which has the advantages of being simple and quick. Reading commands do not carry LBAs with value and/or metadata, so the process of reading value and/or metadata is faster.
  • a hardware topology embodiment of the data processing system of the present invention is in communication with host 71 and memory 72.
  • the host 71 includes a processor 711, a memory 712, and an interface 713.
  • the memory 72 includes an interface 721, a controller 722, and a storage medium 723.
  • Interface 713 and interface 721 are connected by a communication link 73, such as a PCIe bus, Fibre Channel FC, or Ethernet.
  • the operations performed by the host 71 may be performed by the processor 711.
  • the processor 711 may execute steps 11, 14, and 16 by running a program in the memory 712.
  • the memory 712 and the processor 711 are relatively independent or may be integrated.
  • the memory 72 includes a control area 721 and a storage medium 722.
  • the operations performed by the memory 72 are performed by the controller 721 of the memory.
  • the operations of the memory may be performed by the processor 7211 running a program in the memory 7212, such as a processor of the memory for performing steps 12, 13, and 15.
  • the processor is an FPGA, there may be no memory and the corresponding operations are performed directly by the processor.
  • steps 21, 24, and 27 are performed by host 71, and 23 and 25 are executed by memory 72.
  • steps 22 and 26 part of the operation is performed by the host 71, and another part of the operation is performed by the save.
  • the reservoir 72 is executed.
  • the present invention further provides an implementation manner.
  • the write command does not directly carry the value or the metadata, but carries the pointer in the NVMe command, and the value and/or metadata to be written can be obtained through the storage space pointed by the pointer, or In the case of a multi-level pointer, one pointer points to another pointer, and the value to be written and/or metadata is obtained from the storage space pointed to by the other pointer.
  • This embodiment has no limitation on the value to be written and/or the size of the metadata to be written.
  • the host and the storage can be connected through a network such as IP or FC, and can run on a fabric architecture, also known as a NOF (NVMe Over Fabric) architecture.
  • NOF NVMe Over Fabric
  • the memory can obtain value and/or metadata by means of Remote Direct Memory Access (RDMA).
  • RDMA Remote Direct Memory Access
  • DMA Direct Data Access
  • the host can also obtain value and/or metadata directly from memory via RDMA/DMA.
  • Step 21 The host constructs a write command, and the write command carries KV, and the KV includes a value pointer and a value key.
  • the write command is sent to the NVMe memory through the host's NVMe interface.
  • the key and the value belong to the same KV pair.
  • the value pointer can be carried in the payload or in the command header. Similarly, other data carried in a write command or a read command can be carried either in the command header or in the payload.
  • the write command may also carry a key pointer without carrying a key.
  • the scheme of obtaining the key according to the key pointer is the same as the principle of obtaining the value by the value pointer, so the details are not described below. This is an example only by taking a key with a write command.
  • the value pointer directly or indirectly points to the value storage space (for convenience of description, hereinafter also referred to as the first storage space), so the value can be obtained by the value pointer.
  • the value pointer points to the value storage space (which is also considered to be the value pointer to the value)
  • the value can be obtained from the storage space pointed to by the value pointer.
  • the value pointer points to the first pointer, and the first pointer points to the storage space of the value, and the first pointer is found from the value pointer.
  • the value pointed to by a pointer to the storage space In the latter case, a larger value can be carried.
  • the reference between the pointers may have more levels, as long as the value of the storage space can be found, for example, the value pointer points to the first pointer, the first pointer points to the A1 pointer, and the A1 pointer points to The A2 pointer, ..., the AN-1 pointer points to the AN pointer, the AN pointer points to the value storage space, and N is an integer greater than or equal to 2.
  • the value pointer can be carried in the command header or payload of the write command.
  • the first storage space may be described by a first address and a length of the first storage space, or may be described by a first address and a last address of the first storage space.
  • the value pointer records the first address and the value length of the first storage space; or, the value pointer points to another pointer, and the other pointer records the first address and the value length of the first storage space.
  • the first storage space can be a logical location or a physical location. The stored data can be read from the storage medium as long as the memory uses the first storage space.
  • the write command can also carry Number of KV. Used to describe the number of KVs.
  • the write command further carries the KV Format ID
  • the NVMe command for transmitting the KV may have multiple command formats.
  • the fields may be different, and the positions of the fields may also be different.
  • the KV Format ID indicates the command format used by this command.
  • the position of the key in the command header of the write command may be fixed or a fixed position set in advance.
  • the write command further carries a metadata pointer. Similar to the value pointer, the metadata pointer points directly or indirectly to the metadata storage space, so the metadata can be obtained through the metadata pointer.
  • the metadata pointer can be carried in the command header or payload of the write command.
  • the metadata pointer directly or indirectly points to the storage space of the metadata, and the metadata can be obtained through the metadata pointer.
  • the metadata pointer directly points to the metadata (specifically, the storage space of the metadata), and the metadata can be obtained from the storage space pointed to by the metadata pointer;
  • the metadata pointer points to the second pointer, and the second pointer points to the storage space of the metadata, the second pointer is found from the metadata pointer, and the metadata is obtained from the storage space pointed to by the second pointer. In the latter case, you can carry longer metadata.
  • the references between the pointers may have more levels, as long as the metadata storage space can be found, for example, the metadata pointer points to the second pointer, the second pointer points to the B1 pointer, and the B1 pointer points to The B2 pointer, ..., the BN-1 pointer points to the BN pointer, the BN pointer points to the storage space of the metadata, and N is an integer greater than or equal to 2.
  • the metadata can be transmitted in the command without the pointer, but can be directly carried in the command.
  • the metadata can be transmitted in the command without the pointer, but can be directly carried in the command.
  • the value pointer can be a Data Pointer (DPTR).
  • the metadata pointer can be a metadata pointer (Metadata Pointer, PMTR).
  • each command may carry one KV, or may carry two or more KVs.
  • the pointer in the command does not point directly to value or metadata, but points to the PRP entry or SGL entry.
  • the PRP entry or SGL entry is the node in the linked list.
  • Each PRP/SGL entry points to another address, PRP/.
  • the value pointed to by the SGL entry stores the value or metadata.
  • the write command points to value or metadata through a secondary pointer.
  • the key is carried in the command, specifically in the command header.
  • the difference from FIG. 10 is that the key of FIG. 10 is not directly carried in the command, but is transmitted by means of a pointer, and how to carry the reference to the value pointer. Shao.
  • the key pointer can be carried in the command header of the command.
  • DPTR points to PRP/SGL entry1, and PRP/SGL entry2 and PRP/SGL entry1 belong to the same linked list, so PRP/SGL entry2 can be found by finding PRP/SGL entry1. Therefore, DTPR can also be considered to point to PRP/SGL entry1 and PRP/SGL entry2.
  • PRP/SGL entry1 points to value1 and PRP/SGL entry2 points to value2.
  • MPTR points to PRP/SGL entry3 and PRP/SGL entry4, PRP/SGL entry3 points to metadata1, and PRP/SGL entry4 points to metadata2.
  • PRP entry and SGL entry are nodes in the linked list.
  • PRP is suitable for normal NVMe (using PCIe to connect host and memory)
  • SGL is suitable for NOF (connecting host and memory with Fabic).
  • RDMA can use PRP
  • DMA can use PRP or SGL.
  • the PRP transmits the location of the data to the NVMe device through a series of pointers to the memory page. After receiving the addresses, the NVMe device can read the data from the host to the NVMe device through DMA.
  • the SGL transport mechanism is more flexible than PRP. It can specify the length of the transmission and can skip some address space for transmission during the continuous transmission of the address.
  • the SGL transmits the address that needs to transmit data to the NVMe device. After receiving the addresses, the NVMe device reads the data from the host to the NVMe device through DMA.
  • the memory receives the write command through the NVMe interface.
  • the memory obtains the length of the value through the value pointer, allocates the storage space according to the value length value (for convenience of description, hereinafter referred to as the second storage space), and transmits the location information of the allocated storage space to the host.
  • the memory sends a transfer request (first transfer request) to the host, and obtains a value from the first storage space of the host.
  • the host and memory are PCIe connections, and the first transfer request can be a DMA transfer request, such as a single DMA transfer request.
  • the host and the memory are fabric connections, and the first transfer request may be an RDMA transfer.
  • a request such as a single RDMA transfer request.
  • the first transfer request uses the address of the first storage space as the access address, and the address of the second storage space as the write address.
  • the access address may be: a first address of the first storage space + a value length
  • the write address may be: a first address of the second storage space (or a first address of the second storage space + value length).
  • the memory sends a DMA/RDMA transfer request to the host. After receiving the transfer request, the host performs a DMA/RDMA transfer and sends the value to the memory.
  • Pre-allocating the second storage space is optional.
  • the memory may also allocate the second storage space without prior, obtain the value directly from the host, and then allocate the second storage space for the value. In this case, the first transfer request does not carry the write address.
  • the storage space (fourth storage space) is also allocated for the metadata. Since the processing of the metadata pointer is similar to the value pointer, only the value pointer is introduced later.
  • a storage space (second storage space) for storing the value is allocated in the memory according to the value length, and the second storage space is not less than the value length.
  • the second storage space may be described by the first address + the last address, or may be described by the first address + value length, or may be described only by the first address.
  • the length of the value can be determined by the value storage space. For example, in the write command, the first storage space is described by the start address + value length of the space in which the value is stored, and the value length can be obtained directly from the write command.
  • the memory is composed of a controller and a storage medium, and the controller includes a processor, such as a flash memory or a magnetic disk.
  • command header The fields of command header, KV format ID, key, and key length are described in step 12, how to obtain them, and how to use them to obtain corresponding information will not be described here.
  • the pointer is used to obtain value and/or metadata.
  • the value pointer carried in the write command directly points to the first storage space, such as the command grid of Figure 9.
  • the pointer DPTR in the write command points to the value storage space (the first storage space).
  • the pointer DPTR points to a range of memory addresses of the host, and the range of memory addresses stores the value.
  • the value sum is obtained from the host memory address in the form of DMA or RDMA.
  • metadata can be obtained through MPTR.
  • the value pointer in the write command indirectly points to value, refer to the write command format of Figure 10: the value pointer points directly to another pointer, and the other pointer points to value.
  • the pointer DPTR in the write command points to the PRP/SGL entry, the PRP/SGL entry points to the host memory address, and the host memory address is the value storage space.
  • the memory first finds the PRP/SGL entry through DPTR, then finds the value storage space in the host memory through the PRP/SGL entry, and obtains the value from the host memory in DMA or RDMA manner.
  • the way to get the metadata is the same as the way the value is obtained. If the metadata is directly carried in the write command, the metadata can be directly obtained from the write command by referring to step 12.
  • the memory sends a first transfer request to the host, and obtains a value from the first storage space of the host.
  • the first transfer request may be a DMA transfer request; if the host and memory are fabric connections, the first transfer request may be an RDMA transfer request.
  • the method may be divided into two steps: (1) the host sends a transmission request to the host, and the transmission request carries the first storage space as an access address, and carries the second storage space as a write address.
  • the first storage space carried in the transmission request may be described by the first address of the first storage space and the length of the value; the second storage space is described by the first address of the second storage space.
  • the host After receiving the transfer request, the host reads data from the first storage space and sends it to the second storage space of the memory for storage.
  • key, value, and metadata can be transmitted through a single DMA/RDMA transmission. You can also transfer key, value, and metadata separately.
  • step 23 the memory stores the value.
  • the controller of the memory may store the key and value in a storage medium of the memory. If you still get metadata in step 22, Store metadata.
  • the value is successively stored in the storage medium of the memory.
  • mapping relationship between the key and value storage spaces can also be recorded.
  • the value storage space in FIG. 6 is described by the first address + value length.
  • the mapping relationship is recorded in the KV snap-in.
  • the value length and the length of the metadata are optional, as both the value and the metadata can be fixed lengths.
  • the value is transmitted as a pointer.
  • the key may also be transmitted by using a pointer (direct pointer or indirect pointer).
  • a pointer direct pointer or indirect pointer.
  • steps 21-23 are the process of writing the command
  • steps 24-26 are the execution process of the read command, which are independent of each other.
  • the KV requested by the read command and the KV written by the write command may not be the same.
  • the command format of the read command and the write command are similar, except that the read command does not carry the value pointer and the metadata pointer.
  • the host generates a read command and sends the command to the memory, where the key carries the key.
  • the key can be carried in the command header of the read command.
  • the host that generates the read command and the host that generates the write command can be the same host or different hosts.
  • the format of the read command refers to the format of the write command in step 11 and FIG. 2, and the read command includes a command header, a KV format ID, and a key.
  • the read command carries a key or a key pointer.
  • a read command can carry a key or carry at least two keys. When carrying at least two keys, the number of KV field may be carried to describe the number of keys carried.
  • the read command requests readout of value and/or metadata.
  • the read command may have no value field and value position information, and the read command may have no metadata field and metadata location information.
  • the memory receives the read command through the NVMe interface and obtains the key from the read command. Use to get The key, from the mapping relationship stored in the memory, finds the storage space of the value and sends it to the host.
  • the storage space of value in the memory is the second storage space.
  • the way to obtain the key from the read command is slightly different.
  • the key can be obtained from a fixed position of the read command; if the position of the key in the read command is not fixed, the key can be obtained from the read command according to the position information carried in the read command.
  • the storage space of the metadata may also be obtained and sent to the host.
  • the host allocates storage space (named as the third storage space) according to the value length.
  • the host constructs a transfer request (second transfer request), sends a transfer request to the memory, and obtains a value from the second storage space of the memory.
  • the second transfer request may be a DMA transfer request, such as a single DMA transfer request; the host and memory are fabric connections, and the second transfer request may be an RDMA transfer request, such as a single RDMA transfer request.
  • the access address may be: the first address of the second storage space + value length
  • the write address may be the first address of the third storage space (or the first address + value length of the third storage space). It can be seen from the foregoing steps that the second storage space is used to store the value, so the value length and the second storage space are the same size. That is to say, the size of the second storage space is equal to the size of the third storage space.
  • the length of the value can be determined by the value storage space.
  • the second storage space is described by the start address + value length of the space in which the value is stored, and the value length can be obtained directly from the write command.
  • the host sends a DMA/RDMA transfer request to the memory. After receiving the transfer request, the memory performs a DMA/RDMA transfer and sends the value to the host.
  • Pre-allocating the third storage space is optional.
  • the host may also allocate the third storage space without prior, obtain the value directly from the storage, and then allocate the third storage space for the value. In this case, pass The input request does not carry a write address.
  • the host can allocate storage space for metadata according to the length of the metadata. And get the metadata from the memory.
  • the host stores the received value into the third storage space. For example, starting from the first address of the third storage space, the value is successively stored in the host.
  • the third storage space is, for example, in the memory of the host.
  • the metadata is stored in the host using a similar method of storing the value in the host.
  • the steps 26 and 27 may be modified without using DMA or RDMA: the host allocates a storage space to the value, and the memory reads the value from the second storage space. The read value is written to the third storage space.
  • the NVMe read command sent by the host that is received by the NVMe memory may further carry the free space information of the host, and the free space information is used to describe the continuous The size of the free storage space.
  • the NVMe memory further includes: the NVMe memory determining whether the idle storage space of the host is greater than or equal to the second storage space, and if yes, performing the second The step of transmitting location information of the storage space to the host, and if not, ending the step.
  • a continuous free storage space in the host may be referred to as the third storage space.
  • step 21 and step 23 are introduced by taking the value pointer and the metadata pointer in the same command as an example.
  • step 31 the host constructs a write command.
  • the write command carries the key.
  • the write command carries a value source address list and a metadata source address list.
  • the source address list has the function of a pointer, and the source and address lists of the value and the metadata point to the storage space of the value and the storage space of the metadata, respectively.
  • step 32 the host sends a write command to the NVMe device.
  • Step 33 After receiving the write command, the NVMe device parses out the source source address list and the source address list of the metadata from the write command.
  • the linked list is, for example, a PRP entry or an SGL entry.
  • Step 34 The NVMe device parses the value length from the value source address list, and parses the metadata length from the source address list of the metadata.
  • the source address list describes the storage space of the data to be read, so the value length and the length of the metadata can be parsed therefrom.
  • the NVMe device sends a DMA or RDMA request to the host.
  • the request carries a value source address list, a metadata source address list, a value destination address list, and a metadata destination address list.
  • the value destination address list describes the storage space reserved by the NVMe device for storing the value.
  • the metadata destination address list describes the storage space reserved by the NVMe device for storing metadata.
  • Step 36 After receiving the DMA/RDMA request, the host sends a key to the NVMe device according to the source address and the destination address of step 35.
  • step 37 the NVMe device stores the value and the metadata in the medium.
  • the NVMe device records the storage space of the value and the storage space of the metadata.
  • the storage space and the key are mapped so that the key storage space and the metadata storage space can be searched later using the key.
  • the mapping relationship can be stored in the KV snap-in, and the KV snap-in can also record value length and metadata length.
  • the key is directly carried in the command.
  • the command may also carry a key pointer without carrying a key.
  • the memory After the memory receives the command, it obtains the key according to the key pointer.
  • the key pointer For the specific steps of obtaining the key, refer to step 22 for obtaining the value. Since the principle is similar, it will not be described here.
  • steps 24-27 is taken by taking the value pointer and the metadata pointer in the same command as an example.
  • step 41 the host constructs a read command, and the read command carries a key.
  • step 42 the host sends the read command carrying the key to the NVMe device.
  • Step 43 the NVMe device searches for the value length corresponding to the key from the KV management unit. Metadata length.
  • the KV management unit records the value storage space and the metadata storage space.
  • the value is stored in the NVMe device, and the value storage space is described by the value first address + the last address, or by the first address + value length. Therefore, after obtaining the value storage space, the value length can be obtained. Similarly, you can get the metadata length.
  • Step 44 The NVMe device sends a response message to the host, where the response message carries a value length and a metadata length.
  • step 45 the host allocates storage space for the value and the metadata according to the value length and the metadata length, and stores the value and the metadta for future storage.
  • Step 46 The host sends a DMA/RDMA request to the NVMe device, where the request carries the source address and the destination address of the current transmission.
  • the source address is the value storage space and the metadata storage space in the NVMe.
  • the destination address is the storage space reserved by the host for the value and the storage space reserved for the metadata.
  • Step 47 After receiving the DMA/RDMA request from the host, the NVMe device sends the value and the metadata according to the DMA/RDMA protocol.
  • step 48 the host receives the value and the metadata.
  • step 49 the host stores the received value and metadata, for example, stored in the host memory.
  • DMA or RDMA is usually used to read data in a continuous address space.
  • the storage space for storing value and the storage space for storing metadata are not continuous, it can be divided into two DMA/RDM transmissions for processing.
  • the present invention further provides an embodiment based on steps 21-23 that improves upon the embodiment based on Figure 12.
  • the improvement is that the key, value, and metadata of the same command are obtained by dividing into three DMA/RDMA transfers.
  • any one of the transmissions fails, it means that the KV transmission is unsuccessful because This successfully transmitted data can be deleted and the error code is uploaded. Cancel subsequent data transfer. And you can free up the storage space previously allocated for KV (key, value, and metadata).
  • step 51 a NVMe write command is constructed.
  • the write command carries the key.
  • the write command carries a value source address list and a metadata source address list.
  • the source address list has the function of a pointer, and the source and address lists of the value and the metadata point to the storage space of the value and the storage space of the metadata, respectively.
  • step 52 the host sends a write command to the NVMe device.
  • Step 53 After receiving the write command, the NVMe device parses the value source address list and the source address list of the metadata from the write command.
  • the linked list is, for example, a PRP entry or an SGL entry.
  • Step 54 The NVMe device parses the key length from the key source address list, parses the value length from the value source address list, and parses the metadata length from the metadata source address list. Reserve storage space for key, value, and metadata.
  • the source address list describes the storage space of the data to be read, so the value length and the length of the metadata can be parsed therefrom.
  • step 55 the NVMe device sends a DMA/RDMA request to the host.
  • the request carries a key source address list and a key destination address list.
  • the key destination address list describes the storage space reserved by the NVMe device for storing keys.
  • Step 56 After receiving the DMA/RDMA request, the host sends a key to the NVMe device according to the source address and the destination address of step 55.
  • step 57 if the key transmission fails, the NVMe device cancels the transmission of the subsequent value and metadata. Optionally, the storage space reserved for key, value, and metadata is released. Report the error code to the host. If the key transfer is successful, then proceed to the next step 58.
  • the NVMe device sends a DMA/RDMA request to the host.
  • the request carries a value source address list and a value destination address list.
  • the value destination address list describes the storage space reserved by the NVMe device for storing the value.
  • Step 59 After receiving the DMA/RDMA request, the host sends the value to the NVMe device according to the source address and the destination address of step 58.
  • Step 60 If the value transmission fails, the NVMe device cancels the transmission of the subsequent metadata. Optionally, free up storage space reserved for value and metadata. Optionally, delete the key that has been transmitted. Report the error code to the host. If the value transfer is successful, proceed to the next step 61.
  • the NVMe device sends a DMA/RDMA request to the host.
  • the request carries a metadata source address list and a key destination address list.
  • the metadata destination address list describes the storage space reserved by the NVMe device for storing keys.
  • Step 62 After receiving the DMA/RDMA request, the host sends the metadata to the NVMe device according to the source address and the destination address of step 61.
  • step 63 if the metadata transmission fails, optionally, the storage space reserved for the metadata is released. Optionally, delete the key and value that have been transmitted. Report the error code to the host. If the value transfer is successful, then proceed to the next step 64.
  • Step 64 storing the value and metadata to the memory.
  • the values and metadata obtained in steps 60 and 63 are in memory.
  • the value and metadata are stored in a non-volatile medium, such as a hard disk of an NVMe device.
  • step 65 the storage space of the value and the storage space of the metadata are recorded.
  • step 55 to step 63 of FIG. 14 the key, value and metadata are respectively obtained through different DMA transmissions. If any one of the data transmission fails, the transmission of the remaining data is cancelled, and the successfully transmitted data can be deleted.
  • the remaining steps of FIG. 14 are similar to those of FIG. 12, and have been previously described, and are not described herein again.
  • value and metadata can be transmitted separately.
  • Value (or metadata) can also be transmitted multiple times, for example by dividing a value into more than two DMA/RDMA transfers. If any transmission fails, it is determined that the entire KV transmission fails, and the storage space reserved for key, value, and metadata is released.
  • the memory At least two DMA transfer requests are sent to the host to obtain the value, and each DMA transfer obtains a portion of the value.
  • the memory sends at least two RDMA transfer requests to the host to obtain the value, each RDMA transfer obtaining a portion of the value.
  • the value (or metadata) is divided into multiple transmission schemes, which can be used in combination with the scheme of separately transmitting the value and metadata described in steps 55-63. In this case, if any transmission fails (or the transmission request fails to execute), it is determined that the entire KV transmission fails, and the storage space reserved for key, value, and metadata is released.
  • the flow of Figure 14 describes the manner in which write commands are handled.
  • the initiator of a DMA/RDMA transfer request is an NVMe device.
  • a similar scheme can also be adopted, and the key, value, and metadata are respectively acquired through different DMA transmissions.
  • the difference is that the originator of the transfer request is the host, and the NVMe device is transmitted by DMA/RDMA for key, value and metadata.
  • the present invention provides a data processing system including a host device 80 and a storage device 90.
  • the host device 80 and the storage device 90 are connected by PCIe or Fabric.
  • the host device 80 can read or write the KV data to the storage device 90.
  • the host device 80 can be a physical device or a logical device, including a transceiver module 801, a management module 802, and a cache module 803.
  • the storage device 90 may be a physical device or a logical device, and includes an interface module 901, a processing module 902, and a storage module 903.
  • the transceiver module 801 and the interface module 901 communicate.
  • the host device 80 has the functions of the aforementioned host, and the storage device 90 has the functions of the aforementioned memory.
  • the storage device 90 includes: an interface module 901, configured to receive an NVMe write command sent by the host, where the NVMe write command carries a key, and the NMVe write command carries a value pointer, the value The pointer points to the first storage space in the host, the first storage space is used to store a value, and the key belongs to the same KV pair as the value; the processing module 902 is configured to obtain the NVMe write command. The key obtains a value length according to the value pointer, and allocates a second storage space to the value according to the value length; the storage module 903 is configured to send a first transmission request to the host, and obtain the location from the host. The value is stored in the second storage space.
  • the second storage space may be provided by a storage medium in the NVMe storage, the storage medium being coupled to a processor of the NVMe storage. Based on the scheme, in the process of transferring KV data from the host to the NVMe memory, it is not necessary to convert the KV data into a block form, thereby improving the storage efficiency of the KV data.
  • the storage device 90 includes: an interface module 901, configured to receive an NVMe write command, the head of the NVMe write command carries a key, the NMVe command further carries a value, the key and the value corresponding to the The key and the value belong to the same KV pair; the processing module 902 is configured to obtain the key and the value from the NVMe write command, and the storage module 903 is configured to save the value in the NVMe memory.
  • an interface module 901 configured to receive an NVMe write command, the head of the NVMe write command carries a key, the NMVe command further carries a value, the key and the value corresponding to the The key and the value belong to the same KV pair
  • the processing module 902 is configured to obtain the key and the value from the NVMe write command
  • the storage module 903 is configured to save the value in the NVMe memory.
  • the host device 80 includes: a transceiver module 801, configured to send an NVMe read command to the storage device 90, the NMVe command carries a key; the transceiver module 801 is further configured to use the NVMe memory Receiving a response message of the NVMe read command, where the response message carries the location information of the second storage space; the management module 802 is configured to reserve a third storage space for the value according to the value length in the value location information; The transceiver module 801 is further configured to send a transmission request to the NVMe memory, the access address of the transmission request is the second storage space, and obtain the value from the NVMe memory; the cache module 803, the value is It is stored in the third storage space of the host device 80. Based on this module structure, the host device can read the storage device to read the value and or metadata.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法以及存储装置,所述数据处理方法包括:NVMe存储器接收主机发送的NVMe写命令,所述NVMe写命令中携带key和value指针,所述value指针指向第一存储空间,所述第一存储空间用于存储value;所述NVMe存储器从所述NVMe写命令中获得所述key以及获得value长度,按照所述value长度为所述value分配第二存储空间,所述第二存储空间在所述NVMe存储器中;所述NVMe存储器从所述主机获得所述value,把所述value保存在所述第二存储空间中。

Description

一种数据处理方法以及NVMe存储器 技术领域
本发明实施例涉及存储领域,特别涉及NVMe领域。
背景技术
NVMe(Non-Volatile Memory Express,非易失性存储快速)协议是一种使用在存储系统中的高速接口协议,NVMe协议比SCSI协议提供更快的读写速度和更低的延迟,产业重视和普及程度越来越高。
随着信息技术的发展,经常使用对象存储(object storage)技术。一种常见的对象存储技术是键值(key value,KV)存储。现有技术中,由于NVMe设备仅仅支持块(block)接口,因此如果主机(Host)要把KV数据存储到NVMe存储器中,其步骤是:主机要把KV命令(一般情况下由Key、Value和Metadata构成)转成块数据(例如把一个KV命名拆分/合并成一个或者至少两个块数据);主机给块数据分配LBA地址;主机把块数据发送给NVMe存储器;NVMe存储器收到块数据后按照分配的LBA地址对块数据逐一进行存储。
然而,在上述步骤中,把KV数据转换成块数据,以及给块数据分配LBA地址会耗费主机大量的运算资源,导致存储系统的性能下降,对主机和存储控制器的运行效率也会造成影响。
发明内容
本发明提供数据处理方法、NVMe存储器以及存储系统的方案。可以提高把KV数据写入NVMe存储器的效率。在部分方案中,相应的,也会提高把KV数据从NVMe存储器读出来效率。
本发明实施例第一方面,提供一种数据处理方法,该方法包括:快速非易失性存储NVMe存储器接收主机发送的NVMe写命令,所述NVMe写命令中 携带key,所述NMVe写命令携带value指针,所述value指针指向所述主机中的第一存储空间,所述第一存储空间用于存储value,所述key与所述value属于同一个KV对;所述NVMe存储器从所述NVMe写命令中获得所述key,根据所述value指针获得value长度,按照所述value长度为所述value分配第二存储空间,所述第二存储空间在所述NVMe存储器中;所述NVMe存储器发送第一传输请求给所述主机,从所述主机获得所述value,把所述value保存在所述第二存储空间中。基于该方案,KV数据从主机传递到NVMe存储器的过程中,不需要把KV数据转换成块的形式,提高了KV数据的存储效率。
在第一方面的第一种可能的实施方式中,所述NVMe存储器发送第一传输请求给主机以及从所述主机获得所述value,具体包括:所述NVMe存储器发送DMA传输请求给所述主机,从所述主机获得所述value,所述DMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址,其中,所述NVMe存储器和所述主机用PCIe总线连接。该方案提供了使用DMA的方式存储value传输方式。
在第一方面的第二种可能的实施方式中,所述NVMe存储器发送第一传输请求给主机以及从所述主机获得所述value,具体包括:所述存储器发送RDMA传输请求给所述主机,从所述主机获得所述value,所述RDMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址,其中,所述NVMe存储器和所述主机用Fabric总线连接。该方案提供了使用RDMA的方式存储value传输方式。
在第一方面的第三种可能的实施方式中,其中:所述NVMe写命令中进一步携带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述NVMe存储器从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。该方案可以支持正在同一个NVMe写命令中携带复数个KV。
在第一方面的第四种可能的实施方式中,其中:所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述NVMe存储器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。该方案可以使得同一个NVMe设备支持多种格式传输KV的NVMe写命令报文,在一次特定的写操作过程中,使用其中一种格式的写命令报文。类似的,在读命令流程中,也可以使得同一个NVMe设备支持多种格式传输KV的NVMe读命令报文。
在第一方面的第五种可能的实施方式中,其中,所述NVMe写命令还携带所述value元数据指针,所述方法还包括:所述NVMe存储器根据所述元数据指针获得元数据长度,按照所述元数据长度为所述元数据分配第四存储空间,所述第四存储空间在所述NVMe存储器中;所述NVMe存储器通过所述第一传输请求从所述主机中获得所述元数据,把所述元数据保存在第四存储空间中。该方案描述了如何对KV中的元数据进行存储的具体方案。
结合第一方面的第五种可能的实施方式,在第六种可能的实施方式中,其中:所述元数据传输失败后,所述NVMe存储器释放为所述value分配的存储空间,以及释放为所述元数据分配的第四存储空间。该方案在元数据传输失败后,可以及时释放被value占用的资源。
结合第一方面的第七种可能的实施方式,其中,所述NVMe存储器发送第一传输请求给所述主机以及从所述主机获得所述value,具体包括下述其中一种:所述存储器发送至少两个DMA传输请求给所述主机以获得所述value,每个DMA传输请求用于请求获得所述value的一部分,当任意一个DMA传输请求执行失败,则NVMe存储器释放为所述value分配的存储空间,其中,所述NVMe存储器和所述主机用PCIe总线连接;或者,所述存储器发送至少两个RDMA传输请求给所述主机以获得所述value,每个RDMA传输请求用于请求获得所述value的一部分,当任意一个RDMA传输请求执行失败,则NVMe存储器释放为所述value分配的存储空间,其中,所述NVMe存储器和所述主 机用Frabic连接。该方案可以把同一个value分成多次RDMA传输请求进行传输,减小每次RDMA传输的数据量。
结合第一方面的第八种可能的实施方式,其中,所述方法还包括:生成所述key与所述第二存储空间的映射关系。该映射关系可以为后续的KV(尤其是value)的读取方案提供基础。
结合第一方面的第九种可能的实施方式,该方法之后,进一步包括:所述NVMe存储器从所述主机接收NVMe读命令,所述NVMe读命令中携带所述key。此外,还可以进一步包括:所述NVMe存储器根据所述key在所述映射关系中查询,获得存储所述value的所述第二存储空间的位置信息,把所述第二存储空间的位置信息发送给所述主机;所述NVMe存储器接收所述主机发送的第二传输请求,所述第二传输请求用于请求获得存储在所述第二存储空间的数据。该第一方面的第八种可能的实施方式可以实现对KV(或者value)的读取。
结合第一方面的第十种可能的实施方式,其中,所述NVMe存储器接收所述主机发送的第二传输请求之前,进一步包括:所述主机根据所述第二存储空间的大小预留所述主机中的第三存储空间。所述NVMe存储器发送所述value给所述主机之后,进一步包括:所述主机把收到的来自所述第二存储空间的数据写入所述第四存储空间空间。该方案描述了主机在读取KV(或者value)的过程中所执行的操作。
结合第一方面的第十一种可能的实施方式,所述NVMe存储器接收的所述主机发送的NVMe读命令中还携带所述主机的空闲空间信息,所述NVMe存储器从所述主机接收所述NVMe读命令之后,进一步包括:所述NVMe存储器判断所述主机的空闲存储空间是否大于等于所述第二存储空间,如果是,执行将所述第二存储空间的位置信息发送给所述主机的步骤,如果否,结束步骤。该方案在NVMe读命令中携带所述主机的空闲空间信息,主机不用在 收到NVMe存储器响应后判断空闲空间是否足够以及预留存储空间,可以减少NVMe存储器和主机之间的交互次数,提高了主机读取value的效率。需要说明的是,这里的空闲空间是一段地址连续的空间。
结合第一方面的第十二种可能的实施方式,其中:所述第一存储空间用第一存储空间的首地址和所述value长度描述;所述第二存储空间用第二存储空间的首地址描述。该方案描述了存储空间的描述方式。依靠该描述方式所描述的内容,可以定位读出value的存储位置和写入value的存储位置。
第二方面,本发明提供一种NVMe存储器的实施方式,包括控制器以及存储介质,所述控制器和所述存储介质连接,所述存储介质用于提供存储空间,所述处理器被配置为执行:接收主机发送的快速非易失性存储NVMe写命令,所述NVMe写命令中携带key,所述NMVe写命令携带value指针,所述value指针指向所述主机中的第一存储空间,所述第一存储空间用于存储value,所述key与所述value属于同一个KV对;从所述NVMe写命令中获得所述key,根据所述value指针获得value长度,按照所述value长度为所述value分配第二存储空间,所述第二存储空间在所述存储介质中;发送第一传输请求给所述主机,从所述主机获得所述value,把所述value保存在所述第二存储空间中。NVMe存储器被配置为为执行地一方面的方法以及第一方面各种实现方式的方法。
本发明第三方面提供一种数据处理方法的实施方法,该方法包括:非易失性存储快速NVMe存储器接收NVMe写命令,所述NVMe写命令的头部携带键key,所述NMVe命令还携带值value,所述key和所述value对应,所述key和所述value属于同一个KV对;所述NVMe存储器从所述NVMe写命令中获得所述key以及所述value;所述NVMe存储器把所述value保存在所述NVMe存储器的存储介质中。
结合第三方面的第一种可能的实施方式,所述NVMe写命令中进一步携 带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述NVMe存储器从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。
结合第三方面的第二种可能的实施方式,其中,所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述NVMe存储器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。
结合第三方面的第三种可能的实施方式,其中,所述NVMe写命令中进一步携带所述key的长度,所述NVMe存储器从所述写命令中获得所述key具体包括:从所述key的预设起始位置,按照所述key的长度从所述写命令中获得所述key。
结合第三方面的第四种可能的实施方式,其中,所述NVMe写命令中进一步携带value偏移量以及所述value的长度,所述NVMe存储器从所述NVMe写命令中获得所述value具体包括:所述NVMe存储器从所述偏移量指示的位置,按照所述value的长度获得所述value。
结合第三方面的第五种可能的实施方式,该方法进一步包括:生成所述key与所述value存储空间的映射关系。
第三种可能的实现方式,是基于本申请第三方面的第五种可能实现方式,该方法之后,进一步包括:主机发送NVMe读命令给所述NVMe存储器,所述NVMe读命令中携带所述key;所述NVMe存储器从所述主机接收所述读命令,从所述NVMe读命令中获得所述key;所述NVMe存储器使用所述key从所述映射关系中查找所述value存储空间;所述NVMe存储器从使用所述value存储空间获得所述value;所述NVMe存储器生成所述NVMe读命令的响应消息发送给所述主机,所述响应消息携带所述value。
第四方面,本发明提供一种NVMe存储器的实施方式,包括控制器以及 存储介质。所述控制器被配置为为执行地三方面的方法以及第三方面各种实现方式的方法。
本发明实施例第五方面,提供一种存储装置,该存储装置可以是物理设备,例如NVMe存储器;也可以是逻辑上设备,例如是运行在NVMe存储器的处理器中的程序,或者存储服务器中的程序。该装置包括:接口模块,用于接收主机发送的NVMe写命令,所述NVMe写命令中携带key,所述NMVe写命令携带value指针,所述value指针指向所述主机中的第一存储空间,所述第一存储空间用于存储value,所述key与所述value属于同一个KV对;处理模块,用于从所述NVMe写命令中获得所述key,根据所述value指针获得value长度,按照所述value长度为所述value分配第二存储空间;存储模块,用于发送第一传输请求给所述主机,从所述主机获得所述value,把所述value保存在所述第二存储空间中。所述第二存储空间可以由在所述NVMe存储器的存储介质提供,所述存储介质和所述NVMe存储器的处理器连接。基于该方案,KV数据从主机传递到NVMe存储器的过程中,不需要把KV数据转换成块的形式,提高了KV数据的存储效率。
在第五方面的第一种可能的实施方式中,所述存储模块发送第一传输请求给主机以及从所述主机获得所述value,具体包括:所述存储模块发送DMA传输请求给所述主机,从所述主机获得所述value,所述DMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址其中。其中,所述存储装置(存储装置是硬件)和所述主机用PCIe总线连接,或者所述存储装置(存储装置是软件)所在的NVMe存储器和所述主机用PCIe总线连接。该方案提供了使用DMA的方式存储value传输方式。
在第五方面的第二种可能的实施方式中,所述存储模块发送第一传输请求给主机以及从所述主机获得所述value,具体包括:所述存储模块发送RDMA传输请求给所述主机,从所述主机获得所述value,所述RDMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址其 中,所述存储装置(存储装置是硬件)和所述主机用Fabric总线连接,或者所述存储装置(存储装置是软件)所在的NVMe存储器和所述主机用Fabric总线连接。该方案提供了使用RDMA的方式存储value传输方式。
在第五方面的第三种可能的实施方式中,其中:所述NVMe写命令中进一步携带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述NVMe存储器从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。该方案可以支持正在同一个NVMe写命令中携带复数个KV。
在第五方面的第四种可能的实施方式中,其中:所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述NVMe存储器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。该方案可以使得同一个NVMe设备支持多种格式传输KV的NVMe写命令报文,在一次特定的写操作过程中,使用其中一种格式的写命令报文。类似的,在读命令流程中,也可以使得同一个NVMe设备支持多种格式传输KV的NVMe读命令报文。
在第五方面的第五种可能的实施方式中,其中,所述NVMe写命令还携带所述value元数据指针,所述方法还包括:所述NVMe存储器根据所述元数据指针获得元数据长度,按照所述元数据长度为所述元数据分配第四存储空间,所述第四存储空间在所述NVMe存储器中;所述NVMe存储器通过所述第一传输请求从所述主机中获得所述元数据,把所述元数据保存在第四存储空间中。该方案描述了如何对KV中的元数据进行存储的具体方案。
结合第五方面的第五种可能的实施方式,在第六种可能的实施方式中,其中:所述元数据传输失败后,所述处理模块还用于释放为所述value分配的存储空间,以及释放为所述元数据分配的第四存储空间。该方案在元数据传输失败后,可以及时释放被value占用的资源。
结合第五方面的第七种可能的实施方式,其中,所述存储模块发送第一传输请求给所述主机以及从所述主机获得所述value,具体包括下述其中一种:所述处理模块发送至少两个DMA传输请求给所述主机以获得所述value,每个DMA传输请求用于请求获得所述value的一部分,当任意一个DMA传输请求执行失败,则所述存储模块释放为所述value分配的存储空间,其中,所述NVMe存储器和所述主机用PCIe总线连接;或者,所述存储模块发送至少两个RDMA传输请求给所述主机以获得所述value,每个RDMA传输请求用于请求获得所述value的一部分,当任意一个RDMA传输请求执行失败,则所述存储模块释放为所述value分配的存储空间,其中,所述NVMe存储器和所述主机用Frabic连接。该方案可以把同一个value分成多次RDMA传输请求进行传输,减小每次RDMA传输的数据量。
结合第五方面的第八种可能的实施方式,其中,所述处理模块还用于:生成所述key与所述第二存储空间的映射关系。该映射关系可以为后续的KV(尤其是value)的读取方案提供基础。
结合第五方面的第九种可能的实施方式,所述接口模块还用于从所述主机接收NVMe读命令,所述NVMe读命令中携带所述key。所述处理模块,还用于根据所述key在所述映射关系中查询,获得存储所述value的所述第二存储空间的位置信息,把处理模块还用于,通过所述接口模块把所述第二存储空间的位置信息发送给所述主机;所述接口模块还用于接收所述主机发送的第二传输请求,所述第二传输请求用于请求获得存储在所述第二存储空间的数据。该第五方面的第八种可能的实施方式可以实现对KV(或者value)的读取。
结合第五方面的第十种可能的实施方式,其中,所述接口模块在接收所述主机发送的第二传输请求之前,所述主机还用于:根据所述第二存储空间的大小预留所述主机中的第三存储空间。所述存储装置发送所述value给所述主机之后,所述主机还用于把收到的来自所述第二存储空间的数据写入所述 第四存储空间空间。该方案描述了主机在读取KV(或者value)的具有的功能。
结合第五方面的第十一种可能的实施方式,所述接口模块接收的所述主机发送的NVMe读命令中还携带所述主机的空闲空间信息,所述NVMe存储器从所述主机接收所述NVMe读命令之后,进一步包括:所述NVMe存储器判断所述主机的空闲存储空间是否大于等于所述第二存储空间,如果是,执行将所述第二存储空间的位置信息发送给所述主机的步骤,如果否,结束步骤。该方案在NVMe读命令中携带所述主机的空闲空间信息,主机不用在收到NVMe存储器响应后判断空闲空间是否足够以及预留存储空间,可以减少NVMe存储器和主机之间的交互次数,提高了主机读取value的效率。需要说明的是,这里的空闲空间是一段地址连续的空间。
结合第五方面的第十二种可能的实施方式,其中:所述第一存储空间用第一存储空间的首地址和所述value长度描述;所述第二存储空间用第二存储空间的首地址描述。该方案描述了存储空间的描述方式。依靠该描述方式所描述的内容,可以定位读出value的存储位置和写入value的存储位置。
第六方面:结合第五方面及第五方面的各种可能的实现方式提供的存储装置,本发明还还提供存储系统的实施方式,存储系统包括主机和存储装置。
本发明第七方面提供一种存储装置,该存储装置可以是物理设备,例如NVMe存储器;也可以是逻辑上设备,例如是运行在NVMe存储器的处理器中的程序,或者存储服务器中的程序。该装置包括:接口模块,用于接收NVMe写命令,所述NVMe写命令的头部携带键key,所述NMVe命令还携带值value,所述key和所述value对应,所述key和所述value属于同一个KV对;处理模块,用于从所述NVMe写命令中获得所述key以及所述value;存储模块,用于把所述value保存在所述NVMe存储器的存储介质中。
结合第七方面的第一种可能的实施方式,所述NVMe写命令中进一步携 带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述处理模块从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。
结合第七方面的第二种可能的实施方式,其中,所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述处理模块按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。
结合第七方面的第三种可能的实施方式,其中,所述NVMe写命令中进一步携带所述key的长度,所述处理模块从所述写命令中获得所述key具体包括:从所述key的预设起始位置,按照所述key的长度从所述写命令中获得所述key。
结合第七方面的第四种可能的实施方式,其中,所述NVMe写命令中进一步携带value偏移量以及所述value的长度,所述处理模块从所述NVMe写命令中获得所述value具体包括:所述处理模块从所述偏移量指示的位置,按照所述value的长度获得所述value。
结合第七方面的第五种可能的实施方式,所述存储模块进一步用于:生成所述key与所述value存储空间的映射关系。
第七种可能的实现方式,是基于第七方面的第五种可能实现方式,提供一种主机的说是方式:主机用于发送NVMe读命令给所述NVMe存储器,所述NVMe读命令中携带所述key;所述接口模块还用于从所述主机接收所述读命令,从所述NVMe读命令中获得所述key;所述处理模块还用于使用所述key从所述映射关系中查找所述value存储空间;所述处理模块还用于从使用所述value存储空间获得所述value;所述处理模块还用于,生成所述NVMe读命令的响应消息,通过所述接口模块把响应消息发送给所述主机,所述响应消息携带所述value。
第八方面:结合第七方面及第七方面的各种可能的实现方式提供的存储装置,本发明还还提供存储系统的实施方式,存储系统包括主机和存储装置。
第九方面:结合第二方面及第二方面的各种可能的实现方式提供的存储器,本发明还还提供存储系统的实施方式,存储系统包括主机和NVMe存储器。
第十方面:结合第四方面及第四方面的各种可能的实现方式提供的存储器,本发明还还提供存储系统的实施方式,存储系统包括主机和存储装置。
基于本发明提供数据处理方法、NVMe存储器以及存储系统的方案。可以提高把KV数据写入NVMe存储器的效率。
附图说明
图1是一种数据处理方法实施例的流程图;
图2是一种NVMe命令格式示意图;
图3是一种NVMe命令格式示意图;
图4是一种NVMe命令格式示意图;
图5是一种NVMe命令格式示意图;
图6是一种key和value存储位置、metadata存储位置映射关系示意图;
图7是一种数据处理系统硬件拓扑实施例拓扑图;
图8是一种数据处理方法实施例的流程图;
图9是一种命令格式示意图;
图10是一种命令格式示意图;
图11是一种命令格式示意图;
图12是一种写数据方法实施例流程图;
图13是一种读数据方法实施例流程图;
图14是一种写数据方法实施例流程图;
图15一种数据处理系统实施例逻辑功能图。
具体实施方式
快速非易失性存储(Non-Volatile Memory Express,NVMe)是一种逻辑设备接口,可以支持使用PCIe总线访问非易失性存储介质。NVMe接口可以用于flash介质的存储器,例如固态硬盘SSD。NVMe接口的设备,称为NVMe设备。NVMe存储器是NVMe设备的一种,指具有存储功能的NVMe设备,下面主要以NVMe存储器(本发明实施例中,把NVME存储器简称存储器)为例进行介绍。
KV存储,也称为K/V存储,是存储技术的一种。在KV存储中,键-值(key-value/key value)对是基本数据模型。Key-value对可以包括键(key)和值(value)。对key-value对进行扩展之后,Key-value对还可以包括元数据(Metadata)。key唯一标记一个value。
如果主机以块的方式把KV数据发送给NVMe存储器进行存储,主机需要执行的操作包括:首先,把KV数据转换为块的形式;然后,给块分配存储器上的逻辑块地址(Logic Block Address,LBA);接着,生成写命令,写命令中携带块以及块的LBA;NVMe存储器收到块后,按照LBA对应的物理地址对块进行存储。如果要读取这些KV数据,需要先读取块数据,然后把块数据转换成KV数据。写入KV数据过程中,分配LBA地址以及把KV数据转换成块数据需要耗费主机大量的时间和系统资源,增加系统延迟。读取KV数据的过程中,主机需要发送待读块的位置信息(例如起始地址+长度)给NVMe存储器,获得到块数据后,把块数据重新转换成KV数据也会耗费主机大量的运算资源,增加系统延迟。
本申请可以适用NVMe场景以及NVMe Over Fabric(NOF)的场景。NVMe场景中,主机和存储器通过总线(例如PCIe总线)连接,在这种场景下,存 储器作为主机的一个组件,例如主机是服务器,存储器是服务器中的NVMe接口固态硬盘(SSD)。NOF场景中,主机和存储器通过fabric(例如以太网、FC等)连接。主机包括内存和处理器以及接口,主机和存储器这二者可以是两个独立的设备。本发明实施例中,主机是读数据和写数据的发起者,因此称作发起者(initiator);存储器是读请求或者写请求的响应者,也称作目标器(target)。
本发明实施例可以扩展现有的NVMe协议,提出了新的NVMe命令。新的NVMe命令对现有的NVMe命令(例如NVMe标准协议1.2.1中所定义的NVMe命令,或者NVMe over Fabric标准协议1.0所定义的NVMe命令)进行了的扩展和优化,使得NVMe设备可以直接支持KV的接口。
由于NVMe设备可以直接支持KV的接口,因此存储器和主机之间可以直接通过NVMe协议传输KV数据。主机可以通过NVMe协议直接把KV数据写入存储器,或者通过NVMe协议直接从存储器中读出KV。因此,存储器和主机之间避免了像现有技术那样在KV格式和Block格式之间的反复转换,降低了系统复杂度,提高了系统性能。在没有特别说明情况下,本发明各实施例中的NVMe命令均是指扩展的NVMe命令。
在一种可选的实现方式中,key直接携带在NVMe命令的命令头中,因此,直接读取NVMe的命令头(不需要解析NVMe命令的净荷)就可以获得key。
对于value,可以携带在NVMe命令中,读取NVMe命令可以获得所述value。此外,NVMe命令也可以不直接携带value,而是携带value的指针,value指针直接或者间接的指向了value的存储空间,通过DMA/RDMA技术可以从value的存储空间获得所述value。
下面对本发明各实施例可能出现的名词进行示例性介绍。
在现有的NVMe写命令中,增加如下参数,或者定义全新的NVMe KV写命令,其中可以包含如下参数。
Number of KV:一个NVMe命令中传输的KV的数量。KV包括key和与key对应的value。或者KV包括key和value指针,value指针指向value存储空间,value和key对应。
Value(值):数据,可以被存储到存储器中,或者从存储器中读取。例如一个电影。
Key(键):唯一标识一个Value,key也可以称为关键码。Key与对应的value的组合可以称为KV或者KV对(pair)。
KV(key value):key和value的组合,也称为KV对(KV pair)。可选的,KV中还包括metadata。
metadata(元数据):用于描述value的属性。例如:如果Value是一部电影,元数据可以包括不限于:电影名称、时长、主演等信息。
Common Header(公共头部):是命令头(Header)的一部分,指命令头中和现有NVMe命令头相同的部分。
KV Format ID(KV格式ID):指示当前NVMe命令的格式。或者说定义NVMe命令中各个字段的内容。传输KV的NVMe命令可以有多种命令格式,通过KV Format ID来区分。不同的命令格式的NVMe报文可以有不同的字段,或者不同的字段排列顺序。
Key Length:描述key的长度。
Value Length:描述value的长度。
Metadata Length:描述Metadata的长度。
Key Offset:描述key在NVMe命令中的偏移位置。
Value Offset:描述value在NVMe命令中的偏移位置。
Metadata Offset:描述元数据在NVMe命令中的偏移位置。
DPTR(Data Pointer):数据指针,指向待传输的数据。
MPTR(Metadata Pointer):元数据指针,指向待传输的元数据。
PRP(Physical Region Page)Entry:物理区域页实例(或者称为物理区域页格式条目),可以记录指针。PRP是NVMe协议中常用的两种数据传输协议,可以用于NVMe over PCIe架构。PRP entry可以指向物理内存页(memory page)的指针。
SGL(Scatter Gather List)Entry:分散搜集表实例(或者称为分散搜集表格式条目),可以记录指针。SGL是NVMe协议中常用的两种数据传输协议之一,可以用于NVMe over PCIe/Fabric架构。
参见图1,本发明提供一种数据处理方法实施例,可以用于NVMe存储器和主机之间。
11,主机生成NVMe写命令(后文简称写命令)发送给存储器,写命令中携带value以及与value对应的key。写命令通过主机的NVMe接口发送给NVMe存储器。key和value可以携带在净荷中,也可以携带在命令头中。
本步骤中,Key可以携带在命令的头(header)中,也可以携带在净荷(payload)中。本发明各实施例中,在没特别声明的情况下,写命令/读命令中携带的字段既可以在命令头中,也可以在净荷中。例如,value可以携带在命令头中,也可以携带在净荷中。
可选的,如果写命令中携带不止一个KV,那么写命令中还可以携带number of KV,用于描述KV的数量。存储器收到写命令后,可以按照number of KV指示的KV数量对KV进行读取,读取完所有KV后停止读取。此外,还可以使用结束符来指示KV的结束,读到结束符后停止读取。
可选的,写命令进一步携带KV Format ID,传输KV的NVMe命令可以有多种命令格式,不同的命令格式中,字段可以不同,字段的位置也可以不同。KV Format ID用于指示本命令所使用的命令格式。
key在写命令的命令头中的位置可以是不固定的,也可以是预先设定的固定位置。如果key在写命令中的位置是不固定的,写命令中可以进一步携带key位置信息。key位置信息可以是key length与key offset的组合,key offset描述key在写命令中的起始位置。key位置信息也可以是key在写命令中的起始位置和key在写命令中结束位置的组合。如果key在写命令中的起始位置预先设定,那么key位置信息可以是key length,key length描述key的长度。
有两种方案可以预先设定关于key的起始位置。一种是预先设定key在命令中的位置,例如从命令的首个bit开始,从第20bits开始是key;另外一种是预先设定key和其他字段的相对位置,例如key字段在key length字段之后,并且key字段与key length字段相邻。类似的,本实施例和其他实施例中涉及到的“预先设定”的位置均可以使用这两种方案,例如本实施例后续将要介绍的value、metadata的起始位置。
key可以是定长或者变长。key是定长,指每个命令中的key长度相同。Key是变长是指不同的命令中,key的长度可以不同。在key为定长的情况下,命令的接收方可以忽略位置信息(例如key length),或者写命令可以不屑道位置信息。存储器收到写命令后,直接按照预设的起始位置,按固定的长度读取key即可。
value在写命令中的位置可以是固定的也可以是不固定的。如果value在写命令中的位置不是固定的,写命令中可以进一步携带value位置信息。value位置信息可以是value length,value在写命令中的起始位置预先设定,因此可以不携带在写命令中。从value的起始位置开始,以value length作为读取长度,就可以可从写命令中读取出value。value位置信息可以是value length与value offset的组合,value offset描述value在写命令中起始位置。value位置信息也可以是value在写命令中的起始位置和value在写命令中结束位置这二者的组合。
value可以是定长或者变长。当value为固定长度的时候,命令的接收方可 以忽略位置信息(例如value length),或者写命令中不携带位置信息。命令的接收方直接按照预设的起始位置,按固定的长度读取value即可。
可选的,写命令中进一步携带value的metadata。metadata可以携带在写命令的命令头中或者净荷中。
metadata在写命令中的位置可以是固定的。如果metadata在写命令中的位置不是固定的,写命令中可以进一步携带metadata位置信息。Metadata位置信息和value位置信息类似,因此可以参见value位置信息的描述,此处不做赘述。同样的,metadata可以是定长或者变长。
参见图2是一种可选的写命令的命令格式,common header是NVMe命令头的一部分。该命令携带2个KV,因此有2个key,分别是key1和key2;有2个value,分别是value1和value2;2个key length,分别是key1 length和key2length;2个value length,分别是value1 length以及value2 length。此外,该命令还携带2个metadata,分别是metadata1和metadata2;以及对应的metadata length1和metadata length2。
参见图3,是另一NVMe命令格式示意图。和图2相比,增加了value位置信息(value1 offset,value2 offset),以及metadata位置信息(metadata1 offset,metadata1 length,metadata2 offset,metadata2 length),还增加了number of KV,在图3示例的命令中携带了2个KV,因此number of KV的值是2。NVMe命令中的位置信息描述的都是字段在NVMe中的位置。图3中以箭头来表示:通过偏移量可以找到待写数据的位置(例如起始位置)。例如,在读出value1 offset字段后,value1 offset字段的值描述了value1在命令中的偏移位置,从而可以读出value1 offset。图3的命令格式中,字段的分布按字段的类型确定,同类型的字段相邻。例如value和metadata都属于数据部分,因此相邻;value位置信息和metadata位置信息都属于位置信息,因此相邻。
参见图4,是另一NVMe命令格式示意图。和图3相比,字段按照KV归属 分布,同一个KV的字段相邻。图4的命令格式中,先是KV1的字段,然后是KV2的字段。此外,在图4的命令格式中,value和value length;metadata和metadata length。
参见图5,是另一NVMe命令格式示意图。图5中,同一个KV的字段相邻。此外,value length和metadata length相邻,value和metadata相邻。
12,存储器的NVMe接口接收写命令,存储器从写命令中获得key和value。如果写命令中携带有metadata,则还从写命令中获得metadata。
存储器由控制器和存储介质组成,控制器中包括处理器,可选的,还可以包括内存。存储介质例如是闪存或者磁盘。存储器也可以是具有管理能力的硬盘,称为智能硬盘。
如果多种类型的NVMe命令格式(例如图2和图3分别是不同类型的NVMe命令格式)被同一个存储器支持,则存储器通过KV Format ID可以确定收到的命令是哪一种格式。然后按照这个格式中key、value等字段的位置读出key和value等内容。不同的NVMe命令格式,数据的位置关系可以不同。例如,有的格式中,key的位置是固定的;有的格式中,key的位置不是固定的,由key位置信息确定。
如果写命令中携带一个KV,则读出这一个KV。如果写命令中携带至少两个KV,则读出这至少两个KV。存储器可以通过number of KV字段获知KV的数量,在读出相应数量的KV后完成读操作。除了使用number of KV外来标记什么情况下完成读操作外,也可以在命令中增加结束符来,在读到结束符后表示整个命令中的KV已经读取完毕。
如果key的位置是固定的,则存储器通过固定位置读出key。如果key的位置不固定,则按照key位置信息从写命令中读出key。例如,key的位置信息是key length,key在写命令中的起始位置预先设定,从key的起始位置开始,以key length作为读取长度连续的读出数据,即可获得写命令中携带的key。例 如,key的位置信息是key length与key offset的组合,则以key offset作为key的起始位置,以key length以key length作为读取长度连续的读出数据,即可获得写命令中携带的key。例如,key的位置信息是起始位置和结束位置共同组成,则读取起始位置和结束位置之间的数据,即可获得key。
关于读取value和metadata的读取方式,均可以参考key的读取方式,此处不再赘述。在读出key/value/metadata时,存储器可以获得key length/value length/metadata length。
以图2的命令格式为例,command header、KV format ID、key length以及value length这几个字段的长度是固定的,各个字段的相对位置也是固定的,因此可以不需要key offset、value offset等字段来确定key、value的起始位置。存储器读取写命令各个字段的顺序是:读取command header,读取KV format ID,读取key 1 length,按照key 1 length所记录的数值读取key1,读取value1length,按照value1 length读取value1,读取key2 length,按照key2 length读取key2,读取value2 length,按照value2 length读取value2。。
13,存储器对value进行存储。具体而言,是存储器的控制器把key和value存储到非易失性存储介质中。存储器的接口收到value后,先发送到处理器。本步骤中,处理器把value暂存在内存中,然后从内存下发到非易失性存储介质。
此外,存储器还可以记录key和value存储空间的映射关系。映射关系的可以由存储器的存储介质进行存储;也可以是发送给主机,由主机进行存储。后文以由存储器进行存储进行举例。
如果写命令中还携带metadata,则也对metadata进行存储,并记录key和metadata存储空间的映射关系。
value存储空间可以用value存储空间的起始地址和value长度描述。metadata存储空间可以包括metadata存储空间的起始地址和metadata data长度 描述。参见图6是映射关系的示意图,包括key和value存储空间的起始地址的映射,key和value长度的映射,key和metadata存储空间的起始地址的映射,key和metadata长度的映射。可以把key作为索引在映射关系中查找对应的value存储空间的起始地址、value长度、metadata存储空间的起始地址以及metadata长度。图6的示意图所描述的内容记录在KV管理单元中。如果value和metadata定长,value长度和metadata长度是可选的。
存储空间可以是逻辑位置或者物理位置。只要存储器的控制器使用存储空间可以从存储介质中读取value和metadata即可。
上面步骤11-步骤13是写KV的过程,接下来的步骤14-步骤16是读KV的过程。这两个过程是相互独立的,读命令所请求的KV和写命令所写入的KV可以不是同一个。
14,主机生成NVMe读命令(后文简称读命令),发送读命令给存储器,读命令中携带所述key。key可以携带在读命令的命令头中。
生成读命令的主机和生成写命令的主机可以是同一个主机,也可以是不同主机。读命令的命令格式参考步骤11中写命令的格式以及图2。读命令包括command header,KV format ID和key。可选的,读命令中携带key位置信息。一个读命令中可以携带一个key也可以携带至少两个key。当携带至少两个key时,可以携带number of KV字段,以描述携带的key的数量。
读命令中可以没有value字段以及其他和value相关的字段。此外,读命令中可以没有metadata字段以及其他和metadata相关的字段。
15,存储器通过NVMe接口,接收读命令,从读命令中获得key。从存储器存储的KV管理单元中,查找key对应的value存储空间。从value存储空间中获得value并构造读命令的响应消息发送给主机,响应消息中携带value。
参考步骤12,按照命令格式的不同,获得key的方式也略有不同。例如,可以从读命令的固定位置获得key,也可以按照读命令中携带的位置信息从 写命令中获得key。
类似的,如果存储器中存储有与key对应的metadata,可以按照类似的方法查询metadata的存储空间并获得metadata。
需说明的是,如果KV管理单元存储在主机中,则使用key在主机中查询value存储空间,把value存储空间方式发送给存储器,存储器依靠value存储空间获得value发送给主机。
16,主机接收value并存储。例如可以存储在主机的内存(例如缓存)中,或者存储在主机的非易失性存储介质中。
上述步骤11-16中,value和metadata都携带在NVMe命令中,不用进行KV和block之间的转换,具有简单快捷的优点。读命令不用携带value和/或metadata的LBA,因此读value和/或metadata的过程也更为快捷。
参见图7,是本发明的数据处理系统硬件拓扑实施例,主机71和存储器72通信。主机71包括处理器711、内存712以及接口713。存储器72包括接口721、控制器722以及存储介质723。接口713和接口721通过通信链路73连接,通信链路73例如是PCIe总线、光纤通道FC或者以太网。
主机71所执行的操作可以由处理器711执行,例如处理器711通过运行内存712中的程序,可以执行步骤11、14和16,内存712和处理器711相对独立,也可以集成在一起。存储器72包括控制区721和存储介质722。存储器72所执行的操作由存储器的控制器721执行。具体而言,可以由处理器7211运行内存7212中的程序来执行存储器的操作,例如存储器的处理器用于执行步骤12、13和15。在一些情况下,例如当处理器是FPGA的时候,可以没有内存,直接由处理器执行相应的操作。
类似的,后续的方法中使用了相同的硬件,不同在于,主机和存储器用于执行的操作不同。例如,步骤21、24和27由主机71执行,23和25由存储器72执行。步骤22和26中,一部分操作由主机71执行,另外一部分操作由存 储器72执行。
参见图8,本发明还提供一种实施方式,写命令中不直接携带value或者metadata,而是在NVMe命令中携带指针,通过指针指向的存储空间可以获得待写value和/或metadata,或者采用多级指针的方式,一个指针指向另外一个指针,从另外一个指针所指向的存储空间获得待写value和/或metadata。这种实施方式对于待写的value和/或待写的metadata的大小没有限制。
主机和存储器可以通过IP、FC等网络连接,可以运行在Fabric架构,也称为NOF(NVMe Over Fabric)架构。在NOF架构下,存储器可以通过远程直接数据存取(Remote Direct Memory Access,RDMA)的方式获得value和/或metadata。如果存储器在主机内部,存储器还可以通过直接数据存取(Direct Memory Access,DMA)的方式从主机获得value和/或metadata。类似的,主机也可以通过RDMA/DMA的方式从存储器直接获得value和/或metadata。
步骤21,主机构造写命令,写命令中携带KV,KV包括value指针以及value的key。写命令通过主机的NVMe接口发送给NVMe存储器。所述key和所述value属于同一个KV对。
value指针可以携带在净荷中,也可以携带在命令头中。类似的,写命令或者读命令中携带的其他数据既可以携带在命令头中,也可以携带在净荷中。
在其他实施例中,写命令也可以携带key指针而不携带key。根据key指针获得key的方案,和通过value指针获得value的原理相同,因此下面不做赘述。仅以写命令携带key为例进行介绍。
value指针直接或者间接的指向了value的存储空间(为了方便描述,后文也称作第一存储空间),因此通过value指针可以获得value。value指针指向value存储空间(这种情况也被视为value指针指向了value)的情况下,可以从value指针指向的存储空间获得value。另外一种情况是,value指针指向第一指针,而第一指针指向value的存储空间,从value指针找到第一指针,从第 一指针指向的存储空间可以获得value。后一种情况中,可以携带尺寸更大的value。在其他实施例中,指针之间的引用还可以有更多的层级,只要最终能找到value的存储空间即可,例如value指针指向第一指针,第一指针指向第A1指针,第A1指针指向第A2指针,……,第AN-1指针指向第AN指针,第AN指针指向value的存储空间,N是大于等于2的整数。value指针可以携带在写命令的命令头或者净荷中。
所述第一存储空间可以用第一存储空间的首地址和所述value长度描述,也可以用第一存储空间的首地址和末地址描述。例如:value指针记录第一存储空间的首地址和value长度;或者,value指针指向了另一个指针,所述另一个指针记录第一存储空间的首地址和value长度。第一存储空间可以是逻辑位置或者物理位置。只要存储器使用第一存储空间可以从存储介质中读取存储的数据即可。
可选的,如果写命令中携带不止一个KV,那么写命令中还可以携带Number of KV。用于描述KV的数量。
可选的,写命令进一步携带KV Format ID,传输KV的NVMe命令可以有多种命令格式,不同的命令格式中,字段可以不同,字段的位置也可以不同。KV Format ID指示本命令所使用的命令格式。
key在写命令的命令头中的位置可以是固定的,也可以是预先设定的固定位置。key的特征可以参见步骤11对key的描述,此处不做赘述。
可选的,写命令中进一步携带metadata指针。和value指针类似,metadata指针直接或者间接指向metadata存储空间,因此通过metadata指针可以获得metadata。metadata指针可以携带在写命令的命令头或者净荷中。
metadata指针直接或者间接的指向了metadata的存储空间,通过metadata指针可以获得metadata。metadata指针直接指向metadata(具体而言是metadata的存储空间)的情况下,可以从metadata指针指向的存储空间获得metadata; 另外一种情况是,metadata指针指向第二指针,而第二指针指向metadata的存储空间,从metadata指针找到第二指针,从第二指针指向的存储空间可以获得metadata。后一种情况中,可以携带更长的metadata。在其他实施例中,指针之间的引用还可以有更多的层级,只要最终能找到metadata的存储空间即可,例如metadata指针指向第二指针,第二指针指向第B1指针,第B1指针指向第B2指针,……,第BN-1指针指向第BN指针,第BN指针指向metadata的存储空间,N是大于等于2的整数。
此外,metadata可以不通过指针的方式传输,而是直接携带在命令中,具体可以参考步骤11以及图2、图3中metadata的携带方式。
value指针可以是数据指针(Data Pointer,DPTR)。metadata指针可以是元数据指针(Metadata Pointer,PMTR)。
参见图9,是一个命令格式图。图9的命令中携带2个KV,KV中key直接携带在写命令中,value是以指针的方式携带在写命令中。分别是key1和DPTR1(指向value1),以及key2和DPTR2(指向value2)。此外,还通过指针的方式携带了这2个KV的metadata,其中MPTR1指向metadata1,MPTR2指向metadata2。本发明实施例中,每个命令可以携带1个KV,也可以携带2组以上的KV。
参见图10,是另外一个命令格式图。命令中的指针(DPTR,MPTR)没有直接指向value或者metadata,而是指向PRP entry或者SGL entry,PRP entry或者SGL entry是链表中的节点,每个PRP/SGL entry指向了另外一个地址,PRP/SGL entry指向的地址中存储了value或者metadata。也就说,写命令通过二级指针的方式指向了value或者metadata。在图10的方案中,key携带在命令中,具体而言是写的在命令头中。
参见图11的命令格式,和图10不同之处在于,图10的key没有直接携带在命令中,而是通过指针的方式传输,具体如何携带可以参考value指针的介 绍。key指针可以携带在命令的命令头中。
具体而言,DPTR指向PRP/SGL entry1,而PRP/SGL entry2和PRP/SGL entry1属于同一个链表,因此找到PRP/SGL entry1即可以找到PRP/SGL entry2。因此,也可以认为DTPR指向了PRP/SGL entry1以及PRP/SGL entry2。PRP/SGL entry1指向了value1,PRP/SGL entry2指向了value2。类似的,MPTR指向了PRP/SGL entry3以及PRP/SGL entry4,PRP/SGL entry3指向了metadata1,PRP/SGL entry4指向了metadata2。依靠DPTR可以找到value1的存储空间和value2的存储空间;依靠MPTR,可以找到metadta1的存储空间和metadata2的存储空间。PRP entry和SGL entry均是链表中的节点,PRP适用于普通NVMe(用PCIe连接主机和存储器),SGL适用于NOF(用Fabic连接主机和存储器)。RDMA可以使用PRP,DMA可以用PRP或者SGL。
PRP通过一系列的指向memory page指针,将数据所在的位置传输给NVMe设备,NVMe设备收到这些地址后,通过DMA的方式即可将数据从主机读取到NVMe设备上。SGL传输机制相对于PRP更加灵活一些,可以指定传输的长度,能够在地址连续的传输过程中跳过一些地址空间进行传输。SGL将需要传输数据的地址传输给NVMe设备,NVMe设备收到这些地址之后,通过DMA的方式将数据从主机读取到NVMe设备上。
此外,相较于图9,图10中还增加了number of KV字段以及key length字段,在步骤11、步骤12中已有介绍,不再赘述。
步骤22,存储器通过NVMe接口接收写命令。存储器通过value指针获得value的长度,按照value长度为value分配存储空间(为了方便描述,后文称作第二存储空间),把分配的存储空间的位置信息发送给主机。存储器发送传输请求(第一传输请求)给主机,从主机的第一存储空间获得value。主机和存储器是PCIe连接,则第一传输请求可以是DMA传输请求,例如单个DMA传输请求。主机和存储器是Fabric连接,则第一传输请求可以是RDMA传输 请求,例如单个RDMA传输请求。第一传输请求把第一存储空间的地址作为访问地址,第二存储空间的地址作为写入地址。第一传输请求中,访问地址可以是:第一存储空间的首地址+value长度,写入地址可以是:第二存储空间的首地址(或者是第二存储空间的首地址+value长度)。
存储器发送DMA/RDMA传输请求给主机,主机收到传输请求后执行DMA/RDMA传输,把value发送给存储器。
预先分配第二存储空间是可选的,存储器也可以不提前分配第二存储空间,直接从主机获得value,然后再为value分配第二存储空间。这种情况下,第一传输请求不携带写入地址。
如果写命令中携带有metadata指针,则还为metadata分配存储空间(第四存储空间)。由于metadata指针的处理方式和value指针相似,因此后续仅对value指针进行介绍。
在按照value指针获得value之前,按照value长度在存储器中中分配用于存储value的存储空间(第二存储空间),第二存储空间不小于value长度。第二存储空间的可以用首地址+末地址描述,也可以用首地址+value长度描述,也可以仅用首地址描述。value长度可以由value存储空间确定。例如写命令中,第一存储空间由存储value的空间的起始地址+value长度描述,则可以直接从写命令中获得value长度。
存储器由控制器和存储介质组成,控制器中包括处理器,存储介质例如是闪存或者磁盘。
command header、KV format ID、key以及key length这几个字段在步骤12中已有介绍,如何获得它们,以及如何利用他们获得相应信息此处不再赘述。
和步骤11-16所介绍的实施例不同的是,本实施例中不是直接从命令中获得value和/或metadata,而是依靠指针来获得value和/或metadata。
写命令中携带的value指针直接指向了第一存储空间,例如图9的命令格 式,写命令中的指针DPTR指向了value存储空间(第一存储空间)。具体而言,指针DPTR指向了主机的一段内存地址范围,这一段内存地址范围存储了所述value。则按照DPTR描述的存储空间,以DMA或者RDMA的方式从主机内存地址获得value和。按照同样的方式,可以通过MPTR获得metadata。
如果写命令中的value指针间接指向value,可以参考图10的写命令格式:value指针直接指向了另外一个指针,另外一个指针指向value。写命令中的指针DPTR指向了PRP/SGL entry,PRP/SGL entry指向了主机内存地址,主机内存地址是value的存储空间。存储器先通过DPTR找到PRP/SGL entry,然后通过PRP/SGL entry找到位于主机内存的value存储空间,以DMA或者RDMA的方式从主机内存中获得所述value。
如果metadata指针携带在写命令中,则获得metadata的方式和获得value的方式相同。如果metadata直接携带在写命令中,则可以参照步骤12从写命令中直接获得metadata。
存储器发送第一传输请求给主机,从主机的第一存储空间获得value。如果主机和存储器是PCIe连接,第一传输请求可以是DMA传输请求;如果主机和存储器是Fabric连接,第一传输请求可以是RDMA传输请求。具体而言,可以分为两个步骤:(1)主机发送传输请求给主机,传输请求中携带第一存储空间作为访问地址,携带所述第二存储空间作为写入地址。具体而言,传输请求中携带的第一存储空间,可以由第一存储空间的首地址和所述value长度描述;第二存储空间用第二存储空间的首地址描述。(2)主机收到传输请求后,从第一存储空间读出数据,发送给存储器的第二存储空间进行存储。
本步骤中,可以传输key、value以及metadata通过单个DMA/RDMA传输。也可以把key、value和metadata分别进行传输。
步骤23,存储器对value进行存储。具体而言,可以是存储器的控制器把key和value存储到存储器的存储介质中。如果步骤22中还获得有metadata,也 对metadata进行存储。
例如从第二存储空间的首地址开始,把value连续的存入存储器的存储介质中。
此外,还可以记录key和value存储空间的映射关系,参见图6,图6中value存储空间由首地址+value长度描述。映射关系记录在KV管理单元中。value长度和metadata长度是可选的,因为这value和metadata都可以是固定长度。
上述步骤21-23中,value以指针的方式进行传输。在其他实施例中,key也可以用指针(直接指针或者间接指针)的方式传输,具体可以参照value的传输方式,此处不赘述。
上述步骤21-23是写命令的过程,步骤24-26是读命令的执行过程,这两个过程是相互独立的。读命令所请求的KV和写命令所写入的KV可以不是同一个。读命令和写命令的命令格式相似,区别在于读命令不携带value指针和metadata指针。
24,主机生成读命令发送给存储器,读命令中携带所述key。key可以携带在读命令的命令头中。
生成读命令的主机和生成写命令的主机可以是同一个主机,也可以是不同主机。读命令的格式参考步骤11中写命令的格式以及图2,读命令包括command header,KV format ID和key。可选的,读命令携带key或者key指针。一个读命令中可以携带一个key也可以携带至少两个key。当携带至少两个key时,可以携带number of KV字段,以描述携带的key的数量。
读命令请求读出value和/或metadata。
和写命令相比,读命令中可以没有value字段以及value位置信息,读命令中可以没有metadata字段以及metadata位置信息。
25,存储器通过NVMe接口接收读命令,从读命令中获得key。使用获得 的key,从存储器存储的映射关系中,查找value的存储空间并发送给主机。value在存储器中的存储空间是第二存储空间。
如果key直接携带在读命令中,参考步骤12,按照命令格式的不同,从读命令中获得key的方式也略有不同。例如,可以从读命令的固定位置获得key;如果key在读命令中的位置不固定,可以按照读命令中携带的位置信息从读命令中获得key。
可选的,如果存储器中存储有与key对应的metadata,还可以获得metadata的存储空间并发送给主机。
26,主机按照value长度给value分配存储空间(命名为第三存储空间)。主机构造传输请求(第二传输请求),发送传输请求给存储器,从存储器第二存储空间中获得value。
主机和存储器是PCIe连接,则第二传输请求可以是DMA传输请求,例如单个DMA传输请求;主机和存储器是Fabric连接,则第二传输请求可以是RDMA传输请求,例如单个RDMA传输请求。传输请求中,访问地址可以是:第二存储空间的首地址+value长度,写入地址可以是第三存储空间的首地址(或者是第三存储空间的首地址+value长度)。由前述步骤可知第二存储空间用于存储所述value,因此所述value长度和第二存储空间大小相同。也就是说第二存储空间的大小等于第三存储空间的大小。
value长度可以由value存储空间确定。例如在读命令中中,第二存储空间由存储value的空间的起始地址+value长度描述,则可以直接从写命令中获得value长度。
主机发送DMA/RDMA传输请求给存储器,存储器收到传输请求后执行DMA/RDMA传输,把value发送给主机。
预先分配第三存储空间是可选的,主机也可以不提前分配第三存储空间,直接从存储器获得value,然后再为value分配第三存储空间。这种情况下,传 输请求不携带写入地址。
类似的,主机可以按照metadata长度给metadata分配存储空间。并从存储器获取metadata。
27,主机把收到的value存入第三存储空间。例如从第三存储空间的首地址开始,把value连续的存入主机。第三存储空间例如在主机的内存中。
采用把value存入主机的的相类似的方法,把所述metadata存入主机。
需要说明的是,在其他实施例中,可以不使用DMA或者RDMA,把步骤26、27修改为:主机给所述value分配存储空间,存储器从所述第二存储空间中读出所述value,把读出的value写入所述第三存储空间。
此外,步骤26的另外一种可选方案是:所述NVMe存储器接收的所述主机发送的NVMe读命令中还可以进一步携带所述主机的空闲空间信息,空闲空间信息用于描述主机中连续的空闲存储空间的大小。所述NVMe存储器从所述主机接收所述NVMe读命令之后,进一步包括:所述NVMe存储器判断所述主机的空闲存储空间是否大于等于所述第二存储空间,如果是,执行将所述第二存储空间的位置信息发送给所述主机的步骤,如果否,结束步骤。这种方案中,可以把主机中连续的空闲存储空间称为所述第三存储空间。
参见图12,以同一个命令中携带value指针和metadata指针为例,对步骤21-步骤23进行介绍。
步骤31,主机构造写命令。写命令中携带key。此外,写命令中携带value源地址链表和metadata源地址链表。源地址链表拥有指针的功能,value和metadata的源地址链表分别指向value的存储空间和metadata的存储空间。
步骤32,主机发送写命令给NVMe设备。
步骤33,NVMe设备收到写命令后,从写命令中解析出value源地址链表和metadata的源地址链表。链表例如是PRP entry或者SGL entry。
步骤34,NVMe设备从value源地址链表中解析出value长度,从metadata的源地址链表中解析出metadata长度。源地址链表描述了待读取数据的存储空间,因此可以从中解析出value长度和metadata长度。
步骤35,NVMe设备向主机发送DMA或者RDMA请求。请求中携带value源地址链表、metadata源地址链表、value目的地址链表以及metadata目的地址链表。value目的地址链表描述NVMe设备预留的、用于存储value的存储空间,metadata目的地址链表描述NVMe设备预留的、用于存储metadata的存储空间。
步骤36,主机接收到DMA/RDMA请求后,按照步骤35的源地址和目的地址,向NVMe设备发送key。
步骤37,NVMe设备把value和metadata存储在介质中。
步骤38,NVMe设备记录value的存储空间和metadata的存储空间。存储空间和key建立映射,以便后续使用key查找value存储空间和metadata存储空间。映射关系可以存储在KV管理单元中,KV管理单元还可以记录value length和metadata length。
上述步骤中,key直接携带在命令中。在其他实施方式中,命令也可以不携带key而是携带key指针。存储器收到命令后,按照key指针获得key。获得key的具体步骤可以参考步骤22获得value的过程,由于原理类似,此处不再赘述。
参见图13,以同一个命令中携带value指针和metadata指针为例,对步骤24-27的具体描述。
步骤41,主机构造读命令,读命令中携带有key。
步骤42,主机把携带有key的读命令中给NVMe设备。
步骤43,NVMe设备从KV管理单元查找所述key对应的value length和 metadata length。其中,KV管理单元记录有value存储空间和metadata存储空间。
以value存储空间为例,value存储在NVMe设备中,value存储空间由value首地址+末地址描述,或者由首地址+value length。因此,获得value存储空间后,可以获得value length。类似的,可以获得metadata length。
步骤44,NVMe设备发送响应消息给主机,响应消息中携带value length和metadata length。
步骤45,主机按照value length和metadata length,给value和metadata分配存储空间,作为将来存储value和metadta之用。
步骤46,主机向NVMe设备发送DMA/RDMA请求,请求中携带本次传输的源地址和目的地址。源地址是NVMe中的value存储空间和metadata存储空间,目的地址是主机为value预留的存储空间,以及为metadata预留的存储空间。
步骤47,收到主机的DMA/RDMA请求后,NVMe设备按照DMA/RDMA协议发送value以及metadata。
步骤48,主机接收value和metadata。
步骤49,主机存储收到的value和metadata,例如存储在主机内存中。
DMA或者RDMA通常用于读取一段连续的地址空间的数据,当存储value的存储空间和存储metadata的存储空间不连续的时候,可以分作两次DMA/RDM传输进行处理。
参见图14,本发明另外提供一个基于步骤21-23的实施例,对基于图12的实施例进行改进。改进之处在于:同一个命令的key、value和metadata分成三次DMA/RDMA传输获得。
可选的,如果其中任意一个传输失败,意味着这个KV传输不成功,因 此已传输成功的数据可以删除,上传错误码。取消后续数据传输。并且可以释放之前为KV(key、value和metadata)分配的存储空间。
步骤51,构造NVMe写命令。
写命令中携带key。此外,写命令中携带value源地址链表和metadata源地址链表。源地址链表拥有指针的功能,value和metadata的源地址链表分别指向value的存储空间和metadata的存储空间。
步骤52,主机发送写命令给NVMe设备。
步骤53,NVMe设备收到写命令后,从写命令中解析出value源地址链表和metadata的源地址链表。链表例如是PRP entry或者SGL entry。
步骤54,NVMe设备从key源地址链表中解析出key长度,从value源地址链表中解析出value长度,从metadata的源地址链表中解析出metadata长度。为key、value、metadata预留存储空间。源地址链表描述了待读取数据的存储空间,因此可以从中解析出value长度和metadata长度。
步骤55,NVMe设备向主机发送DMA/RDMA请求。请求中携带key源地址链表和key目的地址链表。key目的地址链表描述NVMe设备预留的、用于存储key的存储空间。
步骤56,主机接收到DMA/RDMA请求后,按照步骤55的源地址和目的地址,向NVMe设备发送key。
步骤57,如果key传输失败,则NVMe设备取消后续value、metadata的传输。可选的,释放为key、value和metadata预留的存储空间。上报错误码给主机。如果key传输成功,则继续后续步骤58。
步骤58,NVMe设备向主机发送DMA/RDMA请求。请求中携带value源地址链表和value目的地址链表。value目的地址链表描述NVMe设备预留的、用于存储value的存储空间。
步骤59,主机接收到DMA/RDMA请求后,按照步骤58的源地址和目的地址,向NVMe设备发送value。
步骤60,如果value传输失败,则NVMe设备取消后续metadata的传输。可选的,释放为value和metadata预留的存储空间。可选的,删除已经传输的key。上报错误码给主机。如果value传输成功,则继续后续步骤61。
步骤61,NVMe设备向主机发送DMA/RDMA请求。请求中携带metadata源地址链表和key目的地址链表。metadata目的地址链表描述NVMe设备预留的、用于存储key的存储空间。
步骤62,主机接收到DMA/RDMA请求后,按照步骤61的源地址和目的地址,向NVMe设备发送metadata。
步骤63,如果metadata传输失败,可选的,释放为metadata预留的存储空间。可选的,删除已经传输的key和value。上报错误码给主机。如果value传输成功,则继续后续步骤64。
步骤64,存储value和metadata到存储器。步骤60和63所获得的value和metadata在内存中,本步骤把value和metadata存储到非易失性介质中,例如NVMe设备的硬盘。
步骤65,记录value的存储空间和metadata的存储空间。
从图14的步骤55-步骤63可以看出,key、value和metadata分别通过不同的DMA传输获得,如果其中任意一项数据传输失败,就取消余下数据的传输,已传输成功的数据可以删除。图14的其余步骤和图12相似,前文已有介绍,此处不再赘述。
类似的,除了value和metadata可以分开传输。value(或者metadata)也可以分开多次传输,例如把一个value分成两次以上DMA/RDMA传输。任意一次传输失败,则判定整个KV传输失败,释放为key、value和metadata预留的存储空间。当所述NVMe存储器和所述主机用PCIe总线连接,所述存储器 发送至少两个DMA传输请求给所述主机以获得所述value,每个DMA传输获得所述value的一部分。当所述NVMe存储器和所述主机用Frabic连接,所述存储器发送至少两个RDMA传输请求给所述主机以获得所述value,每次RDMA传输获得所述value的一部分。
value(或metadata)分成多次传输的方案,可以和步骤55-步骤63中所描述的value和metadata分开传输的方案合并使用。这种情况下,任意一次传输失败(或者说,传输请求执行失败),则判定整个KV传输失败,释放为key、value和metadata预留的存储空间。
图14的流程描述的是对写命令的处理方式,DMA/RDMA传输请求的发起者是NVMe设备。类似的,在读命令的处理流程中,也可以采用类似的方案,key、value和metadata分别通过不同的DMA传输获取。不同之处在于,传输请求的发起者是主机,而通过DMA/RDMA传输key、value和metadata的是NVMe设备。
参见图15,本发明提供一种数据处理系统,包括主机装置80和存储装置90,主机装置80和存储装置90通过PCIe或者Fabric连接。主机装置80可以把KV数据读出或者写入存储装置90。
该主机装置80可以是物理上设备或者逻辑装置,包括收发模块801,管理模块802和缓存模块803。存储装置90可以是物理上设备或者逻辑装置,包括接口模块901、处理模块902和存储模块903。收发模块801和接口模块901通信。主机装置80拥有前述主机的功能,存储装置90拥有前述存储器的功能。
下面对主机装置80和存储装置90的功能进行简单介绍。需要说明的是,由于这两个装置(以及相应模块)的功能在方法流程中已有详细说明,因此这里仅做简单介绍。
该存储装置90包括:接口模块901,用于接收主机发送的NVMe写命令,所述NVMe写命令中携带key,所述NMVe写命令携带value指针,所述value 指针指向所述主机中的第一存储空间,所述第一存储空间用于存储value,所述key与所述value属于同一个KV对;处理模块902,用于从所述NVMe写命令中获得所述key,根据所述value指针获得value长度,按照所述value长度为所述value分配第二存储空间;存储模块903,用于发送第一传输请求给所述主机,从所述主机获得所述value,把所述value保存在所述第二存储空间中。所述第二存储空间可以由在所述NVMe存储器的存储介质提供,所述存储介质和所述NVMe存储器的处理器连接。基于该方案,KV数据从主机传递到NVMe存储器的过程中,不需要把KV数据转换成块的形式,提高了KV数据的存储效率。
或者该存储装置90包括:接口模块901,用于接收NVMe写命令,所述NVMe写命令的头部携带键key,所述NMVe命令还携带值value,所述key和所述value对应,所述key和所述value属于同一个KV对;处理模块902,用于从所述NVMe写命令中获得所述key以及所述value;存储模块903,用于把所述value保存在所述NVMe存储器的存储介质中。
在一种实施方式中,该主机装置80的包括:收发模块801,用于发送NVMe读命令给存储装置90,所述NMVe命令携带key;所述收发模块801,还用于从所述NVMe存储器接收所述NVMe读命令的响应消息,所述响应消息携带第二存储空间的位置信息;管理模块802,用于按照所述value位置信息中的value长度给value预留第三存储空间;所述收发模块801,还用于发送传输请求给所述NVMe存储器,所述传输请求的访问地址是所述第二存储空间,以及从所述NVMe存储器获得所述value;缓存模块803,把所述value保存在所述主机装置80的第三存储空间中。基于此模块结构,主机装置可以可以读取存储装置读取value和或metadata。
结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术 人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,基于本发明的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (36)

  1. 一种数据处理方法,其特征在于,该方法包括:
    快速非易失性存储NVMe存储器接收主机发送的NVMe写命令,所述NVMe写命令中携带key,所述NMVe写命令携带value指针,所述value指针指向所述主机中的第一存储空间,所述第一存储空间用于存储value,所述key与所述value属于同一个KV对;
    所述NVMe存储器从所述NVMe写命令中获得所述key,根据所述value指针获得value长度,按照所述value长度为所述value分配第二存储空间,所述第二存储空间在所述NVMe存储器中;
    所述NVMe存储器发送第一传输请求给所述主机,从所述主机获得所述value,把所述value保存在所述第二存储空间中。
  2. 如权利要求1所述的数据处理方法,其中,所述NVMe存储器发送第一传输请求给主机以及从所述主机获得所述value,具体包括:
    所述NVMe存储器发送DMA传输请求给所述主机,从所述主机获得所述value,所述DMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址,其中,所述NVMe存储器和所述主机用PCIe总线连接。
  3. 如权利要求1所述的数据处理方法,其中,所述NVMe存储器发送第一传输请求给主机以及从所述主机获得所述value,具体包括:
    所述存储器发送RDMA传输请求给所述主机,从所述主机获得所述value,所述RDMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址,其中,所述NVMe存储器和所述主机用Fabric总线连接。
  4. 如权利要求1所述的数据处理方法,其中:
    所述NVMe写命令中进一步携带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述NVMe存储器从所述NVMe 写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。
  5. 如权利要求1所述的数据处理方法,其中:
    所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述NVMe存储器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。
  6. 如权利要求1所述的数据处理方法,其中,所述NVMe写命令还携带所述value元数据指针,所述方法还包括:
    所述NVMe存储器根据所述元数据指针获得元数据长度,按照所述元数据长度为所述元数据分配第四存储空间,所述第四存储空间在所述NVMe存储器中;
    所述NVMe存储器通过所述第一传输请求从所述主机中获得所述元数据,把所述元数据保存在第四存储空间中。
  7. 如权利要求6所述的数据处理方法,其中:
    所述元数据传输失败后,所述NVMe存储器释放为所述value分配的存储空间,以及释放为所述元数据分配的第四存储空间。
  8. 如权利要求1所述的数据处理方法,其中,所述NVMe存储器发送第一传输请求给所述主机以及从所述主机获得所述value,具体包括下述其中一种:
    所述存储器发送至少两个DMA传输请求给所述主机以获得所述value,每个DMA传输请求用于请求获得所述value的一部分,当任意一个DMA传输请求执行失败,则NVMe存储器释放为所述value分配的存储空间,其中,所述NVMe存储器和所述主机用PCIe总线连接;或者
    所述存储器发送至少两个RDMA传输请求给所述主机以获得所述value,每个RDMA传输请求用于请求获得所述value的一部分,当任意一个RDMA传输请求执行失败,则NVMe存储器释放为所述value分配的存储空间,其 中,所述NVMe存储器和所述主机用Frabic连接。
  9. 如权利要求1所述的数据处理方法,其中,所述方法还包括:
    生成所述key与所述第二存储空间的映射关系。
  10. 如权利要求9所述的数据处理方法,该方法之后,进一步包括:
    所述NVMe存储器从所述主机接收NVMe读命令,所述NVMe读命令中携带所述key;
    所述NVMe存储器根据所述key在所述映射关系中查询,获得存储所述value的所述第二存储空间的位置信息,把所述第二存储空间的位置信息发送给所述主机;
    所述NVMe存储器接收所述主机发送的第二传输请求,所述第二传输请求用于请求获得存储在所述第二存储空间的数据。
  11. 如权利要求10所述的数据处理方法,其中:
    所述NVMe存储器接收所述主机发送的第二传输请求之前,进一步包括:所述主机根据所述第二存储空间的大小预留所述主机中的第三存储空间;
    所述NVMe存储器发送所述value给所述主机之后,进一步包括:所述主机把收到的来自所述第二存储空间的数据写入所述第四存储空间空间。
  12. 如权利要求10所述的数据处理方法,所述NVMe存储器接收的所述主机发送的NVMe读命令中还携带所述主机的空闲空间信息,所述NVMe存储器从所述主机接收所述NVMe读命令之后,进一步包括:
    所述NVMe存储器判断所述主机的空闲存储空间是否大于等于所述第二存储空间,如果是,执行将所述第二存储空间的位置信息发送给所述主机的步骤,如果否,结束步骤。
  13. 如权利要求1所述的数据处理方法,其中:
    所述第一存储空间用第一存储空间的首地址和所述value长度描述;
    所述第二存储空间用第二存储空间的首地址描述。
  14. 一种NVMe存储器,包括控制器以及存储介质,所述控制器和所述 存储介质连接,所述存储介质用于提供存储空间,其特征在于,所述处理器被配置为执行:
    接收主机发送的快速非易失性存储NVMe写命令,所述NVMe写命令中携带key,所述NMVe写命令携带value指针,所述value指针指向所述主机中的第一存储空间,所述第一存储空间用于存储value,所述key与所述value属于同一个KV对;
    从所述NVMe写命令中获得所述key,根据所述value指针获得value长度,按照所述value长度为所述value分配第二存储空间,所述第二存储空间在所述存储介质中;
    发送第一传输请求给所述主机,从所述主机获得所述value,把所述value保存在所述第二存储空间中。
  15. 如权利要求14所述的NVMe存储器,其中,发送第一传输请求给主机以及从所述主机获得所述value,具体包括:
    所述NVMe存储器和所述主机用PCIe总线连接,发送DMA传输请求给所述主机,从所述主机获得所述value,所述DMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址。
  16. 如权利要求14所述的NVMe存储器,其中,发送第一传输请求给主机以及从所述主机获得所述value,具体包括:
    所述NVMe存储器和所述主机用Fabric总线连接,发送RDMA传输请求给所述主机,从所述主机获得所述value,所述RDMA指令携带所述第一存储空间作为访问地址,携带所述第二存储空间作为写入地址。
  17. 如权利要求14所述的NVMe存储器,其中:
    所述NVMe写命令中进一步携带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述NVMe存储器从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。
  18. 如权利要求14所述的NVMe存储器,其中:
    所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述NVMe存储器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。
  19. 如权利要求14所述的NVMe存储器,其中,所述NVMe写命令还携带所述value元数据指针,所述处理器还被配置为:
    根据所述元数据指针获得元数据长度,按照所述元数据长度为所述元数据分配第四存储空间,所述第四存储空间在所述NVMe存储器中;
    通过所述第一传输请求从所述主机中获得所述元数据,把所述元数据保存在第四存储空间中。
  20. 如权利要求19所述的NVMe存储器,其中,所述处理器还被配置为:
    所述元数据传输失败后,则释放为所述value分配的存储空间,以及释放为所述元数据分配的第四存储空间。
  21. 如权利要求14所述的NVMe存储器,其中,发送第一传输请求给所述主机以及从所述主机获得所述value,具体包括下述其中一种:
    发送至少两个DMA传输请求给所述主机以获得所述value,每个DMA传输请求用于请求获得所述value的一部分,当任意一个DMA传输请求执行失败,则NVMe存储器释放为所述value分配的存储空间,其中所述NVMe存储器和所述主机用PCIe总线连接;或者
    发送至少两个RDMA传输请求给所述主机以获得所述value,每个RDMA传输请求用于请求获得所述value的一部分,当任意一个RDMA传输请求执行失败,则NVMe存储器释放为所述value分配的存储空间,其中,所述NVMe存储器和所述主机用Frabic连接。
  22. 如权利要求14所述的NVMe存储器,其中,所述处理器还被配置为:
    生成所述key与所述第二存储空间的映射关系;
    从所述主机接收NVMe读命令,所述NVMe读命令中携带所述key;
    根据所述key在所述映射关系中查询,获得存储所述value的所述第二存储空间的位置信息,把所述第二存储空间的位置信息发送给所述主机;
    接收所述主机发送的第二传输请求,所述第二传输请求用于请求获得存储在所述第二存储空间的数据。
  23. 一种数据处理方法,其特征在于,该方法包括:
    非易失性存储快速NVMe存储器接收NVMe写命令,所述NVMe写命令的头部携带键key,所述NMVe命令还携带值value,所述key和所述value对应,所述key和所述value属于同一个KV对;
    所述NVMe存储器从所述NVMe写命令中获得所述key以及所述value;
    所述NVMe存储器把所述value保存在所述NVMe存储器的存储介质中。
  24. 如权利要求23所述的数据处理方法,其中:
    所述NVMe写命令中进一步携带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述NVMe存储器从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。
  25. 如权利要求23所述的数据处理方法,其中:
    所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述NVMe存储器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。
  26. 如权利要求23所述的数据处理方法,其中,所述NVMe写命令中进一步携带所述key的长度,所述NVMe存储器从所述写命令中获得所述key具体包括:
    从所述key的预设起始位置,按照所述key的长度从所述写命令中获得所述key。
  27. 如权利要求23所述的方法,其中,所述NVMe写命令中进一步携带value偏移量以及所述value的长度,所述NVMe存储器从所述NVMe写命令中获得所述value具体包括:
    所述NVMe存储器从所述偏移量指示的位置,按照所述value的长度获得所述value。
  28. 如权利要求23所述的数据处理方法,其中:
    该方法进一步包括:生成所述key与所述value存储空间的映射关系;
    该方法之后,进一步包括:
    主机发送NVMe读命令给所述NVMe存储器,所述NVMe读命令中携带所述key;
    所述NVMe存储器从所述主机接收所述读命令,从所述NVMe读命令中获得所述key;
    所述NVMe存储器使用所述key从所述映射关系中查找所述value存储空间;
    所述NVMe存储器从使用所述value存储空间获得所述value;
    所述NVMe存储器生成所述NVMe读命令的响应消息发送给所述主机,所述响应消息携带所述value。
  29. 一种NVMe存储器,包括控制器以及存储介质,所述控制器和所述存储介质连接,所述存储介质用于提供存储空间,其特征在于,所述处理器被配置为执行:
    接收主机发送的快速非易失性存储NVMe写命令,所述NVMe写命令的头部携带键key,所述NMVe命令还携带值value,所述key和所述value对应,所述key和所述value属于同一个KV对;
    从所述NVMe写命令中获得所述key以及所述value;
    把所述value保存在所述存储介质中。
  30. 如权利要求29所述的NVMe存储器数据处理方法,其中:
    所述NVMe写命令中进一步携带KV数量的字段,所述KV数量的字段用于描述所述NVMe写命令中KV的数量,所述处理器从所述NVMe写命令中获得与KV数量相同数量的key,以及获得与KV数量相同数量的value。
  31. 如权利要求29所述的NVMe存储器数据处理方法,其中:
    所述NVMe写命令中进一步携带KV格式的字段,其中,所述KV格式的字段描述所述NVMe写命令中字段的结构,所述处理器按照所述KV格式字段所定义的字段内容,从所述NVMe写命令中获取各个字段。
  32. 如权利要求29所述的NVMe存储器数据处理方法,其中,所述NVMe写命令中进一步携带所述key的长度,从所述写命令中获得所述key具体包括:
    从所述key的预设起始位置,按照所述key的长度从所述写命令中获得所述key。
  33. 如权利要求29所述的NVMe存储器,其中,所述NVMe写命令中进一步携带value偏移量以及所述value的长度,从所述NVMe写命令中获得所述value具体包括:
    从所述偏移量指示的位置,按照所述value的长度获得所述value。
  34. 如权利要求29所述的NVMe存储器数据处理方法,其中,所述处理器还被配置为:
    生成所述key与所述value存储空间的映射关系;
    所述NVMe存储器从所述主机接收NVMe读命令,从所述NVMe读命令中获得所述key;
    使用所述key从所述映射关系中查找所述value存储空间;
    从使用所述value存储空间获得所述value;
    生成所述NVMe读命令的响应消息发送给所述主机,所述响应消息携带所述value。
  35. 一种存储系统,包括权利要求14-22任意一项所述的NVMe存储器, 还包括所述主机,其中:
    所述主机用于构造所述NVMe写命令,发送所述NVMe写命令给所述NVMe存储器。
  36. 一种存储系统,包括主机和权利要求29-34任意一项所述的NVMe存储器,还包括所述主机,其中:
    所述主机用于构造所述NVMe写命令,发送所述NVMe写命令给所述NVMe存储器。
PCT/CN2016/103268 2015-12-28 2016-10-25 一种数据处理方法以及NVMe存储器 WO2017113960A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP16880751.9A EP3260971B1 (en) 2015-12-28 2016-10-25 Data processing method and nvme storage
EP21155651.9A EP3916536A1 (en) 2015-12-28 2016-10-25 Data processing method and nvme storage device
CN201680003110.0A CN107209644B (zh) 2015-12-28 2016-10-25 一种数据处理方法以及NVMe存储器
US15/971,990 US10705974B2 (en) 2015-12-28 2018-05-04 Data processing method and NVME storage device
US16/899,294 US11467975B2 (en) 2015-12-28 2020-06-11 Data processing method and NVMe storage device
US17/947,812 US20230011387A1 (en) 2015-12-28 2022-09-19 Data processing method and nvme storage device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510998928 2015-12-28
CN201510998928.8 2015-12-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/971,990 Continuation US10705974B2 (en) 2015-12-28 2018-05-04 Data processing method and NVME storage device

Publications (1)

Publication Number Publication Date
WO2017113960A1 true WO2017113960A1 (zh) 2017-07-06

Family

ID=59224463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103268 WO2017113960A1 (zh) 2015-12-28 2016-10-25 一种数据处理方法以及NVMe存储器

Country Status (4)

Country Link
US (3) US10705974B2 (zh)
EP (2) EP3916536A1 (zh)
CN (2) CN107209644B (zh)
WO (1) WO2017113960A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363428A (zh) * 2022-01-06 2022-04-15 齐鲁空天信息研究院 基于socket的数据传递方法
EP4273688A3 (en) * 2017-08-10 2024-01-03 Huawei Technologies Co., Ltd. Data access method, device and system

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145459B (zh) * 2016-03-01 2021-05-18 华为技术有限公司 一种级联板、ssd远程共享访问的系统和方法
WO2017214862A1 (zh) * 2016-06-14 2017-12-21 华为技术有限公司 数据访问方法和相关装置及系统
US10305820B1 (en) * 2016-10-12 2019-05-28 Barefoot Networks, Inc. Network forwarding element with key-value processing in the data plane
JP6772826B2 (ja) * 2016-12-26 2020-10-21 ブラザー工業株式会社 画像読取装置および画像送信方法
US10719495B2 (en) 2017-02-09 2020-07-21 Micron Technology, Inc. Stream selection for multi-stream storage devices
US10706105B2 (en) 2017-02-09 2020-07-07 Micron Technology, Inc. Merge tree garbage metrics
US10725988B2 (en) 2017-02-09 2020-07-28 Micron Technology, Inc. KVS tree
US10706106B2 (en) 2017-02-09 2020-07-07 Micron Technology, Inc. Merge tree modifications for maintenance operations
US10572161B2 (en) 2017-11-15 2020-02-25 Samsung Electronics Co., Ltd. Methods to configure and access scalable object stores using KV-SSDs and hybrid backend storage tiers of KV-SSDs, NVMe-SSDs and other flash devices
CN108052290A (zh) * 2017-12-13 2018-05-18 北京百度网讯科技有限公司 用于存储数据的方法和装置
CN110235098B (zh) * 2017-12-26 2021-06-22 华为技术有限公司 存储系统访问方法及装置
CN110324381B (zh) * 2018-03-30 2021-08-03 北京忆芯科技有限公司 云计算与雾计算系统中的kv存储设备
EP3792776B1 (en) 2018-06-30 2022-10-26 Huawei Technologies Co., Ltd. Nvme-based data reading method, apparatus and system
CN111542803B (zh) * 2018-06-30 2021-10-01 华为技术有限公司 一种基于NVMe的数据写入方法、装置及系统
US11115490B2 (en) * 2018-07-31 2021-09-07 EMC IP Holding Company LLC Host based read cache for san supporting NVMEF with E2E validation
CN108920725B (zh) 2018-08-02 2020-08-04 网宿科技股份有限公司 一种对象存储的方法及对象存储网关
US10915546B2 (en) 2018-10-10 2021-02-09 Micron Technology, Inc. Counter-based compaction of key-value store tree data block
US11100071B2 (en) 2018-10-10 2021-08-24 Micron Technology, Inc. Key-value store tree data block spill with compaction
US10852978B2 (en) 2018-12-14 2020-12-01 Micron Technology, Inc. Key-value store using journaling with selective data storage format
US11048755B2 (en) 2018-12-14 2021-06-29 Micron Technology, Inc. Key-value store tree with selective use of key portion
CN109710187B (zh) * 2018-12-24 2022-12-02 深圳忆联信息系统有限公司 NVMe SSD主控芯片的读命令加速方法、装置、计算机设备及存储介质
US10936661B2 (en) 2018-12-26 2021-03-02 Micron Technology, Inc. Data tree with order-based node traversal
KR20210004701A (ko) * 2019-07-05 2021-01-13 삼성전자주식회사 키-밸류 기반으로 데이터를 저장하는 스토리지 장치 및 이의 동작 방법
US20210064745A1 (en) * 2019-08-29 2021-03-04 Flexxon Pte Ltd Methods and systems using an ai co-processor to detect anomolies caused by malware in storage devices
CN110968530B (zh) * 2019-11-19 2021-12-03 华中科技大学 一种基于非易失性内存的键值存储系统和内存访问方法
US11287994B2 (en) * 2019-12-13 2022-03-29 Samsung Electronics Co., Ltd. Native key-value storage enabled distributed storage system
KR20210092361A (ko) 2020-01-15 2021-07-26 삼성전자주식회사 스토리지 장치 및 그것의 동작 방법
EP3851950A1 (en) * 2020-01-15 2021-07-21 Samsung Electronics Co., Ltd. Storage device and operation method thereof
US11200180B2 (en) 2020-01-31 2021-12-14 Western Digital Technologies, Inc. NVMe SGL bit bucket transfers
EP4127940A1 (en) * 2020-05-08 2023-02-08 Huawei Technologies Co., Ltd. Remote direct memory access with offset values
CN113472623A (zh) * 2021-05-31 2021-10-01 山东英信计算机技术有限公司 一种存储系统管理方法、装置、存储介质及设备
US11966343B2 (en) 2021-07-19 2024-04-23 Samsung Electronics Co., Ltd. Universal mechanism to access and control a computational device
CN115904488A (zh) * 2021-08-11 2023-04-04 华为技术有限公司 数据传输方法、系统、装置及设备
US11922034B2 (en) 2021-09-02 2024-03-05 Samsung Electronics Co., Ltd. Dual mode storage device
US11853607B2 (en) 2021-12-22 2023-12-26 Western Digital Technologies, Inc. Optimizing flash memory utilization for NVMe KV pair storage
US11817883B2 (en) 2021-12-27 2023-11-14 Western Digital Technologies, Inc. Variable length ECC code according to value length in NVMe key value pair devices
US11733876B2 (en) 2022-01-05 2023-08-22 Western Digital Technologies, Inc. Content aware decoding in KV devices
JP2023107418A (ja) 2022-01-24 2023-08-03 キオクシア株式会社 ストレージデバイスおよびストレージシステム
US11853564B1 (en) * 2022-06-17 2023-12-26 Western Digital Technologies, Inc. Key value data storage device with improved utilization for short key value pairs

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130086311A1 (en) * 2007-12-10 2013-04-04 Ming Huang METHOD OF DIRECT CONNECTING AHCI OR NVMe BASED SSD SYSTEM TO COMPUTER SYSTEM MEMORY BUS
CN103973810A (zh) * 2014-05-22 2014-08-06 华为技术有限公司 基于互联网协议ip盘的数据处理方法和装置
CN104461380A (zh) * 2014-11-17 2015-03-25 华为技术有限公司 数据存储方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415338B1 (en) * 1998-02-11 2002-07-02 Globespan, Inc. System for writing a data value at a starting address to a number of consecutive locations equal to a segment length identifier
JP2003280979A (ja) * 2002-03-20 2003-10-03 Toshiba Corp 情報記憶装置
US7493433B2 (en) * 2004-10-29 2009-02-17 International Business Machines Corporation System, method and storage medium for providing an inter-integrated circuit (I2C) slave with read/write access to random access memory
US9355109B2 (en) 2010-06-11 2016-05-31 The Research Foundation For The State University Of New York Multi-tier caching
CN102594849B (zh) * 2011-01-06 2015-05-20 阿里巴巴集团控股有限公司 数据备份、恢复方法、虚拟机快照删除、回滚方法及装置
JP5762878B2 (ja) * 2011-08-08 2015-08-12 株式会社東芝 key−valueストアを有するメモリシステム
JP5524144B2 (ja) * 2011-08-08 2014-06-18 株式会社東芝 key−valueストア方式を有するメモリシステム
US8966172B2 (en) * 2011-11-15 2015-02-24 Pavilion Data Systems, Inc. Processor agnostic data storage in a PCIE based shared storage enviroment
WO2014089828A1 (zh) * 2012-12-14 2014-06-19 华为技术有限公司 访问存储设备的方法和存储设备
US9430412B2 (en) * 2013-06-26 2016-08-30 Cnex Labs, Inc. NVM express controller for remote access of memory and I/O over Ethernet-type networks
JP5646775B2 (ja) * 2014-01-15 2014-12-24 株式会社東芝 key−valueストア方式を有するメモリシステム
US9727503B2 (en) * 2014-03-17 2017-08-08 Mellanox Technologies, Ltd. Storage system and server
US9959203B2 (en) * 2014-06-23 2018-05-01 Google Llc Managing storage devices
CN104238963B (zh) * 2014-09-30 2017-08-11 华为技术有限公司 一种数据存储方法、存储装置及存储系统
US9438426B2 (en) 2014-10-03 2016-09-06 Seagate Technology Llc Key-value data storage device with hybrid architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130086311A1 (en) * 2007-12-10 2013-04-04 Ming Huang METHOD OF DIRECT CONNECTING AHCI OR NVMe BASED SSD SYSTEM TO COMPUTER SYSTEM MEMORY BUS
CN103973810A (zh) * 2014-05-22 2014-08-06 华为技术有限公司 基于互联网协议ip盘的数据处理方法和装置
CN104461380A (zh) * 2014-11-17 2015-03-25 华为技术有限公司 数据存储方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3260971A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4273688A3 (en) * 2017-08-10 2024-01-03 Huawei Technologies Co., Ltd. Data access method, device and system
CN114363428A (zh) * 2022-01-06 2022-04-15 齐鲁空天信息研究院 基于socket的数据传递方法
CN114363428B (zh) * 2022-01-06 2023-10-17 齐鲁空天信息研究院 基于socket的数据传递方法

Also Published As

Publication number Publication date
EP3260971A1 (en) 2017-12-27
US20200301850A1 (en) 2020-09-24
EP3260971A4 (en) 2018-05-02
CN111427517A (zh) 2020-07-17
US11467975B2 (en) 2022-10-11
US10705974B2 (en) 2020-07-07
US20230011387A1 (en) 2023-01-12
CN107209644B (zh) 2020-04-28
EP3916536A1 (en) 2021-12-01
CN107209644A (zh) 2017-09-26
US20180253386A1 (en) 2018-09-06
EP3260971B1 (en) 2021-03-10

Similar Documents

Publication Publication Date Title
WO2017113960A1 (zh) 一种数据处理方法以及NVMe存储器
US10768857B2 (en) Storage system having a controller that selects a die of a solid state disk to store data
TWI732110B (zh) 對非揮發性快閃記憶體進行低延遲直接資料存取的系統及方法
WO2020000483A1 (zh) 数据处理的方法和存储系统
WO2018137217A1 (zh) 一种数据处理的系统、方法及对应装置
CN107229415B (zh) 一种数据写方法、数据读方法及相关设备、系统
WO2016093895A1 (en) Generating and/or employing a descriptor associated with a memory translation table
CN107851122B (zh) 大规模存储和检索具有良有界生命的数据
US9558232B1 (en) Data movement bulk copy operation
WO2016054818A1 (zh) 数据处理方法和装置
EP4318251A1 (en) Data access system and method, and device and network card
WO2022007470A1 (zh) 一种数据传输的方法、芯片和设备
WO2020034729A1 (zh) 数据处理方法、相关设备及计算机存储介质
WO2020199760A1 (zh) 数据存储方法、存储器和服务器
US10284672B2 (en) Network interface
CN101237415A (zh) 一种实现arp协议ip核的方法
CN113032293A (zh) 缓存管理器及控制部件
KR20150129808A (ko) 메모리 노드를 포함하는 분산형 메모리 시스템을 위한 방법 및 장치
WO2015055117A1 (zh) 一种内存访问的方法、设备和系统
US9237057B1 (en) Reassignment of a virtual connection from a busiest virtual connection or locality domain to a least busy virtual connection or locality domain
CN110658980B (zh) 数据处理方法及装置、存储系统
US11947419B2 (en) Storage device with data deduplication, operation method of storage device, and operation method of storage server
US9473591B1 (en) Reliable server transport over fibre channel using a block device access model
US9514151B1 (en) System and method for simultaneous shared access to data buffers by two threads, in a connection-oriented data proxy service
JP2014235531A (ja) データ転送装置、データ転送システム、およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880751

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2016880751

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE