WO2020000482A1 - 一种基于NVMe的数据读取方法、装置及系统 - Google Patents

一种基于NVMe的数据读取方法、装置及系统 Download PDF

Info

Publication number
WO2020000482A1
WO2020000482A1 PCT/CN2018/093918 CN2018093918W WO2020000482A1 WO 2020000482 A1 WO2020000482 A1 WO 2020000482A1 CN 2018093918 W CN2018093918 W CN 2018093918W WO 2020000482 A1 WO2020000482 A1 WO 2020000482A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
data
read
host
storage unit
Prior art date
Application number
PCT/CN2018/093918
Other languages
English (en)
French (fr)
Inventor
维克多 吉辛.
李君瑛
周冠锋
林嘉树
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020207022273A priority Critical patent/KR102471219B1/ko
Priority to JP2020545126A priority patent/JP7191967B2/ja
Priority to CN201880005007.9A priority patent/CN111095231B/zh
Priority to PCT/CN2018/093918 priority patent/WO2020000482A1/zh
Priority to EP18924295.1A priority patent/EP3792776B1/en
Publication of WO2020000482A1 publication Critical patent/WO2020000482A1/zh
Priority to US17/072,038 priority patent/US11467764B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • G06F13/404Coupling between buses using bus bridges with address mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4234Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • the present application relates to the field of storage, and in particular, to a method, an apparatus, and a storage device for reading data based on a non-volatile high-speed transmission bus (NVMe).
  • NVMe non-volatile high-speed transmission bus
  • NVMe non-volatile memory express
  • NVM The interface of the subsystem (including controller and storage medium) communication is added to the high-speed Peripheral Component Interconnect Express (PCIe) interface as a register interface, which is optimized for enterprise-level and consumer-level solid-state storage. It has the advantages of high performance and low access delay.
  • PCIe Peripheral Component Interconnect Express
  • NVMe is based on a pair of submission queues (full name in English: submission queue, abbreviation SQ) and completion queues (full name in English: completion queue, abbreviation: CQ) mechanism.
  • the command is put into the submission queue by the host.
  • the completion information is put into the corresponding completion queue by the controller.
  • Each submission queue entry (SQE) is a command.
  • the memory address used for data transmission passes a metadata pointer (full name in English: Meta-data Pointer, abbreviation: MPTR) and a data pointer (full name in English) : Data Pointer, abbreviation: DPTR).
  • the NVMe controller obtains the read instruction, it writes the data to be read into the storage space indicated by the memory address used for data transmission through the PCIe write operation.
  • the present application discloses a method, device, and system for reading data based on NVMe.
  • the host receives the data message sent by the NMVe controller through the entry address opened to the NMVe controller, and allocates a corresponding storage unit for the entry address in its memory space. After receiving the data message, it receives the entry address carried by the data message. Determine the address of the corresponding storage unit, and write the payload data in the data message to the determined storage unit. Therefore, the relationship between the storage unit and the communication protocol is separated, and flexible operation of data is realized.
  • the present application discloses an NVMe-based data reading system.
  • the system includes a host, an NVMe controller, and a storage medium.
  • the storage medium is used to store data of the host, and the host is used to trigger a read instruction to the NVMe controller.
  • the read instruction carries instruction information for indicating the first address.
  • the first address is an address that the NVMe controller can address. After the NVMe controller obtains the read instruction, it is used to read the pending instruction corresponding to the read instruction from the storage medium.
  • the read instruction may be SQE, and the specific process for the host to trigger the read instruction may write SQE to SQ for the host, and notify the NMVe controller through the doorbell.
  • the first address is an address opened by the host to the NVMe controller, but the first address is only an entry address for the NVMe controller to write payload data to the host.
  • the storage space indicated by the first address does not actually store the payload data.
  • the host After receiving the data message sent by the NVMe controller, the host does not write the payload data into the storage space indicated by the first address, but allocates a second corresponding to the first address in its addressable storage space. Address, and write the payload data to the storage unit indicated by the second address. Therefore, the operation of the host on the storage unit indicated by the second address is no longer limited by the communication protocol between the host and the NMVe controller. This application can reduce the delay of the read operation and reduce the occupation of the host's storage space by the data to be read .
  • the host after the host completes the write operation to the storage unit indicated by the second address, it is further configured to operate the data in the storage unit indicated by the second address.
  • the completion of the write operation to the storage unit is to write all the data associated with the storage unit into the storage unit. For example, it can be filled with the storage unit or the last load data associated with the storage unit is written into the storage unit. .
  • the operation on the data may be sending the data in the storage unit to other subjects.
  • the storage space indicated by the second address may be the private memory of the host, which is no longer accessible to the NVMe controller by means of the PCIe address, nor is it used as a command memory buffer (Command Memory Buffer (CMB)).
  • CMB Common Memory Buffer
  • the host uses the mapping relationship between the first address and the second address to store the payload data in the storage unit indicated by the addressable second address, and then splits the communication protocol between the host and the NVMe controller and the second address. Relationship, after the host completes the write operation to the storage unit indicated by the second address, it can operate on the data in the storage unit indicated by the second address, and does not need to wait for the read operation indicated by the read instruction to complete completely. Read the data for operation.
  • the NVMe controller is further configured to trigger completion of a queue entry (CQE), and CQE is used to instruct the NVMe controller After completing the read operation indicated by the read instruction, after the host operates the data in the storage unit indicated by the second address, it is also used to obtain the completion queue entry CQE.
  • CQE queue entry
  • the NVMe controller triggering CQE can specifically write the CQE to CQ after NVMe completes the read operation, and notify the host through an interrupt.
  • the host needs to allocate a PCIe address space for the read instruction, and the storage space indicated by the PCIe address is used to store the data to be read.
  • the host loses ownership of the PCIe address space, that is, the host cannot access the storage space indicated by the PCIe address space until the host obtains the CQE, resulting in delays in reading operations and waste of storage space.
  • the second address is not the first address carried in the data message, but an internal address selectable by the host, the host can operate the data in the storage unit indicated by the second address before acquiring the CQE.
  • the host after the host operates the data in the storage unit indicated by the second address, the host is further configured to release the second address indication. Storage unit.
  • the host can organize its internal storage space into the form of a memory pool, which contains multiple storage units. After the host completes the operations on the data in the storage unit, it can release the memory unit to the memory pool for other read operations. It is not necessary to wait until the entire read operation is explained before releasing the storage unit, thereby reducing the occupation time of the storage unit and increasing the use efficiency of the storage space.
  • the host before the host triggers the read instruction, it is further configured to allocate a second address indication storage unit for the read instruction. And record the correspondence between the first address and the second address.
  • the host allocates the corresponding storage unit for the read instruction before the write instruction is triggered, which can effectively avoid storage space overflow.
  • the host can trigger the read instruction according to the number of free storage units in the maintained memory pool to effectively control the read operation.
  • the data to be read of the read instruction corresponds to at least two data packets, and the host allocates the read instruction At least two storage units.
  • the data to be read can be split into multiple copies and transmitted using multiple data messages.
  • the host can allocate a corresponding number of multiple storage units according to the scale of the read operation.
  • the host is configured to use the order of the first address and the first payload data in the data to be read Determine the second address.
  • the host allocates multiple storage units for the read operation, that is, the first address corresponds to multiple storage units. After the host receives the first data message, it needs to determine the specific storage unit for the first payload data.
  • the host can logically address multiple storage units allocated for the read operation, and write the data to be read into the multiple storage units in sequence.
  • the host may specifically determine a storage unit to which the first load data needs to be written according to an order of the first load data in the data to be read.
  • the NVMe controller is further configured to send a second data packet and a second data packet to the host.
  • the first address and the second payload data are carried in the data.
  • the data to be read includes the second payload data.
  • the host is also used to receive the second data message, and determine the first data message according to the order of receiving the first data message and the second data message. The order of the first load data and the second load data in the data to be read.
  • the NVMe controller needs to divide the data to be read into multiple data messages for transmission. After receiving the data message, the host needs to carry the data message. The payload data is reordered. If the NVMe controller sends data packets in strict accordance with the order of the payload data in the data to be read, the host can perform the payload data according to the order of the received data packets. Sort.
  • the first data packet further carries a bias of the first payload data in the data to be read.
  • the shift amount is used to indicate the order of the first load data in the data to be read.
  • the NVMe controller can realize out-of-order transmission of the data message, and can make greater use of bandwidth resources.
  • the first address is a PCIe address addressable by the NVMe controller, and the first data packet is PCIe message; the storage unit indicated by the second address is the memory space of the host.
  • the present application discloses an NVMe-based data reading method.
  • the method includes: a host triggers a read instruction, the read instruction carries instruction information, and the instruction information is used to indicate a first address, and the first address is an NVMe controller Addressable address; the host receives the first data message sent by the NVMe controller, and the first data message carries the first address and the first payload data; the host determines the second address according to the first address, and the second address is the host Addressable address; the host writes the first payload data to the storage unit indicated by the second address.
  • the method further includes: the host performs data on the storage unit indicated by the second address Do it.
  • the method further includes: the host obtains NVMe control
  • the completion queue entry CQE triggered by the controller is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
  • the method further includes: the host releases the first Memory location indicated by two addresses.
  • the method before the host triggers the read instruction, the method further includes: the host allocates a second address indication for the read instruction Storage unit, and records the correspondence between the first address and the second address.
  • the data to be read of the read instruction corresponds to at least two data packets, and the host allocates the read instruction At least two storage units.
  • the host determines the first address according to the order of the first address and the first payload data in the data to be read. Second address.
  • the method further includes: the host receives a second data packet sent by the NVMe controller, and the second The data message carries the first address and the second payload data; the host determines the order of the first payload data and the second payload data in the data to be read according to the order in which the first data message and the second data message are received.
  • the first data packet further carries a bias of the first payload data in the data to be read.
  • the shift amount is used to indicate the order of the first load data in the data to be read.
  • the first address is a PCIe address addressable by the NVMe controller, and the first data packet is PCIe message; the storage unit indicated by the second address is the memory space of the host.
  • the second aspect is a method implementation manner corresponding to the first aspect.
  • the description in the first aspect or any possible implementation manner of the first aspect corresponds to the second aspect or any possible implementation manner of the second aspect. Here, No longer.
  • the present application provides a readable medium including an execution instruction.
  • the computing device executes any of the foregoing second aspect or any possible implementation of the foregoing second aspect. Way in the way.
  • the present application provides a computing device, including: a processor, a memory, and a bus; the memory is used to store execution instructions, and the processor is connected to the memory through the bus; when the computing device is running, the processor executes the execution of the memory storage Instructions to cause a computing device to execute the method in the second aspect above or any one of the possible implementation manners of the second aspect above.
  • the present application discloses an NVMe-based data reading device.
  • the device includes a processing unit for triggering a read instruction, and the read instruction carries instruction information, and the instruction information is used to indicate a first address and a first address.
  • An address that is addressable by the NVMe controller;
  • a receiving unit configured to receive a first data message sent by the NVMe controller, the first data message carrying a first address and first payload data;
  • the processing unit is further configured to The first address determines the second address, and writes the first payload data into the storage unit indicated by the second address, and the second address is an address that can be addressed by the processing unit.
  • the processing unit after the processing unit completes the write operation to the storage unit indicated by the second address, it is further configured to operate the data in the storage unit indicated by the second address. .
  • the processing unit after the processing unit operates on the data in the storage unit indicated by the second address, it is further configured to obtain an NVMe controller trigger
  • the completion queue entry CQE, CQE is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
  • the processing unit in a third possible implementation manner of the fifth aspect, after the processing unit operates on the data in the storage unit indicated by the second address, the processing unit is further configured to release the second address. Indicated storage unit.
  • the processing unit before the processing unit triggers the read instruction, the processing unit is further configured to allocate a storage for the second address indication for the read instruction Unit, and records the correspondence between the first address and the second address.
  • the data to be read of the read instruction corresponds to at least two data packets
  • the processing unit is a read instruction Allocate at least two storage units.
  • the processing unit is configured to, according to the first address and the first payload data, in the data to be read The second address is determined sequentially.
  • the receiving unit is further configured to receive the second data packet and the second data sent by the NVMe controller.
  • the message carries the first address and the second payload data;
  • the processing unit is further configured to determine the first payload data and the second payload data in the data to be read according to the order in which the first data message and the second data message are received. order.
  • the first data packet further carries a bias of the first payload data in the data to be read
  • the shift amount is used to indicate the order of the first load data in the data to be read.
  • the first address is a PCIe address addressable by the NVMe controller
  • the first data packet is a PCIe packet
  • the storage unit indicated by the second address is a device Memory space.
  • the fifth aspect is a device implementation manner corresponding to the first aspect.
  • the description in the first aspect or any possible implementation manner of the first aspect corresponds to the fifth aspect or any possible implementation manner of the fifth aspect. Here, No longer.
  • the host opens the first address as a data entry address to the NVMe controller for the NVMe controller to write data to be read to the host through the first address.
  • the data message sent by the NVMe controller carries The destination address is the first address, but after receiving the data packet, the host does not actually write the payload data in the data packet into the storage space indicated by the first address, but instead maps the first address to the second address. And write the payload data of the data message into the storage space indicated by the second address.
  • the storage space indicated by the second address may be the private memory space of the host, thereby separating the relationship between the storage space storing the payload data and the communication protocol, and the host's access to the second address is not restricted by the communication protocol.
  • the host Before the end of the read instruction, the host can use the data stored in the storage space indicated by the second address and release the storage space indicated by the second address in advance for other read operations.
  • the technical solution disclosed in this application can reduce the delay of the read operation and save the storage space for storing the data to be read.
  • FIG. 1 is a schematic diagram of a logical structure of an NVMe system according to an embodiment of the present application
  • FIG. 2 is a signaling diagram of a data reading method based on the NVMe standard
  • FIG. 3 is a schematic diagram of a hardware structure of a host according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an NMVe-based data reading method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an entrance organization structure according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an entrance organization structure according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a PCIe address structure according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an address mapping relationship according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of data packet transmission according to an embodiment of the present invention.
  • FIG. 10 is a signaling diagram of an NVMe-based data reading method according to an embodiment of the present invention.
  • FIG. 11 (a) is a schematic diagram of a logical structure of an NVMe system according to an embodiment of the present application.
  • FIG. 11 (b) is a schematic diagram of a logical structure of an NVMe system according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a logical structure of a computing device according to an embodiment of the present application.
  • first and second are used to distinguish each object, such as the first address and the second address, but there is no logical or temporal dependency relationship between each “first” and “second”.
  • the “data packet” refers to a data packet that carries payload data and is sent by the NVMe controller to the host.
  • the payload data herein may be user data or metadata of the user data, and the embodiment of the present invention does not limit the type of the payload data.
  • the embodiments of the present invention use the term "data" or "payload data" to represent various types of data carried in a data message.
  • the data message may be a PCIe message.
  • the entry is an address space opened by the host to the NVMe controller.
  • the entry address may be a PCIe address
  • the data message may be a PCIe write message.
  • the NVMe controller sends a data packet to the host through the portal, and the data packet carries the portal address.
  • the host After receiving the data message, the host identifies the entry address, allocates corresponding storage space for the entry in the local internal memory, and writes the payload data carried by the data message to the allocated memory space for buffering, instead of loading the data Write the storage space indicated by the entry address.
  • the internal memory can be the private memory space of the host.
  • the read operation may be any operation for the NVMe command centralized host to read data from the NVMe controller.
  • the instruction indicating a read operation is a read instruction.
  • the specific implementation of the read instruction may be a submission queue entry.
  • the command initiator and the data initiator may be the same or separated subjects.
  • the command initiator is a system main body that directly triggers an instruction to the NVMe controller, and is also referred to as a command source in the embodiment of the present invention.
  • a data initiator is a system main body that needs to read data and consume data, that is, a system main body for initiating a data access request, and is also referred to as a data source in the embodiment of the present invention. In a separate scenario, the data source needs to read the data through the command source.
  • the term “host” may refer to a command source in a scenario where the data source and a command source are separated, or a computing device that communicates with the NMVe controller in a scenario where the data source and the command source are not separated.
  • the host carries the address information of the storage space for storing the data to be read through DPTR or MPTR in the triggered SQE during the NVMe read operation.
  • the NVMe controller writes the data to be read according to the SQE.
  • the storage space indicated by the address information.
  • the host loses ownership of the storage space used to store the data to be read, that is, the host needs to wait for the read operation to completely end To access the data stored in this storage space.
  • FIG. 1 is an architecture diagram of an NVMe system 100 according to an embodiment of the present invention.
  • the data source 101 and the command source 103 in the system 100 are not the same subject, and they are separated from each other and interconnected through the network 102.
  • the command source 103 may be interconnected with the NVMe controller 105 through the PCIe bus, and the NVMe controller 105 is connected with the storage medium 106.
  • the storage medium 106 is also generally called external storage, and is generally a non-volatile storage medium, which can be used to permanently store data.
  • the storage medium 106 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or a semiconductor medium (for example, a flash memory).
  • the embodiment of the present invention does not limit the specific implementation form of the storage medium 106.
  • the storage medium 106 may further include a remote storage separate from the NVMe controller 105, such as a network storage medium interconnected with the NVMe controller 105 through a network.
  • the network 102 may be used to refer to any method or interconnection protocol between the data source 101 and the command source 103.
  • the network 102 may be a PCIe bus, an internal interconnection bus for computer equipment, the Internet, an intranet (English: intranet ), Local area network (full name in English: local area network, abbreviation: LAN), wide area network (full name in English: wide area network, abbreviation: WAN), storage area network (full name in English: storage area network, abbreviation: SAN), etc., or above Any combination of networks.
  • the data source 101 and the NVMe controller 105 need to communicate through the command source 103.
  • the read instruction triggered by the command source 103 needs to carry address information of a storage space for storing data to be read.
  • the data to be read must first be completely transferred from the storage medium 106 controlled by the NVMe controller 105 to the command source 103.
  • the command source 103 can send data Give the data source 101.
  • the data source when a data source needs to read data from a storage medium, the data source first sends a read request to the command source.
  • the command source writes the SQE to the submission queue (full name: submission queue, abbreviated SQ) according to the read request received from the data source, and carries the address information for receiving the data to be read through the DPTR or MPTR field of the SQE.
  • the command source then notified the NVMe controller of a new SQE through the doorbell mechanism. After the NVMe controller received the doorbell, it read the SQE in SQ, and used the PCIe write instruction to completely write the data to be read according to the address information carried in the SQE.
  • the NVMe controller writes CQE to the completion queue (full name: completion, abbreviation: CQ), and notifies the command source through the interrupt mechanism, the command source processes the interrupt, obtains the CQE, and sends the data source to be read data.
  • the storage space for receiving the data to be read needs to be prepared, and the ownership of this storage space is lost before the CQE is obtained, that is, it is necessary to wait for the data to be read to be completely After writing to this storage space, you can send data to the data source.
  • the delay of this process is proportional to the size of the data to be read.
  • the command source requires a large amount of memory addressable by the NVMe controller to store the data to be read, and the command source allocates memory for the data to be read until the time period between the time when the NVMe controller CQE releases memory Always occupied.
  • FIG. 3 is a schematic structural diagram of a host 300 according to an embodiment of the present application.
  • the host 300 includes a processor 301, and the processor 301 is connected to the system memory 302.
  • the processor 301 may be a central processing unit (CPU), an image processor (English: graphics processing unit, GPU), a field programmable gate array (full English name: Field Programmable GateArray, abbreviation: FPGA), an application specific integrated circuit (full English name : Application Specific Integrated Circuit (abbreviation: ASIC) or digital signal processor (English: digital signal processor (DSP)) or any combination of the above calculation logic.
  • the processor 301 may be a single-core processor or a multi-core processor.
  • the processor 301 may further include a register, and the address information of the register may be opened to the NMVe controller as a PCIe address.
  • the processor 301 may further include read operation logic 310, and the read operation logic 310 may be a specific hardware circuit or a firmware module integrated in the processor 301. If the read operation logic 310 is a specific hardware circuit, the read operation logic 310 executes the method of the embodiment of the present application. If the read operation logic 310 is a firmware module, the processor 310 executes the firmware code in the read operation logic 310 to implement the present application. The technical solution of the embodiment.
  • the read operation logic 310 includes: (1) logic (circuit / firmware code) for triggering a read instruction, wherein the read instruction carries instruction information for indicating a first address that the NVMe controller can address; 2) Logic (circuit / firmware code) for receiving a data message sent by the NVMe controller, the data message carries the first address and payload data; (3) is used to determine the host addressable based on the first address Logic (circuit / firmware code) of the second address; (4) Logic (circuit / firmware code) for writing payload data to the memory cell indicated by the second address.
  • logic circuit / firmware code
  • the bus 309 is used to transfer information between various components of the host 300.
  • the bus 309 may use a wired connection method or a wireless connection method, which is not limited in this application.
  • the bus 309 is also connected with an input / output interface 305 and a communication interface 303.
  • the input / output interface 305 is connected with an input / output device for receiving input information and outputting operation results.
  • the input / output device can be a mouse, keyboard, monitor, or optical drive.
  • the communication interface 303 is used to implement communication with other devices or networks.
  • the communication interface 303 may be interconnected with other devices or networks in a wired or wireless form.
  • the host 300 may be connected to the NVMe controller through the communication interface 303, and the host 300 may also be connected to the network through the communication interface 303 and connected to the NVMe controller through the network.
  • the system memory 302 may include some software, for example, the operating system 308 (such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)), the application program 307, and the read operation module 306, etc. .
  • the operating system 308 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)
  • the application program 307 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)
  • the application program 307 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)
  • the application program 307 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks
  • the processor 301 executes the read operation module 306 to implement the technical solution of the embodiment of the present application.
  • the read operation module 306 includes: (1) a code for triggering a read instruction, wherein the read instruction carries instruction information that is used to indicate a first address that the NVMe controller can address; (2) used to receive NVMe The code of the data message sent by the controller, the data message carries the first address and payload data; (3) a code used to determine the second address that the host can address according to the first address; (4) used to The payload data is written into the code of the memory cell indicated by the second address.
  • FIG. 3 is only an example of the host 300.
  • the host 300 may include more or fewer components than those shown in FIG. 3, or may have different component configuration methods.
  • various components shown in FIG. 3 may be implemented by hardware, software, or a combination of hardware and software.
  • an embodiment of the present invention provides a method for reading data based on NVMe. As shown in FIG. 4, the method 400 includes:
  • Step 401 The host triggers a read instruction, and the read instruction carries instruction information, and the instruction information is used to indicate a first address that the NVMe controller can address.
  • the read instruction may be specifically SQE.
  • the read instruction is described as an example of SQE, but it should be understood that the embodiment of the present invention does not limit the specific implementation form of the first read instruction.
  • the process for the host to trigger a read instruction to the NVMe controller can refer to the NMVe standard. Specifically, the host writes SQE to SQ, and notifies the NVMe controller of the new SQE through the doorbell. The NVMe controller goes to SQ to obtain the SQE according to the doorbell. In the embodiment of the present invention, the host may also directly push SQE to the NVMe controller, and the embodiment of the present invention does not limit the specific process of the host triggering the read instruction to the NVMe controller.
  • the host may open a part of its storage space to the NVMe controller. More specifically, the host may open a part of its storage space to the NVMe controller as a PCIe storage space, and the NVMe controller may access this according to the PCIe address. Partial storage space. Take the Base Address Register (BAR) as an example, the host opens the BAR as a PCIe storage space to the NVMe controller, and organizes part of the PCIe address of the BAR into multiple portals, each of which occupies For a specific NVMe controller addressable PCIe address space, the entry address can be the entry's starting PCIe address.
  • BAR Base Address Register
  • the indication information carried in the read instruction triggered by the host may be used to indicate a specific entry, and the first address may be an entry address or a part of a field of the entry address.
  • the entry is the data entry for the NVMe controller to perform PCIe write operations to the host. In the following description, the function of the entry will be described in more detail.
  • the present invention does not limit the organization manner of the entries in the PCIe address space. It only needs to ensure that each entry uniquely corresponds to a specific read operation.
  • the host may organize a part of the PCIe address of its base address register into the form of an aperture, and each of the vias includes multiple portals, that is, the organization of the portals may be in the form of an array.
  • the entry is addressed to the entry by adding the base offset of the array base address. This array is called a via.
  • FIG. 5 is a schematic structural diagram of a base address register. Each through hole is composed of a set of entries P0 to PN, and each entry is uniquely associated with a specific read operation. "Unique" means that the host can initiate only one NVMe read operation to be associated with a specific entry at any one time.
  • the vias can be divided into metadata vias and data vias.
  • the NVMe controller writes data to the host through the entry DP0 ⁇ DPN included in the data through hole through the PCIe write operation, and writes metadata to the host through the entry MP0 ⁇ MPN included in the metadata through hole.
  • the embodiments of the present invention collectively refer to metadata and data as data.
  • FIG. 7 is a PCIe address structure in a PCIe data packet according to an embodiment of the present invention.
  • the PCIe address structure includes the base address of the BAR, the via offset, and the entry offset.
  • the BAR and through-hole offset are used to uniquely determine the through-hole
  • the inlet offset is used to indicate the specific entrance in the through-hole.
  • the entries can also be randomly distributed in the PCIe address space.
  • the randomly distributed entries in the PCIe space are called any "data entry” and "metadata entry”.
  • the instruction information is used to indicate a specific entry, and the NVMe controller may uniquely determine an entry according to the instruction information.
  • the invention does not limit the specific form of the indication information.
  • the indication information may be a display address, and the indication information may be a specific PCIe address or a partial field of the entry address, that is, the indication information may be a first address or a partial field of the first address.
  • the indication information may be the portal offset of the portal, and the base address and through-hole offset of the BAR may be used as the configuration information of the host for the NVMe controller to obtain.
  • the NVMe controller can determine the complete PCIe address of the entry according to the instruction information. In this case, the format of the SQE can be consistent with the NVMe standard.
  • the indication information may also be an implicit address. For example, if each SQE in an SQ has its own unique command identifier CID, the indication information may consist of "queue ID + CID". If the CID of each SQE processed by the NVMe controller is unique, the indication information may be the CID carried by the corresponding SQE. In other implementations, the indication information may also be part of the CID. In the embodiment of the present invention, the indication information may also be specified by using a specially defined MPTR or PRT or other fields in the SQE. The embodiment of the present invention does not limit a specific implementation manner of the indication information.
  • the NVMe controller can maintain a mapping relationship between the indication information and the entry address, and uniquely determine the entry address based on the mapping relationship and the indication information.
  • the indication identifier is CID of SQE, and the encoding method of the system CID and the addressing method of the entry offset are the same.
  • the CID corresponds to the entry offset one by one.
  • the base address of the BAR and the offset of the through hole can be used as the host.
  • the configuration information is obtained by the NVMe controller, and the NVMe controller may determine the first address of the data packet according to the mapping relationship between the instruction information and the entry address.
  • the embodiment of the present invention does not limit the specific implementation of the indication information, as long as the NVMe controller can determine the first address according to the indication information.
  • the first address is used to indicate an entry corresponding to a read operation, and is specifically an entry address or a partial field of the entry address.
  • Step 402 The host receives a first data message sent by the NVMe controller, where the first data message carries a first address and first payload data.
  • the data message may be a PCIe write operation message. More specifically, the data message may be a transaction layer packet (TLP), and the payload data may be a payload carried in the TLP (payload).
  • TLP transaction layer packet
  • the first address may be a PCIe address in the TLP or a part of the PCIe address in the TLP.
  • the NVMe controller maintains a mapping relationship between the indication information and the first address.
  • the first address may be an entry address corresponding to a read operation. After the NVMe controller obtains the read instruction, the first address is determined according to the specific implementation of the instruction information, and the data is read from the storage medium according to the read instruction. The first address and the read data encapsulate the TLP and send the TLP to the host.
  • Step 403 The host determines a second address that can be addressed by the host according to the first address.
  • the first address is used to indicate the entry of a PCIe write operation.
  • the NVMe controller writes the data of the read operation to the host through the entry.
  • the "entry" represents a range in the host's PCIe address space.
  • the host receives the data message sent by the NVMe controller from the portal, it parses the data message and obtains the first address, but does not use the storage space indicated by the first address to store the payload data, but according to the first address and The preset correspondence determines a second address of a storage unit for actually storing the payload data in its internal memory.
  • the storage unit indicated by the second address may be a memory space inside the host and is not presented through the PCIe address. That is, the internal memory used by the host to store payload data can no longer be accessed by the host through PCIe addressing, nor is it used as a command memory buffer (Command Memory Buffer (CMB)).
  • CMB Command Memory Buffer
  • the memory location indicated by the second address can serve as a read buffer for data.
  • the first address is used to indicate a specific "entry".
  • the host maps the first address to the second address. Specifically, the host may map the first address to the second address through the memory mapping table MTT.
  • the host can maintain an MTT entry for each entry. Each entry can associate a certain entry to the corresponding storage unit.
  • the storage unit can be a fixed size storage space. In the following description, the storage unit is also referred to as a read page. . Before the host triggers the read instruction, it can allocate the storage unit indicated by the second address to the read instruction, and record the correspondence between the first address and the second address through the MTT entry.
  • the entry corresponds to the read operation one by one.
  • the data to be read of the read instruction may correspond to at least two data packets according to the size of the data to be read.
  • the host may also read the data.
  • the instruction allocates at least two memory locations.
  • the present invention does not limit the read page size, but it is recommended that the read memory block of the NVMe controller includes an integer number of read pages. After the NVMe controller reads the data from the storage medium, it will be put into the error correction buffer to verify the error, and then write the error corrected data to the "entry".
  • the error correction buffer is also called read memory block.
  • the host may organize its own memory into the form of a read page pool. Before initiating a read operation, the host allocates the number of read pages required by the read operation from the read page pool and initializes the read operation corresponding to the read operation.
  • the MTT entry of the entry the MTT entry records the correspondence between the entry address and the read page address.
  • FIG. 8 is a schematic diagram of an MMT entry according to an embodiment of the present invention, and a corresponding relationship between an entry and a read page is recorded in the MTT entry.
  • the read pages corresponding to the entrance X are read page 1, read page 7, and read page 4.
  • the read pages corresponding to entry Y are read page 2, read page 13, read page 8 and read page 0.
  • the read page is a fixed-size storage space.
  • the size of the read page may be smaller than the size of the data to be read, so the read operation may require more than one read page.
  • the host can allocate at least two read pages to the read instruction when making the allocation. If the host can allocate multiple read pages to the read operation, and the second address points to one of the read pages. The host may determine the second address according to the order of the first address and the payload data in the data to be read.
  • the embodiment of the present invention does not limit the manner in which the host determines the order of the load data in the data to be read. If the NVMe controller performs order-preserving when performing a PCIe write operation, the host may determine that the load data is in the Read the order in the data. For example, the NVMe controller also sends a second data packet to the host. The second data packet carries the first address and the second payload data. The second payload data also belongs to the data to be read. The host also receives the second data packet. After the second data message is received, the order of the first load data and the second load data in the data to be read may be determined according to the order of receiving the first data message and the second data message. If the NMVe controller is out of sequence when performing a PCIe write operation, the data packet may also carry the offset of the payload data in the data to be read, which offset is used to indicate that the payload data is to be read Order in the data.
  • the NVMe controller may send data packets in a sequence-preserving or non-sequence-preserving manner.
  • the NVMe controller can support any of the following or both sequential modes:
  • the NVMe controller sends data packets in the order of monotonically increasing data offsets.
  • the host receives the payload data according to the order of the data messages.
  • no offset is required, that is, the entry width shown in FIG. 7 can be only two bits (standard specification).
  • the NVMe controller can send PCIe write transactions in any order, but the data message needs to carry a data offset.
  • the NVMe controller can process the read logical blocks in parallel, that is, NVMe can read data corresponding to different logical blocks from the storage medium and place them in different read memory blocks for verification. Because different read memory blocks take different time to complete the verification, the order in which the read memory blocks are written to the host may not be written strictly according to the order of the logical blocks. The read memory block corresponding to the first logical block may be later than the last The read memory block corresponding to the logical block is written to the target memory.
  • the NVMe controller reassembles data according to the data offset carried in the transaction message. In this mode, the data packet needs to carry a data offset, that is, the entry width shown in FIG. 7 needs to be greater than or equal to the maximum data transmission size.
  • S404 The host writes the first payload data in the first data message into the storage unit indicated by the second address.
  • the host After receiving the data message sent by the NVMe controller, the host determines the second address according to the first address, and then writes the payload data into the storage unit indicated by the second address.
  • the host after the host completes the write operation to the storage unit indicated by the second address, it can operate the data in the storage unit indicated by the second address, that is, the data can be consumed, for example, the data can be consumed. To other entities, etc.
  • the host after the data related to a certain read page is completely written into the read page, the host completes the write operation to the read page, that is, the data in the last TLP related to the read page is written into the After the page is read, the host finishes writing to the read page.
  • the size of the data to be read is 4 * P_sz, where P_sz is the size of the read page, that is, the size of the storage space.
  • the size of the read memory block is 2 * P_sz, that is, the data to be read requires two read memory blocks for data error correction check.
  • a data message is used as an example to describe the TLP.
  • the size of the payload data of each TLP is 0.5 * P_sz, that is, each data read from a memory block requires four TLPs to send.
  • the NVMe controller sequentially reads the data to be read from the storage medium to the read memory block 0 and the read memory block 1 for verification.
  • the NVMe controller can perform data verification of two read memory blocks in parallel because the verification speed of each read memory block is different.
  • the read of memory block 1 is completed before the read of memory block 0.
  • the NVMe controller first loads the data in the read memory block 1 into the TLP in order and sends it to the host through the PCIe network.
  • the data encapsulated in TLP0 and TLP1 are the data of read memory block 1, and then read the read of memory block 0 to complete the verification.
  • the NVMe controller encapsulates the data in read memory block 0 into the TLP in the order of the data. And sent to the host through the PCIe network.
  • TLP2, TLP4, TLP6, and TLP7 are packaged with data read from memory block 0.
  • the data packets received by the host may be out of order.
  • the host may determine the order of the payload data in the data to be read according to the data offset in the received TLP, and The MTT is searched according to the order and indication information of the load data in the data to be read, thereby determining the address of the read page storing the load data, and writing the load data into the corresponding read page.
  • the host writes the load data of TLP0 and TLP1 to read page 8
  • the write operation on read page 8 is completed, and the host can process the data in read page 8.
  • the host will After the load data of TLP2 and TLP4 are written to read page 2, the write operation of read page 2 is completed, and the data in read page 2 can be processed.
  • the processing performed by the host on the data stored in the read page is specifically consumption data, for example, sent to other subjects, without having to wait for the read data to be completely written before performing operations on the read data.
  • the embodiment of the present invention implements a pipelined processing mode and reduces the delay of the read operation.
  • the host can release the storage unit for other read operations.
  • the host can release the read page 8 to the read page pool for other read operations without having to wait until the entire read operation is completed and After all the data to be read is processed, the storage space occupied by the read operation can be released, thereby reducing the storage space occupation.
  • the NVMe controller After the NVMe controller completes the read operation, it is also used to trigger the completion of the queue entry CQE.
  • CQE is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
  • the host is also used to obtain the completion queue entry CQE. In the embodiment of the present invention, the host may obtain the CQE only after operating the data in the storage unit indicated by the second address.
  • the host opens the first address as a data entry address to the NVMe controller for the NVMe controller to write data to be read to the host through the first address, and the data message sent by the NVMe controller
  • the destination address carried in it is the first address, but after receiving the data message, the host does not actually write the payload data in the data message into the storage space indicated by the first address, but instead maps the first address to the first address.
  • Two addresses and write the payload data of the data message into the storage space indicated by the second address.
  • the storage space indicated by the second address may be a private memory space of the host, thereby separating the relationship between the storage space storing the payload data and the communication protocol, and the host's access to the second address is not restricted by the communication protocol.
  • the host Before the end of the read instruction, the host can use the data stored in the storage space indicated by the second address and release the storage space indicated by the second address in advance for other read operations.
  • the technical solution disclosed in the embodiment of the present invention can reduce the delay of the read operation and save the storage space for storing the data to be read.
  • FIG. 1000 is an interaction flowchart of an NVMe-based reading method according to an embodiment of the present invention.
  • an application scenario of the method 1000 is a scenario in which a data source is separated from a command source.
  • the data source needs to read the data to be read to the storage space of its data source through the command source.
  • the embodiment of the present invention is not limited to a specific scenario in which a data source is separated from a command source.
  • the scenario where the data source and the command source are separated may be a flash memory cluster (English full name: Justa Bunch) Of Flash, abbreviated based on NOF (full name in English: NVMe). JBOF).
  • the data source is a host that needs to access the storage medium
  • the command source is a NOF bridge interconnected with the host through the fabric. More specifically, the command source may be a NOF engine in the NOF bridge.
  • the NOF bridge is interconnected with the NVMe controller through the PCIe bus, and the NVMe is connected with a storage medium.
  • the scenario where the data source and the command source are separated may also be the host and the encryption accelerator.
  • the data source is the host and the command source is an encryption accelerator connected to the host. More specifically, The command source is the acceleration engine of the encryption accelerator.
  • the encryption accelerator is connected to the NVMe controller through the PCIe bus, and the NVMe controller is connected to a storage medium.
  • the SQE when the command source performs a read operation, the SQE carries indication information of the entry address of the data to be read.
  • the entry address may essentially be a PCIe address of an optional address of the NMVe controller.
  • the NVMe controller After obtaining the SQE, the NVMe controller sends a TLP to the command source through a PCIe write operation, and carries the PCIe address in the TLP.
  • the command source parses the TLP packet, obtains the PCIe address, and determines the local storage unit corresponding to the PCIe address according to the mapping relationship between the PCIe address and local memory, and then loads the data in the TLP. Write to the determined memory location.
  • An entry can correspond to multiple storage units.
  • the command source can operate the data stored in the storage unit.
  • the end of the write operation to the storage unit refers to the last corresponding to the storage unit.
  • a TLP payload data is written into the memory cell.
  • the command source obtains a part of the data to be read, it can send the obtained data to the data source. You do not need to wait for the entire read operation to be completed. You can send the data to be read to the data source.
  • the data to be read includes data 1, data 2, data 3, and data 4, data 1, data 2, data 3, and data 4 may respectively correspond to a storage unit.
  • the command source receives a storage unit, After the data, the data of the storage unit can be sent to the data source. After the command source sends the data in the storage unit to the data source, it can release the corresponding storage unit for other read operations.
  • the command source writes the received load data in the TLP into its own memory space by establishing a mapping relationship between the local memory and the PCIe address, so that the pipeline operation of the data, that is, the command
  • the source After the source receives part of the data, it can send the received data to the data source, and the data sent by the NVMe controller and the data sent to the data source can be processed in parallel, which saves storage space for cached data and speeds up Processing speed of read operations.
  • FIG. 12 is a schematic diagram of a logical structure of a computing device 1200 according to an embodiment of the present application. As shown in FIG. 12, the computing device 1200 includes:
  • the processing unit 1202 is configured to trigger a read instruction to the NVMe controller.
  • the read instruction carries instruction information, and the instruction information is used to indicate a first address, and the first address is an address that the NVMe controller can address.
  • the receiving unit 1204 is configured to receive a first data packet sent by the NVMe controller, where the first data packet carries a first address and first payload data.
  • the processing unit 1202 determines a second address according to the first address, and writes the first payload data into the storage unit indicated by the second address.
  • the second address is an address that the processing unit 1202 can address.
  • processing unit 1202 After the processing unit 1202 completes the write operation to the storage unit indicated by the second address, it is further configured to operate the data in the storage unit indicated by the second address.
  • the processing unit 1202 After the processing unit 1202 operates the data in the storage unit indicated by the second address, the processing unit 1202 is further used to obtain a completion queue entry CQE triggered by the NVMe controller, and the CQE is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
  • processing unit 1202 After the processing unit 1202 operates on the data in the storage unit indicated by the second address, it is also used to release the storage unit indicated by the second address.
  • the processing unit 1202 is further configured to allocate a second address indication storage unit for the read instruction, and record the correspondence between the first address and the second address.
  • the data to be read of the read instruction corresponds to at least two data packets, and the processing unit 1202 allocates at least two storage units for the read instruction.
  • the processing unit 1202 may determine the second address according to the order of the first address and the first payload data in the data to be read.
  • the receiving unit 1204 is further configured to receive a second data packet sent by the NVMe controller, and the second data packet carries the first address and the second payload data; the processing unit 1202 is further configured to receive the first data packet according to The order of the text and the second data message determines the order of the first payload data and the second payload data in the data to be read.
  • the first data message also carries an offset of the first payload data in the data to be read, and the offset is used to indicate the order of the first payload data in the data to be read.
  • the first address may be a PCIe address addressable by the NVMe controller
  • the first data packet is a PCIe packet
  • the storage unit indicated by the second address may be a memory space of the computing device.
  • the processing unit 1202 may be specifically implemented by the read operation logic 310 in the processor 301 in FIG. 3, or may be implemented by the processor 301 and the read operation module 306 in the system memory 302 in FIG. 3. .
  • the receiving unit 1204 may be implemented by the processor 301 and the communication interface 303 in the embodiment of FIG. 3.
  • the embodiment of the present application is an apparatus embodiment of the host corresponding to the foregoing embodiment.
  • the feature descriptions of the foregoing embodiments are applicable to the embodiments of the present application, and details are not described herein again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Information Transfer Systems (AREA)
  • Memory System (AREA)
  • Communication Control (AREA)

Abstract

一种基于NVMe的数据读取方法,装置和系统。该方法包括:主机触发读指令,读指令中携带第一地址的指示信息,第一地址为主机开放给NVMe控制器寻址访问的地址,NVMe控制器获取到读指令后,向主机发送数据报文,数据报文中携带第一地址和载荷数据,主机接收到数据报文后,根据第一地址确定第二地址,并将载荷数据存入第二地址指示的存储空间。第二地址可以为主机的私有内存地址。主机通过将第一地址映射为第二地址,割裂了第二地址与通信协议的关系,主机对第二地址的访问可以不受通信协议的约束。

Description

一种基于NVMe的数据读取方法、装置及系统 技术领域
本申请涉及存储领域,尤其涉及一种基于非易失性高速传输总线(non-volatile memory express,NVMe)的数据读取方法、装置和存储设备。
背景技术
随着存储技术的发展,尤其是在使用闪存(Flash)作为存储介质的固态硬盘(solid state drive,SSD)中,传统的机械硬盘设计的串行高级技术附件(serial advanced technology attachment,SATA)接口与串行ATA高级主控接口/高级主机控制器接口(Serial ATA Advanced Host Controller Interface,AHCI)标准已经无法满足存储设备的要求,成为限制存储设备处理能力的一大瓶颈。非易失性高速传输总线(non-volatile memory express,NVMe)应运而生,NVMe是一种允许主机(Host)和非易失性存储(non-volatile memory,NVM)子系统通信的接口,NVM子系统(包括控制器和存储介质)通信的该接口以寄存器接口的方式附加到高速外围部件互连总线(Peripheral Component Interconnect express,PCIe)接口之上,为企业级和消费级固态存储做了优化具有性能高、访问时延低的优势。
NVMe基于成对的提交队列(英文全称:submission queue,缩写SQ)和完成队列(英文全称:completion queue,缩写:CQ)机制。命令由主机放到提交队列。完成信息由控制器放到对应的完成队列。每个提交队列条目(submission queue entry,SQE)是一个命令,在读指令中,用于数据传输的内存地址通过元数据指针(英文全称:Meta-data Pointer,缩写:MPTR)和数据指针(英文全称:Data Pointer,缩写:DPTR)进行指定。NVMe控制器获取到读指令后,通过PCIe写操作将待读取数据写入用于数据传输的内存地址指示的存储空间。
发明内容
本申请公开了一种基于NVMe的数据读取方法、装置和系统。主机通过开放给NMVe控制器的入口地址接收NMVe控制器发送的数据报文,并在其内存空间为该入口地址分配对应的存储单元,接收到数据报文后,根据数据报文携带的入口地址确定与其对应的存储单元的地址,并将数据报文中的载荷数据写入确定的存储单元。从而割裂了存储单元与通信协议的关系,实现数据的灵活操作。
第一方面,本申请公开了一种基于NVMe的数据读取系统,该系统包括主机、NVMe控制器和存储介质,存储介质用于存储主机的数据,主机用于向NVMe控制器触发读指令,读指令中携带用于指示第一地址的指示信息,第一地址为NVMe控制器可寻址的地址,NVMe控制器获取到读指令后,用于从存储介质中读取读指令对应的待读取数据,并向主机发送第一数据报文,第一数据报文中携带第一地址和第一载荷数据,其中第一载荷数据属于第一数据报文,该主机接收到第一数据报文后,还用于根据第一地址确定第 二地址,并将第一载荷数据写入第二地址指示的存储单元,其中,第二地址为主机可寻址的地址。
其中,读指令可以具体为SQE,主机触发读指令的具体流程可以为主机将SQE写入SQ,并通过门铃通知NMVe控制器。第一地址为主机开放给NVMe控制器访问的地址,但第一地址只是作为NVMe控制器向主机写入载荷数据的入口地址,第一地址指示的存储空间并没有实际的存储载荷数据。主机在接收到NVMe控制器发送的数据报文后,并不是将载荷数据写入第一地址指示的存储空间,而是在其可寻址的存储空间内分配了与第一地址对应的第二地址,并将载荷数据写入第二地址指示的的存储单元。从而主机对第二地址指示的存储单元的操作就不再受主机与NMVe控制器之间通信协议的限制,本申请可以减少读操作的时延,并减少待读取数据对主机存储空间的占用。
根据第一方面,在第一方面第一种可能的实现方式中,主机完成对第二地址指示的存储单元的写操作后,还用于对第二地址指示的存储单元中的数据进行操作。完成对存储单元的写操作为将与该存储单元关联的数据全部写入该存储单元,例如可以具体为写满该存储单元或将与该存储单元关联的最后一份载荷数据写入该存储单元。对数据进行操作可以为将存储单元中的数据发送给其他主体。
第二地址指示的存储空间可以为主机的私有内存,不再通过PCIe地址的方式供NVMe控制器访问,不是也不作为命令内存缓冲区(Command memory Buffer,CMB)。主机通过第一地址和第二地址之间的映射关系,将载荷数据存入其可寻址的第二地址指示的存储单元后,割裂了主机与NVMe控制器之间的通信协议与第二地址之间的关系,主机完成对第二地址指示的存储单元的写操作后,就可以对第二地址指示的存储单元中的数据进行操作,不需要等待读指令指示的读操作完全结束后才对读取的数据进行操作。
根据第一方面第一种可能的实现方式,在第一方面第二种可能的实现方式中,NVMe控制器还用于触发完成队列条目(completion queue entry,CQE),CQE用于指示NVMe控制器完成读指令指示的读操作,主机对第二地址指示的存储单元中的数据进行操作后,还用于获取完成队列条目CQE。
NVMe控制器触发CQE可以具体为NVMe完成读操作后,将CQE写入CQ,并通过中断通知主机。基于现有协议,在触发读指令前,主机需要为读指令分配PCIe地址空间,该PCIe地址指示的存储空间用于存储待读取数据。在完成读指令之前,主机丧失该PCIe地址空间的所有权,即主机获取到CQE之前,不能访问该PCIe地址空间指示的存储空间,从而造成读操作延时和存储空间的浪费。由于第二地址不是数据报文中携带的第一地址,是主机可选址的内部地址,所以主机可以在获取CQE之前就对第二地址指示的存储单元中的数据进行操作。
根据第一方面以上任一种可能的实现方式,在第一方面第三种可能的实现方式中,主机对第二地址指示的存储单元中的数据进行操作后,还用于释放第二地址指示的存储单元。
主机可以将其内部存储空间组织成内存池的形式,该内存池包含多个存储单元,主机对存储单元中的数据完成操作后,就可以将内存单元释放到内存池中,供其他读操作使用,不必等到整个读操作解释才释放该存储单元,从而减少了对存储单元的占用时间,增大了存储空间的使用效率。
根据第一方面或第一方面以上任一种可能的实现方式,在第一方面第四种可能的实 现方式中,主机触发读指令前,还用于为读指令分配第二地址指示的存储单元,并记录第一地址与第二地址的对应关系。
主机在触发写指令前就为读指令分配好对应的存储单元可以有效避免存储空间溢出,主机可以根据维护的内存池中空闲的存储单元的数量触发读指令,实现对对读操作的有效调控。
根据第一方面或第一方面以上任一种可能的实现方式,在第一方面第五种可能的实现方式中,读指令的待读取数据对应至少两个数据报文,主机为读指令分配至少两个存储单元。
因为数据报文可携带的载荷数据的大小的限制,NVMe控制器在发送待读取数据时,可以将待读取数据拆分成多份,使用多个数据报文进行传输。主机可以根据读操作的规模分配对应数量的多个存储单元。
根据第一方面或第一方面以上任一种可能的实现方式,在第一方面第六种可能的实现方式中,主机用于根据第一地址和第一载荷数据在待读取数据中的顺序确定第二地址。
主机为读操作分配了多个存储单元,即第一地址对应多个存储单元,主机接收到第一数据报文后,需要为第一载荷数据确定具体的存储单元。主机可以为该读操作分配的多个存储单元进行逻辑编址,并将待读取数据依次写入该多个存储单元。主机具体可以根据第一载荷数据在待读取数据中的顺序确定该第一载荷数据需要写入的存储单元。
根据第一方面或第一方面以上任一种可能的实现方式,在第一方面第七种可能的实现方式中,NVMe控制器还用于向主机发送第二数据报文,第二数据报文中携带第一地址和第二载荷数据,待读取数据包含第二载荷数据,主机还用于接收第二数据报文,并根据接收第一数据报文和第二数据报文的顺序确定第一载荷数据和第二载荷数据在待读取数据中的顺序。
因为每个数据报文可以携带的载荷数据的大小受限,NVMe控制器需要将待读取数据分为多个数据报文传输,主机在接收到数据报文后,需要对数据报文中携带的载荷数据进行重新排序,如果NVMe控制器发送数据报文的时候是按照载荷数据在待读取数据中的顺序严格保序发送的,则主机可以根据接收到数据报文的顺序对载荷数据进行排序。
根据第一方面或第一方面以上任一种可能的实现方式,在第一方面第八种可能的实现方式中,第一数据报文中还携带第一载荷数据在待读取数据中的偏移量,偏移量用于指示第一载荷数据在待读取数据中的顺序。
通过在数据报文中携带载荷数据在待读取数据中的偏移量,NVMe控制器可以实现对数据报文的乱序传输,能够更大限度的利用带宽资源。
根据第一方面或第一方面以上任一种可能的实现方式,在第一方面第八种可能的实现方式中,第一地址为NVMe控制器可寻址的PCIe地址,第一数据报文为PCIe报文;第二地址指示的存储单元为主机的内存空间。
第二方面,本申请公开了一种基于NVMe的数据读取方法,该方法包括:主机触发读指令,读指令中携带指示信息,指示信息用于指示第一地址,第一地址为NVMe控制器可寻址的地址;主机接收NVMe控制器发送的第一数据报文,第一数据报文中携带第一地址和第一载荷数据;主机根据第一地址确定第二地址,第二地址为主机可寻址的地址;主机将第一载荷数据写入第二地址指示的存储单元。
根据第二方面,在第二方面第一种可能的实现方式中,主机完成对第二地址指示的存储单元的写操作后,该方法还包括:主机对第二地址指示的存储单元中的数据进行操作。
根据第二方面第一种可能的实现方式,在第二方面第二种可能的实现方式中,主机对第二地址指示的存储单元中的数据进行操作后,该方法还包括:主机获取NVMe控制器触发的完成队列条目CQE,CQE用于指示NVMe控制器完成读指令指示的读操作。
根据第二方面以上任一种可能的实现方式,在第二方面第三种可能的实现方式中,主机对第二地址指示的存储单元中的数据进行操作后,该方法还包括:主机释放第二地址指示的存储单元。
根据第二方面或第二方面以上任一种可能的实现方式,在第二方面第四种可能的实现方式中,主机触发读指令前,该方法还包括:主机为读指令分配第二地址指示的存储单元,并记录第一地址与第二地址的对应关系。
根据第二方面或第二方面以上任一种可能的实现方式,在第二方面第五种可能的实现方式中,读指令的待读取数据对应至少两个数据报文,主机为读指令分配至少两个存储单元。
根据第二方面或第二方面以上任一种可能的实现方式,在第二方面第六种可能的实现方式中,主机根据第一地址和第一载荷数据在待读取数据中的顺序确定第二地址。
根据第二方面或第二方面以上任一种可能的实现方式,在第二方面第七种可能的实现方式中,该方法还包括:主机接收NVMe控制器发送的第二数据报文,第二数据报文中携带第一地址和第二载荷数据;主机根据接收第一数据报文和第二数据报文的顺序确定第一载荷数据和第二载荷数据在待读取数据中的顺序。
根据第二方面或第二方面以上任一种可能的实现方式,在第二方面第八种可能的实现方式中,第一数据报文中还携带第一载荷数据在待读取数据中的偏移量,偏移量用于指示第一载荷数据在待读取数据中的顺序。
根据第二方面或第二方面以上任一种可能的实现方式,在第二方面第八种可能的实现方式中,第一地址为NVMe控制器可寻址的PCIe地址,第一数据报文为PCIe报文;第二地址指示的存储单元为主机的内存空间。
第二方面为第一方面对应的方法实现方式,第一方面或第一方面任一种可能的实现方式中的描述对应适用于第二方面或第二方面任一种可能的实现方式,在此不再赘述。
第三方面,本申请提供了一种可读介质,包括执行指令,当计算设备的处理器执行该执行指令时,该计算设备执行以上第二方面或以上第二方面的任一种可能的实现方式中的方法。
第四方面,本申请提供了一种计算设备,包括:处理器、存储器和总线;存储器用于存储执行指令,处理器与存储器通过总线连接,当计算设备运行时,处理器执行存储器存储的执行指令,以使计算设备执行以上第二方面或以上第二方面的任一种可能的实现方式中的方法。
第五方面,本申请公开了一种基于NVMe的数据读取装置,该装置包括:处理单元,用于触发读指令,读指令中携带指示信息,指示信息用于指示第一地址,第一地址为NVMe控制器可寻址的地址;接收单元,用于接收NVMe控制器发送的第一数据报文,第一数据报文中携带第一地址和第一载荷数据;该处理单元还用于根据第一地址确定第二地 址,并将第一载荷数据写入第二地址指示的存储单元,第二地址为处理单元可寻址的地址。
根据第五方面,在第五方面第一种可能的实现方式中,处理单元完成对第二地址指示的存储单元的写操作后,还用于对第二地址指示的存储单元中的数据进行操作。
根据第五方面第一种可能的实现方式,在第五方面第二种可能的实现方式中,处理单元对第二地址指示的存储单元中的数据进行操作后,还用于获取NVMe控制器触发的完成队列条目CQE,CQE用于指示NVMe控制器完成读指令指示的读操作。
根据第五方面以上任一种可能的实现方式,在第五方面第三种可能的实现方式中,处理单元对第二地址指示的存储单元中的数据进行操作后,还用于释放第二地址指示的存储单元。
根据第五方面或第五方面以上任一种可能的实现方式,在第五方面第四种可能的实现方式中,处理单元触发读指令前,还用于为读指令分配第二地址指示的存储单元,并记录第一地址与第二地址的对应关系。
根据第五方面或第五方面以上任一种可能的实现方式,在第五方面第五种可能的实现方式中,读指令的待读取数据对应至少两个数据报文,处理单元为读指令分配至少两个存储单元。
根据第五方面或第五方面以上任一种可能的实现方式,在第五方面第六种可能的实现方式中,处理单元用于根据第一地址和第一载荷数据在待读取数据中的顺序确定第二地址。
根据第五方面或第五方面以上任一种可能的实现方式,在第五方面第七种可能的实现方式中,接收单元还用于接收NVMe控制器发送的第二数据报文,第二数据报文中携带第一地址和第二载荷数据;处理单元还用于根据接收第一数据报文和第二数据报文的顺序确定第一载荷数据和第二载荷数据在待读取数据中的顺序。
根据第五方面或第五方面以上任一种可能的实现方式,在第五方面第八种可能的实现方式中,第一数据报文中还携带第一载荷数据在待读取数据中的偏移量,偏移量用于指示第一载荷数据在待读取数据中的顺序。
根据第五方面或第五方面以上任一种可能的实现方式,第一地址为NVMe控制器可寻址的PCIe地址,第一数据报文为PCIe报文;第二地址指示的存储单元为装置的内存空间。
第五方面为第一方面对应的装置实现方式,第一方面或第一方面任一种可能的实现方式中的描述对应适用于第五方面或第五方面任一种可能的实现方式,在此不再赘述。
根据本申请公开的技术方案,主机将第一地址作为数据入口地址开放给NVMe控制器,供NVMe控制器通过第一地址向主机写入待读取数据,NVMe控制器发送的数据报文中携带的目的地址为第一地址,但主机接收到数据报文后,并没有将数据报文中的载荷数据真正的写入第一地址指示的存储空间,而是将第一地址映射为第二地址,并将数据报文的载荷数据写入第二地址指示的存储空间。其中,第二地址指示的存储空间可以为主机的私有内存空间,从而割裂了存储载荷数据的存储空间与通信协议之间的关系,主机对第二地址的访问不受通 信协议的限制。主机在读指令结束前,可以使用第二地址指示的存储空间中存储的数据,并提前释放第二地址指示的存储空间供其他读操作使用。本申请公开的技术方案可以减少读操作的时延,并节省用于存储待读取数据的存储空间。
附图说明
图1为依据本申请一实施例的NVMe系统的逻辑结构示意图;
图2为一种基于NVMe标准的数据读取方法的信令图;
图3为依据本申请一实施例的主机的硬件结构示意图;
图4为依据本申请一实施例的基于NMVe的数据读取方法的流程示意图。
图5为依据本发明一实施例的入口组织结构示意图;
图6为依据本发明一实施例的入口组织结构示意图;
图7为依据本发明一实施例的PCIe地址结构示意图;
图8为依据本发明一实施例的地址映射关系示意图;
图9为依据本发明一实施例的数据报文传输示意图;
图10为依据本发明一实施例的基于NVMe的数据读取方法的信令图;
图11(a)为依据本申请一实施例的NVMe系统的逻辑结构示意图;
图11(b)为依据本申请一实施例的NVMe系统的逻辑结构示意图;
图12为依据本申请一实施例的计算设备的逻辑结构示意图。
具体实施方式
下面将结合附图,对本发明实施例进行描述。
本发明实施例采用术语第一和第二等来区分各个对象,例如第一地址和第二地址等,但各个“第一”和“第二”之间不具有逻辑或时序上的依赖关系。
在本发明实施例中,“数据报文”是指NVMe控制器向主机发送的携带载荷数据的数据包。此处的载荷数据可以是用户数据或者用户数据的元数据,本发明实施例不限定载荷数据的类型。在以下描述中,除非另有说明,本发明实施例使用“数据”或者“载荷数据”一词来表示数据报文中携带的各类数据。在本发明实施例中,数据报文可以为PCIe报文。
在本发明实施例中,入口为主机向NVMe控制器开放的地址空间,入口地址可以具体为PCIe地址,数据报文可以为PCIe写报文。NVMe控制器通过入口向主机发送数据报文,数据报文中携带入口地址。主机接收到数据报文后,识别入口地址,在本地的内部存储器中为该入口分配对应的存储空间,并将数据报文携带的载荷数据写入分配的存储空间进行缓存,而不是将载荷数据写入入口地址指示的存储空间。内部存储器可以具体 为主机的私有内存空间。
在本发明实施例中,读操作可以为NVMe命令集中主机从NVMe控制器读取数据的任何操作。指示读操作的指令为读指令。读指令的具体实现方式可以为提交队列条目。
在本发明实施例中,命令发起者和数据发起者可以是相同或者相互分离的主体。命令发起者为直接向NVMe控制器触发指令的系统主体,在本发明实施例中也称为命令源。数据发起者为需要读取数据并消费数据的系统主体,即用于发起数据访问请求的系统主体,在本发明实施例中也称为数据源。在分离场景下,数据源需要通过命令源读取数据。在本发明实施例中,“主机”一词可以指代数据源与命令源分离场景下的命令源或者二者未分离场景下与NMVe控制器通信的计算设备。
在传统方式中,主机在进行NVMe读操作时,在触发的SQE中通过DPTR或MPTR携带用于存放待读取数据的存储空间的地址信息,NVMe控制器根据根据SQE将待读取数据写入该地址信息指示的存储空间。在主机提交SQE和获取到NVMe控制器用于指示读操作完成的完成队列条目之间的时间段内,主机丧失了用于存储待读取数据的存储空间的所有权,即主机需要等待读操作完全结束,才可以访问该存储空间存储的数据。
图1为依据本发明一实施例的NVMe系统100的架构图,如图1所示,系统100中数据源101和命令源103不是同一主体,二者相互分离,通过网络102互联。命令源103可以通过PCIe总线与NVMe控制器105互联,NVMe控制器105连接有存储介质106。
在本发明实施例中,存储介质106一般也称为外存,一般为非易失存储介质,可以用于永久性存储数据。存储介质106可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如光盘)、或者半导体介质(例如闪存(Flash)等,本发明实施例不限定存储介质106的具体实现形式。在一些实施例中,存储介质106还可能进一步包括与NVMe控制器105分离的远程存储器,例如通过网络与NVMe控制器105互联的网络存储介质。
在本发明实施例中,网络102可以用于指代数据源101与命令源103互联的任意方式或互联协议等,例如可以为PCIe总线,计算机设备内部互联总线,因特网,内联网(英文:intranet),局域网(英文全称:local area network,缩写:LAN),广域网络(英文全称:wide area network,缩写:WAN),存储区域网络(英文全称:storage area network,缩写:SAN)等,或者以上网络的任意组合。
在系统100中,数据源101与NVMe控制器105需要经过命令源103进行通信。在传统方式中,命令源103触发的读指令需要携带用于存储待读取数据的存储空间的地址信息。当进行读操作时,需要首先将待读取数据从NVMe控制器105控制的存储介质106中完全转移至命令源103,当获取到指示读操作结束的CQE后,命令源103才可以把数据发送给数据源101。
具体如图2所示,基于传统方式,当数据源需要从存储介质读取数据时,数据源首先向命令源发送读请求。命令源根据从数据源接收到的读请求向提交队列(英文全称:submission queue,缩写SQ)写入SQE,并通过SQE的DPTR或MPTR字段携带用于接收待读取数据的地址信息。命令源随后通过门铃机制通知NVMe控制器有新的SQE,NVMe控制器接收到门铃后,去SQ中读取该SQE,并根据SQE中携带的地址信息使用PCIe写指令将待读取数据完全写入该地址信息指示的存储空间。在读操作完成后,NVMe控制器向完成队列(英文全称:completion queue,缩写:CQ)写入CQE,并通过中断机制通知命令源,命令源处理中断,获取CQE,并向数据源发送待读取数据。
由图2可知,在命令源发起读指令前,需要准备好用于接收待读取数据的存储空间,并在获取到CQE前丧失了这部分存储空间的所有权,即需要等待待读取数据完全写入该存储空间后,才可以将数据发送给数据源。这一过程的延迟与待读取数据的大小成正比。此外,命令源需要大量的NVMe控制器可寻址的内存空间存储待读取数据,且命令源为待读取数据分配内存至获取到NVMe控制器CQE释放内存之间的时间段这一部分内存空间一直被占用。
图3为依据本申请一实施例的主机300的结构示意图。
如图3所示,主机300包括处理器301,处理器301与系统内存302连接。处理器301可以为中央处理器(CPU),图像处理器(英文:graphics processing unit,GPU),现场可编程门阵列(英文全称:Field Programmable Gate Array,缩写:FPGA),专用集成电路(英文全称:Application Specific Integrated Circuit,缩写:ASIC)或数字信号处理器(英文:digital signal processor,DSP)等计算逻辑或以上任意计算逻辑的组合。处理器301可以为单核处理器或多核处理器。
在本发明实施例中,处理器301内部还可以包含寄存器,该寄存器的地址信息可以作为PCIe地址开放给NMVe控制器。
在本申请的一个实施例中,处理器301还可以包括读操作逻辑310,读操作逻辑310可以为具体的硬件电路或集成在处理器301中的固件模块。如果读操作逻辑310为具体的硬件电路,则读操作逻辑310执行本申请实施例的方法,如果读操作逻辑310为固件模块,则处理器310执行读操作逻辑310中的固件代码来实现本申请实施例的技术方案。读操作逻辑310包括:(1)用于触发读指令的逻辑(电路/固件代码),其中,读指令中携带指示信息,该指示信息用于指示NVMe控制器可寻址的第一地址;(2)用于接收NVMe控制器发送的数据报文的逻辑(电路/固件代码),数据报文中携带第一地址和载荷数据;(3)用于根据第一地址确定该主机可寻址的第二地址的逻辑(电路/固件代码);(4)用于将载荷数据写入第二地址指示的存储单元的逻辑(电路/固件代码)。
总线309用于在主机300的各部件之间传递信息,总线309可以使用有线的连接方式或采用无线的连接方式,本申请并不对此进行限定。总线309还连接有输入/输出接口305和通信接口303。
输入/输出接口305连接有输入/输出设备,用于接收输入的信息,输出操作结果。输入/输出设备可以为鼠标、键盘、显示器、或者光驱等。
通信接口303用来实现与其他设备或网络之间的通信,通信接口303可以通过有线或者无线的形式与其他设备或网络互联。例如,主机300可以通过通信接口303与NVMe控制器互联,主机300还可以通过通信接口303与网络互联,并通过网络连接NVMe控制器。
本申请实施例的一些特征可以由处理器301执行系统内存302中的软件代码来完成/支持。系统内存302可以包括一些软件,例如,操作系统308(例如Darwin、RTXC、LINUX、UNIX、OS X、WINDOWS、macOS或嵌入式操作系统(例如Vxworks)),应用程序307,和读操作模块306等。
在本申请的一个实施例中,处理器301执行读操作模块306来实现本申请实施例的技术方案。读操作模块306包括:(1)用于触发读指令的代码,其中,读指令中携带指示信息,该指示信息用于指示NVMe控制器可寻址的第一地址;(2)用于接收NVMe 控制器发送的数据报文的代码,数据报文中携带第一地址和载荷数据;(3)用于根据第一地址确定该主机可寻址的第二地址的代码;(4)用于将载荷数据写入第二地址指示的存储单元的代码。
此外,图3仅仅是一个主机300的例子,主机300可能包含相比于图3展示的更多或者更少的组件,或者有不同的组件配置方式。同时,图3中展示的各种组件可以用硬件、软件或者硬件与软件的结合方式实施。
为了降低读指令的延迟和节省读操作占用的内存空间,本发明实施例提供了一种基于NVMe的数据读取方法,如图4所示,方法400包括:
步骤401:主机触发读指令,该读指令中携带指示信息,该指示信息用于指示NVMe控制器可寻址的第一地址。
在本发明实施例中,读指令可以具体为SQE,在以下描述中,以读指令为SQE进行举例说明,但应理解,本发明实施例并不限定第一读指令的具体实现形式。
主机向NVMe控制器触发读指令的流程可以参照NMVe标准,具体为,主机将SQE写入SQ,并通过门铃通知NVMe控制器有新的SQE,NVMe控制器根据门铃去SQ获取SQE。本发明实施例中,主机还可以直接向NVMe控制器推送SQE,本发明实施例不限定主机向NVMe控制器触发读指令的具体流程。
在本发明实施例中,主机可以将其一部分存储空间开放给NVMe控制器,更具体的,主机可以将其一部分存储空间作为PCIe存储空间开放给NVMe控制器,NVMe控制器可以根据PCIe地址访问该部分存储空间。以基地址寄存器(Base Address Register,BAR)进行举例说明,主机将BAR作为PCIe存储空间开放给NVMe控制器,并将BAR的一部分PCIe地址组织成多个入口(portal)的形式,每一个入口占据特定的NVMe控制器可寻址的PCIe地址空间,入口地址可以为入口的起始PCIe地址。主机触发的读指令中携带的指示信息可以用于指示具体入口,第一地址可以为入口地址或者入口地址的部分字段。入口即NVMe控制器向主机进行PCIe写操作的数据入口,在下面的描述中,会对入口的功能进行更详细的描述。本发明不限定PCIe地址空间里的入口的组织方式,只需要保证每个入口和具体的读操作唯一对应。
在本发明实施例中,主机可以将其基地址寄存器的一部分PCIe地址组织成通孔(aperture)的形式,每一个通孔中包含多个入口(portal),即入口的组织可以用数组的形式,通过数组基地址加入口偏移量寻址到入口,这个数组称为通孔。如图5所示,图5为基地址寄存器的结构示意图,每个通孔由一组入口P 0~P N组成,每个入口唯一地关联到具体的读操作。“唯一”是指主机在任一时刻只能发起一个NVMe读操作关联到一个特定的入口。
更具体的,如图6所示,在本发明实施例中,通孔可以分为元数据通孔和数据通孔。NVMe控制器通过PCIe写操作将数据通过数据通孔包含的入口DP 0~DP N写入到主机,将元数据通过元数据通孔包含的入口MP 0~MP N写入到主机。为了描述方便,在以下描述中,除非另有说明,本发明实施例将元数据和数据统称为数据。
图7为依据本发明一实施例的PCIe数据报文中的PCIe地址结构。如图7所示,PCIe地址结构中包含BAR的基地址、通孔偏移量以及入口偏移量。其中,BAR和通孔偏移量用于唯一的确定通孔,入口偏移量用于指示该通孔中具体的入口。
在本发明实施例中,入口还可以任意分布在PCIe地址空间,在PCIe空间中任意分 布的入口称为任意的“数据入口”和“元数据入口”。
在本发明实施例中,指示信息用于指示具体的入口,NVMe控制器可以根据指示信息唯一的确定一个入口。本发明不限定指示信息的具体形式。
在本发明实施例中,该指示信息可以为显示地址,指示信息可以为入口具体的PCIe地址或者入口地址的部分字段,即该指示信息可以为第一地址或者第一地址的部分字段。例如,如果入口被组织成数组的形式,则指示信息可以为入口的入口偏移量,BAR的基地址和通孔偏移量可以作为主机的配置信息供NVMe控制器获取。NVMe控制器可以根据指示信息确定入口的完整PCIe地址。则在这种情况下,SQE的格式可以与NVMe标准规定的一致。
在本发明实施例中,该指示信息还可以为隐式地址,例如,如果一个SQ中的每个SQE有各自独特的命令标识CID,则指示信息可以由“队列ID+CID”组成。如果在NVMe控制器所处理的每个SQE的CID都是唯一的,则指示信息可以为对应SQE携带的CID。在其他实现方式中,指示信息还可以为CID的一部分。在本发明实施例中,指示信息还可以使用特别定义的MPTR或者PRT或者SQE中其他字段指定。本发明实施例不限定指示信息的具体实现方式。NVMe控制器可以维护有指示信息与入口地址的映射关系,并根据映射关系和指示信息唯一的确定入口地址。例如,指示标识为SQE的CID,而系统CID的编码方式和入口偏移量的编址方式相同,CID与入口偏移量一一对应,BAR的基地址和通孔偏移量可以作为主机的配置信息供NVMe控制器获取,NVMe控制器可以根据指示信息与入口地址的映射关系确定数据报文的第一地址。
本发明实施例不限定指示信息的具体实现,只要NVMe控制器能够根据指示信息确定第一地址即可。第一地址用于指示与读操作对应的入口,具体为入口地址或者入口地址的部分字段。
步骤402:主机接收NVMe控制器发送的第一数据报文,其中,该第一数据报文中携带第一地址和第一载荷数据。
在本发明实施例中,数据报文可以为PCIe写操作报文,更具体的,数据报文可以是事务层包(transaction layer packet,TLP),载荷数据可以为TLP中携带的负荷(payload),该第一地址可以为TLP中的PCIe地址或者TLP中的PCIe地址的一部分。
NVMe控制器维护有指示信息与第一地址之间的映射关系。该第一地址可以具体为对读操作对应的入口地址,NVMe控制器获取到读指令后,根据指示信息的具体实现,确定该第一地址,并根据读指令从存储介质中读取数据,根据第一地址和读取的数据封装TLP,并将TLP发送至主机。
步骤403:主机根据第一地址确定主机可寻址的第二地址。
第一地址用于指示PCIe写操作的入口。NVMe控制器通过入口将读操作的数据写入主机,“入口”代表主机的PCIe地址空间中一个范围。当主机从入口收到NVMe控制器发送的数据报文后,解析该数据报文并获取第一地址,但是并不使用第一地址指示的存储空间存储该载荷数据,而是根据第一地址和预设的对应关系在其内部存储器中确定用于实际存储载荷数据的存储单元的第二地址。
第二地址指示的存储单元可以是主机内部的内存空间,不通过PCIe地址呈现出去。即主机用于存储载荷数据的内部存储器可以不再通过PCIe寻址的方式供主机访问,不是也不作为命令内存缓冲区(Command memory Buffer,CMB)。第二地址指示的存储单 元可以充当数据的读缓冲区。
第一地址用于指示一个具体的“入口”,主机从入口接收到数据报文后,将第一地址映射到第二地址。具体的,主机可以通过内存映射表MTT将第一地址映射到第二地址。主机可以为每个入口维护一个MTT表项,每个表项可以将某一个入口关联到对应的存储单元,存储单元可以为固定大小的存储空间,在以下描述中,存储单元也称作为读页面。主机触发读指令前,就可以为读指令分配第二地址指示的存储单元,并通过MTT表项记录第一地址与第二地址的对应关系。
在本发明实施例中,入口与读操作一一对应,在一次读操作中,根据待读取数据的大小,读指令的待读取数据可以对应至少两个数据报文,主机也可以为读指令分配至少两个存储单元。
本发明不限定读页面大小,但是建议NVMe控制器的读内存块包含整数个读页面。NVMe控制器从存储介质读取到数据之后会放到纠错缓冲区进行校验纠错,然后将纠错后的数据写到“入口”,纠错缓冲区也称为读内存块。
在本发明实施例中,主机可以将自己的内存组织成读页面池的形式,在发起读操作之前,主机从读页面池中分配读操作所需数量的读页面,并初始化与该读操作对应的入口的MTT表项,该MTT表项记录有入口地址与读页面地址之间的对应关系。如图8所示,图8为依据本发明一实施例的MMT表项示意图,MTT表项中记录有入口与读页面的对应关系。如图所示,入口X对应的读页面为读页面1,读页面7和读页面4。入口Y对应的读页面为读页面2,读页面13,读页面8和读页面0。
在本发明实施例中,读页面是固定大小的存储空间。读页面的大小可能小于待读取数据的大小,所以读操作可能需要不止一个读页面。主机在进行分配时,可以给读指令分配至少两个读页面。如果主机可以给读操作分配多个读页面,而第二地址指向其中的一个读页面。主机可以根据第一地址和载荷数据在待读取数据中的顺序确定该第二地址。
本发明实施例不限定主机确定载荷数据在待读取数据中的顺序的方式,如果NVMe控制器在进行PCIe写操作时是保序的,主机可以根据接收数据报文的顺序确定载荷数据在待读取数据中的顺序。例如,NVMe控制器还向主机发送第二数据报文,第二数据报文中携带第一地址和第二载荷数据,第二载荷数据也属于待读取数据,主机还接收第二数据报文到第二数据报文后,可以根据接收第一数据报文和第二数据报文的顺序确定第一载荷数据和第二载荷数据在待读取数据中的顺序。如果NMVe控制器在进行PCIe写操作时是不保序的,则数据报文中还可以携带载荷数据在待读取数据中的偏移量,该偏移量用于指示载荷数据在待读取数据中的顺序。
在本发明实施例中,NVMe控制器可以采用保序或者不保序的方式对数据报文进行发送。例如,NVMe控制器可以支持下列任意一种或者同时支持两种顺序模式:
“严格”模式:
这种模式下,NVMe控制器根据数据偏移单调递增的顺序发送数据报文。主机根据数据报文的顺序接收载荷数据。在这一种模式下,不需要偏移量,即图7所示的入口宽度可以只为两个bit(标准规定)。
“宽松”模式:
这宽松模式下,NVMe控制器可以以任意顺序发送PCIe写事务,但数据报文中需要 携带数据偏移量。本发明实施例中,NVMe控制器可以并行的处理读操作的逻辑块,即NVMe可以将不同逻辑块对应的数据从存储介质中读出,分别放在不同的读内存块中进行校验。因为不同的读内存块完成校验的时间不同,所以读内存块写入主机的顺序可能不是严格按照逻辑块的顺序进行写入的,第一个逻辑块对应的读内存块可能晚于最后一个逻辑块对应的读内存块写入目标内存。NVMe控制器根据事务报文中携带的数据偏移量重组数据。在这种模式下,数据报文中需要携带数据偏移量,即图7所示的入口宽度需要大于或者等于最大数据传输大小。
S404:主机将第一数据报文中的第一载荷数据写入第二地址指示的存储单元。
主机接收到NVMe控制器发送的数据报文,根据第一地址确定第二地址后,便将载荷数据写入第二地址指示的存储单元。
在本发明实施例中,主机完成对第二地址指示的存储单元的写操作后,就可以对该第二地址指示的存储单元中的数据进行操作,即可以消费该数据,例如,可以将数据发给其他实体等。在本发明实施例中,当与某一个读页面相关的数据完全被写入该读页面后,主机完成对该读页面的写操作,即与读页面相关的最后一个TLP中的数据写入该读页面后,主机完成对该读页面的写操作。
在本发明实施例中,一次读操作可以有多个读页面,主机在完成整个读操作之前,可以完成对某些读页面的写操作,主机完成对一个读页面的写操作后,就可以使用该读页面中的数据,不需要等待整个读操作完成。
如图9所示,待读取数据的大小为4*P_sz,其中,P_sz为读页面的大小,即存储空间的大小。读内存块的大小的大小为2*P_sz,即待读取数据需要两个读内存块进行数据的纠错校验。本发明实施例以数据报文为TLP进行举例说明,每个TLP的载荷数据的大小为0.5*P_sz,即每个读内存块的数据需要四个TLP进行发送。如图所示,NVMe控制器将待读取数据按顺序依次从存储介质读取至读内存块0和读内存块1进行校验。NVMe控制器可以并行的进行两个读内存块的数据校验,因为每个读内存块的校验速度不同,在本发明实施例中,读内存块1先于读内存块0校验完成,NVMe控制器首先按照顺序将读内存块1中的数据依次分装到TLP中,并通过PCIe网络发送至主机。如图所示,TLP 0和TLP 1中封装的数据为读内存块1的数据,随后读内存块0完成校验,NVMe控制器按照数据的顺序将读内存块0中的数据封装至TLP,并通过PCIe网络发送至主机。如图所述,TLP 2,TLP 4,TLP 6和TLP 7内封装的是读内存块0的数据。如图所示,本发明实施例中,主机接收到的数据包可以是乱序的,主机可以根据接收到的TLP中的数据偏移量来确定载荷数据在待读取数据中的顺序,并根据载荷数据在待读取数据中的顺序和指示信息查找MTT,从而确定存储载荷数据的读页面的地址,并将载荷数据写入对应的读页面。如图所示,主机将TLP0和TLP 1的载荷数据写入读页面8后,就完成了对读页面8的写操作,主机就可以对读页面8中的数据进行处理,同理,主机将TLP2和TLP 4的载荷数据写入读页面2后,就完成了对读页面2的写操作,就可以读读页面2中的数据进行处理。在本发明实施例中,主机对读页面中存储的数据进行处理具体为消费数据,例如发送给其他主体,而不必等到待读取数据完全写入后再可以对待读取数据进行操作。本发明实施例实现一种流水线的处理方式,减小了读操作的时延。
在本发明实施例中,主机完成对第二地址指示的存储单元的处理后,就可以释放该存储单元,以供其他的读操作使用。例如在图9实施例中,主机完成对读页面8中存储 的数据的处理后,就可以将读页面8释放到读页面池,以供其他读操作使用,而不必等到整个读操作完成并对所有待读取数据的处理后,才可以释放释放读操作占用的存储空间,从而减少了存储空间的占用。
NVMe控制器完成读操作后,还用于触发完成队列条目CQE,CQE用于指示NVMe控制器完成了读指令指示的读操作,主机还用于获取完成队列条目CQE。本发明实施例中,主机可以在对第二地址指示的存储单元中的数据进行操作后,才获取CQE。
根据本发明实施例公开的技术方案,主机将第一地址作为数据入口地址开放给NVMe控制器,供NVMe控制器通过第一地址向主机写入待读取数据,NVMe控制器发送的数据报文中携带的目的地址为第一地址,但主机接收到数据报文后,并没有将数据报文中的载荷数据真正的写入第一地址指示的存储空间,而是将第一地址映射为第二地址,并将数据报文的载荷数据写入第二地址指示的存储空间。其中,第二地址指示的存储空间可以为主机的私有内存空间,从而割裂了存储载荷数据的存储空间与通信协议之间的关系,主机对第二地址的访问不受通信协议的限制。主机在读指令结束前,可以使用第二地址指示的存储空间中存储的数据,并提前释放第二地址指示的存储空间供其他读操作使用。本发明实施例公开的技术方案可以减少读操作的时延,并节省用于存储待读取数据的存储空间。
图1000为依据本发明一实施例的一种基于NVMe的读取方法的交互流程图。如图1000所示,方法1000的应用场景为数据源与命令源分离的场景。数据源需要将待读取数据通过命令源读取至其数据源的存储空间。本发明实施例不限定数据源与命令源分离的具体场景。
举例而言,在本发明实施例中,数据源和命令源分离的场景可以为基于NOF(英文全称:NVMe over fabric,缩写:NOF)的闪存簇(英文全称:Just a Bunch Of Flash,缩写:JBOF)。如图11(a)所示,数据源为需要访问存储介质的主机,命令源为与主机通过fabric互联的NOF桥,更具体的,命令源可以为NOF桥中的NOF引擎。NOF桥通过PCIe总线与NVMe控制器互联,NVMe连接有存储介质。
在本发明实施例中,数据源和命令源分离的场景还可以为主机与加密加速器,如图11(b)所示,数据源为主机,命令源为与主机互联的加密加速器,更具体的,命令源为加密加速器的加速引擎。加密加速器通过PCIe总线与NVMe控制器互联,NVMe控制器连接有存储介质。
在本发明实施例中,命令源在进行读操作的时候,会在SQE中携带待读取数据的入口地址的指示信息,入口地址本质上可以是一段NMVe控制器可选址的PCIe地址。NVMe控制器在获取到SQE后,会通过PCIe写操作向命令源发送TLP,并在TLP中携带该PCIe地址。命令源在接收到TLP包后,解析TLP包,获取该PCIe地址,并根据该PCIe地址与本地内存之间的映射关系,确定于该PCIe地址对应的本地存储单元,然后把TLP中的载荷数据写入到确定的存储单元中。一个入口可以对应多个存储单元,只要对存储单元的写操作结束,命令源就可以对该存储单元中存储的数据进行操作,对存储单元的写操作结束是指将与该存储单元对应的最后一个TLP的载荷数据写入该存储单元。命令源在获取到待读取数据的部分数据后,就可以向数据源发送获取到的数据,不需要等待整 个读操作完全结束从可以向数据源发送待读取数据。如图10所示,待读取数据包含数据1,数据2,数据3和数据4,数据1,数据2,数据3和数据4可以分别对应一个存储单元,当命令源接收到一个存储单元的数据后,就可以将该存储单元的数据发送至数据源。命令源在将存储单元中的数据发送至数据源后,就可以释放对应的存储单元,以供其他读操作使用。
根据本发明实施例公开的技术方案,命令源通过建立本地内存与PCIe地址的映射关系,将接收到的TLP中的载荷数据写入自己的内存空间,从而可以实现对数据的流水线操作,即命令源接收到部分数据后,就可以向数据源发送接收到的数据,且接收NVMe控制器发送的数据和向数据源发送数据可以并行处理,从而节省了用于缓存数据的存储空间,且加快了读操作的处理速度。
图12为依据本申请一实施例的一种计算设备1200的逻辑结构示意图,如图12所示,计算设备1200包括:
处理单元1202,用于向NVMe控制器触发读指令,读指令中携带指示信息,指示信息用于指示第一地址,第一地址为NVMe控制器可寻址的地址。
接收单元1204,用于接收NVMe控制器发送的第一数据报文,第一数据报文中携带第一地址和第一载荷数据。
处理单元1202根据第一地址确定第二地址,并将第一载荷数据写入第二地址指示的存储单元,第二地址为处理单元1202可寻址的地址。
可选的,处理单元1202完成对第二地址指示的存储单元的写操作后,还用于对第二地址指示的存储单元中的数据进行操作。
处理单元1202对第二地址指示的存储单元中的数据进行操作后,还用于获取NVMe控制器触发的完成队列条目CQE,CQE用于指示NVMe控制器完成读指令指示的读操作。
处理单元1202对第二地址指示的存储单元中的数据进行操作后,还用于释放第二地址指示的存储单元。
处理单元1202触发读指令前,还用于为读指令分配第二地址指示的存储单元,并记录第一地址与第二地址的对应关系。
可选的,读指令的待读取数据对应至少两个数据报文,处理单元1202为读指令分配至少两个存储单元。
处理单元1202可以根据第一地址和第一载荷数据在待读取数据中的顺序确定第二地址。
可选的,接收单元1204还用于接收NVMe控制器发送的第二数据报文,第二数据报文中携带第一地址和第二载荷数据;处理单元1202还用于根据接收第一数据报文和第二数据报文的顺序确定第一载荷数据和第二载荷数据在待读取数据中的顺序。
可选的,第一数据报文中还携带第一载荷数据在待读取数据中的偏移量,偏移量用于指示第一载荷数据在待读取数据中的顺序。
在本发明实施例中,第一地址可以为NVMe控制器可寻址的PCIe地址,第一数据报文为PCIe报文;第二地址指示的存储单元可以为计算设备的内存空间。
在本申请实施例中,处理单元1202可以具体由图3中的处理器301中的读操作逻辑310来实现,或者由图3中的处理器301和系统内存302中的读操作模块306来实现。接收单元1204可以由图3实施例中的处理器301和通信接口303来实现。
本申请实施例为以上实施例对应的主机的的装置实施例,以上实施例部分的特征描述适用于本申请实施例,在此不再赘述。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者替换其中部分技术特征;而这些修改或者替换,并不使相应技术方案脱离权利要求的保护范围。

Claims (32)

  1. 一种基于NVMe的数据读取系统,其特征在于,所述系统包括主机、NVMe控制器和存储介质;
    所述存储介质用于存储数据;
    所述主机用于触发读指令,所述读指令中携带指示信息,所述指示信息用于指示第一地址,所述第一地址为NVMe控制器可寻址的地址;
    所述NVMe控制器用于获取所述读指令,从所述存储介质中读取所述读指令对应的待读取数据,并向所述主机发送第一数据报文,所述第一数据报文中携带所述第一地址和第一载荷数据,所述待读取数据包含所述第一载荷数据;
    所述主机还用于接收所述第一数据报文,根据所述第一地址确定第二地址,并将所述第一载荷数据写入所述第二地址指示的存储单元,所述第二地址为所述主机可寻址的地址。
  2. 根据权利要求1所述的系统,其特征在于,所述主机完成对所述第二地址指示的存储单元的写操作后,还用于对所述第二地址指示的存储单元中的数据进行操作。
  3. 根据权利要求2所述的系统,其特征在于,所述NVMe控制器还用于触发完成队列条目CQE,所述CQE用于指示所述NVMe控制器完成所述读指令指示的读操作;
    所述主机对所述第二地址指示的存储单元中的数据进行操作后,还用于获取所述完成队列条目CQE。
  4. 根据权利要求2或3所述的系统,其特征在于,所述主机对所述第二地址指示的存储单元中的数据进行操作后,还用于释放所述第二地址指示的存储单元。
  5. 根据权利要求1-4任一项所述的系统,其特征在于,所述主机触发所述读指令前,还用于为所述读指令分配所述第二地址指示的存储单元,并记录所述第一地址与所述第二地址的对应关系。
  6. 根据权利要求5所述的系统,其特征在于,所述读指令的待读取数据对应至少两个数据报文,所述主机为所述读指令分配至少两个存储单元。
  7. [根据细则91更正 18.07.2019] 
    根据权利1所述的系统,其特征在于,所述主机用于根据所述第一地址和所述第一载荷数据在所述待读取数据中的顺序确定所述第二地址。
  8. 根据权利要求7所述的系统,其特征在于,所述NVMe控制器还用于向所述主机发送第二数据报文,所述第二数据报文中携带所述第一地址和第二载荷数据,所述待读取数据包含所述第二载荷数据;
    所述主机还用于接收所述第二数据报文,并根据接收所述第一数据报文和所述第二数据报文的顺序确定所述第一载荷数据和所述第二载荷数据在所述待读取数据中的顺序。
  9. 根据权利要求7所述的系统,其特征在于,所述第一数据报文中还携带所述第一载荷数据在所述待读取数据中的偏移量,所述偏移量用于指示所述第一载荷数据在所述待读取数据中的顺序。
  10. 根据权利要求1-9任一项所述的系统,其特征在于,所述第一地址为所述NVMe控制器可寻址的PCIe地址,所述第一数据报文为PCIe报文;所述第二地址指示的存储单元为所述主机的内存空间。
  11. 一种基于NVMe的数据读取方法,其特征在于,所述方法包括:
    主机触发读指令,所述读指令中携带指示信息,所述指示信息用于指示第一地址,所述第一地址为NVMe控制器可寻址的地址;
    所述主机接收所述NVMe控制器发送的第一数据报文,所述第一数据报文中携带所述第一地址和第一载荷数据;
    所述主机根据所述第一地址确定第二地址,所述第二地址为所述主机可寻址的地址;
    所述主机将所述第一载荷数据写入所述第二地址指示的存储单元。
  12. 根据权利要求11所述的方法,其特征在于,所述主机完成对所述第二地址指示的存储单元的写操作后,所述方法还包括:
    所述主机对所述第二地址指示的存储单元中的数据进行操作。
  13. 根据权利要求12所述的方法,其特征在于,所述主机对所述第二地址指示的存储单元中的数据进行操作后,所述方法还包括:
    所述主机获取所述NVMe控制器触发的完成队列条目CQE,所述CQE用于指示所述NVMe控制器完成所述读指令指示的读操作。
  14. 根据权利要求12或13所述的方法,其特征在于,所述主机对所述第二地址指示的存储单元中的数据进行操作后,所述方法还包括:
    所述主机释放所述第二地址指示的存储单元。
  15. 根据权利要求11-14任一项所述的方法,其特征在于,所述主机触发所述读指令前,所述方法还包括:
    所述主机为所述读指令分配所述第二地址指示的存储单元,并记录所述第一地址与所述第二地址的对应关系。
  16. 根据权利要求15所述的方法,其特征在于,所述读指令的待读取数据对应至 少两个数据报文,所述主机为所述读指令分配至少两个存储单元。
  17. [根据细则91更正 18.07.2019] 
    根据权利11d所述的方法,其特征在于,所述主机根据所述第一地址和所述第一载荷数据在所述待读取数据中的顺序确定所述第二地址。
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    所述主机接收所述NVMe控制器发送的第二数据报文,所述第二数据报文中携带所述第一地址和第二载荷数据;
    所述主机根据接收所述第一数据报文和所述第二数据报文的顺序确定所述第一载荷数据和所述第二载荷数据在所述待读取数据中的顺序。
  19. 根据权利要求17所述的方法,其特征在于,所述第一数据报文中还携带所述第一载荷数据在所述待读取数据中的偏移量,所述偏移量用于指示所述第一载荷数据在所述待读取数据中的顺序。
  20. 根据权利要求11-19任一项所述的方法,其特征在于,所述第一地址为所述NVMe控制器可寻址的PCIe地址,所述第一数据报文为PCIe报文;所述第二地址指示的存储单元为所述主机的内存空间。
  21. 一种基于NVMe的数据读取装置,其特征在于,所述装置包括:
    处理单元,用于触发读指令,所述读指令中携带指示信息,所述指示信息用于指示第一地址,所述第一地址为NVMe控制器可寻址的地址;
    接收单元,用于接收所述NVMe控制器发送的第一数据报文,所述第一数据报文中携带所述第一地址和第一载荷数据;
    所述处理单元还用于根据所述第一地址确定第二地址,并将所述第一载荷数据写入所述第二地址指示的存储单元,所述第二地址为所述处理单元可寻址的地址。
  22. 根据权利要求21所述的装置,其特征在于,所述处理单元完成对所述第二地址指示的存储单元的写操作后,还用于对所述第二地址指示的存储单元中的数据进行操作。
  23. 根据权利要求22所述的装置,其特征在于,所述处理单元对所述第二地址指示的存储单元中的数据进行操作后,还用于获取所述NVMe控制器触发的完成队列条目CQE,所述CQE用于指示所述NVMe控制器完成所述读指令指示的读操作。
  24. 根据权利要求22或23所述的装置,其特征在于,所述处理单元对所述第二地址指示的存储单元中的数据进行操作后,还用于释放所述第二地址指示的存储单元。
  25. 根据权利要求21-24任一项所述的装置,其特征在于,所述处理单元触发所述读指令前,还用于为所述读指令分配所述第二地址指示的存储单元,并记录所述第一地 址与所述第二地址的对应关系。
  26. 根据权利要求25所述的装置,其特征在于,所述读指令的待读取数据对应至少两个数据报文,所述处理单元为所述读指令分配至少两个存储单元。
  27. [根据细则91更正 18.07.2019] 
    根据权利21所述的装置,其特征在于,所述处理单元用于根据所述第一地址和所述第一载荷数据在所述待读取数据中的顺序确定所述第二地址。
  28. 根据权利要求27所述的装置,其特征在于,所述接收单元还用于接收所述NVMe控制器发送的第二数据报文,所述第二数据报文中携带所述第一地址和第二载荷数据;
    所述处理单元还用于根据接收所述第一数据报文和所述第二数据报文的顺序确定所述第一载荷数据和所述第二载荷数据在所述待读取数据中的顺序。
  29. 根据权利要求27所述的装置,其特征在于,所述第一数据报文中还携带所述第一载荷数据在所述待读取数据中的偏移量,所述偏移量用于指示所述第一载荷数据在所述待读取数据中的顺序。
  30. 根据权利要求21-29任一项所述的装置,其特征在于,所述第一地址为所述NVMe控制器可寻址的PCIe地址,所述第一数据报文为PCIe报文;所述第二地址指示的存储单元为所述装置的内存空间。
  31. 一种可读介质,其特征在于,包括执行指令,当计算设备的处理器执行所述执行指令时,所述计算设备执行权利要求11-20任一项所述的方法。
  32. 一种计算设备,其特征在于,包括:处理器、存储器和总线;
    所述存储器用于存储执行指令,所述处理器与所述存储器通过所述总线连接,当所述计算设备运行时,所述处理器执行所述存储器存储的所述执行指令,以使所述计算设备执行权利要求11-20任一项所述的方法。
PCT/CN2018/093918 2018-06-30 2018-06-30 一种基于NVMe的数据读取方法、装置及系统 WO2020000482A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020207022273A KR102471219B1 (ko) 2018-06-30 2018-06-30 NVMe 기반의 데이터 판독 방법, 장치, 및 시스템
JP2020545126A JP7191967B2 (ja) 2018-06-30 2018-06-30 NVMeベースのデータ読み取り方法、装置及びシステム
CN201880005007.9A CN111095231B (zh) 2018-06-30 2018-06-30 一种基于NVMe的数据读取方法、装置及系统
PCT/CN2018/093918 WO2020000482A1 (zh) 2018-06-30 2018-06-30 一种基于NVMe的数据读取方法、装置及系统
EP18924295.1A EP3792776B1 (en) 2018-06-30 2018-06-30 Nvme-based data reading method, apparatus and system
US17/072,038 US11467764B2 (en) 2018-06-30 2020-10-16 NVMe-based data read method, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/093918 WO2020000482A1 (zh) 2018-06-30 2018-06-30 一种基于NVMe的数据读取方法、装置及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/072,038 Continuation US11467764B2 (en) 2018-06-30 2020-10-16 NVMe-based data read method, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2020000482A1 true WO2020000482A1 (zh) 2020-01-02

Family

ID=68984388

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/093918 WO2020000482A1 (zh) 2018-06-30 2018-06-30 一种基于NVMe的数据读取方法、装置及系统

Country Status (6)

Country Link
US (1) US11467764B2 (zh)
EP (1) EP3792776B1 (zh)
JP (1) JP7191967B2 (zh)
KR (1) KR102471219B1 (zh)
CN (1) CN111095231B (zh)
WO (1) WO2020000482A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752484B (zh) * 2020-06-08 2024-04-12 深圳大普微电子科技有限公司 一种ssd控制器、固态硬盘及数据写入方法
CN111831226B (zh) * 2020-07-07 2023-09-29 山东华芯半导体有限公司 一种自主输出nvme协议命令加速处理方法
CN113296691B (zh) * 2020-07-27 2024-05-03 阿里巴巴集团控股有限公司 数据处理系统、方法、装置以及电子设备
CN112527705B (zh) * 2020-11-05 2023-02-28 山东云海国创云计算装备产业创新中心有限公司 一种PCIe DMA数据通路的验证方法、装置及设备
CN113031862B (zh) * 2021-03-18 2024-03-22 中国电子科技集团公司第五十二研究所 一种基于nvme协议控制sata盘的存储系统
CN114996172B (zh) * 2022-08-01 2022-11-01 北京得瑞领新科技有限公司 基于ssd访问主机内存的方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210041A (zh) * 2016-07-05 2016-12-07 杭州华为数字技术有限公司 一种数据写入方法及服务器端网卡
US20180074757A1 (en) * 2016-09-09 2018-03-15 Toshiba Memory Corporation Switch and memory device
CN107992436A (zh) * 2016-10-26 2018-05-04 杭州华为数字技术有限公司 一种NVMe数据读写方法及NVMe设备

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130086311A1 (en) 2007-12-10 2013-04-04 Ming Huang METHOD OF DIRECT CONNECTING AHCI OR NVMe BASED SSD SYSTEM TO COMPUTER SYSTEM MEMORY BUS
CN103080917B (zh) * 2010-06-18 2014-08-20 Lsi公司 可扩展存储装置
US8966172B2 (en) * 2011-11-15 2015-02-24 Pavilion Data Systems, Inc. Processor agnostic data storage in a PCIE based shared storage enviroment
WO2013109640A1 (en) * 2012-01-17 2013-07-25 Intel Corporation Techniques for command validation for access to a storage device by a remote client
US20140195634A1 (en) * 2013-01-10 2014-07-10 Broadcom Corporation System and Method for Multiservice Input/Output
US9256384B2 (en) * 2013-02-04 2016-02-09 Avago Technologies General Ip (Singapore) Pte. Ltd. Method and system for reducing write latency in a data storage system by using a command-push model
US9424219B2 (en) * 2013-03-12 2016-08-23 Avago Technologies General Ip (Singapore) Pte. Ltd. Direct routing between address spaces through a nontransparent peripheral component interconnect express bridge
US9727503B2 (en) 2014-03-17 2017-08-08 Mellanox Technologies, Ltd. Storage system and server
US20160124876A1 (en) * 2014-08-22 2016-05-05 HGST Netherlands B.V. Methods and systems for noticing completion of read requests in solid state drives
US9712619B2 (en) * 2014-11-04 2017-07-18 Pavilion Data Systems, Inc. Virtual non-volatile memory express drive
US9565269B2 (en) * 2014-11-04 2017-02-07 Pavilion Data Systems, Inc. Non-volatile memory express over ethernet
US9575853B2 (en) 2014-12-12 2017-02-21 Intel Corporation Accelerated data recovery in a storage system
CN106484549B (zh) 2015-08-31 2019-05-10 华为技术有限公司 一种交互方法、NVMe设备、HOST及物理机系统
CN111427517A (zh) 2015-12-28 2020-07-17 华为技术有限公司 一种数据处理方法以及NVMe存储器
US9921756B2 (en) * 2015-12-29 2018-03-20 EMC IP Holding Company LLC Method and system for synchronizing an index of data blocks stored in a storage system using a shared storage module
US10769098B2 (en) 2016-04-04 2020-09-08 Marvell Asia Pte, Ltd. Methods and systems for accessing host memory through non-volatile memory over fabric bridging with direct target access
CN107832086B (zh) * 2016-09-14 2020-03-20 华为技术有限公司 计算机设备、程序写入方法及程序读取方法
WO2018102969A1 (zh) 2016-12-05 2018-06-14 华为技术有限公司 NVMe over Fabric架构中数据读写命令的控制方法、设备和系统
US11451647B2 (en) * 2016-12-27 2022-09-20 Chicago Mercantile Exchange Inc. Message processing protocol which mitigates optimistic messaging behavior
US10387081B2 (en) * 2017-03-24 2019-08-20 Western Digital Technologies, Inc. System and method for processing and arbitrating submission and completion queues
US10503434B2 (en) 2017-04-12 2019-12-10 Micron Technology, Inc. Scalable low-latency storage interface
CN107608909A (zh) 2017-09-19 2018-01-19 记忆科技(深圳)有限公司 一种NVMe固态硬盘写加速的方法
JP6901427B2 (ja) * 2018-03-27 2021-07-14 キオクシア株式会社 ストレージ装置、コンピュータシステムおよびストレージ装置の動作方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210041A (zh) * 2016-07-05 2016-12-07 杭州华为数字技术有限公司 一种数据写入方法及服务器端网卡
US20180074757A1 (en) * 2016-09-09 2018-03-15 Toshiba Memory Corporation Switch and memory device
CN107992436A (zh) * 2016-10-26 2018-05-04 杭州华为数字技术有限公司 一种NVMe数据读写方法及NVMe设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3792776A4 *

Also Published As

Publication number Publication date
US11467764B2 (en) 2022-10-11
CN111095231B (zh) 2021-08-03
CN111095231A (zh) 2020-05-01
KR102471219B1 (ko) 2022-11-25
KR20200101982A (ko) 2020-08-28
JP2021515318A (ja) 2021-06-17
EP3792776B1 (en) 2022-10-26
EP3792776A4 (en) 2021-06-09
JP7191967B2 (ja) 2022-12-19
US20210034284A1 (en) 2021-02-04
EP3792776A1 (en) 2021-03-17

Similar Documents

Publication Publication Date Title
WO2020000482A1 (zh) 一种基于NVMe的数据读取方法、装置及系统
US9734085B2 (en) DMA transmission method and system thereof
US8352689B2 (en) Command tag checking in a multi-initiator media controller architecture
WO2013170731A1 (zh) 将数据写入存储设备的方法与存储设备
US11579803B2 (en) NVMe-based data writing method, apparatus, and system
US8516170B2 (en) Control flow in a ring buffer
US20040186931A1 (en) Transferring data using direct memory access
US10963295B2 (en) Hardware accelerated data processing operations for storage data
US20230359396A1 (en) Systems and methods for processing commands for storage devices
WO2020087931A1 (zh) 一种数据备份方法、装置及系统
US11200180B2 (en) NVMe SGL bit bucket transfers
US10802828B1 (en) Instruction memory
WO2022073399A1 (zh) 存储节点、存储设备及网络芯片
US11789634B2 (en) Systems and methods for processing copy commands
US10261700B1 (en) Method and apparatus for streaming buffering to accelerate reads
KR20140113370A (ko) 패스 스루 스토리지 디바이스들
JP2005267139A (ja) ブリッジ装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18924295

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20207022273

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020545126

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018924295

Country of ref document: EP

Effective date: 20201208

NENP Non-entry into the national phase

Ref country code: DE