WO2020000482A1 - 一种基于NVMe的数据读取方法、装置及系统 - Google Patents
一种基于NVMe的数据读取方法、装置及系统 Download PDFInfo
- Publication number
- WO2020000482A1 WO2020000482A1 PCT/CN2018/093918 CN2018093918W WO2020000482A1 WO 2020000482 A1 WO2020000482 A1 WO 2020000482A1 CN 2018093918 W CN2018093918 W CN 2018093918W WO 2020000482 A1 WO2020000482 A1 WO 2020000482A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- address
- data
- read
- host
- storage unit
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4027—Coupling between buses using bus bridges
- G06F13/404—Coupling between buses using bus bridges with address mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4234—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0607—Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
- G06F3/0664—Virtualisation aspects at device level, e.g. emulation of a storage device or system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
Definitions
- the present application relates to the field of storage, and in particular, to a method, an apparatus, and a storage device for reading data based on a non-volatile high-speed transmission bus (NVMe).
- NVMe non-volatile high-speed transmission bus
- NVMe non-volatile memory express
- NVM The interface of the subsystem (including controller and storage medium) communication is added to the high-speed Peripheral Component Interconnect Express (PCIe) interface as a register interface, which is optimized for enterprise-level and consumer-level solid-state storage. It has the advantages of high performance and low access delay.
- PCIe Peripheral Component Interconnect Express
- NVMe is based on a pair of submission queues (full name in English: submission queue, abbreviation SQ) and completion queues (full name in English: completion queue, abbreviation: CQ) mechanism.
- the command is put into the submission queue by the host.
- the completion information is put into the corresponding completion queue by the controller.
- Each submission queue entry (SQE) is a command.
- the memory address used for data transmission passes a metadata pointer (full name in English: Meta-data Pointer, abbreviation: MPTR) and a data pointer (full name in English) : Data Pointer, abbreviation: DPTR).
- the NVMe controller obtains the read instruction, it writes the data to be read into the storage space indicated by the memory address used for data transmission through the PCIe write operation.
- the present application discloses a method, device, and system for reading data based on NVMe.
- the host receives the data message sent by the NMVe controller through the entry address opened to the NMVe controller, and allocates a corresponding storage unit for the entry address in its memory space. After receiving the data message, it receives the entry address carried by the data message. Determine the address of the corresponding storage unit, and write the payload data in the data message to the determined storage unit. Therefore, the relationship between the storage unit and the communication protocol is separated, and flexible operation of data is realized.
- the present application discloses an NVMe-based data reading system.
- the system includes a host, an NVMe controller, and a storage medium.
- the storage medium is used to store data of the host, and the host is used to trigger a read instruction to the NVMe controller.
- the read instruction carries instruction information for indicating the first address.
- the first address is an address that the NVMe controller can address. After the NVMe controller obtains the read instruction, it is used to read the pending instruction corresponding to the read instruction from the storage medium.
- the read instruction may be SQE, and the specific process for the host to trigger the read instruction may write SQE to SQ for the host, and notify the NMVe controller through the doorbell.
- the first address is an address opened by the host to the NVMe controller, but the first address is only an entry address for the NVMe controller to write payload data to the host.
- the storage space indicated by the first address does not actually store the payload data.
- the host After receiving the data message sent by the NVMe controller, the host does not write the payload data into the storage space indicated by the first address, but allocates a second corresponding to the first address in its addressable storage space. Address, and write the payload data to the storage unit indicated by the second address. Therefore, the operation of the host on the storage unit indicated by the second address is no longer limited by the communication protocol between the host and the NMVe controller. This application can reduce the delay of the read operation and reduce the occupation of the host's storage space by the data to be read .
- the host after the host completes the write operation to the storage unit indicated by the second address, it is further configured to operate the data in the storage unit indicated by the second address.
- the completion of the write operation to the storage unit is to write all the data associated with the storage unit into the storage unit. For example, it can be filled with the storage unit or the last load data associated with the storage unit is written into the storage unit. .
- the operation on the data may be sending the data in the storage unit to other subjects.
- the storage space indicated by the second address may be the private memory of the host, which is no longer accessible to the NVMe controller by means of the PCIe address, nor is it used as a command memory buffer (Command Memory Buffer (CMB)).
- CMB Common Memory Buffer
- the host uses the mapping relationship between the first address and the second address to store the payload data in the storage unit indicated by the addressable second address, and then splits the communication protocol between the host and the NVMe controller and the second address. Relationship, after the host completes the write operation to the storage unit indicated by the second address, it can operate on the data in the storage unit indicated by the second address, and does not need to wait for the read operation indicated by the read instruction to complete completely. Read the data for operation.
- the NVMe controller is further configured to trigger completion of a queue entry (CQE), and CQE is used to instruct the NVMe controller After completing the read operation indicated by the read instruction, after the host operates the data in the storage unit indicated by the second address, it is also used to obtain the completion queue entry CQE.
- CQE queue entry
- the NVMe controller triggering CQE can specifically write the CQE to CQ after NVMe completes the read operation, and notify the host through an interrupt.
- the host needs to allocate a PCIe address space for the read instruction, and the storage space indicated by the PCIe address is used to store the data to be read.
- the host loses ownership of the PCIe address space, that is, the host cannot access the storage space indicated by the PCIe address space until the host obtains the CQE, resulting in delays in reading operations and waste of storage space.
- the second address is not the first address carried in the data message, but an internal address selectable by the host, the host can operate the data in the storage unit indicated by the second address before acquiring the CQE.
- the host after the host operates the data in the storage unit indicated by the second address, the host is further configured to release the second address indication. Storage unit.
- the host can organize its internal storage space into the form of a memory pool, which contains multiple storage units. After the host completes the operations on the data in the storage unit, it can release the memory unit to the memory pool for other read operations. It is not necessary to wait until the entire read operation is explained before releasing the storage unit, thereby reducing the occupation time of the storage unit and increasing the use efficiency of the storage space.
- the host before the host triggers the read instruction, it is further configured to allocate a second address indication storage unit for the read instruction. And record the correspondence between the first address and the second address.
- the host allocates the corresponding storage unit for the read instruction before the write instruction is triggered, which can effectively avoid storage space overflow.
- the host can trigger the read instruction according to the number of free storage units in the maintained memory pool to effectively control the read operation.
- the data to be read of the read instruction corresponds to at least two data packets, and the host allocates the read instruction At least two storage units.
- the data to be read can be split into multiple copies and transmitted using multiple data messages.
- the host can allocate a corresponding number of multiple storage units according to the scale of the read operation.
- the host is configured to use the order of the first address and the first payload data in the data to be read Determine the second address.
- the host allocates multiple storage units for the read operation, that is, the first address corresponds to multiple storage units. After the host receives the first data message, it needs to determine the specific storage unit for the first payload data.
- the host can logically address multiple storage units allocated for the read operation, and write the data to be read into the multiple storage units in sequence.
- the host may specifically determine a storage unit to which the first load data needs to be written according to an order of the first load data in the data to be read.
- the NVMe controller is further configured to send a second data packet and a second data packet to the host.
- the first address and the second payload data are carried in the data.
- the data to be read includes the second payload data.
- the host is also used to receive the second data message, and determine the first data message according to the order of receiving the first data message and the second data message. The order of the first load data and the second load data in the data to be read.
- the NVMe controller needs to divide the data to be read into multiple data messages for transmission. After receiving the data message, the host needs to carry the data message. The payload data is reordered. If the NVMe controller sends data packets in strict accordance with the order of the payload data in the data to be read, the host can perform the payload data according to the order of the received data packets. Sort.
- the first data packet further carries a bias of the first payload data in the data to be read.
- the shift amount is used to indicate the order of the first load data in the data to be read.
- the NVMe controller can realize out-of-order transmission of the data message, and can make greater use of bandwidth resources.
- the first address is a PCIe address addressable by the NVMe controller, and the first data packet is PCIe message; the storage unit indicated by the second address is the memory space of the host.
- the present application discloses an NVMe-based data reading method.
- the method includes: a host triggers a read instruction, the read instruction carries instruction information, and the instruction information is used to indicate a first address, and the first address is an NVMe controller Addressable address; the host receives the first data message sent by the NVMe controller, and the first data message carries the first address and the first payload data; the host determines the second address according to the first address, and the second address is the host Addressable address; the host writes the first payload data to the storage unit indicated by the second address.
- the method further includes: the host performs data on the storage unit indicated by the second address Do it.
- the method further includes: the host obtains NVMe control
- the completion queue entry CQE triggered by the controller is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
- the method further includes: the host releases the first Memory location indicated by two addresses.
- the method before the host triggers the read instruction, the method further includes: the host allocates a second address indication for the read instruction Storage unit, and records the correspondence between the first address and the second address.
- the data to be read of the read instruction corresponds to at least two data packets, and the host allocates the read instruction At least two storage units.
- the host determines the first address according to the order of the first address and the first payload data in the data to be read. Second address.
- the method further includes: the host receives a second data packet sent by the NVMe controller, and the second The data message carries the first address and the second payload data; the host determines the order of the first payload data and the second payload data in the data to be read according to the order in which the first data message and the second data message are received.
- the first data packet further carries a bias of the first payload data in the data to be read.
- the shift amount is used to indicate the order of the first load data in the data to be read.
- the first address is a PCIe address addressable by the NVMe controller, and the first data packet is PCIe message; the storage unit indicated by the second address is the memory space of the host.
- the second aspect is a method implementation manner corresponding to the first aspect.
- the description in the first aspect or any possible implementation manner of the first aspect corresponds to the second aspect or any possible implementation manner of the second aspect. Here, No longer.
- the present application provides a readable medium including an execution instruction.
- the computing device executes any of the foregoing second aspect or any possible implementation of the foregoing second aspect. Way in the way.
- the present application provides a computing device, including: a processor, a memory, and a bus; the memory is used to store execution instructions, and the processor is connected to the memory through the bus; when the computing device is running, the processor executes the execution of the memory storage Instructions to cause a computing device to execute the method in the second aspect above or any one of the possible implementation manners of the second aspect above.
- the present application discloses an NVMe-based data reading device.
- the device includes a processing unit for triggering a read instruction, and the read instruction carries instruction information, and the instruction information is used to indicate a first address and a first address.
- An address that is addressable by the NVMe controller;
- a receiving unit configured to receive a first data message sent by the NVMe controller, the first data message carrying a first address and first payload data;
- the processing unit is further configured to The first address determines the second address, and writes the first payload data into the storage unit indicated by the second address, and the second address is an address that can be addressed by the processing unit.
- the processing unit after the processing unit completes the write operation to the storage unit indicated by the second address, it is further configured to operate the data in the storage unit indicated by the second address. .
- the processing unit after the processing unit operates on the data in the storage unit indicated by the second address, it is further configured to obtain an NVMe controller trigger
- the completion queue entry CQE, CQE is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
- the processing unit in a third possible implementation manner of the fifth aspect, after the processing unit operates on the data in the storage unit indicated by the second address, the processing unit is further configured to release the second address. Indicated storage unit.
- the processing unit before the processing unit triggers the read instruction, the processing unit is further configured to allocate a storage for the second address indication for the read instruction Unit, and records the correspondence between the first address and the second address.
- the data to be read of the read instruction corresponds to at least two data packets
- the processing unit is a read instruction Allocate at least two storage units.
- the processing unit is configured to, according to the first address and the first payload data, in the data to be read The second address is determined sequentially.
- the receiving unit is further configured to receive the second data packet and the second data sent by the NVMe controller.
- the message carries the first address and the second payload data;
- the processing unit is further configured to determine the first payload data and the second payload data in the data to be read according to the order in which the first data message and the second data message are received. order.
- the first data packet further carries a bias of the first payload data in the data to be read
- the shift amount is used to indicate the order of the first load data in the data to be read.
- the first address is a PCIe address addressable by the NVMe controller
- the first data packet is a PCIe packet
- the storage unit indicated by the second address is a device Memory space.
- the fifth aspect is a device implementation manner corresponding to the first aspect.
- the description in the first aspect or any possible implementation manner of the first aspect corresponds to the fifth aspect or any possible implementation manner of the fifth aspect. Here, No longer.
- the host opens the first address as a data entry address to the NVMe controller for the NVMe controller to write data to be read to the host through the first address.
- the data message sent by the NVMe controller carries The destination address is the first address, but after receiving the data packet, the host does not actually write the payload data in the data packet into the storage space indicated by the first address, but instead maps the first address to the second address. And write the payload data of the data message into the storage space indicated by the second address.
- the storage space indicated by the second address may be the private memory space of the host, thereby separating the relationship between the storage space storing the payload data and the communication protocol, and the host's access to the second address is not restricted by the communication protocol.
- the host Before the end of the read instruction, the host can use the data stored in the storage space indicated by the second address and release the storage space indicated by the second address in advance for other read operations.
- the technical solution disclosed in this application can reduce the delay of the read operation and save the storage space for storing the data to be read.
- FIG. 1 is a schematic diagram of a logical structure of an NVMe system according to an embodiment of the present application
- FIG. 2 is a signaling diagram of a data reading method based on the NVMe standard
- FIG. 3 is a schematic diagram of a hardware structure of a host according to an embodiment of the present application.
- FIG. 4 is a schematic flowchart of an NMVe-based data reading method according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of an entrance organization structure according to an embodiment of the present invention.
- FIG. 6 is a schematic diagram of an entrance organization structure according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of a PCIe address structure according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of an address mapping relationship according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of data packet transmission according to an embodiment of the present invention.
- FIG. 10 is a signaling diagram of an NVMe-based data reading method according to an embodiment of the present invention.
- FIG. 11 (a) is a schematic diagram of a logical structure of an NVMe system according to an embodiment of the present application.
- FIG. 11 (b) is a schematic diagram of a logical structure of an NVMe system according to an embodiment of the present application.
- FIG. 12 is a schematic diagram of a logical structure of a computing device according to an embodiment of the present application.
- first and second are used to distinguish each object, such as the first address and the second address, but there is no logical or temporal dependency relationship between each “first” and “second”.
- the “data packet” refers to a data packet that carries payload data and is sent by the NVMe controller to the host.
- the payload data herein may be user data or metadata of the user data, and the embodiment of the present invention does not limit the type of the payload data.
- the embodiments of the present invention use the term "data" or "payload data" to represent various types of data carried in a data message.
- the data message may be a PCIe message.
- the entry is an address space opened by the host to the NVMe controller.
- the entry address may be a PCIe address
- the data message may be a PCIe write message.
- the NVMe controller sends a data packet to the host through the portal, and the data packet carries the portal address.
- the host After receiving the data message, the host identifies the entry address, allocates corresponding storage space for the entry in the local internal memory, and writes the payload data carried by the data message to the allocated memory space for buffering, instead of loading the data Write the storage space indicated by the entry address.
- the internal memory can be the private memory space of the host.
- the read operation may be any operation for the NVMe command centralized host to read data from the NVMe controller.
- the instruction indicating a read operation is a read instruction.
- the specific implementation of the read instruction may be a submission queue entry.
- the command initiator and the data initiator may be the same or separated subjects.
- the command initiator is a system main body that directly triggers an instruction to the NVMe controller, and is also referred to as a command source in the embodiment of the present invention.
- a data initiator is a system main body that needs to read data and consume data, that is, a system main body for initiating a data access request, and is also referred to as a data source in the embodiment of the present invention. In a separate scenario, the data source needs to read the data through the command source.
- the term “host” may refer to a command source in a scenario where the data source and a command source are separated, or a computing device that communicates with the NMVe controller in a scenario where the data source and the command source are not separated.
- the host carries the address information of the storage space for storing the data to be read through DPTR or MPTR in the triggered SQE during the NVMe read operation.
- the NVMe controller writes the data to be read according to the SQE.
- the storage space indicated by the address information.
- the host loses ownership of the storage space used to store the data to be read, that is, the host needs to wait for the read operation to completely end To access the data stored in this storage space.
- FIG. 1 is an architecture diagram of an NVMe system 100 according to an embodiment of the present invention.
- the data source 101 and the command source 103 in the system 100 are not the same subject, and they are separated from each other and interconnected through the network 102.
- the command source 103 may be interconnected with the NVMe controller 105 through the PCIe bus, and the NVMe controller 105 is connected with the storage medium 106.
- the storage medium 106 is also generally called external storage, and is generally a non-volatile storage medium, which can be used to permanently store data.
- the storage medium 106 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or a semiconductor medium (for example, a flash memory).
- the embodiment of the present invention does not limit the specific implementation form of the storage medium 106.
- the storage medium 106 may further include a remote storage separate from the NVMe controller 105, such as a network storage medium interconnected with the NVMe controller 105 through a network.
- the network 102 may be used to refer to any method or interconnection protocol between the data source 101 and the command source 103.
- the network 102 may be a PCIe bus, an internal interconnection bus for computer equipment, the Internet, an intranet (English: intranet ), Local area network (full name in English: local area network, abbreviation: LAN), wide area network (full name in English: wide area network, abbreviation: WAN), storage area network (full name in English: storage area network, abbreviation: SAN), etc., or above Any combination of networks.
- the data source 101 and the NVMe controller 105 need to communicate through the command source 103.
- the read instruction triggered by the command source 103 needs to carry address information of a storage space for storing data to be read.
- the data to be read must first be completely transferred from the storage medium 106 controlled by the NVMe controller 105 to the command source 103.
- the command source 103 can send data Give the data source 101.
- the data source when a data source needs to read data from a storage medium, the data source first sends a read request to the command source.
- the command source writes the SQE to the submission queue (full name: submission queue, abbreviated SQ) according to the read request received from the data source, and carries the address information for receiving the data to be read through the DPTR or MPTR field of the SQE.
- the command source then notified the NVMe controller of a new SQE through the doorbell mechanism. After the NVMe controller received the doorbell, it read the SQE in SQ, and used the PCIe write instruction to completely write the data to be read according to the address information carried in the SQE.
- the NVMe controller writes CQE to the completion queue (full name: completion, abbreviation: CQ), and notifies the command source through the interrupt mechanism, the command source processes the interrupt, obtains the CQE, and sends the data source to be read data.
- the storage space for receiving the data to be read needs to be prepared, and the ownership of this storage space is lost before the CQE is obtained, that is, it is necessary to wait for the data to be read to be completely After writing to this storage space, you can send data to the data source.
- the delay of this process is proportional to the size of the data to be read.
- the command source requires a large amount of memory addressable by the NVMe controller to store the data to be read, and the command source allocates memory for the data to be read until the time period between the time when the NVMe controller CQE releases memory Always occupied.
- FIG. 3 is a schematic structural diagram of a host 300 according to an embodiment of the present application.
- the host 300 includes a processor 301, and the processor 301 is connected to the system memory 302.
- the processor 301 may be a central processing unit (CPU), an image processor (English: graphics processing unit, GPU), a field programmable gate array (full English name: Field Programmable GateArray, abbreviation: FPGA), an application specific integrated circuit (full English name : Application Specific Integrated Circuit (abbreviation: ASIC) or digital signal processor (English: digital signal processor (DSP)) or any combination of the above calculation logic.
- the processor 301 may be a single-core processor or a multi-core processor.
- the processor 301 may further include a register, and the address information of the register may be opened to the NMVe controller as a PCIe address.
- the processor 301 may further include read operation logic 310, and the read operation logic 310 may be a specific hardware circuit or a firmware module integrated in the processor 301. If the read operation logic 310 is a specific hardware circuit, the read operation logic 310 executes the method of the embodiment of the present application. If the read operation logic 310 is a firmware module, the processor 310 executes the firmware code in the read operation logic 310 to implement the present application. The technical solution of the embodiment.
- the read operation logic 310 includes: (1) logic (circuit / firmware code) for triggering a read instruction, wherein the read instruction carries instruction information for indicating a first address that the NVMe controller can address; 2) Logic (circuit / firmware code) for receiving a data message sent by the NVMe controller, the data message carries the first address and payload data; (3) is used to determine the host addressable based on the first address Logic (circuit / firmware code) of the second address; (4) Logic (circuit / firmware code) for writing payload data to the memory cell indicated by the second address.
- logic circuit / firmware code
- the bus 309 is used to transfer information between various components of the host 300.
- the bus 309 may use a wired connection method or a wireless connection method, which is not limited in this application.
- the bus 309 is also connected with an input / output interface 305 and a communication interface 303.
- the input / output interface 305 is connected with an input / output device for receiving input information and outputting operation results.
- the input / output device can be a mouse, keyboard, monitor, or optical drive.
- the communication interface 303 is used to implement communication with other devices or networks.
- the communication interface 303 may be interconnected with other devices or networks in a wired or wireless form.
- the host 300 may be connected to the NVMe controller through the communication interface 303, and the host 300 may also be connected to the network through the communication interface 303 and connected to the NVMe controller through the network.
- the system memory 302 may include some software, for example, the operating system 308 (such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)), the application program 307, and the read operation module 306, etc. .
- the operating system 308 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)
- the application program 307 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)
- the application program 307 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks)
- the application program 307 such as Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, macOS or embedded operating system (such as Vxworks
- the processor 301 executes the read operation module 306 to implement the technical solution of the embodiment of the present application.
- the read operation module 306 includes: (1) a code for triggering a read instruction, wherein the read instruction carries instruction information that is used to indicate a first address that the NVMe controller can address; (2) used to receive NVMe The code of the data message sent by the controller, the data message carries the first address and payload data; (3) a code used to determine the second address that the host can address according to the first address; (4) used to The payload data is written into the code of the memory cell indicated by the second address.
- FIG. 3 is only an example of the host 300.
- the host 300 may include more or fewer components than those shown in FIG. 3, or may have different component configuration methods.
- various components shown in FIG. 3 may be implemented by hardware, software, or a combination of hardware and software.
- an embodiment of the present invention provides a method for reading data based on NVMe. As shown in FIG. 4, the method 400 includes:
- Step 401 The host triggers a read instruction, and the read instruction carries instruction information, and the instruction information is used to indicate a first address that the NVMe controller can address.
- the read instruction may be specifically SQE.
- the read instruction is described as an example of SQE, but it should be understood that the embodiment of the present invention does not limit the specific implementation form of the first read instruction.
- the process for the host to trigger a read instruction to the NVMe controller can refer to the NMVe standard. Specifically, the host writes SQE to SQ, and notifies the NVMe controller of the new SQE through the doorbell. The NVMe controller goes to SQ to obtain the SQE according to the doorbell. In the embodiment of the present invention, the host may also directly push SQE to the NVMe controller, and the embodiment of the present invention does not limit the specific process of the host triggering the read instruction to the NVMe controller.
- the host may open a part of its storage space to the NVMe controller. More specifically, the host may open a part of its storage space to the NVMe controller as a PCIe storage space, and the NVMe controller may access this according to the PCIe address. Partial storage space. Take the Base Address Register (BAR) as an example, the host opens the BAR as a PCIe storage space to the NVMe controller, and organizes part of the PCIe address of the BAR into multiple portals, each of which occupies For a specific NVMe controller addressable PCIe address space, the entry address can be the entry's starting PCIe address.
- BAR Base Address Register
- the indication information carried in the read instruction triggered by the host may be used to indicate a specific entry, and the first address may be an entry address or a part of a field of the entry address.
- the entry is the data entry for the NVMe controller to perform PCIe write operations to the host. In the following description, the function of the entry will be described in more detail.
- the present invention does not limit the organization manner of the entries in the PCIe address space. It only needs to ensure that each entry uniquely corresponds to a specific read operation.
- the host may organize a part of the PCIe address of its base address register into the form of an aperture, and each of the vias includes multiple portals, that is, the organization of the portals may be in the form of an array.
- the entry is addressed to the entry by adding the base offset of the array base address. This array is called a via.
- FIG. 5 is a schematic structural diagram of a base address register. Each through hole is composed of a set of entries P0 to PN, and each entry is uniquely associated with a specific read operation. "Unique" means that the host can initiate only one NVMe read operation to be associated with a specific entry at any one time.
- the vias can be divided into metadata vias and data vias.
- the NVMe controller writes data to the host through the entry DP0 ⁇ DPN included in the data through hole through the PCIe write operation, and writes metadata to the host through the entry MP0 ⁇ MPN included in the metadata through hole.
- the embodiments of the present invention collectively refer to metadata and data as data.
- FIG. 7 is a PCIe address structure in a PCIe data packet according to an embodiment of the present invention.
- the PCIe address structure includes the base address of the BAR, the via offset, and the entry offset.
- the BAR and through-hole offset are used to uniquely determine the through-hole
- the inlet offset is used to indicate the specific entrance in the through-hole.
- the entries can also be randomly distributed in the PCIe address space.
- the randomly distributed entries in the PCIe space are called any "data entry” and "metadata entry”.
- the instruction information is used to indicate a specific entry, and the NVMe controller may uniquely determine an entry according to the instruction information.
- the invention does not limit the specific form of the indication information.
- the indication information may be a display address, and the indication information may be a specific PCIe address or a partial field of the entry address, that is, the indication information may be a first address or a partial field of the first address.
- the indication information may be the portal offset of the portal, and the base address and through-hole offset of the BAR may be used as the configuration information of the host for the NVMe controller to obtain.
- the NVMe controller can determine the complete PCIe address of the entry according to the instruction information. In this case, the format of the SQE can be consistent with the NVMe standard.
- the indication information may also be an implicit address. For example, if each SQE in an SQ has its own unique command identifier CID, the indication information may consist of "queue ID + CID". If the CID of each SQE processed by the NVMe controller is unique, the indication information may be the CID carried by the corresponding SQE. In other implementations, the indication information may also be part of the CID. In the embodiment of the present invention, the indication information may also be specified by using a specially defined MPTR or PRT or other fields in the SQE. The embodiment of the present invention does not limit a specific implementation manner of the indication information.
- the NVMe controller can maintain a mapping relationship between the indication information and the entry address, and uniquely determine the entry address based on the mapping relationship and the indication information.
- the indication identifier is CID of SQE, and the encoding method of the system CID and the addressing method of the entry offset are the same.
- the CID corresponds to the entry offset one by one.
- the base address of the BAR and the offset of the through hole can be used as the host.
- the configuration information is obtained by the NVMe controller, and the NVMe controller may determine the first address of the data packet according to the mapping relationship between the instruction information and the entry address.
- the embodiment of the present invention does not limit the specific implementation of the indication information, as long as the NVMe controller can determine the first address according to the indication information.
- the first address is used to indicate an entry corresponding to a read operation, and is specifically an entry address or a partial field of the entry address.
- Step 402 The host receives a first data message sent by the NVMe controller, where the first data message carries a first address and first payload data.
- the data message may be a PCIe write operation message. More specifically, the data message may be a transaction layer packet (TLP), and the payload data may be a payload carried in the TLP (payload).
- TLP transaction layer packet
- the first address may be a PCIe address in the TLP or a part of the PCIe address in the TLP.
- the NVMe controller maintains a mapping relationship between the indication information and the first address.
- the first address may be an entry address corresponding to a read operation. After the NVMe controller obtains the read instruction, the first address is determined according to the specific implementation of the instruction information, and the data is read from the storage medium according to the read instruction. The first address and the read data encapsulate the TLP and send the TLP to the host.
- Step 403 The host determines a second address that can be addressed by the host according to the first address.
- the first address is used to indicate the entry of a PCIe write operation.
- the NVMe controller writes the data of the read operation to the host through the entry.
- the "entry" represents a range in the host's PCIe address space.
- the host receives the data message sent by the NVMe controller from the portal, it parses the data message and obtains the first address, but does not use the storage space indicated by the first address to store the payload data, but according to the first address and The preset correspondence determines a second address of a storage unit for actually storing the payload data in its internal memory.
- the storage unit indicated by the second address may be a memory space inside the host and is not presented through the PCIe address. That is, the internal memory used by the host to store payload data can no longer be accessed by the host through PCIe addressing, nor is it used as a command memory buffer (Command Memory Buffer (CMB)).
- CMB Command Memory Buffer
- the memory location indicated by the second address can serve as a read buffer for data.
- the first address is used to indicate a specific "entry".
- the host maps the first address to the second address. Specifically, the host may map the first address to the second address through the memory mapping table MTT.
- the host can maintain an MTT entry for each entry. Each entry can associate a certain entry to the corresponding storage unit.
- the storage unit can be a fixed size storage space. In the following description, the storage unit is also referred to as a read page. . Before the host triggers the read instruction, it can allocate the storage unit indicated by the second address to the read instruction, and record the correspondence between the first address and the second address through the MTT entry.
- the entry corresponds to the read operation one by one.
- the data to be read of the read instruction may correspond to at least two data packets according to the size of the data to be read.
- the host may also read the data.
- the instruction allocates at least two memory locations.
- the present invention does not limit the read page size, but it is recommended that the read memory block of the NVMe controller includes an integer number of read pages. After the NVMe controller reads the data from the storage medium, it will be put into the error correction buffer to verify the error, and then write the error corrected data to the "entry".
- the error correction buffer is also called read memory block.
- the host may organize its own memory into the form of a read page pool. Before initiating a read operation, the host allocates the number of read pages required by the read operation from the read page pool and initializes the read operation corresponding to the read operation.
- the MTT entry of the entry the MTT entry records the correspondence between the entry address and the read page address.
- FIG. 8 is a schematic diagram of an MMT entry according to an embodiment of the present invention, and a corresponding relationship between an entry and a read page is recorded in the MTT entry.
- the read pages corresponding to the entrance X are read page 1, read page 7, and read page 4.
- the read pages corresponding to entry Y are read page 2, read page 13, read page 8 and read page 0.
- the read page is a fixed-size storage space.
- the size of the read page may be smaller than the size of the data to be read, so the read operation may require more than one read page.
- the host can allocate at least two read pages to the read instruction when making the allocation. If the host can allocate multiple read pages to the read operation, and the second address points to one of the read pages. The host may determine the second address according to the order of the first address and the payload data in the data to be read.
- the embodiment of the present invention does not limit the manner in which the host determines the order of the load data in the data to be read. If the NVMe controller performs order-preserving when performing a PCIe write operation, the host may determine that the load data is in the Read the order in the data. For example, the NVMe controller also sends a second data packet to the host. The second data packet carries the first address and the second payload data. The second payload data also belongs to the data to be read. The host also receives the second data packet. After the second data message is received, the order of the first load data and the second load data in the data to be read may be determined according to the order of receiving the first data message and the second data message. If the NMVe controller is out of sequence when performing a PCIe write operation, the data packet may also carry the offset of the payload data in the data to be read, which offset is used to indicate that the payload data is to be read Order in the data.
- the NVMe controller may send data packets in a sequence-preserving or non-sequence-preserving manner.
- the NVMe controller can support any of the following or both sequential modes:
- the NVMe controller sends data packets in the order of monotonically increasing data offsets.
- the host receives the payload data according to the order of the data messages.
- no offset is required, that is, the entry width shown in FIG. 7 can be only two bits (standard specification).
- the NVMe controller can send PCIe write transactions in any order, but the data message needs to carry a data offset.
- the NVMe controller can process the read logical blocks in parallel, that is, NVMe can read data corresponding to different logical blocks from the storage medium and place them in different read memory blocks for verification. Because different read memory blocks take different time to complete the verification, the order in which the read memory blocks are written to the host may not be written strictly according to the order of the logical blocks. The read memory block corresponding to the first logical block may be later than the last The read memory block corresponding to the logical block is written to the target memory.
- the NVMe controller reassembles data according to the data offset carried in the transaction message. In this mode, the data packet needs to carry a data offset, that is, the entry width shown in FIG. 7 needs to be greater than or equal to the maximum data transmission size.
- S404 The host writes the first payload data in the first data message into the storage unit indicated by the second address.
- the host After receiving the data message sent by the NVMe controller, the host determines the second address according to the first address, and then writes the payload data into the storage unit indicated by the second address.
- the host after the host completes the write operation to the storage unit indicated by the second address, it can operate the data in the storage unit indicated by the second address, that is, the data can be consumed, for example, the data can be consumed. To other entities, etc.
- the host after the data related to a certain read page is completely written into the read page, the host completes the write operation to the read page, that is, the data in the last TLP related to the read page is written into the After the page is read, the host finishes writing to the read page.
- the size of the data to be read is 4 * P_sz, where P_sz is the size of the read page, that is, the size of the storage space.
- the size of the read memory block is 2 * P_sz, that is, the data to be read requires two read memory blocks for data error correction check.
- a data message is used as an example to describe the TLP.
- the size of the payload data of each TLP is 0.5 * P_sz, that is, each data read from a memory block requires four TLPs to send.
- the NVMe controller sequentially reads the data to be read from the storage medium to the read memory block 0 and the read memory block 1 for verification.
- the NVMe controller can perform data verification of two read memory blocks in parallel because the verification speed of each read memory block is different.
- the read of memory block 1 is completed before the read of memory block 0.
- the NVMe controller first loads the data in the read memory block 1 into the TLP in order and sends it to the host through the PCIe network.
- the data encapsulated in TLP0 and TLP1 are the data of read memory block 1, and then read the read of memory block 0 to complete the verification.
- the NVMe controller encapsulates the data in read memory block 0 into the TLP in the order of the data. And sent to the host through the PCIe network.
- TLP2, TLP4, TLP6, and TLP7 are packaged with data read from memory block 0.
- the data packets received by the host may be out of order.
- the host may determine the order of the payload data in the data to be read according to the data offset in the received TLP, and The MTT is searched according to the order and indication information of the load data in the data to be read, thereby determining the address of the read page storing the load data, and writing the load data into the corresponding read page.
- the host writes the load data of TLP0 and TLP1 to read page 8
- the write operation on read page 8 is completed, and the host can process the data in read page 8.
- the host will After the load data of TLP2 and TLP4 are written to read page 2, the write operation of read page 2 is completed, and the data in read page 2 can be processed.
- the processing performed by the host on the data stored in the read page is specifically consumption data, for example, sent to other subjects, without having to wait for the read data to be completely written before performing operations on the read data.
- the embodiment of the present invention implements a pipelined processing mode and reduces the delay of the read operation.
- the host can release the storage unit for other read operations.
- the host can release the read page 8 to the read page pool for other read operations without having to wait until the entire read operation is completed and After all the data to be read is processed, the storage space occupied by the read operation can be released, thereby reducing the storage space occupation.
- the NVMe controller After the NVMe controller completes the read operation, it is also used to trigger the completion of the queue entry CQE.
- CQE is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
- the host is also used to obtain the completion queue entry CQE. In the embodiment of the present invention, the host may obtain the CQE only after operating the data in the storage unit indicated by the second address.
- the host opens the first address as a data entry address to the NVMe controller for the NVMe controller to write data to be read to the host through the first address, and the data message sent by the NVMe controller
- the destination address carried in it is the first address, but after receiving the data message, the host does not actually write the payload data in the data message into the storage space indicated by the first address, but instead maps the first address to the first address.
- Two addresses and write the payload data of the data message into the storage space indicated by the second address.
- the storage space indicated by the second address may be a private memory space of the host, thereby separating the relationship between the storage space storing the payload data and the communication protocol, and the host's access to the second address is not restricted by the communication protocol.
- the host Before the end of the read instruction, the host can use the data stored in the storage space indicated by the second address and release the storage space indicated by the second address in advance for other read operations.
- the technical solution disclosed in the embodiment of the present invention can reduce the delay of the read operation and save the storage space for storing the data to be read.
- FIG. 1000 is an interaction flowchart of an NVMe-based reading method according to an embodiment of the present invention.
- an application scenario of the method 1000 is a scenario in which a data source is separated from a command source.
- the data source needs to read the data to be read to the storage space of its data source through the command source.
- the embodiment of the present invention is not limited to a specific scenario in which a data source is separated from a command source.
- the scenario where the data source and the command source are separated may be a flash memory cluster (English full name: Justa Bunch) Of Flash, abbreviated based on NOF (full name in English: NVMe). JBOF).
- the data source is a host that needs to access the storage medium
- the command source is a NOF bridge interconnected with the host through the fabric. More specifically, the command source may be a NOF engine in the NOF bridge.
- the NOF bridge is interconnected with the NVMe controller through the PCIe bus, and the NVMe is connected with a storage medium.
- the scenario where the data source and the command source are separated may also be the host and the encryption accelerator.
- the data source is the host and the command source is an encryption accelerator connected to the host. More specifically, The command source is the acceleration engine of the encryption accelerator.
- the encryption accelerator is connected to the NVMe controller through the PCIe bus, and the NVMe controller is connected to a storage medium.
- the SQE when the command source performs a read operation, the SQE carries indication information of the entry address of the data to be read.
- the entry address may essentially be a PCIe address of an optional address of the NMVe controller.
- the NVMe controller After obtaining the SQE, the NVMe controller sends a TLP to the command source through a PCIe write operation, and carries the PCIe address in the TLP.
- the command source parses the TLP packet, obtains the PCIe address, and determines the local storage unit corresponding to the PCIe address according to the mapping relationship between the PCIe address and local memory, and then loads the data in the TLP. Write to the determined memory location.
- An entry can correspond to multiple storage units.
- the command source can operate the data stored in the storage unit.
- the end of the write operation to the storage unit refers to the last corresponding to the storage unit.
- a TLP payload data is written into the memory cell.
- the command source obtains a part of the data to be read, it can send the obtained data to the data source. You do not need to wait for the entire read operation to be completed. You can send the data to be read to the data source.
- the data to be read includes data 1, data 2, data 3, and data 4, data 1, data 2, data 3, and data 4 may respectively correspond to a storage unit.
- the command source receives a storage unit, After the data, the data of the storage unit can be sent to the data source. After the command source sends the data in the storage unit to the data source, it can release the corresponding storage unit for other read operations.
- the command source writes the received load data in the TLP into its own memory space by establishing a mapping relationship between the local memory and the PCIe address, so that the pipeline operation of the data, that is, the command
- the source After the source receives part of the data, it can send the received data to the data source, and the data sent by the NVMe controller and the data sent to the data source can be processed in parallel, which saves storage space for cached data and speeds up Processing speed of read operations.
- FIG. 12 is a schematic diagram of a logical structure of a computing device 1200 according to an embodiment of the present application. As shown in FIG. 12, the computing device 1200 includes:
- the processing unit 1202 is configured to trigger a read instruction to the NVMe controller.
- the read instruction carries instruction information, and the instruction information is used to indicate a first address, and the first address is an address that the NVMe controller can address.
- the receiving unit 1204 is configured to receive a first data packet sent by the NVMe controller, where the first data packet carries a first address and first payload data.
- the processing unit 1202 determines a second address according to the first address, and writes the first payload data into the storage unit indicated by the second address.
- the second address is an address that the processing unit 1202 can address.
- processing unit 1202 After the processing unit 1202 completes the write operation to the storage unit indicated by the second address, it is further configured to operate the data in the storage unit indicated by the second address.
- the processing unit 1202 After the processing unit 1202 operates the data in the storage unit indicated by the second address, the processing unit 1202 is further used to obtain a completion queue entry CQE triggered by the NVMe controller, and the CQE is used to instruct the NVMe controller to complete the read operation indicated by the read instruction.
- processing unit 1202 After the processing unit 1202 operates on the data in the storage unit indicated by the second address, it is also used to release the storage unit indicated by the second address.
- the processing unit 1202 is further configured to allocate a second address indication storage unit for the read instruction, and record the correspondence between the first address and the second address.
- the data to be read of the read instruction corresponds to at least two data packets, and the processing unit 1202 allocates at least two storage units for the read instruction.
- the processing unit 1202 may determine the second address according to the order of the first address and the first payload data in the data to be read.
- the receiving unit 1204 is further configured to receive a second data packet sent by the NVMe controller, and the second data packet carries the first address and the second payload data; the processing unit 1202 is further configured to receive the first data packet according to The order of the text and the second data message determines the order of the first payload data and the second payload data in the data to be read.
- the first data message also carries an offset of the first payload data in the data to be read, and the offset is used to indicate the order of the first payload data in the data to be read.
- the first address may be a PCIe address addressable by the NVMe controller
- the first data packet is a PCIe packet
- the storage unit indicated by the second address may be a memory space of the computing device.
- the processing unit 1202 may be specifically implemented by the read operation logic 310 in the processor 301 in FIG. 3, or may be implemented by the processor 301 and the read operation module 306 in the system memory 302 in FIG. 3. .
- the receiving unit 1204 may be implemented by the processor 301 and the communication interface 303 in the embodiment of FIG. 3.
- the embodiment of the present application is an apparatus embodiment of the host corresponding to the foregoing embodiment.
- the feature descriptions of the foregoing embodiments are applicable to the embodiments of the present application, and details are not described herein again.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Information Transfer Systems (AREA)
- Memory System (AREA)
- Communication Control (AREA)
Abstract
Description
Claims (32)
- 一种基于NVMe的数据读取系统,其特征在于,所述系统包括主机、NVMe控制器和存储介质;所述存储介质用于存储数据;所述主机用于触发读指令,所述读指令中携带指示信息,所述指示信息用于指示第一地址,所述第一地址为NVMe控制器可寻址的地址;所述NVMe控制器用于获取所述读指令,从所述存储介质中读取所述读指令对应的待读取数据,并向所述主机发送第一数据报文,所述第一数据报文中携带所述第一地址和第一载荷数据,所述待读取数据包含所述第一载荷数据;所述主机还用于接收所述第一数据报文,根据所述第一地址确定第二地址,并将所述第一载荷数据写入所述第二地址指示的存储单元,所述第二地址为所述主机可寻址的地址。
- 根据权利要求1所述的系统,其特征在于,所述主机完成对所述第二地址指示的存储单元的写操作后,还用于对所述第二地址指示的存储单元中的数据进行操作。
- 根据权利要求2所述的系统,其特征在于,所述NVMe控制器还用于触发完成队列条目CQE,所述CQE用于指示所述NVMe控制器完成所述读指令指示的读操作;所述主机对所述第二地址指示的存储单元中的数据进行操作后,还用于获取所述完成队列条目CQE。
- 根据权利要求2或3所述的系统,其特征在于,所述主机对所述第二地址指示的存储单元中的数据进行操作后,还用于释放所述第二地址指示的存储单元。
- 根据权利要求1-4任一项所述的系统,其特征在于,所述主机触发所述读指令前,还用于为所述读指令分配所述第二地址指示的存储单元,并记录所述第一地址与所述第二地址的对应关系。
- 根据权利要求5所述的系统,其特征在于,所述读指令的待读取数据对应至少两个数据报文,所述主机为所述读指令分配至少两个存储单元。
- [根据细则91更正 18.07.2019]
根据权利1所述的系统,其特征在于,所述主机用于根据所述第一地址和所述第一载荷数据在所述待读取数据中的顺序确定所述第二地址。 - 根据权利要求7所述的系统,其特征在于,所述NVMe控制器还用于向所述主机发送第二数据报文,所述第二数据报文中携带所述第一地址和第二载荷数据,所述待读取数据包含所述第二载荷数据;所述主机还用于接收所述第二数据报文,并根据接收所述第一数据报文和所述第二数据报文的顺序确定所述第一载荷数据和所述第二载荷数据在所述待读取数据中的顺序。
- 根据权利要求7所述的系统,其特征在于,所述第一数据报文中还携带所述第一载荷数据在所述待读取数据中的偏移量,所述偏移量用于指示所述第一载荷数据在所述待读取数据中的顺序。
- 根据权利要求1-9任一项所述的系统,其特征在于,所述第一地址为所述NVMe控制器可寻址的PCIe地址,所述第一数据报文为PCIe报文;所述第二地址指示的存储单元为所述主机的内存空间。
- 一种基于NVMe的数据读取方法,其特征在于,所述方法包括:主机触发读指令,所述读指令中携带指示信息,所述指示信息用于指示第一地址,所述第一地址为NVMe控制器可寻址的地址;所述主机接收所述NVMe控制器发送的第一数据报文,所述第一数据报文中携带所述第一地址和第一载荷数据;所述主机根据所述第一地址确定第二地址,所述第二地址为所述主机可寻址的地址;所述主机将所述第一载荷数据写入所述第二地址指示的存储单元。
- 根据权利要求11所述的方法,其特征在于,所述主机完成对所述第二地址指示的存储单元的写操作后,所述方法还包括:所述主机对所述第二地址指示的存储单元中的数据进行操作。
- 根据权利要求12所述的方法,其特征在于,所述主机对所述第二地址指示的存储单元中的数据进行操作后,所述方法还包括:所述主机获取所述NVMe控制器触发的完成队列条目CQE,所述CQE用于指示所述NVMe控制器完成所述读指令指示的读操作。
- 根据权利要求12或13所述的方法,其特征在于,所述主机对所述第二地址指示的存储单元中的数据进行操作后,所述方法还包括:所述主机释放所述第二地址指示的存储单元。
- 根据权利要求11-14任一项所述的方法,其特征在于,所述主机触发所述读指令前,所述方法还包括:所述主机为所述读指令分配所述第二地址指示的存储单元,并记录所述第一地址与所述第二地址的对应关系。
- 根据权利要求15所述的方法,其特征在于,所述读指令的待读取数据对应至 少两个数据报文,所述主机为所述读指令分配至少两个存储单元。
- [根据细则91更正 18.07.2019]
根据权利11d所述的方法,其特征在于,所述主机根据所述第一地址和所述第一载荷数据在所述待读取数据中的顺序确定所述第二地址。 - 根据权利要求17所述的方法,其特征在于,所述方法还包括:所述主机接收所述NVMe控制器发送的第二数据报文,所述第二数据报文中携带所述第一地址和第二载荷数据;所述主机根据接收所述第一数据报文和所述第二数据报文的顺序确定所述第一载荷数据和所述第二载荷数据在所述待读取数据中的顺序。
- 根据权利要求17所述的方法,其特征在于,所述第一数据报文中还携带所述第一载荷数据在所述待读取数据中的偏移量,所述偏移量用于指示所述第一载荷数据在所述待读取数据中的顺序。
- 根据权利要求11-19任一项所述的方法,其特征在于,所述第一地址为所述NVMe控制器可寻址的PCIe地址,所述第一数据报文为PCIe报文;所述第二地址指示的存储单元为所述主机的内存空间。
- 一种基于NVMe的数据读取装置,其特征在于,所述装置包括:处理单元,用于触发读指令,所述读指令中携带指示信息,所述指示信息用于指示第一地址,所述第一地址为NVMe控制器可寻址的地址;接收单元,用于接收所述NVMe控制器发送的第一数据报文,所述第一数据报文中携带所述第一地址和第一载荷数据;所述处理单元还用于根据所述第一地址确定第二地址,并将所述第一载荷数据写入所述第二地址指示的存储单元,所述第二地址为所述处理单元可寻址的地址。
- 根据权利要求21所述的装置,其特征在于,所述处理单元完成对所述第二地址指示的存储单元的写操作后,还用于对所述第二地址指示的存储单元中的数据进行操作。
- 根据权利要求22所述的装置,其特征在于,所述处理单元对所述第二地址指示的存储单元中的数据进行操作后,还用于获取所述NVMe控制器触发的完成队列条目CQE,所述CQE用于指示所述NVMe控制器完成所述读指令指示的读操作。
- 根据权利要求22或23所述的装置,其特征在于,所述处理单元对所述第二地址指示的存储单元中的数据进行操作后,还用于释放所述第二地址指示的存储单元。
- 根据权利要求21-24任一项所述的装置,其特征在于,所述处理单元触发所述读指令前,还用于为所述读指令分配所述第二地址指示的存储单元,并记录所述第一地 址与所述第二地址的对应关系。
- 根据权利要求25所述的装置,其特征在于,所述读指令的待读取数据对应至少两个数据报文,所述处理单元为所述读指令分配至少两个存储单元。
- [根据细则91更正 18.07.2019]
根据权利21所述的装置,其特征在于,所述处理单元用于根据所述第一地址和所述第一载荷数据在所述待读取数据中的顺序确定所述第二地址。 - 根据权利要求27所述的装置,其特征在于,所述接收单元还用于接收所述NVMe控制器发送的第二数据报文,所述第二数据报文中携带所述第一地址和第二载荷数据;所述处理单元还用于根据接收所述第一数据报文和所述第二数据报文的顺序确定所述第一载荷数据和所述第二载荷数据在所述待读取数据中的顺序。
- 根据权利要求27所述的装置,其特征在于,所述第一数据报文中还携带所述第一载荷数据在所述待读取数据中的偏移量,所述偏移量用于指示所述第一载荷数据在所述待读取数据中的顺序。
- 根据权利要求21-29任一项所述的装置,其特征在于,所述第一地址为所述NVMe控制器可寻址的PCIe地址,所述第一数据报文为PCIe报文;所述第二地址指示的存储单元为所述装置的内存空间。
- 一种可读介质,其特征在于,包括执行指令,当计算设备的处理器执行所述执行指令时,所述计算设备执行权利要求11-20任一项所述的方法。
- 一种计算设备,其特征在于,包括:处理器、存储器和总线;所述存储器用于存储执行指令,所述处理器与所述存储器通过所述总线连接,当所述计算设备运行时,所述处理器执行所述存储器存储的所述执行指令,以使所述计算设备执行权利要求11-20任一项所述的方法。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18924295.1A EP3792776B1 (en) | 2018-06-30 | 2018-06-30 | Nvme-based data reading method, apparatus and system |
CN201880005007.9A CN111095231B (zh) | 2018-06-30 | 2018-06-30 | 一种基于NVMe的数据读取方法、装置及系统 |
KR1020207022273A KR102471219B1 (ko) | 2018-06-30 | 2018-06-30 | NVMe 기반의 데이터 판독 방법, 장치, 및 시스템 |
JP2020545126A JP7191967B2 (ja) | 2018-06-30 | 2018-06-30 | NVMeベースのデータ読み取り方法、装置及びシステム |
PCT/CN2018/093918 WO2020000482A1 (zh) | 2018-06-30 | 2018-06-30 | 一种基于NVMe的数据读取方法、装置及系统 |
US17/072,038 US11467764B2 (en) | 2018-06-30 | 2020-10-16 | NVMe-based data read method, apparatus, and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/093918 WO2020000482A1 (zh) | 2018-06-30 | 2018-06-30 | 一种基于NVMe的数据读取方法、装置及系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/072,038 Continuation US11467764B2 (en) | 2018-06-30 | 2020-10-16 | NVMe-based data read method, apparatus, and system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020000482A1 true WO2020000482A1 (zh) | 2020-01-02 |
Family
ID=68984388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/093918 WO2020000482A1 (zh) | 2018-06-30 | 2018-06-30 | 一种基于NVMe的数据读取方法、装置及系统 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11467764B2 (zh) |
EP (1) | EP3792776B1 (zh) |
JP (1) | JP7191967B2 (zh) |
KR (1) | KR102471219B1 (zh) |
CN (1) | CN111095231B (zh) |
WO (1) | WO2020000482A1 (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111752484B (zh) * | 2020-06-08 | 2024-04-12 | 深圳大普微电子科技有限公司 | 一种ssd控制器、固态硬盘及数据写入方法 |
CN111831226B (zh) * | 2020-07-07 | 2023-09-29 | 山东华芯半导体有限公司 | 一种自主输出nvme协议命令加速处理方法 |
CN113296691B (zh) * | 2020-07-27 | 2024-05-03 | 阿里巴巴集团控股有限公司 | 数据处理系统、方法、装置以及电子设备 |
JP7496280B2 (ja) | 2020-10-07 | 2024-06-06 | 株式会社竹中工務店 | 関数同定方法 |
CN112527705B (zh) * | 2020-11-05 | 2023-02-28 | 山东云海国创云计算装备产业创新中心有限公司 | 一种PCIe DMA数据通路的验证方法、装置及设备 |
CN113050992A (zh) * | 2021-03-09 | 2021-06-29 | 瀚云科技有限公司 | 一种寄存器地址获取方法、装置、终端及可读存储介质 |
CN113031862B (zh) * | 2021-03-18 | 2024-03-22 | 中国电子科技集团公司第五十二研究所 | 一种基于nvme协议控制sata盘的存储系统 |
KR102695726B1 (ko) * | 2021-07-02 | 2024-08-19 | 한국과학기술원 | 컴퓨팅 장치 및 스토리지 카드 |
CN114996172B (zh) * | 2022-08-01 | 2022-11-01 | 北京得瑞领新科技有限公司 | 基于ssd访问主机内存的方法及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106210041A (zh) * | 2016-07-05 | 2016-12-07 | 杭州华为数字技术有限公司 | 一种数据写入方法及服务器端网卡 |
US20180074757A1 (en) * | 2016-09-09 | 2018-03-15 | Toshiba Memory Corporation | Switch and memory device |
CN107992436A (zh) * | 2016-10-26 | 2018-05-04 | 杭州华为数字技术有限公司 | 一种NVMe数据读写方法及NVMe设备 |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130086311A1 (en) | 2007-12-10 | 2013-04-04 | Ming Huang | METHOD OF DIRECT CONNECTING AHCI OR NVMe BASED SSD SYSTEM TO COMPUTER SYSTEM MEMORY BUS |
JP5957647B2 (ja) * | 2010-06-18 | 2016-07-27 | シーゲイト テクノロジー エルエルシーSeagate Technology LLC | スケーラブルな記憶装置 |
US8966172B2 (en) * | 2011-11-15 | 2015-02-24 | Pavilion Data Systems, Inc. | Processor agnostic data storage in a PCIE based shared storage enviroment |
US9467512B2 (en) * | 2012-01-17 | 2016-10-11 | Intel Corporation | Techniques for remote client access to a storage medium coupled with a server |
US20140195634A1 (en) * | 2013-01-10 | 2014-07-10 | Broadcom Corporation | System and Method for Multiservice Input/Output |
US9256384B2 (en) * | 2013-02-04 | 2016-02-09 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Method and system for reducing write latency in a data storage system by using a command-push model |
US9424219B2 (en) * | 2013-03-12 | 2016-08-23 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Direct routing between address spaces through a nontransparent peripheral component interconnect express bridge |
US9727503B2 (en) * | 2014-03-17 | 2017-08-08 | Mellanox Technologies, Ltd. | Storage system and server |
US20160124876A1 (en) * | 2014-08-22 | 2016-05-05 | HGST Netherlands B.V. | Methods and systems for noticing completion of read requests in solid state drives |
US9565269B2 (en) * | 2014-11-04 | 2017-02-07 | Pavilion Data Systems, Inc. | Non-volatile memory express over ethernet |
US9712619B2 (en) * | 2014-11-04 | 2017-07-18 | Pavilion Data Systems, Inc. | Virtual non-volatile memory express drive |
US9575853B2 (en) | 2014-12-12 | 2017-02-21 | Intel Corporation | Accelerated data recovery in a storage system |
CN106484549B (zh) | 2015-08-31 | 2019-05-10 | 华为技术有限公司 | 一种交互方法、NVMe设备、HOST及物理机系统 |
EP3916536A1 (en) | 2015-12-28 | 2021-12-01 | Huawei Technologies Co., Ltd. | Data processing method and nvme storage device |
US9921756B2 (en) * | 2015-12-29 | 2018-03-20 | EMC IP Holding Company LLC | Method and system for synchronizing an index of data blocks stored in a storage system using a shared storage module |
WO2017176775A1 (en) * | 2016-04-04 | 2017-10-12 | Marvell World Trade Ltd. | Methods and systems for accessing host memory through non-volatile memory over fabric bridging with direct target access |
CN107832086B (zh) * | 2016-09-14 | 2020-03-20 | 华为技术有限公司 | 计算机设备、程序写入方法及程序读取方法 |
CN110413542B (zh) | 2016-12-05 | 2023-08-22 | 华为技术有限公司 | NVMe over Fabric架构中数据读写命令的控制方法、设备和系统 |
US11451647B2 (en) * | 2016-12-27 | 2022-09-20 | Chicago Mercantile Exchange Inc. | Message processing protocol which mitigates optimistic messaging behavior |
US10387081B2 (en) * | 2017-03-24 | 2019-08-20 | Western Digital Technologies, Inc. | System and method for processing and arbitrating submission and completion queues |
US10503434B2 (en) | 2017-04-12 | 2019-12-10 | Micron Technology, Inc. | Scalable low-latency storage interface |
CN107608909A (zh) | 2017-09-19 | 2018-01-19 | 记忆科技(深圳)有限公司 | 一种NVMe固态硬盘写加速的方法 |
JP6901427B2 (ja) * | 2018-03-27 | 2021-07-14 | キオクシア株式会社 | ストレージ装置、コンピュータシステムおよびストレージ装置の動作方法 |
-
2018
- 2018-06-30 CN CN201880005007.9A patent/CN111095231B/zh active Active
- 2018-06-30 JP JP2020545126A patent/JP7191967B2/ja active Active
- 2018-06-30 EP EP18924295.1A patent/EP3792776B1/en active Active
- 2018-06-30 KR KR1020207022273A patent/KR102471219B1/ko active IP Right Grant
- 2018-06-30 WO PCT/CN2018/093918 patent/WO2020000482A1/zh unknown
-
2020
- 2020-10-16 US US17/072,038 patent/US11467764B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106210041A (zh) * | 2016-07-05 | 2016-12-07 | 杭州华为数字技术有限公司 | 一种数据写入方法及服务器端网卡 |
US20180074757A1 (en) * | 2016-09-09 | 2018-03-15 | Toshiba Memory Corporation | Switch and memory device |
CN107992436A (zh) * | 2016-10-26 | 2018-05-04 | 杭州华为数字技术有限公司 | 一种NVMe数据读写方法及NVMe设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3792776A4 * |
Also Published As
Publication number | Publication date |
---|---|
CN111095231B (zh) | 2021-08-03 |
KR20200101982A (ko) | 2020-08-28 |
JP7191967B2 (ja) | 2022-12-19 |
EP3792776A1 (en) | 2021-03-17 |
US20210034284A1 (en) | 2021-02-04 |
CN111095231A (zh) | 2020-05-01 |
EP3792776B1 (en) | 2022-10-26 |
KR102471219B1 (ko) | 2022-11-25 |
EP3792776A4 (en) | 2021-06-09 |
JP2021515318A (ja) | 2021-06-17 |
US11467764B2 (en) | 2022-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020000482A1 (zh) | 一种基于NVMe的数据读取方法、装置及系统 | |
US9734085B2 (en) | DMA transmission method and system thereof | |
US8352689B2 (en) | Command tag checking in a multi-initiator media controller architecture | |
US11579803B2 (en) | NVMe-based data writing method, apparatus, and system | |
WO2013170731A1 (zh) | 将数据写入存储设备的方法与存储设备 | |
US8516170B2 (en) | Control flow in a ring buffer | |
US20040186931A1 (en) | Transferring data using direct memory access | |
CN112416250B (zh) | 基于NVMe的固态硬盘的命令处理方法及相关设备 | |
US10963295B2 (en) | Hardware accelerated data processing operations for storage data | |
US20230359396A1 (en) | Systems and methods for processing commands for storage devices | |
US11200180B2 (en) | NVMe SGL bit bucket transfers | |
WO2020087931A1 (zh) | 一种数据备份方法、装置及系统 | |
US11789634B2 (en) | Systems and methods for processing copy commands | |
US10802828B1 (en) | Instruction memory | |
WO2022073399A1 (zh) | 存储节点、存储设备及网络芯片 | |
US10261700B1 (en) | Method and apparatus for streaming buffering to accelerate reads | |
KR20140113370A (ko) | 패스 스루 스토리지 디바이스들 | |
JP2005267139A (ja) | ブリッジ装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18924295 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20207022273 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2020545126 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2018924295 Country of ref document: EP Effective date: 20201208 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |