WO2020000483A1 - 数据处理的方法和存储系统 - Google Patents
数据处理的方法和存储系统 Download PDFInfo
- Publication number
- WO2020000483A1 WO2020000483A1 PCT/CN2018/093919 CN2018093919W WO2020000483A1 WO 2020000483 A1 WO2020000483 A1 WO 2020000483A1 CN 2018093919 W CN2018093919 W CN 2018093919W WO 2020000483 A1 WO2020000483 A1 WO 2020000483A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- host
- target
- queue
- pcie
- address
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1642—Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/161—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
- G06F13/1621—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by maintaining request order
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/161—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
- G06F13/1626—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests
- G06F13/1631—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests through address comparison
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/36—Handling requests for interconnection or transfer for access to common bus or bus system
- G06F13/362—Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control
- G06F13/3625—Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using a time dependent access
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4282—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0688—Non-volatile semiconductor memory arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
Definitions
- the present application relates to the storage field, and in particular, to a data processing method and a storage system.
- NVMe non-volatile memory express
- NVM The interface for the communication of the subsystem (including the controller and at least one SSD) is attached to the high-speed serial computer expansion bus standard (Peripheral Component Interconnect Express) (PCIe) interface in the form of a register interface. It is used for enterprise-level and consumer-level solid-state storage. The optimization has the advantages of high performance and low access delay.
- PCIe Peripheral Component Interconnect Express
- the host creates input / output (I / O) submission queues and I / O completion queues in its memory.
- NVMe is based on paired I / O submission queues and I / O completion queue mechanisms.
- the I / O submission queue is a ring buffer used to store one or more data operation requests to be executed by the controller.
- the I / O completion queue is a circular buffer that stores the operation results of the controller's completed data operation requests.
- Each I / O submission queue corresponds to one I / O completion queue, and the same I / O completion queue can correspond to multiple I / O submission queues.
- the I / O completion queue and the I / O submission queue are specified by the host to match, and the operation results of the pending data operation requests in each I / O submission queue are stored in a designated I / O completion queue.
- the specific process of NVMe data processing includes: When the host receives one or more pending data operation requests, first, the host stores it to the I / O submission queue; then, the host updates the I / O submission queue tail gate register (located at The storage area of the NVMe controller), the doorbell is used to notify the NVMe controller of the pending data operation request; the NVMe controller obtains the pending data operation request in the I / O submission queue through direct memory access (DMA) read ; After the NVMe controller completes the above data operation request, it stores the operation result to an I / O completion queue through DMA write.
- DMA direct memory access
- This I / O completion queue is the I / O with the NVMe controller to obtain the data operation request.
- the receive queue matches the I / O completion queue.
- the NVMe controller submits an operation result of a data operation request stored in a queue to the I / O, it first sends an interrupt request to the host to notify the host that a data operation request has been completed.
- the above process requires the host and the NVMe controller to notify each other by using the doorbell mechanism and the interrupt mode, and the data processing process is complicated.
- This application provides a data processing method, device, and storage system, which can solve the problem of complex data processing in the conventional technology.
- the present application provides a data processing method.
- the method includes: a non-volatile memory bus standard NVMe controller and a host computer expand a PCIe bus communication through a high-speed serial computer, and the NVMe controller receives
- the first PCIe message stores at least one submission queue entry SQE to the target I / O submission queue according to the entry information of the target I / O submission queue.
- the memory of the NVMe controller is provided with at least one input / output I / O submission queue.
- the first PCIe message includes entry information of the target I / O submission queue and at least one submission queue entry SQE.
- One SQE corresponds to one data operation request. Each data operation request is used to perform a read operation or a write operation on the storage medium managed by the NVMe.
- the host directly sends the SQE to the NVMe controller through the PCIe message, thus avoiding the host not being aware of it every time the SQE operation is completed.
- the data structure and storage location of the I / O submission queue The host and the NVMe controller communicate through the entry information of the target I / O submission queue.
- the NVMe controller can store the SQE according to the entry information of the target I / O submission queue. .
- the doorbell mechanism in the traditional technology can be eliminated to simplify the data processing process.
- the entry information of the target I / O submission queue is the only first PCIe address in the host-addressable PCIe address space, and the NVMe controller according to the entry information of the target I / O submission queue,
- the process of saving at least one SQE to the target I / O submission queue includes: determining a second address according to the first PCIe address, and the second address is an address of the target I / O submission queue stored in the memory of the NVMe controller; The second address stores the at least one SQE to the target I / O submission queue.
- the process for the NVMe controller to determine the second address according to the first PCIe address includes: first determining an identifier of the target I / O submission queue according to the first PCIe address of the target I / O submission queue, And determining the second address according to the identifier of the target I / O submission queue.
- a PCIe address in a host-addressable PCIe address space is used to mark each I / O submission queue, each I / O submission queue is assigned a PCIe address, and the PCIe address is used as the entry information of the I / O submission queue.
- the NVMe controller can store at least one SQE to the target I / O submission queue based on the PCIe address.
- the NVMe controller calculates the identifier of the target I / O submission queue according to the following formula:
- ADD 11 is the first PCIe address of the target I / O submission queue
- ADD 21 is the start of the continuous address space divided by the host in the addressable PCIe address space and used to identify each I / O submission queue.
- Start address MCS is the maximum number of aggregated SQEs in each I / O submission queue.
- the NVMe controller before the NVMe controller receives the first PCIe message of the host, the NVMe controller receives a creation instruction of the host, and according to the creation instruction, at least one I / is set in the memory of the NVMe controller. O submission queue, and record the association between the identity of each I / O submission queue and the address information of each I / O submission queue in the memory of the NVMe controller.
- the NVMe controller can create at least one I / O submission queue according to business requirements, and realize data storage in the I / O submission queue.
- the NVMe controller before the NVMe controller receives the creation instruction of the host, the NVMe controller negotiates with the host the maximum number of aggregated SQEs MCS in each I / O submission queue, and the negotiated MCS is the NVMe control The smaller the largest aggregated SQE in each I / O submission queue supported by the server and the largest aggregated SQE in each I / O submission queue supported by the host.
- the NVMe controller and the host can determine the maximum number of SQEs that the host can push each time by negotiating the MCS, that is, the maximum number of SQEs that can be carried on the same PCIe. By this way of pushing, the host and NVMe control can be reduced The number of packets between routers improves the efficiency of data processing.
- the NVMe controller before the NVMe controller receives the creation instruction of the host, the NVMe controller negotiates with the host the maximum number of aggregated CQEs MCC in each I / O submission queue, and the negotiated MCC is the NVMe control The smaller the largest aggregated CQE in each I / O completion queue supported by the server and the largest aggregated CQE in each I / O completion queue supported by the host.
- the NVMe controller and the host can determine the maximum number of CQEs that the NVMe controller can push each time by negotiating the MCC, that is, the maximum number of CQEs that can be carried on the same PCIe. This way of pushing can reduce the number of hosts and The number of messages between NVMe controllers improves the efficiency of data processing.
- the process of negotiating the MCC by the host and the NVMe controller may be performed together with the process of negotiating the MCS, that is, the negotiation request message sent by the host includes both the content of the negotiated MCS and the content of the negotiated MCC; the NVMe control
- the response result of the negotiation request message returned by the router also includes the MCS and MCC confirmed by it.
- the process of negotiating the MCC by the host and the NVMe controller may be performed separately from the process of negotiating the MCS. Confirm the MCC with different negotiation request messages.
- the first PCIe message further includes depth information M of the target I / O submission queue, where M indicates the number of SQEs carried in the first PCIe message, 1 ⁇ M ⁇ MCS; then
- the NVMe controller saves at least one SQE to the target I / O submission queue, including: determining a preset sequence of M SQEs, and saving M SQEs to the target I / O submission queue according to the preset sequence of M SQEs.
- the preset order of M SQEs refers to the order in which the host receives data operation requests.
- the NVMe controller can store SQEs in the order in which the host receives data operation requests, ensuring the I / O submission queue stored in the NVMe memory.
- the order of SQE is the same as the order in which the host receives the data operation request.
- At least one I / O completion queue is set in the memory of the host, and the NVMe controller obtains at least one SQE from the target I / O submission queue, and operates according to the data carried in the at least one SQE.
- Request a read or write operation on the storage medium managed by the NVME controller; send a second PCIe message to the host.
- the second PCIe message includes the entry information of the target I / O completion queue and at least one completion queue entry CQE.
- Each CQE is an operation result of the NVMe controller performing a data operation request carried in each SQE.
- the NVMe controller can also store at least one CQE to the target I / O completion queue in a manner based on the entry information of the target I / O completion queue, thereby eliminating the interrupt mechanism in the traditional technical solution and simplifying data processing. Process to improve processing efficiency.
- the NVMe controller uses the operation results of at least two data operation requests as the payload data of the second PCIe message, and one CQE corresponds to the operation of one data operation request Result:
- the CQE aggregation conditions include that the maximum polymerizable CQE size MCCB meets the first threshold or the I / O completion queue aggregation timer CQT recording time meets the second threshold.
- the NVMe controller can send multiple CQEs to the host using the same PCIe message in a push manner. That is, the NVMe controller uses multiple SQEs as the payload data of a PCIe message at the same time and sends the message to the host. Therefore, the number of packets between the host and the NVMe controller is reduced, and the processing efficiency is improved.
- the second message further includes depth information N of the target I / O completion queue, where N indicates the number of CQEs carried in the second PCIe message; the NVMe controller follows a preset order Take N CQEs as payload data in the second PCIe message, where 1 ⁇ N ⁇ MCC, where the MCC is a positive integer.
- the preset order of the N CQEs refers to the order in which the NVMe controller completes the corresponding operation results generated by the SQE.
- the second PCIe message uses this preset order to ensure that the host stores the order of operation results and the order of operation results actually generated by NVMe Consistent.
- the host is an embedded processor, and the embedded processor supports sending a PCIE message with a capacity of at least 64 bytes.
- the host may also use several non-contiguous storage ranges in the base address register to indicate Each I / O submission queue.
- the mapping relationship between the address allocated by each I / O submission queue and the identifier of the I / O submission queue is recorded in the host and the NVMe controller, and the host can send the first PCIe packet to the NVMe controller according to the mapping relationship.
- the first PCIe message includes the PCIe address allocated by the host for the target I / O submission queue, and the PCIe address is used as the entry information of the target I / O submission queue.
- the NVMe controller can parse the first PCIe message to obtain the PCIe address field allocated by the host for the target I / O submission queue, and then determine the identifier of the I / O submission queue corresponding to the PCIe address field according to the mapping relationship, and then, Storing at least one SQE in the first PCIe message to the target I / O submission queue according to the identifier of the I / O submission queue. This completes the process of the NVMe controller storing at least one SQE based on the entry information of the target I / O submission queue.
- the above process can also avoid the use of the doorbell mechanism in the traditional technology and simplify the data processing process.
- the host can send multiple SQEs to the NVMe controller using the same PCIe message through SQE aggregation conditions in a push manner, reducing the number of message packets between the host and the NVMe controller, and improving data processing efficiency.
- the host-addressable PCIe address space includes an address space of a memory of the host and an address space of a PCIe-based address register in the host.
- the embodiment of the present invention can also use the addresses in the memory address space of the host to map the identifiers of each I / O submission queue.
- the identifier of the I / O submission queue corresponds to a unique PCIe address.
- the host stores the mapping relationship between the address in the memory address space and the identifier of the I / O submission queue, and the host can send a PCIe message carrying at least one SQE to the NVMe controller according to the mapping relationship, and the NVMe controller can also According to the mapping relationship, the identifier of the corresponding I / O submission queue is determined according to the mapping address carried in the PCIe message, and then at least one SQE is stored in the target I / O submission queue.
- the doorbell mechanism in the traditional technology can also be cancelled, and the data processing process can be simplified.
- the same PCIe message is used to send multiple SQEs to the NVMe controller, which can also reduce the number of message messages between the host and the NVMe controller and improve data processing efficiency.
- the host and the NVMe controller may also use a specified field in the PCIe message or a part of the payload data according to a pre-approval.
- the identifier of the transmission target I / O submission queue is transmitted to the NVMe controller.
- the NVMe controller parses the PCIe message to obtain the specified field (for example, a reserved field in the PCIe message or the payload data start bit), and performs The convention determines the identity of the I / O submission queue represented by the specified field.
- the doorbell mechanism in the traditional technology can also be eliminated to simplify the data processing process.
- the present application provides a data processing method.
- the method includes: expanding a PCIe bus communication between a non-volatile storage bus standard NVMe controller and a host through a high-speed serial computer, and the host operates according to the data to be sent.
- the identifier of the requested target input / output I / O submission queue determines the entry information of the target I / O submission queue; and sends a first PCIe message to the NVMe controller.
- the memory of the NVMe controller is provided with at least one I / O submission queue;
- the first PCIe message includes entry information of the target I / O submission queue and at least one submission queue entry SQE, one SQE corresponds to one data operation request, and each Each data operation request is used to perform a read operation or a write operation on a storage medium managed by the NVMe controller.
- the host allocates a unique PCIe address in the host-addressable PCIe address space for each I / O submission queue, and the entry information of the target I / O submission queue is the target I / O submission
- the first PCIe address of the queue is the memory space of the host's memory or the address space of the PCIe base address register in the host; the target I / O is determined according to the identifier of the target I / O submission queue.
- the first PCIe address of the submission queue is the memory space of the host's memory or the address space of the PCIe base address register in the host.
- the host sends a creation instruction to the NVMe controller, and the creation instruction is used to instruct the NVMe controller to set at least one I / O submission queue in the memory of the NVMe controller, and record each I / O The association relationship between the identity of the submission queue and the first PCIe address of each I / O submission queue.
- the host before the host sends a creation instruction to the NVMe controller, the host negotiates with the NVMe controller the maximum number of aggregated SQEs MCS in each I / O submission queue, and the MCS obtained through negotiation is The smaller of the maximum number of aggregated SQEs in each I / O submission queue supported by the NVMe controller and the maximum number of aggregated SQEs in each I / O submission queue supported by the host.
- the first PCIe message further includes depth information M of the target I / O submission queue, where M indicates the number of SQEs carried in the first PCIe message, and 1 ⁇ M ⁇ MCS.
- the host uses at least two SQEs as the payload data of the first PCIe message, and one data operation request corresponds to one SQE , Each data operation request is used to perform a read operation or a write operation on the storage medium managed by the NVMe; wherein the SQE aggregation conditions include a maximum aggregatable SQE size MCSB meets a third threshold or an I / O submission queue aggregation timer SQT record Time meets a fourth threshold.
- the host receives a second PCIe message sent by the NVMe controller, and the second PCIe message includes entry information of the target I / O completion queue and at least one completion queue entry CQE, and each CQE is
- the NVMe controller executes the operation result of the data operation request carried in each SQE; stores at least one completion queue entry to the target I / O completion queue according to the entry information of the target I / O completion queue.
- the entry information of the target I / O completion queue is the only second PCIe address in the addressable PCIe address space of the host; then the host according to the entry information of the target I / O completion queue
- the process of storing at least one CQE in the target I / O completion queue includes: first, determining a third address according to the second PCIe address, and the third address is a memory for storing the target I / O completion queue in the memory of the host; An address; and storing the at least one CQE to the target I / O completion queue according to the third address.
- the process of determining the third address by the host according to the second PCIe address includes: first determining the identifier of the target I / O completion queue according to the second PCIe address, and then identifying the identifier of the target I / O completion queue. Determining the third address.
- the host calculates the identifier of the target I / O completion queue according to the following formula:
- ADD 12 is the second PCIe address of the target I / O completion cross queue
- ADD 22 is a continuous address space divided in the host-addressable PCIe address space and used to identify each I / O completion queue.
- Starting address, MCC is the maximum number of aggregated CQEs in each I / O completion queue;
- the host stores the at least one CQE to a memory of the host according to an identifier of the target I / O completion queue.
- the host is an embedded processor, and the embedded processor supports sending a PCIE message with a capacity of at least 64 bytes.
- the host may also use several non-contiguous storage ranges in the base address register to indicate Each I / O completion queue.
- the mapping relationship between the address allocated by each I / O completion queue and the identifier of the I / O completion queue is recorded in the host and the NVMe controller, and the NVMe controller may send a second PCIe packet to the host according to the mapping relationship.
- the second PCIe message includes a PCIe address allocated by the host for the target I / O completion queue, and the PCIe address is used as entry information of the target I / O completion queue.
- the host can parse the second PCIe message to obtain the PCIe address field assigned by the target I / O submission queue, and then determine the identifier of the I / O completion queue corresponding to the PCIe address field according to the mapping relationship, and then, according to the I / O
- the identifier of the O completion queue stores at least one CQE in the second PCIe message to the target I / O completion queue. This completes the process of the host storing at least one CQE based on the entry information of the target I / O completion queue.
- the above process can also cancel the interrupt mechanism in the traditional technology and simplify the data processing process.
- the NVMe controller can send multiple CQEs to the host through the same PCIe message through the CQE aggregation method in the method shown in Figure 3 in a push manner, reducing the number of message packets between the host and the NVMe controller, and improving Improved data processing efficiency.
- the host-addressable PCIe address space includes an address space of a memory of the host and an address space of a PCIe-based address register in the host.
- the identifier of each I / O completion queue is mapped using the address space of the base address register.
- the identifier of each I / O completion queue may be mapped using the address in the memory address space of the host.
- the ID of the I / O completion queue corresponds to a unique address. At this time, the mapping relationship between the address in the memory address space and the identifier of the I / O completion queue is stored in the host and the NVMe controller.
- the NVMe controller can send a PCIe packet carrying at least one CQE to the host according to the mapping relationship, and the host also According to the mapping relationship, the identifier of the corresponding I / O completion queue can be determined according to the mapping address carried in the PCIe packet, and then at least one CQE is stored in the target I / O completion queue.
- the interruption mechanism in the conventional technology can also be eliminated, and the data processing process can be simplified.
- using the same PCIe packet to send multiple CQEs to the host can also reduce the number of message packets between the host and the NVMe controller and improve data processing efficiency.
- the NVMe controller can also use the designated field in the PCIe message or a part of the payload data according to a predetermined agreement.
- the target of the target I / O completion queue is transmitted to the host, and the host then uses the specified field (for example, a reserved field in the PCIe message or the payload data start bit) to parse the PCIe message according to a predetermined agreement, and obtains the PCIe message.
- the ID of the target I / O completion queue carried in the target and stores the operation result to the I / O completion queue.
- the host records the correspondence between the I / O submission queue and the I / O completion queue.
- the second message can also directly carry the identifier of the target I / O submission queue.
- the host can obtain the identifier of the target I / O submission queue, and then according to the I / O submission queue and I / O The correspondence between the completion queues determines the target I / O completion queue, and then stores at least one CQE carried in the second packet to the target I / O completion queue.
- the present application provides a data processing method.
- the method includes: a non-volatile storage bus standard NVMe controller communicates with a host through a high-speed serial computer expansion bus standard PCIe communication, and the NVMe controller receives the host's transmission.
- the first PCIe packet of the first PCIe packet according to the entry information of the target I / O submission queue, saving at least one SQE to the target I / O submission queue.
- the memory of the NVMe controller is provided with at least one input / output I / O submission queue.
- the first PCIe message includes the entry information of the target I / O submission queue and at least one submission queue entry SQE.
- One SQE corresponds to one data operation request. Each data operation request is used to perform a read operation or a write operation on the storage medium managed by the NVMe.
- the host directly sends the SQE to the NVMe controller through the PCIe message, thus avoiding the host not being aware of it every time the SQE operation is completed.
- the data structure and storage location of the I / O submission queue The host and the NVMe controller communicate through the entry information of the target I / O submission queue.
- the NVMe controller can store the SQE according to the entry information of the target I / O submission queue. .
- the doorbell mechanism in the traditional technology can be eliminated to simplify the data processing process.
- the entry information of the target I / O submission queue is the only first PCIe address in the host-addressable PCIe address space, and the NVMe controller according to the entry information of the target I / O submission queue,
- the process of saving at least one SQE to the target I / O submission queue includes: determining a second address according to the first PCIe address, and the second address is an address of the target I / O submission queue stored in the memory of the NVMe controller; The second address stores the at least one SQE to the target I / O submission queue.
- the process for the NVMe controller to determine the second address according to the first PCIe address includes: first determining an identifier of the target I / O submission queue according to the first PCIe address of the target I / O submission queue, And determining the second address according to the identifier of the target I / O submission queue.
- a section of an address in a host-addressable PCIe address space is used to mark each I / O submission queue.
- Each I / O submission queue is assigned a PCIe address, and the PCIe address is used as the entry information of the I / O submission queue.
- the NVMe controller can store at least one SQE to the target I / O submission queue based on the PCIe address.
- the NVMe controller calculates the identifier of the target I / O submission queue according to the following formula:
- ADD 11 is the first PCIe address of the target I / O submission queue
- ADD 21 is the start of the continuous address space divided by the host in the addressable PCIe address space and used to identify each I / O submission queue.
- Start address MCS is the maximum number of aggregated SQEs in each I / O submission queue.
- the NVMe controller before the NVMe controller receives the first PCIe message of the host, the NVMe controller receives a creation instruction of the host, and according to the creation instruction, at least one I / is set in the memory of the NVMe controller. O submission queue, and record the association between the identity of each I / O submission queue and the address information of each I / O submission queue in the memory of the NVMe controller.
- the NVMe controller can create at least one I / O submission queue according to business requirements, and realize data storage in the I / O submission queue.
- the NVMe controller before the NVMe controller receives the creation instruction of the host, the NVMe controller negotiates with the host the maximum number of aggregated SQEs MCS in each I / O submission queue, and the negotiated MCS is the NVMe control The smaller the largest aggregated SQE in each I / O submission queue supported by the server and the largest aggregated SQE in each I / O submission queue supported by the host.
- the NVMe controller and the host can determine the maximum number of SQEs that the host can push each time by negotiating the MCS, that is, the maximum number of SQEs that can be carried on the same PCIe. By this push, the host and NVMe control can be reduced. The number of packets between routers improves the efficiency of data processing.
- the first PCIe message further includes depth information M of the target I / O submission queue, where M indicates the number of SQEs carried in the first PCIe message, 1 ⁇ M ⁇ MCS; then
- the NVMe controller saves at least one SQE to the target I / O submission queue, including: determining a preset sequence of M SQEs, and saving M SQEs to the target I / O submission queue according to the preset sequence of M SQEs.
- the preset order of M SQEs refers to the order in which the host receives data operation requests.
- the NVMe controller can store SQEs in the order in which the host receives data operation requests, ensuring the I / O submission queue stored in the NVMe memory
- the order of SQE is the same as the order in which the host receives the data operation request.
- the present application provides a data processing method.
- the method includes: at least one I / O completion queue is set in the memory of the host; the NVMe controller obtains at least one SQE from the target I / O submission queue; At least one data operation carried in the SQE requests a read or write operation on a storage medium managed by the NVME controller; a second PCIe message is sent to the host, and the second PCIe message includes the entry information of the target I / O completion queue and At least one completion queue entry CQE, and each CQE is an operation result of the NVMe controller performing a data operation request carried in each SQE.
- the NVMe controller can also store at least one CQE to the target I / O completion queue in a manner based on the entry information of the target I / O completion queue, thereby eliminating the interrupt mechanism in the traditional technical solution and simplifying data processing Process to improve processing efficiency.
- the NVMe controller uses the operation results of at least two data operation requests as the payload data of the second PCIe message, and one CQE corresponds to the operation result of one data operation request.
- the CQE aggregation conditions include that the maximum aggregateable CQE size MCCB meets a first threshold or the I / O completion queue aggregation timer CQT recording time meets a second threshold.
- the NVMe controller can send multiple CQEs to the host using the same PCIe message in a push manner. That is, the NVMe controller uses multiple SQEs as the payload data of a PCIe message at the same time and sends the message to the host. Therefore, the number of packets between the host and the NVMe controller is reduced, and the processing efficiency is improved.
- the second message further includes depth information N of the target I / O completion queue, where N indicates the number of CQEs carried in the second PCIe message; the NVMe controller follows a preset order Take N CQEs as payload data in the second PCIe message, where 1 ⁇ N ⁇ MCC, where the MCC is a positive integer.
- the preset order of the N CQEs refers to the order in which the NVMe controller completes the corresponding operation results generated by the SQE.
- the second PCIe message uses this preset order to ensure that the host stores the order of operation results and the order of operation results actually generated by NVMe. Consistent.
- the host is an embedded processor, and the embedded processor supports sending a PCIE message with a capacity of at least 64 bytes.
- the present application provides a data processing apparatus, and the apparatus includes various modules for performing the fault processing method in each of the foregoing aspects or any possible implementation manner of each aspect.
- the present application provides a data processing storage system.
- the storage system includes a host and an NVMe controller, a first memory and a second memory.
- Computer extended PCIe bus communication the first memory stores computer instructions executed by the host and data storage to implement I / O completion queues, and the second memory stores computer instructions executed by the NVMe controller and implements I / O O submit the data storage of the queue, and when the storage system is running, the NVMe controller is configured to execute a computer execution instruction in the second storage to use the hardware resources in the storage system to perform the first aspect or any of the first aspect
- the host is configured to execute a computer execution instruction in the first memory to utilize the storage
- the hardware resources in the system execute the second aspect or any possible implementation manner of the second aspect, and the fourth aspect or the first aspect Procedure
- the present application provides a computer-readable storage medium.
- the computer-readable storage medium stores instructions.
- the computer-readable storage medium runs on a computer, the computer executes the methods described in the foregoing aspects.
- the present application provides a computer program product containing instructions that, when run on a computer, causes the computer to perform the methods described in the above aspects.
- FIG. 1 is a schematic structural diagram of a storage device according to an embodiment of the present invention.
- FIG. 2 is a schematic flowchart of an NVMe-based data processing method according to an embodiment of the present invention.
- FIG. 3 is a schematic flowchart of another NVMe-based data processing method according to an embodiment of the present invention.
- 4A is a schematic diagram of a host assigning a PCIe address to an I / O submission queue in an address space of a base address register according to an embodiment of the present invention
- 4B is a schematic diagram of a host assigning a PCIe address to an I / O completion queue in an address space of a base address register according to an embodiment of the present invention
- FIG. 5 is a schematic structural diagram of an NVMe controller according to an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a host according to an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of a storage system according to an embodiment of the present invention.
- FIG. 1 is a schematic architecture diagram of a storage system according to an embodiment of the present invention.
- the storage system 100 includes a host 101, an NVMe controller 102, at least one solid state drive (SSD) 103, a first memory 104 and a second memory 105, and both the host and the NVMe controller are configured with memory.
- the memory of the host is referred to as the first memory 104
- the memory of the NVMe controller is referred to as the second memory 105.
- PCIe Peripheral Component Interconnect Express
- the host 101 is a processor.
- the processor may be a central processing unit (CPU).
- the processor may also be another general-purpose processor or a digital signal processor (DSP). , Application specific integrated circuit (ASIC), field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor or any conventional processor.
- the processing may also be a chip-on-chip (system of chip, SoC) or an embedded processor.
- SoC chip-on-chip
- SoC chip-on-chip
- the processor supports sending PCIe messages with a capacity of at least 64 bytes.
- the first memory 104 and the second memory 105 may be implemented by a random access memory (RAM) or other storage media.
- the NVMe controller 102 and at least one solid state hard disk 103 may be collectively referred to as the NVM subsystem.
- the NVMe subsystem is used to receive and execute data operation requests sent by the host, and each data operation request is used for the NVMe controller.
- the solid state hard disk 103 managed by 102 performs a read operation or a write operation.
- NVMe controllers 102 there may be one or more NVMe controllers 102, and only one is shown in the figure.
- the storage system contains multiple NVMe controllers, there is one active NVMe controller that communicates with the host.
- the other NVMe controllers act as standby NVMe controllers.
- a standby NVMe controller is upgraded to an active NVMe controller.
- FIG. 1B is a schematic diagram of a logical structure of an I / O submission queue and an I / O completion queue.
- the I / O submission queue is used to store data operation requests.
- the data operation requests are specifically called by upper-layer applications and sent to the host by the host interface, including requests to read data stored in the SSD, and requests to write data to the SSD.
- the first memory 104 of the host 101 is used to implement data storage of the I / O completion queue
- the second memory 105 of the NVMe controller 102 is used to implement data storage of the I / O submission queue.
- the I / O submission queue is a logical concept. It consists of one or more units. Each unit stores a data operation request. Each data operation request can be stored with a storage space with a maximum size of 64 bytes.
- the I / O submission queue corresponds to a ring buffer for storing one or more data operation requests. Specifically, the I / O submission queue can be expressed by using a physical region page (PRG) or a hash chain (scatter list).
- Each data operation request also called a submission queue entry (SQE) or a submission queue element (SQE)
- SQE submission queue entry
- SQE submission queue element
- Each slot corresponds to two PRGs or one SGL in the buffer.
- the I / O submission queue is provided with a head pointer and a tail pointer.
- the head pointer is used to indicate the slot of the SQE that can be taken out at the current moment
- the tail pointer is used to indicate the slot that can store the newly added SQE at the current moment.
- the tail pointer is incremented by 1 each time a new SQE is added to the I / O submission queue.
- the head pointer is incremented by one.
- the data operation requests to be executed need to be stored in the slots of the submission queue one by one in the received order, and then read one by one in the order of first in, first out (FIFO).
- the I / O completion queue is a circular buffer for storing the operation results of the NVMe controller's completed data operation requests. Similar to the structure of the I / O submission queue, the I / O completion queue is also a logical concept. It consists of one or more units, and each unit can be called a slot. The I / O completion queue also corresponds to a circular buffer used to store the operation results of one or more data operation requests. It can be specifically expressed by PRG or SGL. The operation result of each data operation request can also be called a completion queue entry. (complete queue entry) (CQE) or complete queue element (CQE).
- CQE completion queue entry
- CQE complete queue entry
- CQE complete queue element
- Each I / O submission queue corresponds to one I / O completion queue, and the same I / O completion queue can correspond to multiple I / O submission queues.
- the matching relationship between the I / O completion queue and the I / O submission queue is specified by the host, and the operation result of the data operation request in each I / O submission queue is stored in a designated I / O completion queue.
- the NVMe data processing process also includes managing a submission queue and managing a completion queue.
- the management submission queue is used to store management requests for the host and the NVMe controller. For example, a host request to create an I / O submission queue can be stored in the management submission queue.
- the management completion queue is used to store the operation results of the data operation requests that the NVMe controller has completed. Specifically, the management submission queue and the management completion queue may be stored in the first memory of the host.
- the logical structure of the management submission queue and the management completion queue is similar to the form of the I / O submission queue and the I / O completion queue, and is not repeated here.
- the host's memory implements the I / O submission queue and the I / O completion queue, and the communication between the host and the NVMe controller requires the doorbell mechanism and the interrupt method to notify the other party.
- the I / O submission queue There is a data operation request, and the operation result of the data operation request is stored in the I / O completion queue.
- the entire NVMe data processing process has the problem of complicated processing.
- the implementation process of the I / O submission queue and the I / O completion queue is mainly improved, and the management of the receiving queue and the management of the completion queue are still implemented by using a conventional technical solution. That is, the management receiving queue and the management completion queue are still stored in the host's memory.
- the host first stores the management request to the management receiving queue, and then updates the status of the doorbell register at the end of the submission queue in the NVMe controller through the doorbell mechanism to notify NVMe.
- the controller obtains the pending management request; then, the NVMe controller obtains and executes the pending management request, and generates the operation result of the pending management request; and then notifies the host that the pending management request operation has been completed through an interruption form; finally, the DMA write mode is used.
- the operation result of the pending management request is stored in the management completion queue.
- An embodiment of the present invention provides a data processing method.
- the second memory 105 for example, RAM
- the host uses an entry information method based on the I / O submission queue and uses a PCIe report.
- the text sends the data operation request to the NVMe controller 102, and the NVMe controller 102 stores the data operation request to the target I / O submission queue of the second memory 105 based on the entry information of the I / O submission queue.
- the NVMe controller 102 may directly obtain a data operation request from the second storage 105.
- the NVMe controller needs to obtain a data operation request from the memory of the host, the technical solution provided by the embodiment of the present invention has a shorter addressing path in the PCIe bus, and the data processing process takes less time.
- the entry-based I / O submission queue access method eliminates the need for the host to notify the NVMe controller to obtain a data operation request through the doorbell mechanism in the traditional NVMe data processing process, simplifying the data processing process.
- the host 101 can send multiple data operation requests to the NVMe controller 102 by using a PCIe message in a push manner, which reduces the number of communication messages between the host 101 and the NVMe controller 102 and improves communication efficiency.
- the NVMe controller 102 can then use the entry information method based on the I / O completion queue to store the operation result of the data operation request to the I / O completion of the host 101 through the PCIe message.
- the process in which the NVMe controller in the traditional technical solution needs to notify the host of a completed data operation request through an interruption is required to be stored in the I / O completion queue. This further simplifies the NVMe data processing process and reduces the duration of the NVMe data processing process.
- the NVMe controller can also use the same PCIe message to send the operation results of multiple data operation requests to the host 101 in a push manner, thereby reducing the number of communication messages between the host 101 and the NVMe controller 102 and improving data processing. effectiveness.
- I / O submission queues and I / O completion queues there are multiple I / O submission queues and I / O completion queues in the process of NVMe data.
- I / O submission queues and I / O completion queues there are multiple I / O submission queues and I / O completion queues in the process of NVMe data.
- one I / O submission queue and the I / O The data processing process of the I / O completion queue associated with the / O submission queue is taken as an example to introduce the technical solution of the present invention.
- FIG. 2 is a schematic flowchart of a data processing method according to an embodiment of the present invention. As shown in the figure, the data processing method includes:
- the host sends a first PCIe message to the NVMe processor.
- the first PCIe message includes entry information of the target I / O submission queue and at least one submission queue entry SQE.
- each I / O submission queue is used to store pending data operation requests sent by different types or different applications.
- the host receives a data operation request, it can determine the I / O submission queue that the data operation request needs to store according to preset rules.
- the operation request carries the identity of the application program, and the host determines its associated I / O submission queue according to the identity of the application function program.
- the host is an I / O submission queue created according to business requirements, that is, a matching relationship between an application program and the I / O submission queue is preset in the host.
- the host can determine the target I / O submission queue that the data operation request needs to store based on the matching relationship between the application sending the data request and the I / O submission queue.
- the target I / O submission queue refers to the I / O submission queue that matches the data operation request.
- the data storage of the I / O submission queue is implemented by the second memory of the NVMe controller, and the host is not aware of the data structure and storage location of the I / O submission queue.
- different SQEs of the same I / O submission queue are stored in different slots of the same I / O submission queue, similar to storing in the second memory of NVMe through the entry of an I / O submission queue.
- the host does not need to know how the NVMe controller stores the SQE to the slot of the target SQ.
- the main purpose of the entry information is for the controller to identify the I / O submission queue or I / O completion queue corresponding to the currently received PCIe message. It can be an identifier or address that uniquely identifies an I / O submission queue, or other The description that uniquely identifies an I / O submission queue.
- the entry information can also be called entry identification, entry address, or other form of naming.
- the I / O submission queue can also be divided into different priorities.
- the host receives a data operation request sent by an upper-layer application, it can send it to the designated I / O submission queue according to the type and priority of the data operation request.
- the host can be understood as an enhanced processor.
- the enhanced processor can be an embedded processor.
- the embedded processor supports sending at least 64 bytes of PCIe writes each time.
- the NVMe controller stores at least one SQE in the first PCIe packet to the target I / O submission queue according to the entry information of the target I / O submission queue.
- the host sends at least one SQE to the NVMe controller based on the entry information of the target I / O submission queue, and the NVMe controller stores at least one SQE to the target I / O submission queue of its memory, and the NVMe controls
- the device can directly obtain and execute SQE from its memory, cancel the doorbell mechanism in the traditional technical solution, the host and the NVMe controller only need to communicate through the entry-based method, and the data processing process is simpler. The data processing process is simplified, and the time consumption of data processing is correspondingly reduced.
- FIG. 3 is a schematic flowchart of another data processing method according to an embodiment of the present application. As shown, the method includes:
- the host sends a message to the NVMe controller to negotiate the maximum number of aggregated SQEs.
- the NVMe controller sends a response message to the host to negotiate a request for the maximum number of aggregated SQEs.
- the host determines the maximum number of aggregated SQEs that the host and the NVMe controller can support.
- MCS refers to the maximum aggregate SQEs that can be carried in a PCIe packet negotiated by the host and the NVMe controller.
- the host and the NVMe controller can preset the MCS that they support according to their hardware configuration, or the maintenance personnel can specify the MCS that they support according to the network communication of the host or the NVMe controller.
- the host and the NVMe controller support different MCS.
- the MCS that the host and the NVMe controller can support can be determined through negotiation.
- the MCS that the host and the NVMe controller can support is the MCS supported by the host and the MCS supported by the NVMe controller. Smaller value.
- the host After the MCS is determined through negotiation, when the host sends a data operation request to the NVMe controller, it can carry multiple data operation requests in one PCIe message, thereby reducing the number of communication messages between the host and the NVMe controller and improving Data processing efficiency.
- the host and the NVMe controller may also negotiate a maximum aggregate CQE (maximum number of CQE, MCC) using similar steps S301 to S303.
- the timing for negotiating the MCC may be in the system initialization phase, or before the I / O submission queue is created, or before the NVMe controller sends the operation result of the data operation request to the host, the embodiment of the present invention is not limited.
- MCC refers to the maximum number of CQEs that can be carried in a PCIe packet negotiated by the host and the NVMe controller.
- the host and NVMe can preset the MCCs they support according to their hardware configuration, or the maintenance personnel can specify the MCCs they support based on the network communication of the host or NVMe controller.
- the host and the NVMe controller support different MCCs.
- the MCCs that the host and the NVMe controller can support can be determined through negotiation.
- the negotiated host and the NVC controller can support the MCCs that are supported by the host-supported MCC and the NVMe controller. Smaller value in MCC.
- the NVMe controller After the MCC is negotiated and determined, when the NVMe controller sends the operation result of the data operation request to the host, it can carry multiple operation results in one PCIe message, thereby reducing the number of communication messages between the host and the NVMe controller. Improve data processing efficiency.
- the negotiation process of the MCC may be processed simultaneously with the negotiation process of the MCS, or may be processed separately. That is, the host can negotiate the MCC and MCS with the NVMe controller through one negotiated MCS and MCC request message, and the host can also negotiate the MCC and MCS with the NVMe controller through two different negotiated MCS and MCC request messages.
- the host can create at least one I / O submission queue through steps S304 to S308 according to business requirements.
- the specific process is as follows:
- the host sends a request to create at least one I / O submission queue to the management submission queue according to business requirements.
- the host notifies the NVMe controller that there is a pending request in the management submission queue.
- the NVMe controller obtains a request for creating at least one I / O submission queue in the management submission queue.
- the NVMe controller creates at least one I / O submission queue.
- the NVMe controller sends the operation result of creating at least one I / O submission queue request to the host to the management completion queue.
- Step S304 to step S308 are processes for creating at least one I / O submission queue.
- the request for creating at least one I / O submission queue includes the number of I / O submission queues to be created.
- the process of creating an I / O submission queue still uses the traditional doorbell and interruption methods to achieve communication between the host and the NVMe controller.
- the host stores the pending management request in the management submission queue, updates the tailgate register (located in the second memory of the NVMe controller) of the management submission queue, and notifies the NVMe controller of the pending management request; Direct (memory access, DMA) read to obtain the pending management request in the management submission queue; after the NVMe controller finishes processing the above-mentioned creation of at least one I / O submission queue request, it sends an interrupt to the host to notify it that there is a completed The operation result is to be stored in the I / O completion queue. The operation result is then stored in the corresponding management completion queue by DMA writing.
- DMA memory access
- the process of the NVMe controller creating at least one I / O submission queue in the second memory is not limited in this application.
- the creation process of the I / O submission queue includes: The NVMe controller divides the storage space in the second memory for the number of submission queues to be created in the request for creating at least one I / O submission queue, respectively.
- the SQE of each I / O submission queue is stored, and the NVMe controller records the position information of each I / O submission queue in the second memory.
- the ring buffer that composes each completion queue can be represented by PRG or SGL.
- the storage space for implementing the I / O submission queue may be a continuous address range in the second memory, or a discontinuous address range, which is not limited in the present invention.
- the request for creating an I / O submission queue in the embodiment of the present invention is similar to the request for creating an I / O submission queue in the conventional technology, but has the following differences:
- QID is used to indicate the entry ID of SQ. It has nothing to do with the traditional SQ tail doorbell.
- the request to create an I / O submission queue may further include a queue depth of each I / O, and the queue depth may be set to MCS. Ensure that the SQE in each received PCIe message can be successfully stored in the target I / O submission queue.
- the host allocates a unique PCIe address for each I / O submission queue in its addressable PCIe address space.
- the addressable PCIe address space of the host includes the address space of the host memory and the address space allowed by the host in the storage area of the NVMe controller, such as the address space (base address register, BAR) of the base address register in the NVMe controller.
- the host can assign a unique PCIe address to each I / O submission queue.
- the host and the NVMe controller determine the target I / O submission queue based on the PCIe address.
- the PCIe address allocated by the target I / O submission queue is recorded as the first PCIe address.
- MMIO memory-mapped input-output
- the open storage area in the NVMe controller that the host can access can be Called host-addressable storage space, or host-accessible storage space.
- the host runs a root complex (RC) in the PCIe bus, and the host can access the storage area of the NVMe controller through the root complex.
- RC root complex
- the system maps the storage area allowed by the host in the NVMe controller to the memory area of the host.
- the root complex checks the data to be accessed in the data operation request. Address information. If the address information to be accessed is a mapped address of a storage area that the NVMe controller allows the host to access, it will trigger it to generate a transaction layer packet (TLP) and use the TLP to access the NVMe controller. To perform the operation of reading or writing the target data.
- TLP transaction layer packet
- the NVMe controller may have several areas that the host can access internally (the attributes may be different, for example, some can be read ahead and some cannot be read ahead) need to be mapped to the memory area.
- the system software can read these BARs, allocate corresponding memory areas to them, and write the response memory base address back to the BAR.
- FIG. 4A is a schematic diagram of a host assigning a PCIe address to an I / O submission queue in an address space of a base address register according to an embodiment of the present invention.
- the base address register X is the addressable address space of the host.
- the base address of the base address register X is the base address 100.
- the host first divides a continuous address space into the first address space, and then, in the first address space Assign a unique PCIe address to each I / O submission queue.
- the first address space may also be referred to as an aperture of an I / O submission queue.
- the process of the host assigning a PCIe address to each I / O submission queue can also be understood as the host mapping a continuous address in the base address register to the I / O submission queue.
- This continuous PCIe address can be used to identify the I / O submission queue. For example, (base address 100 + offset address 100) to (base address 100 + offset address 100 + MCS * 64) in the base address register X are allocated to the I / O submission queue 0.
- the I / O submission queue The PCIe address of 0 is (base address 100 + offset address 100), and the PCIe address of I / O submission queue 0 is (base address 100 + offset address 100 + MCS * 64 * 0).
- Allocate base address 100 + offset address 100 + MCS * 64
- base address 100 + offset address 100 + MCS * 64 * 2 base address register X
- I The PCIe address of the / O submission queue N is (base address 100 + offset address 100 + MCS * 64 * 1).
- the base address register X base address 100 + offset address 100 + MCS * 64 * N
- base address 100 + offset address 100 + MCS * 64 * (N + 1)) are allocated to I / O submits queue N.
- the PCIe address of I / O submits queue N is (base address 100 + offset address 100 + MCS * 64 * N).
- the first address space in FIG. 4A is a segment of the address space in the base address register X.
- the starting address of the first address space can be the base address of the base address register X, or it can be from the base address register (base Address + offset address), the following embodiment of the present invention is described by taking the starting address of the first address space shown in FIG. 4A as (base address + offset address) as an example.
- the host records a set of data indexed by the entry ID of at least one I / O submission queue.
- each I / O submission queue is assigned a unique PCIe address in the PCIe base address register to each I / O submission queue, and the PCIe address of each I / O submission queue indicates entry information of an I / O submission queue.
- the creation of at least one submission queue request may also carry the association relationship between the I / O submission queue and the I / O completion queue, so that after the NVMe controller completes the SQE operation in the I / O submission queue, The operation result is stored in an I / O completion queue associated with the I / O submission queue.
- the host may also send the association relationship between the I / O submission queue and the I / O completion queue to the NVMe controller when it learns that the NVMe controller has completed at least one I / O submission queue request, so as to facilitate NVMe
- the controller After the controller completes the SQE operation in the I / O submission queue, the controller stores the operation result in the I / O completion queue associated with the I / O submission queue.
- steps S301 to S309 describe how to create an I / O submission queue in the embodiment of the present invention.
- steps S310 to S313 how to use the I / O submission queue in the embodiment of the present application is described.
- S310 The host receives at least one data operation request.
- the host will receive the data operation request sent by the upper-level application.
- Each data operation request is stored as an SQE in a slot of the target I / O submission queue, and each SQE corresponds to a data operation request.
- the host uses at least two data processing requests as the payload data of the first PCIe message.
- the SQE polymerization conditions include at least one of the following conditions:
- Condition one the maximum SQE aggregate block size (maximum SQ coalesced block size, MSCB) meets the first threshold.
- the host may use the M SQEs as payload data of the same PCIe message at the same time, where M is greater than or equal to 2.
- M is greater than or equal to 2.
- three SQEs (SQE1, SQE2, and SQE3) in the I / O submission queue 1 are in a pending state.
- the total size of the three SQEs is 190 bytes, and the first threshold is 180 bytes.
- the three SQEs are sent to the NVMe controller as the payload data of the same PCIe message.
- Condition 2 The SQT record duration of the I / O submission queue aggregation timer meets the second threshold.
- the host can simultaneously use the waiting SQE that is longer than or equal to the second threshold as the payload data of the same PCIe message.
- the two SQEs (SQE1 and SQE2) that belong to I / O submission queue 1 are waiting for sending time is 60s, and the second threshold is 50s, then the host can use SQE1 and SQE2 as the payload of the same PCIe message The data is sent to the NVMe controller.
- the host arranges the multiple SQEs to be aggregated in order according to the order in which the data operation request is received in step S309, and collectively serves as the PCIe message. Payload data.
- the NVMe controller receives the PCIe message, it can store it to the target I / O submission queue in the receiving order of the data operation request, and then obtain and execute each SQE one by one.
- the host sends a first PCIe message to the NVMe controller.
- the first PCIe message includes entry information of the target I / O submission queue and at least one SQE.
- the host and the NVMe controller communicate based on the PCIe bus, and the first PCIe message is specifically a TLP message.
- the host can send one or more SQEs to the NVMe controller through the same PCIe message in a push manner.
- the PCIe message communicated between the host and the NVMe controller includes an NVMe message header and payload data (not shown in FIG. 4), and an NVMe message header. It is used to record the fields added when the message is processed by the NVMe protocol layer.
- the payload data is used to carry one or more SQEs.
- the host may know the maximum number of slots of each I / O submission queue. For example, the host periodically obtains the maximum number of slots of each I / O submission queue from the NVMe controller through a query request, and Number of available slots.
- the host adds a counter for the number of SQE transmissions, which is used to record the number of SQEs for each I / O submission queue sent by the host to the NVMe controller.
- one or more counters can be set. When there is only one counter, the counter is used to record the number of SQEs to be executed by the host to each I / O submission queue in all I / O submission queues in the storage system.
- each counter can be used to record the number of SQEs sent by each host to one or more I / O submission queues. For the same I / O submission queue, when the number of SQEs sent by the host recorded in the counter reaches the maximum number of slots in the I / O submission queue, the host can send a query request to the NVMe controller to determine the current I / O submission queue Whether there are free slots, that is, whether the NVMe controller has read the SQE in the I / O submission queue.
- the host When receiving a slot response from the I / O submission queue returned by the NVMe controller, the host sends a PCIe message carrying the new SQE to the NVMe controller to prevent the host from continuously sending more SQEs than the I / O submission queue.
- the number of free slots causes storage failure.
- the counter of the number of SQE transmissions on the host side can record and control the number of SQEs sent by the host to the NVMe controller, to implement flow control on the data processing process of the NVMe controller, and to prevent the host from frequently sending more SQEs to cause NVMe control. Problem of server storage failure, improving SQE storage success rate.
- the NVMe controller stores the SQE in the first PCIe message to the target I / O submission queue according to the entry information of the target I / O submission queue.
- the NVMe controller When the NVMe controller receives the first PCIe message, it can determine the address information of the target I / O submission queue in the memory of the NVMe controller according to the entry information of the target I / O submission queue, and then according to the target I / O submission queue in NVMe
- the address information in the memory of the controller stores at least one SQE in the first PCIe message to the target I / O submission queue. Specifically, you can store SQE according to the operation steps from step S3131 to step 3133:
- the NVMe controller determines the target I / O submission queue identifier according to the entry information of the target I / O submission queue.
- the NVMe controller determines the identity of the target I / O submission queue according to the entry information of the target I / O submission queue in the PCIe address structure in the first PCIe message. Specifically, the following formula 1 is used to calculate the target I / O submission queue identifier:
- the start address of the continuous address space divided in the address space to identify each I / O submission queue For example, as shown in FIG. 4A, ADD 12 is the start address of the first address space.
- the above formula 1 can be used to confirm the identity of the target I / O submission queue. For example, if the first PCIe address of the target I / O submission queue is (base1 + MCS * 64 * 2) and the starting address of the first address space is base1, then the target I / O submission queue identifier is calculated using Equation 1 2.
- the NVMe controller determines the number of SQEs in the first PCIe message according to the payload data size in the first PCIe message, and determines the storage location of the SQE in the target I / O submission queue.
- the NVMe controller After the NVMe controller receives the first PCIe message, it first parses the content of the first PCIe message and obtains payload data therein. Then, calculate the number of SQEs carried in the payload data. The NVMe controller can calculate the number of SQEs according to the following formula 2:
- Number of SQEs payload data size in the first PCIe message / 64
- the NVMe controller After the NVMe controller determines the number of SQEs carried in the first PCIe message, it needs to further determine the storage location of the SQE, that is, the storage of the SQE in the target I / O submission queue is determined according to the identifier of the target I / O submission queue. Slot. Specifically, the NVMe controller records position information of each I / O submission queue in the second memory, and may determine the position of the next available slot according to the position information indicated by the tail pointer of the target I / O submission queue at the current moment. Furthermore, at least one SQE carried in the first PCIe message is stored in the target I / O submission queue.
- the NVMe controller stores at least one SQE to the target I / O submission queue according to the determined storage location of the target I / O submission queue.
- the NVMe controller When the NVMe controller determines the identifier of the target I / O submission queue and the number of SQEs in the first PCIe message, it can store the SQEs to the free slots one by one according to the position indicated by the tail pointer of the target I / O submission queue. .
- the NVMe controller may have multiple threads or processes submit queues to the same I / O to store SQE at the same time.
- the scope of the lock operation includes reserving I / O submission queue slots, copying SQE to I / O submission queue slots, updating the tail gate bell register of the submission queue, and finally releasing the I / O submission queue. Write permission.
- the SQE is sent to the NVMe controller in a push manner, and the doorbell mechanism is cancelled, it is not necessary to operate within the locked range.
- the scope of the lock operation only includes the process of the NVMe controller reserving I / O submission queue slots. This reduces the scope and time consuming of the locking operation.
- the NVMe controller can store at least one SQE to the target I / O submission queue based on the entry information of the I / O submission queue in the PCIe message.
- the host and the NVMe controller do not need to communicate through the doorbell mechanism, which reduces the complexity of the NVMe data processing process.
- the host can use aggregation to push multiple SQEs at the same time using the same PCIe message, which reduces the number of communication messages between the host and the NVMe controller and improves data processing efficiency.
- the host can read data operation requests directly from its memory, further improving data processing efficiency.
- the embodiment of the present invention also simplifies the scope of the locking operation in the data processing process.
- the NVMe controller only needs to lock the process of storing the SQE to the determined free slot, which solves the complex process of traditional locking processing. The time-consuming issue reduces the lock time and data processing time.
- the host may also use discontinuous storages in the base address register. Intervals represent individual I / O submission queues.
- the mapping relationship between the address allocated by each I / O submission queue and the identifier of the I / O submission queue is recorded in the host and the NVMe controller.
- the host can send a first PCIe packet to the NVMe controller according to the mapping relationship.
- the PCIe message includes the PCIe address allocated by the host for the target I / O submission queue, and the PCIe address is used as the entry information of the target I / O submission queue.
- the NVMe controller can parse the first PCIe message, obtain the PCIe address field allocated by the host for the target I / O submission queue, and then determine the identifier of the I / O submission queue corresponding to the PCIe address field according to the mapping relationship, and then , Determine the storage location of the I / O submission queue in the memory of the NVMe controller according to the identifier of the I / O submission queue, and then store at least one SQE in the first PCIe message to the target I / O submission queue. This completes the process of the NVMe controller storing at least one SQE based on the entry information of the target I / O submission queue.
- the above process can also eliminate the doorbell mechanism in the traditional technology and simplify the data processing process.
- the host can send multiple SQEs to the NVMe controller by using the same PCIe message through the SQE aggregation method in the method shown in Figure 3 by pushing, reducing the number of message packets between the host and the NVMe controller, and improving Improved data processing efficiency.
- the host-addressable PCIe address space includes an address space of a memory of the host and an address space of a host-addressable PCIe base address register.
- the address space of the base address register is used to map the identifiers of each I / O submission queue.
- the addresses in the memory address space of the host can be used to allocate PCIE addresses for each I / O submission queue.
- the identifier of each I / O submission queue corresponds to a unique PCIe address.
- the host stores the mapping relationship between the address in the memory address space and the identifier of the I / O submission queue, and the host can send a PCIe message carrying at least one SQE to the NVMe controller according to the mapping relationship, and the NVMe controller can also Determine the identifier of the corresponding I / O submission queue according to the mapping address carried in the PCIe message according to the mapping relationship, and determine the storage location of the I / O submission queue in the NVMe controller according to the identifier of the I / O submission queue, Then store at least one SQE to the target I / O submission queue.
- the doorbell mechanism in the traditional technology can also be cancelled, and the data processing process can be simplified.
- using the same PCIe message to send multiple SQEs to the NVMe controller can also reduce the number of message messages between the host and the NVMe controller and improve data processing efficiency.
- the host and the NVMe controller may also use a designated field in the PCIe message or a part of the payload data to be transmitted according to a predetermined agreement.
- the target I / O submits the identifier of the queue to the NVMe controller.
- the NVMe controller then parses the PCIe message to obtain the specified field (for example, a reserved field in the PCIe message, or the payload data start bit), and according to a predetermined agreement Determines the ID of the I / O submission queue represented by the specified field.
- the doorbell mechanism in the traditional technology can also be eliminated to simplify the data processing process.
- the NVMe controller reads and executes the data operation request in SQE, and generates an operation result.
- the NVMe controller can read the data operation request in the SQE one by one, and execute the data operation request to generate the operation result.
- the operation result includes an identifier of the I / O submission queue and an operation result of the data operation request.
- the data storage of the I / O completion queue is still implemented by the memory of the host.
- the process of creating an I / O completion queue does not constitute a limitation on the embodiment of the present invention.
- a CQE corresponds to an operation result, and each operation result is used to indicate the operation result of a data operation request in a CQE.
- the NVMe controller will send at least one operation result to the host based on the entry information of the I / O completion queue after completing the operations of the data operation requests of multiple SQEs, and the host will then based on the The entry information of the / O completion queue stores the operation result to the target I / O completion queue.
- the CQE polymerization conditions include at least one of the following conditions:
- Condition one the maximum CQ aggregate block size (maximum CQ coalesced block size (MCCB)) meets the third threshold.
- the NVMe controller can simultaneously send the N CQEs as the payload data of the same PCIe message to the host.
- N is greater than or equal to two.
- CQE1, CQE2, and CQE3 belong to the I / O completion queue 1 at the moment, they are waiting to be sent.
- the size of the three CQEs is 190 bytes and the third threshold is 180 bytes.
- the three CQEs can be sent to the host as the payload data of the same PCIe message at the same time.
- Condition 2 The CQT recording duration of the I / O completion queue aggregation timer meets the fourth threshold.
- the NVMe controller may use at least two CQEs with a waiting time greater than or equal to the fourth threshold as the same PCIe message at the same time.
- the payload data is sent to the host. For example, at the current moment, the two CQEs (CQE1 and CQE2) that belong to I / O completion queue 1 have waited for 60s and the fourth threshold is 45s. Then the NVMe controller can use the two CQEs as the same PCIe packet The payload data is sent to the host.
- the NVMe controller sends a second PCIe message to the host.
- the second PCIe message carries the entry information of the target I / O completion queue and at least one CQE.
- Each I / O submission queue will correspond to an I / O completion queue.
- the correspondence between the I / O submission queue and the I / O completion queue is stored in the NVMe controller.
- the NVMe controller can according to the I / O submission queue and I / O
- the corresponding relationship of the O completion queue determines the target I / O completion queue corresponding to the target I / O submission queue.
- the NVMe controller determines the PCIe address of the target I / O completion queue, and then sends the host carrying the target I / O completion queue to the host. Entry information and at least one CQE second PCIe message.
- FIG. 4B is a schematic diagram of a host assigning a PCIe address to an I / O completion queue in an address space of a base address register according to an embodiment of the present invention.
- the base address register Y is the addressable address space of the host.
- the base address of the base address register Y is the base address 200.
- the host divides a continuous address space into the second address space, and in the second address space is Multiple I / O completion queues assign PCIe addresses, and the host assigns unique PCIe addresses to each I / O completion queue.
- the starting address of the second address space is (base address 200 + offset address 200).
- the process of the host assigning a PCIe address to each I / O completion queue can also be understood as the host mapping a continuous address in the base address register to the I / O completion queue.
- the base address register Y base address 200 + offset address 200
- base address 200 + offset address 200 + MCC * 64 are allocated to the I / O completion queue 0.
- the I / O completion queue A PCIe address of 0 is (base address 200 + offset address 200).
- I The PCIe address of / O completion queue 1 is (base address 200 + offset address 200 + MCC * 1).
- the base address register Y (base address 200 + offset address 200 + MCC * 64 * M) to (base address 200 + offset address 200 + MCC * 64 * (M + 1)) is allocated to I / O completion queue M.
- the PCIe address of I / O completion queue M is (base address 200 + offset address 200 + MCC * 64 * M).
- the second address space in FIG. 4B is a segment of the address space in the base address register Y.
- the starting address of the second address space can be the base address of the base address register Y, or it can be from the base address register (base Address + offset address), the following embodiment of the present invention is described by taking the starting address of the second address space shown in FIG. 4B as (base address + offset address) as an example.
- the PCIe address of the target I / O completion queue is recorded as the second PCIe address.
- the second PCIe address of the target I / O completion queue can be expressed as:
- PCIe address of target I / O completion queue (base address of BAR + offset address) + MCC * 64 * identification of target I / O completion queue
- the host may assign a PCIe address to the identifier of each I / O completion queue, and may notify the NVMe controller after the I / O completion queue is created.
- the NVMe controller stores the identifier of the I / O completion queue and each I / O completes the mapping relationship of the PCIe addresses allocated by the queue.
- the PCIe message communicated between the host and the NVMe controller includes an NVMe message header and payload data (not shown in FIG. 4B), and an NVMe message header. It is used to record the fields added when the message is processed by the NVMe protocol layer.
- the payload data is used to carry one or more SQE requests.
- the host stores at least one CQE in the target I / O completion queue according to the identified entry information of the target target I / O completion queue.
- the host obtains and parses the second PCIe message, determines the address information of the target I / O completion queue in the memory of the host according to the entry information of the target I / O completion queue, and then stores at least one CQE to the target I / O completion queue. For details, see steps S3171 to S3173 below.
- the host determines the identifier of the target I / O completion queue according to the entry information of the target I / O completion queue.
- ADD 21 is the second PCIe address of the target I / O completion queue.
- ADD 22 is the start address of a continuous address space divided by the host in the addressable PCIe address space to identify each I / O completion queue, for example, the start address of the second address space in FIG. 4B.
- the NVMe controller can determine the target I / O completion queue identifier.
- the host determines the number of CQEs according to the size of the payload data in the second PCIe message, and determines the storage position of the CQEs in the target I / O completion queue.
- the host stores at least one CQE to the target I / O completion queue according to the determined storage location of the target I / O completion queue.
- the host After receiving the second PCIe message, the host parses the message content and obtains the payload data therein. And calculate the number of CQE carried in the payload data. Specifically, the number of CQE can be calculated according to the following formula 4:
- Number of CQEs payload data size in the second message / 64
- the host After the host determines the number of CQEs carried in the second PCIe message, it needs to further determine the storage location of the CQE, that is, determine the slot for storing the CQE in the target I / O completion queue. Specifically, the host records position information of each I / O completion queue in the first memory, and may determine the position of the next available slot according to the position information indicated by the tail pointer of the target I / O completion queue at the current moment. Storing at least one CQE carried in the second PCIe message to a target I / O completion queue.
- the host can store data operation requests to the target I / O submission queue by using the entry information of the I / O submission queue, and the NVMe controller uses the entry of the target I / O submission queue in the PCIe address structure.
- the identifier can directly store one or more SQEs carried in the same PCIe packet to the target I / O submission queue.
- the NVMe controller can also store the operation result of the data operation request to the target I / O completion queue by using the entry information of the I / O completion queue.
- the technical solution provided in the present application cancels the doorbell and interruption mechanism, and stores the SQE and CQE based on the entry information to simplify the data processing process.
- the host or the NVMe controller can aggregate multiple packets as the PCIe payload data in an aggregated manner, the host or the NVMe controller can push multiple data operation requests or operation results at once, reducing the host and NVMe control
- the number of communication messages between routers improves the communication efficiency between the two.
- the locking range of the NVMe controller storing the SQE to the target I / O completion queue is simplified, which further simplifies the traditional NVMe data processing process and reduces the lock time and data processing time.
- the size of the sequence numbers of the above processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present invention.
- the implementation process constitutes any limitation.
- the host may also use discontinuous storages in the base address register. Intervals represent individual I / O completion queues.
- the mapping relationship between the address allocated by each I / O completion queue and the identifier of the I / O completion queue is recorded in the host and the NVMe controller, and the NVMe controller may send a second PCIe packet to the host according to the mapping relationship.
- the second PCIe message includes a PCIe address allocated by the host for the target I / O completion queue, and the PCIe address is used as entry information of the target I / O completion queue.
- the host can parse the second PCIe message to obtain the PCIe address field assigned by the target I / O submission queue, and then determine the identifier of the I / O completion queue corresponding to the PCIe address field according to the mapping relationship, and then, according to the I / O
- the identifier of the O completion queue stores at least one CQE in the second PCIe message to the target I / O completion queue. This completes the process of the host storing at least one CQE based on the entry information of the target I / O completion queue.
- the above process can also cancel the interrupt mechanism in the traditional technology and simplify the data processing process.
- the NVMe controller can send multiple CQEs to the host through the same PCIe message through the CQE aggregation method in the method shown in Figure 3 in a push manner, reducing the number of message packets between the host and the NVMe controller, and improving Improved data processing efficiency.
- the host-addressable PCIe address space includes an address space of a memory of the host and an address space of a PCIe-based address register in the host.
- the address space of the base address register is used to map the identification of each I / O completion queue.
- the address in the memory address space of the host is used to identify each I / O completion queue.
- the ID of the O completion queue corresponds to a unique address.
- the mapping relationship between the address in the memory address space and the identifier of the I / O completion queue is stored in the host and the NVMe controller.
- the NVMe controller can send a PCIe packet carrying at least one CQE to the host according to the mapping relationship, and the host also According to the mapping relationship, the identifier of the corresponding I / O completion queue can be determined according to the mapping address carried in the PCIe packet, and then at least one CQE is stored in the target I / O completion queue.
- the interruption mechanism in the conventional technology can also be eliminated, and the data processing process can be simplified.
- using the same PCIe packet to send multiple CQEs to the host can also reduce the number of message packets between the host and the NVMe controller and improve data processing efficiency.
- the NVMe controller can also use the designated field in the PCIe message or a part of the payload data according to a predetermined agreement.
- the target of the target I / O completion queue is transmitted to the host, and the host then uses the specified field (for example, a reserved field in the PCIe message or the payload data start bit) to parse the PCIe message according to a predetermined agreement, and obtains the PCIe message.
- the ID of the target I / O completion queue carried in the target and stores the operation result to the I / O completion queue.
- the host records the correspondence between the I / O submission queue and the I / O completion queue.
- the second message may also directly carry the identity of the target I / O submission queue.
- the host may obtain the identity of the target I / O submission queue, and then submit it according to the I / O.
- the correspondence between the queue and the I / O completion queue determines the target I / O completion queue, and then stores at least one CQE carried in the second packet to the target I / O completion queue.
- the solution can be mixed with the doorbell mechanism and the interrupt mechanism in the prior art.
- the technical solution of storing the SQE based on the entry information of the I / O submission queue in steps S310 to S313 may be adopted.
- the NVMe controller sends CQE to the host, the interrupt mechanism in the traditional technical solution is still used.
- the NVMe controller sends an interrupt signal to the host, and then sends the CQE to the host through direct memory access (DMA) write.
- the host which stores the CQE to the target I / O completion queue.
- the host uses the traditional doorbell mechanism when sending the SQE to the NVMe controller.
- the NVMe controller sends the CQE to the host, the technical solution of storing the CQE based on the entry information of the I / O completion queue in steps S314 to S317 is adopted.
- the above method can also simplify the data processing process and improve the data processing efficiency to a certain extent.
- the data processing method provided by the embodiment of the present invention is described in detail above with reference to FIG. 1 to FIG. 4B.
- the NVMe controller for data processing provided by the embodiment of the present invention will be described below with reference to FIG. 5 to FIG. Host and storage system.
- FIG. 5 is a schematic structural diagram of an NVMe controller 500 according to an embodiment of the present invention. As shown in the figure, the NVMe controller 500 includes a receiving unit 501 and a processing unit 502;
- the receiving unit 501 is configured to receive a first PCIe message sent by the host, wherein at least one I / O submission queue is set in a memory of the NVMe controller, and the first PCIe message includes a target I / O submission queue entry information and at least one submission queue entry SQE, one SQE corresponds to one data operation request, and each data operation request is used to perform a read operation or a write operation on the storage medium managed by the NVMe;
- the processing unit 502 is configured to save the at least one SQE to the target I / O submission queue according to the entry information of the target I / O submission queue.
- the entry information of the target I / O submission queue is a unique first PCIe address in the host-addressable PCIe address space; then, the processing unit 502 is further configured to process the first PCIe address according to the first PCIe.
- the address determines a second address, where the second address is an address of the target I / O submission queue stored in the memory of the NVMe controller 500; and storing the at least one SQE to the target according to the second address I / O submission queue.
- the processing unit 502 is further configured to determine an identifier of the target I / O submission queue according to a first PCIe address of the target I / O submission queue, and according to the identifier of the target I / O submission queue Determining the second address.
- processing unit 502 is further configured to calculate an identifier of the target I / O submission queue according to the following formula:
- the ADD 11 is a first PCIe address of the target I / O submission queue
- the ADD 21 is a partition of the host in an addressable PCIe address space for identifying each I / O submission queue.
- the MCS is the maximum number of aggregated SQEs in each I / O submission queue.
- the processing unit 502 is further configured to negotiate with the host the maximum aggregated SQE number MCS in each I / O submission queue before the receiving unit 501 receives the creation instruction of the host, the negotiation
- the obtained MCS is a comparison between the maximum number of aggregated SQEs in each I / O submission queue supported by the NVMe controller and the maximum number of aggregated SQEs in each I / O submission queue supported by the host. Little ones.
- the first PCIe message further includes depth information M of the target I / O submission queue, where M indicates the number of SQEs carried in the first PCIe message, 1 ⁇ M ⁇ MCS
- the processing unit 502 is further configured to determine a preset order of M SQEs, and save the M SQEs to the target I / O submission queue according to the preset order of the M SQEs.
- At least one I / O completion queue is set in the memory of the host, and the NVMe controller further includes a sending unit 503;
- the processing unit 502 is further configured to obtain the at least one SQE from the target I / O submission queue, and request storage in the NVME controller according to the data operation carried in the at least one SQE.
- the media is read or written;
- the sending unit 503 is configured to send a second PCIe message to the host, where the second PCIe message includes entry information of a target I / O completion queue and at least one completion queue entry CQE, and each CQE is the
- the NVMe controller executes the operation result of the data operation request carried in each SQE.
- the processing unit 502 is further configured to: when the CQE aggregation condition is satisfied, the NVMe controller uses the operation results of at least two data operation requests as the payload data of the second PCIe message, and one CQE Corresponds to an operation result of a data operation request; wherein the CQE aggregation conditions include a maximum aggregated CQE size MCCB meeting a first threshold or an I / O completion queue aggregation timer CQT recording time meeting a second threshold.
- the second message further includes depth information N of the target I / O completion queue, where N indicates the number of CQEs carried in the second PCIe message; the processing unit 502 is further configured to: Use N CQEs as payload data in the second PCIe message according to a preset order, where 1 ⁇ N ⁇ MCC, and the MCC is a positive integer.
- the host is an embedded processor, and the embedded processor supports sending a PCIE message with a capacity of at least 64 bytes.
- the NVMe controller 500 may be implemented through an application specific integrated circuit (ASIC) or a programmable logic device (programmable logic device (PLD)).
- the PLD may be complex program logic Device (complex programmable device, CPLD), field programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
- the NVMe controller 500 and its various modules may also be software modules.
- the NVMe controller 500 may correspond to executing the method described in the embodiment of the present invention, and the above and other operations and / or functions of the various units in the NVMe controller 500 are implemented in order to implement FIGS. 2 to 3 respectively.
- the corresponding processes performed by the NVMe controller in each method are not repeated here for brevity.
- FIG. 6 is a schematic structural diagram of a host 600 according to an embodiment of the present invention. As shown, the host 600 includes a processing unit 601, a sending unit 602, and a receiving unit 603. Among them,
- the processing unit 601 is configured to determine entry information of the target I / O submission queue according to an identifier of a target input / output I / O submission queue of a data operation request to be sent.
- the sending unit 602 is configured to send a first PCIe message to the NVMe controller, where the first PCIe message includes entry information of the target I / O submission queue and at least one submission queue entry SQE, One SQE corresponds to one data operation request, and each data operation request is used to perform a read operation or a write operation on a storage medium managed by the NVMe controller.
- the processing unit 601 is further configured to allocate, for each I / O submission queue, a first PCIe address unique in the host-addressable PCIe address space, and submit according to the target I / O
- the identifier of the queue determines the first PCIe address of the target I / O submission queue, and uses the first PCIe address of the target I / O submission queue as entry information of the target I / O submission queue.
- the sending unit 602 is further configured to send a creation instruction to the NVMe controller, where the creation instruction is used to instruct the NVMe controller to set the at least one I in a memory of the NVMe controller. / O submission queue, and record the association between the identifier of each I / O submission queue and the first PCIe address of each I / O submission queue.
- the processing unit 601 before the sending unit 602 sends a creation instruction to the NVMe controller, negotiates with the NVMe controller the maximum aggregated SQE number MCS in each I / O submission queue,
- the negotiated MCS is the maximum number of aggregated SQEs in each I / O submission queue supported by the NVMe controller and the maximum number of aggregated SQEs in each I / O submission queue supported by the host. The smaller of them.
- the first PCIe message further includes depth information M of the target I / O submission queue, where M indicates the number of SQEs carried in the first PCIe message, 1 ⁇ M ⁇ MCS .
- the processing unit 601 is further configured to: when the SQE aggregation condition is satisfied, the host uses at least two data operation requests as the payload data of the first PCIe message, and one data operation request corresponds to one SQE.
- Data request requests are used to perform read or write operations on the storage medium managed by NVMe; wherein the SQE aggregation conditions include a maximum aggregatable SQE size MCSB meets a third threshold or an I / O submission queue aggregation timer SQT record The duration meets the fourth threshold.
- the host 600 further includes a receiving unit 603, configured to receive a second PCIe message sent by the NVMe controller, where the second PCIe message includes entry information of a target I / O completion queue and at least one Completing the queue entry CQE, each CQE is an operation result of the NVMe controller performing the data operation request carried in each SQE;
- the processing unit 601 is further configured to store the at least one completion queue entry SQE to the target I / O completion queue according to the entry information of the target I / O completion queue.
- the entry information of the target I / O completion queue is a unique second PCIe address in the host-addressable PCIe address space
- the processing unit 601 is further configured to determine a third address according to the second PCIe address, where the third address is an address of the target I / O completion queue stored in a memory of the host; and according to the first Three addresses store the at least one CQE to the target I / O completion queue.
- the determining, by the processing unit 601, a third address according to the second PCIe address includes: determining an identifier of the target I / O completion queue according to the second PCIe address, and completing the target according to the target I / O. The identification of the queue determines the third address.
- processing unit 601 is further configured to calculate an identifier of the target I / O completion queue according to the following formula:
- the ADD 12 is a second PCIe address of the target I / O completion cross queue
- the ADD 22 is a partition in the addressable PCIe address space of the host for identifying each I / O completion queue.
- the starting address of the continuous address space, the MCC is the maximum number of aggregated CQEs in each I / O completion queue;
- the processing unit 601 is further configured to store the at least one CQE to a memory of the host according to an identifier of the target I / O completion queue.
- the host is an embedded processor, and the embedded processor supports sending a PCIE message with a capacity of at least 64 bytes.
- the host 600 in the embodiment of the present invention may be implemented through an application specific integrated circuit (ASIC) or a programmable logic device (PLD).
- the PLD may be a complex program logic device ( complex programmable device (CPLD), field programmable gate array (field-programmable gate array, FPGA), general array logic (GAL), or any combination thereof.
- CPLD complex programmable device
- FPGA field programmable gate array
- GAL general array logic
- the host 600 may correspond to executing the method described in the embodiment of the present invention, and the above and other operations and / or functions of each unit in the host 600 are respectively for implementing the methods in FIGS. 2 to 3. The corresponding processes performed by the host are not repeated here for brevity.
- FIG. 7 is a schematic diagram of a storage system 700 according to an embodiment of the present invention.
- the storage system 700 includes a host 701, an NVMe controller 702, a first storage 703, a second storage 704, and a bus 705.
- the host 701, the NVMe controller 702, the first memory 703, and the second memory 704 communicate through the bus 705, and communication can also be achieved through other means such as wireless transmission.
- the first memory 703 is used by the host 701 to implement data storage of the I / O completion queue.
- the second memory 704 is used by the NVMe controller 702 to implement data storage of the I / O submission queue.
- the host is configured to send a first PCIe message to the NVMe controller;
- the first PCIe message includes entry information of a target I / O submission queue and at least one submission queue entry SQE;
- the NVMe controller is configured to receive the first PCIe message sent by the host; and save the at least one SQE to the target I / O submission according to the entry information of the target I / O submission queue A queue; wherein at least one input / output I / O submission queue is set in a memory of the NVMe controller.
- the host 701 is further configured to allocate, for each I / O submission queue, a first PCIe address unique in the host-addressable PCIe address space, and the target I / O submission queue
- the entry information is a first PCIe address of the target I / O submission queue
- the NVMe controller 702 is further configured to determine a second address according to the first PCIe address, where the second address is an address of the target I / O submission queue stored in a memory of the NVMe controller; Storing the at least one SQE to the target I / O submission queue according to the second address.
- the NVMe controller 702 is further configured to determine the identity of the target I / O submission queue according to the first PCIe address of the target I / O submission queue, and determine the identity of the target I / O submission queue according to The identification determines the second address.
- the NVMe controller 702 is further configured to calculate an identifier of the target I / O submission queue according to the following formula:
- the ADD 11 is a first PCIe address of the target I / O submission queue
- the ADD 21 is a partition of the host in an addressable PCIe address space for identifying each I / O submission queue.
- the MCS is the maximum number of aggregated SQEs in each I / O submission queue.
- the NVMe controller 702 is further configured to receive a creation instruction of the host 701 before receiving the first PCIe message of the host 701, and according to the creation instruction in the NVMe controller 702,
- the at least one I / O submission queue is set in the second memory 704, and an association between an identifier of each I / O submission queue and address information of each I / O submission queue in the memory of the NVMe controller is recorded. relationship.
- the host 701 is configured to negotiate with the NVMe controller 702 the maximum aggregated SQE number MCS in each I / O submission queue, and the MCS obtained through the negotiation is supported by the NVMe controller The smaller of the maximum aggregated SQEs in each I / O submission queue and the maximum aggregated SQEs in each I / O submission queue supported by the host.
- the first PCIe message further includes depth information M of the target I / O submission queue, where M indicates the number of SQEs carried in the first PCIe message, 1 ⁇ M ⁇ MCS ;
- the NVMe controller 702 is further configured to determine a preset order of M SQEs, and save the M SQEs to the target I / O submission queue according to the preset order of the M SQEs.
- At least one I / O completion queue is set in the first memory of the host 701;
- the NVMe controller 702 is further configured to obtain the at least one SQE from the target I / O submission queue, and manage the NVMe controller 702 according to the data operation request carried in the at least one SQE. Read or write to the storage medium; send a second PCIe message to the host 701, where the second PCIe message includes entry information of the target I / O completion queue and at least one completion queue entry CQE, one CQE The operation result corresponding to a data operation request;
- the host 701 is further configured to receive a second PCIe message sent by the NVMe controller 702, and store the at least one completion queue entry SQE to the target I / O according to the entry information of the target I / O completion queue. Complete the queue.
- the entry information of the target I / O completion queue is a unique second PCIe address in the host-addressable PCIe address space
- the host 701 is further configured to determine a third address according to the second PCIe address, where the third address is an address of the target I / O completion queue stored in a memory of the host; according to the third address Storing the at least one CQE to the target I / O completion queue.
- the host 701 is further configured to determine an identifier of the target I / O completion queue according to the second PCIe address, and determine the third address according to the identifier of the target I / O completion queue.
- the host 701 is further configured to calculate an identifier of the target I / O completion queue according to the following formula:
- the ADD 12 is the second PCIe address of the target I / O completion cross queue
- the ADD 22 is a continuous partition of the host in the addressable PCIe address space for identifying each I / O completion queue.
- the starting address of the address space, the MCC is the maximum number of aggregated CQEs in each I / O completion queue;
- the host 701 is further configured to store the at least one CQE to a memory of the host according to an identifier of the target I / O completion queue.
- the NVMe controller 702 is further configured to use the operation results of at least two data operation requests as the payload data of the second PCIe message when a CQE aggregation condition is satisfied, and one CQE corresponds to one data operation.
- the CQE aggregation conditions include that the maximum aggregated CQE size MCCB meets a third threshold or the I / O completion queue aggregation timer CQT recording duration satisfies a fourth threshold.
- the second message further includes depth information N of the target I / O completion queue, where N indicates the number of CQEs carried in the second PCIe message;
- the NVMe controller 702 is further configured to use N CQEs as payload data in the second PCIe message according to a preset sequence, where 1 ⁇ N ⁇ MCC, and the MCC is a positive integer.
- the host 701 is an embedded processor, and the embedded processor supports sending a PCIE message with a capacity of at least 64 bytes.
- the host 701 may be a CPU, and the host 701 may also be another general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), and a field programmable gate array (FPGA). ) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general-purpose processor may be a microprocessor or any conventional processor.
- the first memory 703 may include a read-only memory and a random access memory. Non-volatile random access memory may also be included.
- the second memory 704 may include a read-only memory and a random access memory. Non-volatile random access memory may also be included.
- the bus 705 may also include a power bus, a control bus, and a status signal bus. However, for the sake of clarity, various buses are marked as the bus 705 in the figure.
- the storage system 700 corresponds to the storage system 100 shown in FIG. 1 according to the embodiment of the present invention.
- the storage device 700 is configured to implement the corresponding processes of the methods shown in FIG. 2 and FIG. Concise, I won't repeat them here.
- the data processing storage system 700 may correspond to the NVMe controller 500 and the host 600 of the data processing in the embodiment of the present invention, and may correspond to executing FIG. 2 and FIG. 2 in the embodiment of the present invention.
- the corresponding subjects in the method shown in FIG. 3, and the above and other operations and / or functions of the various modules in the storage system 700 are respectively to implement the corresponding processes of the methods in FIG. 2 to FIG. To repeat.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from a website site, a computer, a server, or a data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, and the like, including one or more sets of available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium.
- the semiconductor medium may be a solid state drive (SSD).
- the disclosed systems, devices, and methods may be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the unit is only a logical function division.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Information Transfer Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Bus Control (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本申请提供一种数据处理的方法和系统,该方法包括:NVMe控制器接收主机发送的第一PCIe报文,其中,NVMe控制器的存储器中设置有至少一个输入输出I/O提交队列,第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE,;根据目标I/O提交队列的入口信息,将至少一个SQE保存至所述目标I/O提交队列。以此简化NVMe数据处理过程,减少数据处理过程耗时,提升数据处理效率。
Description
本申请涉及存储领域,尤其涉及一种数据处理的方法和存储系统。
随着存储技术的发展,尤其是在使用固态硬盘(solid state drive,SSD)作为存储介质的存储设备中,伴随着SSD性能不断提升,传统的机械硬盘设计的串行高级技术附件(serial advanced technology attachment,SATA)接口与串行ATA高级主控接口/高级主机控制器接口(serial ATA advanced host controller interface,AHCI)标准已经无法满足SSD的要求,成为限制SSD处理能力的一大瓶颈。非易失性存储总线标准(non-volatile memory express,NVMe)应运而生,NVMe是一种允许主机(host)和非易失性存储(non-volatile memory,NVM)子系统通信的接口,NVM子系统(包括控制器和至少一个SSD)通信的该接口以寄存器接口的方式附加到高速串行计算机扩展总线标准(Peripheral Component Interconnect express,PCIe)接口之上,为企业级和消费级固态存储做了优化具有性能高、访问时延低的优势。
NVMe数据处理过程中,主机在其内存中创建输入/输出(input/output,I/O)提交队列和I/O完成队列,NVMe基于成对的I/O提交队列和I/O完成队列机制完成数据处理。I/O提交队列是一个用于存储待控制器执行的一个或多个数据操作请求的环形缓冲区。I/O完成队列则是一个用于存放控制器已完成数据操作请求的操作结果的环形缓冲区。每个I/O提交队列对应一个I/O完成队列,同一个I/O完成队列可以对应多个I/O提交队列。I/O完成队列和I/O提交队列由主机指定匹配关系,每个I/O提交队列中待执行数据操作请求的操作结果都会存储至一个指定I/O完成队列中。NVMe数据处理的具体过程包括:当主机接收到一个或多个待执行数据操作请求时,首先,主机将其存储至I/O提交队列;然后,主机更新I/O提交队列尾门铃寄存器(位于NVMe控制器的存储区域),通过门铃通知NVMe控制器有待执行数据操作请求;NVMe控制器通过直接内存存取(direct memory access,DMA)读的方式获取I/O提交队列中待执行数据操作请求;NVMe控制器在处理完成上述数据操作请求后,再通过DMA写的方式将操作结果存储至一个I/O完成队列,该I/O完成队列是与NVMe控制器获取数据操作请求的I/O接收队列匹配的I/O完成队列。NVMe控制器每次向I/O提交队列中存储数据操作请求的操作结果时,都会先向主机发送中断请求,以通知主机已完成一条数据操作请求。上述过程需要主机和NVMe控制器利用门铃机制和中断方式通知对方,数据处理过程复杂。
发明内容
本申请提供了一种数据处理的方法、装置和存储系统,能够解决传统技术中数据 处理过程复杂的问题。
第一方面,本申请提供了一种数据处理的方法,该方法包括:非易失性存储总线标准NVMe控制器与主机之间通过高速串行计算机扩展PCIe总线通信,NVMe控制器接收主机发送的第一PCIe报文,根据目标I/O提交队列的入口信息,将至少一个提交队列条目SQE保存至目标I/O提交队列。其中,NVMe控制器的存储器中设置有至少一个输入输出I/O提交队列,第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE,一个SQE对应一个数据操作请求,每个数据操作请求用于对所述NVMe管理的存储介质进行读操作或写操作。
本申请中,由于在NVMe控制器侧实现I/O提交队列的数据存储,主机直接将SQE通过PCIe报文发送至NVMe控制器,因此避免了主机每完成一个SQE的操作就要主机并不感知I/O提交队列的数据结构和存储位置,主机和NVMe控制器之间通过基于目标I/O提交队列的入口信息进行通信,NVMe控制器可以根据该目标I/O提交队列的入口信息存储SQE。可以取消传统技术中门铃机制,简化数据处理过程。
在一种可能的实现方式中,目标I/O提交队列的入口信息为主机可寻址的PCIe地址空间中唯一的第一PCIe地址,则NVMe控制器根据目标I/O提交队列的入口信息,将至少一个SQE保存至目标I/O提交队列的过程包括:根据所述第一PCIe地址确定第二地址,第二地址为NVMe控制器的存储器中存储目标I/O提交队列的地址;根据所述第二地址将所述至少一个SQE存储至所述目标I/O提交队列。
在一种可能的实现方式中,NVMe控制器根据所述第一PCIe地址确定第二地址的过程包括:先根据目标I/O提交队列的第一PCIe地址确定目标I/O提交队列的标识,再根据目标I/O提交队列的标识确定所述第二地址。
本申请中,利用主机可寻址的PCIe地址空间中PCIe地址标记各个I/O提交队列,每个I/O提交队列分配一个PCIe地址,利用该PCIe地址作为I/O提交队列的入口信息,NVMe控制器可以基于该PCIe地址将至少一个SQE存储至目标I/O提交队列。
在另一种可能的实现方式中,NVMe控制器根据以下公式计算目标I/O提交队列的标识:
其中,ADD
11为目标I/O提交队列的第一PCIe地址,ADD
21为所述主机在可寻址的PCIe地址空间中划分的用于标识各个I/O提交队列的连续的地址空间的起始地址,MCS为每个I/O提交队列中最大可聚合的SQE数量。通过上述公式的计算,NVMe控制器可以确定目标I/O提交队列的标识,以及该I/O提交队列在NVMe控制器的存储器中存储位置,进而将至少一个SQE存储至目标I/O提交队列。
在另一种可能的实现方式中,在NVMe控制器接收主机的第一PCIe报文之前,NVMe控制器接收主机的创建指示,根据创建指示在所述NVMe控制器的存储器中设置至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列在NVMe控制器的存储器中地址信息之间的关联关系。通过上述过程方法,NVMe控制器可以根据业务需求创建至少一个I/O提交队列,实现I/O提交队列中数据存储。
在另一种可能的实现方式中,在NVMe控制器接收主机的创建指示之前,NVMe控 制器与主机协商每个I/O提交队列中最大可聚合的SQE数量MCS,协商得到的MCS为NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。NVMe控制器和主机可以通过协商MCS的方式确定主机每次可以推送SQE的最大个数,也就是,在同一个PCIe可携带SQE的最大数量,通过这种推送的方式,可以减少主机和NVMe控制器之间报文的数量,提升数据处理的效率。
在另一种可能的实现方式中,在NVMe控制器接收主机的创建指示之前,NVMe控制器与主机协商每个I/O提交队列中最大可聚合的CQE数量MCC,协商得到的MCC为NVMe控制器所支持的每个I/O完成队列中最大可聚合的CQE数量和主机所支持的每个I/O完成队列中最大可聚合的CQE数量中的较小者。NVMe控制器和主机可以通过协商MCC的方式确定NVMe控制器每次可以推送CQE的最大个数,也就是,在同一个PCIe可携带CQE的最大数量,通过这种推送的方式,可以减少主机和NVMe控制器之间报文的数量,提升数据处理的效率。
可选地,主机和NVMe控制器协商MCC的过程,可以和协商MCS的过程一并执行,即在主机发送的协商请求报文中既包括协商MCS的内容,也包括协商MCC的内容;NVMe控制器返回的协商请求报文的响应结果中也同时包含其确认的MCS和MCC。或者,主机和NVMe控制器协商MCC的过程,可以和协商MCS的过程分开执行。利用不同协商请求报文确认MCC。
在另一种可能的实现方式中,第一PCIe报文中还包括目标I/O提交队列的深度信息M,M指示第一PCIe报文中携带的SQE的数量,1≤M≤MCS;则NVMe控制器将至少一个SQE保存至目标I/O提交队列,包括:确定M个SQE的预设顺序,根据M个SQE的预设顺序将M个SQE保存至目标I/O提交队列。M个SQE的预设顺序是指主机接收数据操作请求的顺序,通过上述过程的描述,NVMe控制器可以按照主机接收数据操作请求的顺序存储SQE,保证NVMe的存储器中存储的I/O提交队列中SQE的顺序与主机接收数据操作请求的顺序一致。
在另一种可能的实现方式中,主机的存储器中设置有至少一个I/O完成队列,NVMe控制器从目标I/O提交队列中获取至少一个SQE,并根据至少一个SQE中携带的数据操作请求在NVME控制器管理的存储介质进行读取或写入操作;向主机发送第二PCIe报文,第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,每个CQE为NVMe控制器执行每个SQE中携带的数据操作请求的操作结果。通过上述过程的描述,NVMe控制器也可以采用基于目标I/O完成队列的入口信息的方式将至少一个CQE存储至目标I/O完成队列,由此取消传统技术方案中中断机制,简化数据处理过程,提升处理效率。
在另一种可能的实现方式中,当满足CQE聚合条件时,NVMe控制器将至少两个数据操作请求的操作结果作为第二PCIe报文的净荷数据,一个CQE对应一个数据操作请求的操作结果;其中,CQE聚合条件包括最大可聚合CQE大小MCCB满足第一阈值或I/O完成队列聚合定时器CQT记录时间满足第二阈值。NVMe控制器可以通过推送的方式将多个CQE利用同一个PCIe报文发送给主机,也就是说,NVMe控制器将多个SQE同时作为一个PCIe报文的净荷数据,向主机发送该报文,由此减少主机和NVMe控制 器之间的报文数量,提升处理效率。
在另一种可能的实现方式中,第二报文还包括目标I/O完成队列的深度信息N,N指示所述第二PCIe报文中携带的CQE的数量;NVMe控制器按照预设顺序将N个CQE作为第二PCIe报文中净荷数据,其中1≤N≤MCC,所述MCC是正整数。N个CQE的预设顺序是指NVMe控制器完成对应的SQE生成的操作结果的顺序,第二PCIe报文中通过该预设顺序保证主机存储操作结果的顺序与NVMe实际生成的操作结果的顺序一致。
在另一种可能的实现方式中,主机为嵌入式处理器,嵌入式处理器支持发送容量为至少64字节的PCIE报文。
作为一种可能的实施例,主机除了利用基地址寄存器中一段连续的第一地址空间为多个I/O提交队列分配PCIe地址外,也可以利用基地址寄存器中不连续的若干个存储区间表示各个I/O提交队列。此时,主机和NVMe控制器中记录有每个I/O提交队列所分配的地址和I/O提交队列的标识的映射关系,主机可以根据该映射关系向NVMe控制器发送第一PCIe报文,此时,第一PCIe报文中包括主机为目标I/O提交队列所分配的PCIe地址,该PCIe地址作为目标I/O提交队列的入口信息。NVMe控制器则可以解析第一PCIe报文获取主机为目标I/O提交队列所分配的PCIe地址字段,再根据该映射关系确定该PCIe地址字段所对应的I/O提交队列的标识,然后,根据该I/O提交队列的标识将第一PCIe报文中至少一个SQE存储至目标I/O提交队列。以此完成NVMe控制器基于目标I/O提交队列的入口信息存储至少一个SQE的过程,上述过程也可以避免使用传统技术中门铃机制,简化数据处理过程。而且,主机可以通过推送的方式,通过SQE聚合条件利用同一个PCIe报文向NVMe控制器发送多个SQE,减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一种可能的实施例,主机可寻址的PCIe地址空间包括主机的存储器的地址空间和主机中PCIe基地址寄存器的地址空间。除了上述步骤中,利用基地址寄存器的地址空间映射各个I/O提交队列的标识外,本发明实施例还可以利用主机的存储器地址空间中的地址映射各个I/O提交队列的标识,每个I/O提交队列的标识对应唯一的PCIe地址。此时,主机中均存储有存储器地址空间中地址和I/O提交队列的标识的映射关系,主机可以根据该映射关系向NVMe控制器发送携带至少一个SQE的PCIe报文,NVMe控制器也可以根据该映射关系根据PCIe报文中携带的映射地址确定其对应的I/O提交队列的标识,进而将至少一个SQE存储至目标I/O提交队列。通过上述方法,也可以取消传统技术中门铃机制,简化数据处理过程。再结合SQE聚合条件利用同一个PCIe报文向NVMe控制器发送多个SQE,也可以减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一个可能的实施例,除了上述基于PCIe协议利用地址映射I/O提交队列的标识外,主机和NVMe控制器也可以按照预先约定,利用PCIe报文中指定字段,或者净荷数据的一部分传输目标I/O提交队列的标识给NVMe控制器,NVMe控制器再解析PCIe报文,获取上述指定字段(例如,PCIe报文中预留字段,或净荷数据起始位),并按照预先约定确定该指定字段所表示的I/O提交队列的标识。也可以取消传统技术中门铃机制,简化数据处理过程。
第二方面,本申请提供了一种数据处理的方法,该方法包括:非易失性存储总线 标准NVMe控制器与主机之间通过高速串行计算机扩展PCIe总线通信,主机根据待发送的数据操作请求的目标输入输出I/O提交队列的标识,确定目标I/O提交队列的入口信息;向NVMe控制器发送第一PCIe报文。其中,NVMe控制器的存储器中设置有至少一个I/O提交队列;第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE,一个SQE对应一个数据操作请求,每个数据操作请求用于对NVMe控制器管理的存储介质进行读操作或写操作。
在一种可能的实现方式中,主机为每个I/O提交队列分配一个在主机可寻址的PCIe地址空间中唯一的PCIe地址,目标I/O提交队列的入口信息为目标I/O提交队列的第一PCIe地址,主机可寻址的PCIe地址空间为主机的存储器的存储空间或者主机中的PCIe基地址寄存器中的地址空间;根据目标I/O提交队列的标识,确定目标I/O提交队列的第一PCIe地址。
在另一种可能的实现方式中,主机向NVMe控制器发送创建指示,创建指示用于指示NVMe控制器在NVMe控制器的存储器中设置至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列的第一PCIe地址之间的关联关系。
在另一种可能的实现方式中,在主机向NVMe控制器发送创建指示之前,主机与NVMe控制器协商每个I/O提交队列中最大可聚合的SQE数量MCS,协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。
在另一种可能的实现方式中,第一PCIe报文中还包括目标I/O提交队列的深度信息M,M指示第一PCIe报文中携带的SQE的数量,1≤M≤MCS。
在另一种可能的实现方式中,在发送第一PCIe报文之前,当满足SQE聚合条件时,主机将至少两个SQE作为第一PCIe报文的净荷数据,一个数据操作请求对应一个SQE,每个数据操作请求用于对所述NVMe管理的存储介质进行读操作或写操作;其中,SQE聚合条件包括最大可聚合SQE大小MCSB满足第三阈值或I/O提交队列聚合定时器SQT记录时间满足第四阈值。
在另一种可能的实现方式中,主机接收NVMe控制器发送的第二PCIe报文,第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,每个CQE为NVMe控制器执行每个SQE中携带的数据操作请求的操作结果;根据目标I/O完成队列的入口信息将至少一个完成队列条目存储至目标I/O完成队列。
在另一种可能的实现方式中,目标I/O完成队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第二PCIe地址;则主机根据目标I/O完成队列的入口信息将至少一个CQE存储至目标I/O完成队列的过程包括:首先,根据所述第二PCIe地址确定第三地址,第三地址为所述主机的存储器中存储所述目标I/O完成队列的地址;再根据所述第三地址将所述至少一个CQE存储至所述目标I/O完成队列。
在另一种可能的实现方式中,主机根据第二PCIe地址确定第三地址的过程包括:先根据第二PCIe地址确定目标I/O完成队列的标识,再根据目标I/O完成队列的标识确定所述第三地址。
在另一种可能的实现方式中,主机根据以下公式计算目标I/O完成队列的标识:
其中,ADD
12为目标I/O完成交队列的第二PCIe地址,ADD
22为在所述主机可寻址的PCIe地址空间中划分的用于标识各个I/O完成队列的连续的地址空间的的起始地址,MCC为每个I/O完成队列中最大可聚合的CQE数量;
主机根据所述目标I/O完成队列的标识将所述至少一个CQE存储至主机的存储器。
在另一种可能的实现方式中,主机为嵌入式处理器,嵌入式处理器支持发送容量为至少64字节的PCIE报文。
作为一种可能的实施例,主机除了利用基地址寄存器中一段连续的第一地址空间为多个I/O完成队列分配PCIe地址外,也可以利用基地址寄存器中不连续的若干个存储区间表示各个I/O完成队列。此时,主机和NVMe控制器中记录有每个I/O完成队列所分配的地址和I/O完成队列的标识的映射关系,NVMe控制器可以根据该映射关系向主机发送第二PCIe报文,此时,第二PCIe报文中包括主机为目标I/O完成队列所分配的PCIe地址,该PCIe地址作为目标I/O完成队列的入口信息。主机则可以解析第二PCIe报文获取目标I/O提交队列所分配的PCIe地址字段,再根据该映射关系确定该PCIe地址字段所对应的I/O完成队列的标识,然后,根据该I/O完成队列的标识将第二PCIe报文中至少一个CQE存储至目标I/O完成队列。以此完成主机器基于目标I/O完成队列的入口信息存储至少一个CQE的过程,上述过程也可以取消传统技术中中断机制,简化数据处理过程。而且,NVMe控制器可以通过推送的方式,通过图3所示方法中CQE聚合方式,利用同一个PCIe报文向主机发送多个CQE,减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一种可能的实施例,主机可寻址的PCIe地址空间包括主机的存储器的地址空间和主机中PCIe基地址寄存器的地址空间。除了上述步骤中,利用基地址寄存器的地址空间映射各个I/O完成队列的标识外,本发明实施例还可以利用主机的存储器地址空间中的地址映射各个I/O完成队列的标识,每个I/O完成队列的标识对应唯一的地址。此时,主机和NVMe控制器中存储有存储器地址空间中地址和I/O完成队列的标识的映射关系,NVMe控制器可以根据该映射关系向主机发送携带至少一个CQE的PCIe报文,主机也可以根据该映射关系根据PCIe报文中携带的映射地址确定其对应的I/O完成队列的标识,进而将至少一个CQE存储至目标I/O完成队列。通过上述方法,也可以取消传统技术中中断机制,简化数据处理过程。再结合图3所示的CQE聚合方式,利用同一个PCIe报文向主机发送多个CQE,也可以减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一种可能的实现方式,除了上述基于PCIe标准利用地址标识目标I/O完成队列的标识外,NVMe控制器也可以按照预先约定,利用PCIe报文中指定字段,或者净荷数据的一部分传输目标I/O完成队列的标识给主机,主机再按照预先约定利用上述指定字段(例如,PCIe报文中预留字段,或净荷数据起始位)解析PCIe报文,获取该PCIe报文中携带的目标I/O完成队列的标识,并将操作结果存储至I/O完成队列。
作为另一种可能的实现方式,主机中记录有I/O提交队列和I/O完成队列的对应关系。相应地,第二报文也可以直接携带目标I/O提交队列的标识,主机在接收第二报文后可以获取目标I/O提交队列的标识,然后根据I/O提交队列和I/O完成队列的 对应关系确定目标I/O完成队列,再将第二报文中携带的至少一个CQE存储至目标I/O完成队列。
第三方面,本申请提供了一种数据处理的方法,该方法包括:非易失性存储总线标准NVMe控制器与主机之间通过高速串行计算机扩展总线标准PCIe通信,NVMe控制器接收主机发送的第一PCIe报文,根据目标I/O提交队列的入口信息,将至少一个SQE保存至目标I/O提交队列。其中,NVMe控制器的存储器中设置有至少一个输入输出I/O提交队列,第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE,一个SQE对应一个数据操作请求,每个数据操作请求用于对所述NVMe管理的存储介质进行读操作或写操作。
本申请中,由于在NVMe控制器侧实现I/O提交队列的数据存储,主机直接将SQE通过PCIe报文发送至NVMe控制器,因此避免了主机每完成一个SQE的操作就要主机并不感知I/O提交队列的数据结构和存储位置,主机和NVMe控制器之间通过基于目标I/O提交队列的入口信息进行通信,NVMe控制器可以根据该目标I/O提交队列的入口信息存储SQE。可以取消传统技术中门铃机制,简化数据处理过程。在一种可能的实现方式中,目标I/O提交队列的入口信息为主机可寻址的PCIe地址空间中唯一的第一PCIe地址,则NVMe控制器根据目标I/O提交队列的入口信息,将至少一个SQE保存至目标I/O提交队列的过程包括:根据所述第一PCIe地址确定第二地址,第二地址为NVMe控制器的存储器中存储目标I/O提交队列的地址;根据所述第二地址将所述至少一个SQE存储至所述目标I/O提交队列。
在一种可能的实现方式中,NVMe控制器根据所述第一PCIe地址确定第二地址的过程包括:先根据目标I/O提交队列的第一PCIe地址确定目标I/O提交队列的标识,再根据目标I/O提交队列的标识确定所述第二地址。
本申请中,利用主机可寻址的PCIe地址空间中一段地址标记各个I/O提交队列,每个I/O提交队列分配一个PCIe地址,利用该PCIe地址作为I/O提交队列的入口信息,NVMe控制器可以基于该PCIe地址将至少一个SQE存储至目标I/O提交队列。
在另一种可能的实现方式中,NVMe控制器根据以下公式计算目标I/O提交队列的标识:
其中,ADD
11为目标I/O提交队列的第一PCIe地址,ADD
21为所述主机在可寻址的PCIe地址空间中划分的用于标识各个I/O提交队列的连续的地址空间的起始地址,MCS为每个I/O提交队列中最大可聚合的SQE数量。通过上述公式的计算,NVMe控制器可以确定目标I/O提交队列的标识,根据该目标I/O提交队列的标识确定目标I/O提交队列在NVMe控制器中存储位置,进而将至少一个SQE存储至目标I/O提交队列。
在另一种可能的实现方式中,在NVMe控制器接收主机的第一PCIe报文之前,NVMe控制器接收主机的创建指示,根据创建指示在所述NVMe控制器的存储器中设置至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列在NVMe控制器的存储器中地址信息之间的关联关系。通过上述过程方法,NVMe控制器可以根据业务需求创建至少一个I/O提交队列,实现I/O提交队列中数据存储。
在另一种可能的实现方式中,在NVMe控制器接收主机的创建指示之前,NVMe控制器与主机协商每个I/O提交队列中最大可聚合的SQE数量MCS,协商得到的MCS为NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。NVMe控制器和主机可以通过协商MCS的方式确定每次主机可以推送SQE的最大个数,也就是,在同一个PCIe可携带SQE的最大数量,通过这种推送的方式,可以减少主机和NVMe控制器之间报文的数量,提升数据处理的效率。
在另一种可能的实现方式中,第一PCIe报文中还包括目标I/O提交队列的深度信息M,M指示第一PCIe报文中携带的SQE的数量,1≤M≤MCS;则NVMe控制器将至少一个SQE保存至目标I/O提交队列,包括:确定M个SQE的预设顺序,根据M个SQE的预设顺序将M个SQE保存至目标I/O提交队列。M个SQE的预设顺序是指主机接收数据操作请求的顺序,通过上述过程的描述,NVMe控制器可以按照主机接收数据操作请求的顺序存储SQE,保证NVMe的存储器中存储的I/O提交队列中SQE的顺序与主机接收数据操作请求的顺序一致。
第四方面,本申请提供一种数据处理的方法,该方法包括:主机的存储器中设置有至少一个I/O完成队列,NVMe控制器从目标I/O提交队列中获取至少一个SQE,并根据至少一个SQE中携带的数据操作请求在NVME控制器管理的存储介质进行读取或写入操作;向主机发送第二PCIe报文,第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,每个CQE为NVMe控制器执行每个SQE中携带的数据操作请求的操作结果。通过上述过程的描述,NVMe控制器也可以采用基于目标I/O完成队列的入口信息的方式将至少一个CQE存储至目标I/O完成队列,由此取消传统技术方案中中断机制,简化数据处理过程,提升处理效率。
在一种可能的实现方式中,当满足CQE聚合条件时,NVMe控制器将至少两个数据操作请求的操作结果作为第二PCIe报文的净荷数据,一个CQE对应一个数据操作请求的操作结果;其中,CQE聚合条件包括最大可聚合CQE大小MCCB满足第一阈值或I/O完成队列聚合定时器CQT记录时间满足第二阈值。NVMe控制器可以通过推送的方式将多个CQE利用同一个PCIe报文发送给主机,也就是说,NVMe控制器将多个SQE同时作为一个PCIe报文的净荷数据,向主机发送该报文,由此减少主机和NVMe控制器之间的报文数量,提升处理效率。
在另一种可能的实现方式中,第二报文还包括目标I/O完成队列的深度信息N,N指示所述第二PCIe报文中携带的CQE的数量;NVMe控制器按照预设顺序将N个CQE作为第二PCIe报文中净荷数据,其中1≤N≤MCC,所述MCC是正整数。N个CQE的预设顺序是指NVMe控制器完成对应的SQE生成的操作结果的顺序,第二PCIe报文中通过该预设顺序保证主机存储操作结果的顺序与NVMe实际生成的操作结果的顺序一致。
在另一种可能的实现方式中,主机为嵌入式处理器,嵌入式处理器支持发送容量为至少64字节的PCIE报文。
第五方面,本申请提供了一种数据处理的装置,所述装置包括用于执行上述各个方面或各个方面的任一种可能实现方式中的故障处理方法的各个模块。
第六方面,本申请提供了一种数据处理的存储系统,该存储系统包括主机和NVMe 控制器,第一存储器和第二存储器,主机、NVMe控制器、第一存储器和第二存储器通过高速串行计算机扩展PCIe总线通信,所述第一存储器中存储有主机执行的计算机指令和实现I/O完成队列的数据存储,所述第二存储器中存储有NVMe控制器执行的计算机指令和实现I/O提交队列的数据存储,所述存储系统运行时,所述NVMe控制器用于执行所述第二存储器中的计算机执行指令以利用所述存储系统中的硬件资源执行第一方面或第一方面任一种可能的实现方式,以及第三方面或第三方面任一种可能的实现方式中的方法的操作步骤,所述主机用于执行所述第一存储器中的计算机执行指令以利用所述存储系统中的硬件资源执行第二方面或第二方面任一种可能的实现方式,以及第四方面或第四方面中任一种可能的实现方式中的方法的操作步骤。
第七方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,是的计算机执行上述各方面所述的方法。
第八方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
图1是本发明实施例提供的一种存储设备的架构示意图。
图2是本发明实施例提供的一种基于NVMe的数据处理的方法的示意性流程图。
图3是本发明实施例提供的另一种基于NVMe的数据处理的方法的示意性流程图。
图4A是本发明实施例提供的一种主机在基地址寄存器的地址空间中为I/O提交队列分配PCIe地址的示意图;
图4B是本发明实施例提供的一种主机在基地址寄存器的地址空间中为I/O完成队列分配PCIe地址的示意图;
图5是本发明实施例提供的一种NVMe控制器的结构示意图。
图6是本发明实施例提供的一种主机的结构示意图。
图7是本发明实施例提供的一种存储系统的结构示意图。
下面将结合本发明实施例中的附图,介绍本发明实施例中的技术方案
图1为本发明实施例提供的一种存储系统的架构示意图。如图所示,该存储系统100中包括主机101、NVMe控制器102、至少一个固态硬盘(solidstate drive,SSD)103,第一存储器104和第二存储器105,主机和NVMe控制器均配置有存储器。为便于后续描述,将主机的存储器称为第一存储器104,将NVMe控制器的存储器称为第二存储器105。主机101、NVMe控制器102、至少一个固态硬盘(solid state drive,SSD)103,第一存储器104和第二存储器105之间通过高速串行计算机扩展总线标准(Peripheral Component Interconnect express,PCIe)相通信。
本发明实施例中主机101为一个处理器,该处理器可以为中央处理单元(central process unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。该处理还可以为一种片上芯片(system of chip,SoC)或者嵌入式处理器。该处理器支持发送容量至少64字节的PCIe报文。第一存储器104和第二存储器105可以由随机存取器(random access memory,RAM)或其他存储介质实现。在该存储系统中,可以将NVMe控制器102和至少一个固态硬盘103统称为NVM子系统,NVMe子系统用于接收和执行主机发送的数据操作请求,每个数据操作请求用于对NVMe控制器102管理的固态硬盘103进行读操作或写操作。
应理解,图1所示的存储系统的架构中,NVMe控制器102可以为一个或多个,图中仅示出一个。当存储系统中包含多个NVMe控制器时,会有一个主用状态的NVMe控制器与主机进行通信,其他NVMe控制器作为备用状态的NVMe控制器,当主用状态的NVMe控制器故障时,再由一个备用状态的NVMe控制器升级为主用状态的NVMe控制器。本申请实施例仅以存储系统中包含一个NVMe控制器为例进行描述。
NVMe的数据处理过程是基于输入/输出(input/output,I/O)提交队列(submission queue,SQ)和I/O完成队列(complete queue,CQ)实现的。图1B为一种I/O提交队列和I/O完成队列逻辑结构的示意图。I/O提交队列用于存储数据操作请求,该数据操作请求具体为上层应用调用与主机接口发送给主机,包括读取SSD中存储数据的请求,以及向SSD中写入数据的请求。具体地,主机101的第一存储器104用于实现I/O完成队列的数据存储,NVMe控制器102的第二存储器105用于实现I/O提交队列的数据存储。
I/O提交队列是一个逻辑概念,由一个或多个单元组成,每个单元存储一个数据操作请求,每个数据操作请求可以用最大64字节大小的存储空间存储。I/O提交队列对应一个用于存储一个或多个数据操作请求的环形缓冲区,具体可以利用内存区域页(physical region page,PRG)或散列链(scatter gather list,SGL)表示。每个数据操作请求(也可以称为提交队列条目(send queue entry,SQE)或提交队列元素(send queue element,SQE))可以存储至I/O提交队列的一个单元,该单元可以称为I/O提交队列的一个槽位(slot),每个槽位对应缓冲区中两个PRG或一个SGL。I/O提交队列设置有头指针和尾指针,头指针用于指示当前时刻可以被取走的SQE的槽位,尾指针用于指示当前时刻可以存储新增SQE的槽位。初始化阶段,头指针=尾指针=0,I/O提交队列中每新增一个SQE时,尾指针加1。I/O提交队列中每次被取走一个SQE时,头指针加1。待执行数据操作请求需要按照接收的先后顺序逐个存储至提交队列的槽位中,再按照先进先出(first in first out,FIFO)的顺序被逐一读取。
I/O完成队列则是一个用于存放NVMe控制器已完成数据操作请求的操作结果的环形缓冲区。与I/O提交队列的结构类似,I/O完成队列也是一个逻辑概念,由一个或多个单元组成,每个单元可以称为一个槽位。I/O完成队列也对应一个用于存储一个或多个数据操作请求的操作结果的环形缓冲区,具体可以利用PRG或者SGL表示,每个数据操作请求的操作结果也可以称为一个完成队列条目(complete queue entry,CQE)或完成队列元素(complete queue element,CQE)。每个I/O提交队列对应一 个I/O完成队列,同一个I/O完成队列可以对应多个I/O提交队列。I/O完成队列和I/O提交队列的匹配关系由主机指定,每个I/O提交队列中数据操作请求的操作结果都会存储至一个指定的I/O完成队列中。
进一步地,NVMe数据处理过程中还包括管理提交队列和管理完成队列。其中,管理提交队列用于存储主机和NVMe控制器的管理请求。例如,主机创建I/O提交队列的请求即可存储至管理提交队列。管理完成队列用于存储NVMe控制器已完成的数据操作请求的操作结果。具体地,管理提交队列和管理完成队列可以存储在主机的第一存储器中。管理提交队列和管理完成队列的逻辑结构与I/O提交队列和I/O完成队列的形式类似,在此不再赘述。
传统的NVMe数据处理过程中,由主机的存储器实现I/O提交队列和I/O完成队列,而且主机和NVMe控制器之间通信需要采用门铃机制和中断方式通知对方,I/O提交队列中存在数据操作请求,数据操作请求的操作结果存储至I/O完成队列。整个NVMe数据处理过程存在处理过程复杂的问题。
在本发明的以下实施例中,主要对I/O提交队列和I/O完成队列的实现过程进行改进,管理接收队列和管理完成队列仍采用传统的技术方案实现。即管理接收队列和管理完成队列仍存储至主机的存储器中,数据处理过程中,主机先将管理请求存储至管理接收队列,然后通过门铃机制更新NVMe控制器中提交队列尾门铃寄存器状态,通知NVMe控制器获取待执行管理请求;接着,NVMe控制器获取并执行待执行管理请求,生成待执行管理请求的操作结果;再通过中断形式通知主机已完成待执行管理请求的操作;最后通过DMA写方式将待执行管理请求的操作结果存储至管理完成队列。
本发明实施例提供一种数据处理的方法,由NVMe控制器102的第二存储器105(例如,RAM)存储I/O提交队列,主机采用基于I/O提交队列的入口信息方式,通过PCIe报文将数据操作请求发送给NVMe控制器102,由NVMe控制器102基于I/O提交队列的入口信息将数据操作请求存储至其第二存储器105的目标I/O提交队列。NVMe控制器102可以直接从起第二存储器105中获取数据操作请求。相比于传统的技术方案中NVMe控制器需要到主机的存储器中获取数据操作请求的过程,本发明实施例提供的技术方案在PCIe总线中寻址路径更短,数据处理过程耗时更少。而且,基于入口的I/O提交队列访问方式,取消了传统的NVMe数据处理过程中主机需要通过门铃机制通知NVMe控制器获取数据操作请求的过程,简化了数据处理过程。进一步地,主机101可以利用推送的方式,利用一个PCIe报文发送多个数据操作请求至NVMe控制器102,减少了主机101和NVMe控制器102之间通信报文的数量,提升了通信效率。另一方面,NVMe控制器102完成上述数据操作的处理后,可以再采用基于I/O完成队列的入口信息方式,通过PCIe报文将数据操作请求的操作结果存储至主机101的I/O完成队列中,取消传统技术方案中NVMe控制器需要通过中断通知主机有已完成的数据操作请求的操作结果需要存储至I/O完成队列的过程。进一步简化了NVMe数据处理过程,降低NVMe数据处理过程的时长。而且,NVMe控制器也可以采用推送的方式利用同一PCIe报文发送多个数据操作请求的操作结果至主机101,由此减少主机101和NVMe控制器102之间通信报文的数量,提升数据处理效率。
值得说明的是,NVMe数据数据过程中存在多个I/O提交队列和I/O完成队列,为 便于描述,在本发明实施例的以下描述内容中,以一个I/O提交队列和该I/O提交队列关联的I/O完成队列的数据处理过程为例介绍本发明的技术方案。
下面,结合本发明实施例的附图进一步介绍本发明提供的数据处理方法。
图2为本发明实施例提供的一种数据处理方法的流程示意图。如图所示,所述数据处理的方法包括:
S201、主机向NVMe处理器发送第一PCIe报文,第一PCIe报文中包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE。
存储系统中同一时刻可以存在多个I/O提交队列,每个I/O提交队列用于存储不同类型或不同应用程序发送的待执行数据操作请求。当主机接收到数据操作请求时,可以根据预置规则确定该数据操作请求所需存储的I/O提交队列。例如,操作请求中携带应用程序的标识,主机根据该应用功能程序的标识确定其关联的I/O提交队列。具体地,主机是根据业务需求创建的I/O提交队列,也就是说,主机中预置有应用程序和I/O提交队列的匹配关系。当主机接收到数据操作请求时,主机可以根据发送数据请求的应用程序和I/O提交队列的匹配关系,确定该数据操作请求所需存储的至目标I/O提交队列。目标I/O提交队列是指与数据操作请求匹配的I/O提交队列。
进一步地,在发明实施例中,由NVMe控制器的第二存储器实现I/O提交队列的数据存储,主机不感知I/O提交队列的数据结构和存储位置。对主机而言,同一个I/O提交队列的不同SQE都存储至同一个I/O提交队列的不同槽位,类似于通过一个I/O提交队列的入口存储至NVMe的第二存储器,至于NVMe控制器如何将SQE存储至目标SQ的槽位,主机无需获知。入口信息的主要目的用于控制器识别当前接收到的PCIe报文对应的I/O提交队列或者I/O完成队列,它可以是唯一标识一个I/O提交队列的标识或地址,或者其他可以唯一标识一个I/O提交队列的描述内容。另外,入口信息也可以称为入口标识,入口地址或其他形式的命名。
可选地,I/O提交队列还可以划分不同优先级,主机在接收上层应用程序发送的数据操作请求时,可以根据该数据操作请求的类型和优先级将其至指定I/O提交队列。
主机可以理解为一种增强的处理器,该增强的处理器可以为一种嵌入式处理器,该嵌入式处理器支持每次发送最少64字节的PCIe写事物。
S202、NVMe控制器根据目标I/O提交队列的入口信息将第一PCIe报文中至少一个SQE存储至目标I/O提交队列。
本发明实施例中,主机通过基于目标I/O提交队列的入口信息将至少一个SQE发送给NVMe控制器,由NVMe控制器将至少一个SQE存储至其存储器的目标I/O提交队列,NVMe控制器可以直接从其存储器获取并执行SQE,取消传统的技术方案中门铃机制,主机和NVMe控制器仅需要通过基于入口的方式进行通信,数据处理过程更简单。简化了数据处理过程,数据处理的耗时也相应减少。
接下来,结合图3进一步介绍本发明实施例提供的数据处理方法。图3为本申请实施例提供的另一种数据处理的方法的流程示意图。如图所示,所述方法包括:
S301、主机向NVMe控制器发送协商最大聚合SQE数量报文。
S302、NVMe控制器向主机发送协商最大聚合SQE数量请求的响应报文。
S303、主机确定主机和NVMe控制器可支持的最大聚合SQE数量。
在存储系统初始化阶段,主机和NVMe控制器需要先协商最大聚合SQE数量(maximum number of coalesced SQE,MCS)。MCS是指主机和NVMe控制器协商确定的一个PCIe报文中可携带SQE的最大数量。主机和NVMe控制器可以根据其硬件配置预设其支持的MCS,也可以由维护人员根据主机或NVMe控制器的网络通信情况指定其支持的MCS。主机和NVMe控制器支持不同的MCS,可以通过协商的方式确定主机和NVMe控制器可支持的MCS,该主机和NVMe控制器可支持的MCS为主机支持的MCS和NVMe控制器所支持的MCS中较小值。在协商确定MCS后,主机向NVMe控制器发送待执行数据操作请求时,可以在一个PCIe报文中携带多个数据操作请求,以此减少主机和NVMe控制器之间通信报文的数量,提升数据处理效率。
可选地,主机和NVMe控制器还可以利用类似步骤S301至步骤S303协商最大聚合CQE数量(maximum number of coalesced CQE,MCC)。其中,协商MCC的时机可以系统初始化阶段,或者,在创建I/O提交队列之前,或者,在NVMe控制器向主机发送数据操作请求的操作结果之前,本发明实施例不作限制。
与MCS类似,MCC是指主机和NVMe控制器协商确定的一个PCIe报文中可携带CQE的最大数量。主机和NVMe可以根据其硬件配置预设其支持的MCC,也可以由维护人员根据主机或NVMe控制器的网络通信情况指定其支持的MCC。主机和NVMe控制器支持不同的MCC,可以通过协商的方式确定主机和NVMe控制器可支持的MCC,协商确定的主机和NVMe控制器可支持的MCC为主机支持的MCC和NVMe控制器所支持的MCC中较小值。在协商确定MCC后,NVMe控制器在向主机发送数据操作请求的操作结果时,可以在一个PCIe报文中携带多个操作结果,以此减少主机和NVMe控制器之间通信报文的数量,提升数据处理效率。
可选地,MCC的协商过程可以和MCS的协商过程同时处理,也可以分开处理。也就是说,主机可以通过一个协商MCS和MCC请求报文与NVMe控制器协商MCC和MCS,主机也可以通过两个不同的协商MCS和MCC请求报文与NVMe控制器协商MCC和MCS。
在主机和NVMe控制器协商确定MCS后,主机可以按照业务需求,通过步骤S304至步骤S308创建至少一个I/O提交队列。具体过程如下:
S304、主机根据业务需求下发创建至少一个I/O提交队列请求至管理提交队列。
S305、主机通知NVMe控制器管理提交队列中存在待执行请求。
S306、NVMe控制器获取管理提交队列中创建至少一个I/O提交队列请求。
S307、NVMe控制器创建至少一个I/O提交队列。
S308、NVMe控制器向主机发送创建至少一个I/O提交队列请求的操作结果至管理完成队列。
步骤S304至步骤S308为创建至少一个I/O提交队列的过程,本发明实施例中,创建至少一个I/O提交队列请求中包括待创建I/O提交队列的数量。创建I/O提交队列的过程仍采用传统的技术方案中门铃和中断的方式实现主机和NVMe控制器之间的通信。具体地,主机将待执行管理请求存储至管理提交队列中,更新管理提交队列尾门铃寄存器(位于NVMe控制器的第二存储器),通知NVMe控制器有待执行管理请求; NVMe控制器通过直接内存存取(direct memory access,DMA)读的方式获取管理提交队列中待执行管理请求;NVMe控制器在处理完成上述创建至少一个I/O提交队列请求后,向主机发送中断,通知其存在已完成的操作结果待存储至I/O完成队列。再将操作结果通过DMA写的方式存储至对应的管理完成队列。
NVMe控制器在第二存储器创建至少一个I/O提交队列的过程本申请不限定。作为一种可能的实现方式,I/O提交队列的创建过程包括:NVMe控制器根据创建至少一个I/O提交队列请求中待创建提交队列的数量,分别在第二存储器中划分存储空间用于存储每个I/O提交队列的SQE,NVMe控制器记录每个I/O提交队列在第二存储器中位置信息。组成每个完成队列的环形缓冲区,具体可以利用PRG或者SGL表示。另外,实现I/O提交队列的存储空间可以是第二存储器中连续的地址范围,也可以是不连续的地址范围,本发明不作限定。
本发明实施例中创建I/O提交队列的请求与传统技术中创建I/O提交队列的请求类似,但是有以下几点不同:
1)取消PRP Entry 1域指示内容;
2)取消PC域指示内容;
3)QID用于指示SQ的入口标识。与传统的SQ的尾门铃无关。
可选地,创建I/O提交队列的请求中还可以包括每个I/O的队列深度,该队列深度可以设置为MCS。保证每次接收的PCIe报文中SQE都可以顺利存储至目标I/O提交队列。
S309、主机在其可寻址的PCIe地址空间中为每个I/O提交队列分配唯一的PCIe地址。
主机可寻址的PCIe地址空间包括主机存储器的地址空间和NVMe控制器的存储区域中允许主机访问的地址空间,例如NVMe控制器中基地址寄存器的地址空间(base address register,BAR)。主机可以为每个I/O提交队列分配唯一的PCIe地址,在数据处理过程中,主机和NVMe控制器基于该PCIe地址确定目标I/O提交队列。为便于后续描述,在本发明实施例的以下描述中,将目标I/O提交队列所分配的PCIe地址记为第一PCIe地址。
值得说明的是,在PCIe总线中,主机仅能直接访问其内存中存储的数据,无法直接访问NVMe控制器的存储区域。传统的技术方案中,可以利用内存映射输入输出(memory map input output,MMIO)方式实现主机对NVMe控制器中开放的存储区域的访问,其中,NVMe控制器中允许主机访问的开放的存储区域可以称为主机可寻址的存储空间,或称为主机可访问的存储空间。主机中运行有PCIe总线中根复合体(root complex,RC),主机可以通过根复合体访问NVMe控制器的存储区域。具体地,系统把NVMe控制器中允许主机访问的存储区域映射至主机的内存区域,当主机要访问NVMe控制器开放的允许主机访问的存储区域时,根复合体检查数据操作请求中待访问的地址信息,如果发现待访问的地址信息是NVMe控制器允许主机访问的存储区域的映射地址,就会触发其产生事务层报文(transaction layer packet,TLP),并利用该TLP去访问NVMe控制器,执行读取或写入目标数据的操作。另外,NVMe控制器可能有若干个内部允许主机访问的区域(属性可能不一样,比如,有些可预读,有些不可预读) 需要映射到内存区域,这些NVMe控制器开放的存储区域的大小和属性被写到NVMe控制器中的基地址寄存器中。存储系统上电时,系统软件可以读取这些BAR,分别为其分配对应的内存区域,并把响应的内存基地址写回到BAR。
具体地,图4A为本发明实施例提供的一种主机在基地址寄存器的地址空间中为I/O提交队列分配PCIe地址的示意图。如图所示,基地址寄存器X是主机可寻址的地址空间,基地址寄存器X的基地址为基地址100,主机先划分一段连续的地址空间第一地址空间,然后,在第一地址空间中为每个I/O提交队列分配唯一的PCIe地址。其中,第一地址空间也可以称为一个I/O提交队列的孔(aperture)。主机为每个I/O提交队列分配PCIe地址的过程,也可以理解为主机将基地址寄存器中一段连续的地址映射给该I/O提交队列,这段连续的PCIe地址可以用于标识该I/O提交队列。例如,将基地址寄存器X中(基地址100+偏移地址100)至(基地址100+偏移地址100+MCS*64)分配给I/O提交队列0,此时,I/O提交队列0的PCIe地址为(基地址100+偏移地址100),I/O提交队列0的PCIe地址为(基地址100+偏移地址100+MCS*64*0)。将基地址寄存器X中(基地址100+偏移地址100+MCS*64)至(基地址100+偏移地址100+MCS*64*2)分配给I/O提交队列1,此时,I/O提交队列N的PCIe地址为(基地址100+偏移地址100+MCS*64*1)。依此类推,将基地址寄存器X中(基地址100+偏移地址100+MCS*64*N)至(基地址100+偏移地址100+MCS*64*(N+1))分配给I/O提交队列N,此时,I/O提交队列N的PCIe地址为(基地址100+偏移地址100+MCS*64*N)。
值得说明的是,图4A中第一地址空间是基地址寄存器X中一段地址空间,第一地址空间的起始地址可以为基地址寄存器X的基地址,也可以是从基地址寄存器中(基地址+偏移地址),本发明的以下实施例以图4A所示的第一地址空间的起始地址为(基地址+偏移地址)为例进行描述。
主机中记录有以至少一个I/O提交队列的入口标识为索引的一组数据。其中,每个I/O提交队列在PCIe基地址寄存器中分配给每个I/O提交队列唯一的PCIe地址,每个I/O提交队列的PCIe地址指示一个I/O提交队列的入口信息。
作为一个可能的实现方式,创建至少一个提交队列请求中还可以携带I/O提交队列和I/O完成队列的关联关系,以便于NVMe控制器在完成I/O提交队列中SQE的操作后,将操作结果存储至与该I/O提交队列关联的I/O完成队列中。
作为一个可能的实现方式,主机还可以在获知NVMe控制器已完成至少一个I/O提交队列请求时,向NVMe控制器发送I/O提交队列和I/O完成队列的关联关系,以便于NVMe控制器在完成I/O提交队列中SQE的操作后,将操作结果存储至与该I/O提交队列关联的I/O完成队列中。
上述步骤S301至步骤S309介绍了本发明实施例如何创建I/O提交队列,接下来,进一步结合步骤S310至步骤S313介绍本申请实施例如何使用该I/O提交队列。
S310、主机接收至少一个数据操作请求。
主机会接收上层应用程序发送的数据操作请求,每个数据操作请求作为一个SQE存储在目标I/O提交队列的一个槽位中,每个SQE对应一个数据操作请求。
S311、(可选)当满足SQE聚合条件时,主机将至少两个数据处理请求作为第一 PCIe报文的净荷数据。
具体地,SQE聚合条件包括以下条件中至少一种:
条件一:最大SQE聚合块大小(maximum SQ coalesced block size,MSCB)满足第一阈值。
当前时刻一个I/O提交队列中待发送的M个SQE的总大小达到第一阈值时,主机可以将M个SQE同时作为同一个PCIe报文的净荷数据,其中,M大于或等于2。例如,当前时刻归属于I/O提交队列1中3个SQE(SQE1、SQE2和SQE3)处于待发送状态,3个SQE总大小为190字节,第一阈值为180字节,则主机可以将3个SQE同时作为同一个PCIe报文的净荷数据发送给NVMe控制器。
条件二:I/O提交队列聚合定时器SQT记录时长满足第二阈值。
当一个I/O提交队列中任意一个待发送的SQE的等待时长达到第二阈值时,主机可以将等待时长大于或等于第二阈值的待发送的SQE同时作为同一个PCIe报文的净荷数据发送给NVMe控制器。例如,当前时刻归属于I/O提交队列1的2个SQE(SQE1和SQE2)等待发送时长为60s,而第二阈值为50s,则主机可以将SQE1和SQE2作为同一个PCIe报文的净荷数据发送给NVMe控制器。
进一步地,当主机向NVMe控制器发送的PCIe报文中包括多个SQE时,主机是按照步骤S309中接收数据操作请求的顺序,将待聚合的多个SQE按照顺序排列,共同作为PCIe报文的净荷数据。以保证NVMe控制器在接收到该PCIe报文时,可以按照数据操作请求的接收顺序将其存储至目标I/O提交队列,再逐一获取并执行每个SQE。
S312、主机向NVMe控制器发送第一PCIe报文,第一PCIe报文中包括目标I/O提交队列的入口信息和至少一个SQE。
主机和NVMe控制器基于PCIe总线通信,第一PCIe报文具体为TLP报文。主机可以通过推送的方式,将一个或多个SQE通过同一个PCIe报文发送至NVMe控制器。
值得说明的是,主机和NVMe控制器之间通信的PCIe报文中除图4A所示PCIe地址结构外,还包括NVMe报文头和净荷数据(图4未示出),NVMe报文头用于记录该报文在经过NVMe协议层处理时所添加的字段,净荷数据则是用于携带一个或多个SQE。
作为一个可能的实施例,主机可以获知每个I/O提交队列的最大槽位数量,例如,主机周期性通过查询请求向NVMe控制器获知每个I/O提交队列的最大槽位数量,以及可用槽位数量。主机中增加SQE发送数量的计数器,该计数器用于记录主机向NVMe控制器发送的各个I/O提交队列的SQE的数量。其中,计数器可以设置为一个或多个,当仅存在一个计数器时,该计数器用于记录主机向存储系统中所有I/O提交队列中每个I/O提交队列发送待执行的SQE的数量。当存在多个计数器时,每个计数器可以用于记录主机向一个或多个I/O提交队列中每个I/O提交队列已发送的SQE的数量。对于同一I/O提交队列,当计数器中记录的主机已发送的SQE的数量达到I/O提交队列最大槽位数量时,主机可以向NVMe控制器发送查询请求,确定当前时刻I/O提交队列是否存在空闲槽位,即NVMe控制器是否已读取I/O提交队列中SQE。当接收到NVMe控制器返回的I/O提交队列存在空闲槽位响应时,主机再向NVMe控制器发送携带新的SQE的PCIe报文,避免主机连续发送的SQE的数量超过I/O提交队列空闲槽位的数量导致存储失败的问题。
通过上述流控方法,主机侧的SQE发送数量的计数器可以记录和控制主机向NVMe控制器发送SQE的数量,实现对NVMe控制器数据处理过程的流量控制,避免主机频繁发送较多SQE导致NVMe控制器存储失败的问题,提升SQE的存储成功率。
S313、NVMe控制器根据目标I/O提交队列的入口信息将第一PCIe报文中SQE存储至目标I/O提交队列。
NVMe控制器接收第一PCIe报文时,可以根据目标I/O提交队列的入口信息确定目标I/O提交队列在NVMe控制器的存储器中地址信息,再根据该目标I/O提交队列在NVMe控制器的存储器中地址信息,将第一PCIe报文中至少一个SQE存储至目标I/O提交队列。具体可以按照步骤S3131至步骤3133的操作步骤存储SQE:
S3131、NVMe控制器根据目标I/O提交队列的入口信息确定目标I/O提交队列标识。
NVMe控制器根据第一PCIe报文中PCIe地址结构中目标I/O提交队列的入口信息确定目标I/O提交队列的标识。具体利用如下公式1计算目标I/O提交队列的标识:
地址空间中划分的用于标识各个I/O提交队列的连续的地址空间的起始地址。示例地,如图4A所示,ADD
12为第一地址空间的起始地址。利用上述公式1可以确认目标I/O提交队列的标识。例如,若目标I/O提交队列的第一PCIe地址为(base1+MCS*64*2),第一地址空间的起始地址为base1,则利用公式1计算获得目标I/O提交队列标识为2。
S3132、NVMe控制器根据第一PCIe报文中净荷数据大小确定第一PCIe报文中SQE的数量,并确定SQE在目标I/O提交队列中的存储位置。
NVMe控制器接收到第一PCIe报文后,首先,解析第一PCIe报文内容,获取其中净荷数据。然后,计算净荷数据中携带SQE的数量。NVMe控制器可以根据以下公式2计算SQE的数量:
SQE的数量=第一PCIe报文中净荷数据大小/64
NVMe控制器在确定第一PCIe报文中携带的SQE的数量后,还需要进一步确定SQE的存储位置,也就是根据目标I/O提交队列的标识确定在目标I/O提交队列中存储SQE的槽位。具体地,NVMe控制器记录有各个I/O提交队列在第二存储器中位置信息,并且,可以根据当前时刻目标I/O提交队列的尾指针指示的位置信息确定下一个可用槽位的位置,进而,将第一PCIe报文携带的至少一个SQE存储至目标I/O提交队列。
S3133、NVMe控制器根据确定的目标I/O提交队列的存储位置分别将至少一个SQE存储至目标I/O提交队列。
当NVMe控制器确定目标I/O提交队列的标识、第一PCIe报文中SQE数量时,可以根据目标I/O提交队列的尾指针指示的位置逐一将SQE按照预设顺序存储至空闲槽位。
进一步地,NVMe控制器在同一时刻,可能有多个线程或进程同时向同一个I/O提交队列存储SQE,在将至少一个SQE存储至目标I/O提交队列的槽位时,需要通过加锁操作避免对同一个槽位的多次写入SQE操作导致的数据不一致问题。传统的技术方 案中,加锁操作的范围包括预留I/O提交队列槽位、复制SQE至I/O提交队列的槽位、以及更新提交队列尾门铃寄存器,最后再释放I/O提交队列的写权限。而本申请实施例中,由于SQE是通过推送的方式发给至NVMe控制器,而且取消了门铃机制,所以不需要在加锁范围内操作。加锁操作的范围只包括NVMe控制器预留I/O提交队列槽位的过程。进而减少了加锁操作的范围和耗时。
通过上述步骤S3131至步骤S3133,NVMe控制器可以基于PCIe报文中I/O提交队列的入口信息,将至少一个SQE存储至目标I/O提交队列。主机和NVMe控制器之间不需要通过门铃机制进行通信,降低了NVMe数据处理过程的复杂性。另外,主机可以采用聚合的方式,利用同一个PCIe报文一次推送多个SQE,减少了主机和NVMe控制器之间通信消息的个数,提升了数据处理效率。而且,主机可以直接从其存储器读取数据操作请求,进一步提升了数据处理效率。进一步地,本发明实施例还简化了数据处理过程中加锁操作的范围,NVMe控制器只需要对SQE存储至所确定的空闲槽位的过程加锁,解决了传统的加锁处理过程复杂、耗时长的问题,减少了加锁时长和数据处理时长。
作为一种可能的实施例,主机除了利用如图4A所示的一段连续的第一地址空间为多个I/O提交队列分配PCIe地址外,也可以利用基地址寄存器中不连续的若干个存储区间表示各个I/O提交队列。主机和NVMe控制器中记录有每个I/O提交队列所分配的地址和I/O提交队列的标识的映射关系,主机可以根据该映射关系向NVMe控制器发送第一PCIe报文,第一PCIe报文中包括主机为目标I/O提交队列所分配的PCIe地址,该PCIe地址作为目标I/O提交队列的入口信息。NVMe控制器则可以解析第一PCIe报文,获取主机为目标I/O提交队列所分配的PCIe地址字段,再根据该映射关系确定该PCIe地址字段所对应的I/O提交队列的标识,然后,根据该I/O提交队列的标识确定该I/O提交队列在NVMe控制器的存储器中存储位置,再将第一PCIe报文中至少一个SQE存储至目标I/O提交队列。以此完成NVMe控制器基于目标I/O提交队列的入口信息存储至少一个SQE的过程,上述过程也可以取消传统技术中门铃机制,简化数据处理过程。而且,主机可以通过推送的方式,通过图3所示方法中SQE聚合方式,利用同一个PCIe报文向NVMe控制器发送多个SQE,减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一种可能的实施例,主机可寻址的PCIe地址空间包括主机的存储器的地址空间和主机可寻址的PCIe基地址寄存器的地址空间。除了上述步骤中,利用基地址寄存器的地址空间映射各个I/O提交队列的标识外,本发明实施例还可以利用主机的存储器地址空间中的地址为各个I/O提交队列分配PCIE地址,每个I/O提交队列的标识对应唯一的PCIe地址。此时,主机中均存储有存储器地址空间中地址和I/O提交队列的标识的映射关系,主机可以根据该映射关系向NVMe控制器发送携带至少一个SQE的PCIe报文,NVMe控制器也可以根据该映射关系根据PCIe报文中携带的映射地址确定其对应的I/O提交队列的标识,并根据该I/O提交队列的标识确定该I/O提交队列在NVMe控制器中存储位置,进而将至少一个SQE存储至目标I/O提交队列。通过上述方法,也可以取消传统技术中门铃机制,简化数据处理过程。再结合图3所示的SQE 聚合方式,利用同一个PCIe报文向NVMe控制器发送多个SQE,也可以减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一个可能的实施例,除了上述基于PCIe协议利用PCIe地址标识I/O提交队列外,主机和NVMe控制器也可以按照预先约定,利用PCIe报文中指定字段,或者净荷数据的一部分传输目标I/O提交队列的标识给NVMe控制器,NVMe控制器再解析PCIe报文,获取上述指定字段(例如,PCIe报文中预留字段,或净荷数据起始位),并按照预先约定确定该指定字段所表示的I/O提交队列的标识。也可以取消传统技术中门铃机制,简化数据处理过程。
接下来,进一步介绍本发明实施例中,NVMe控制器在存储SQE至目标I/O提交队列后,如何获取并执行SQE中数据操作请求,以及如何将数据操作请求的操作结果再基于I/O完成队列的入口信息发送给主机的过程。
S314、NVMe控制器读取并执行SQE中数据操作请求,生成操作结果。
在至少一个SQE存储至目标I/O提交队列后,NVMe控制器可以逐一读取SQE中数据操作请求,并执行该数据操作请求,生成操作结果。可选地,操作结果中包括I/O提交队列的标识和数据操作请求的操作结果。
S315、(可选)当满足CQE聚合条件时,将至少两个操作结果作为第二PCIe报文的净荷数据。
本发明实施例中,仍由主机的存储器实现I/O完成队列的数据存储。对于I/O完成队列创建的过程不构成对本发明实施例的限定。
一个CQE对应一个操作结果,每个操作结果用于指示一个CQE中数据操作请求的操作结果。与步骤S311中主机聚合多个SQE类似,NVMe控制器在完成多个SQE的数据操作请求的操作后,会基于I/O完成队列的入口信息将至少一个操作结果发送给主机,主机再基于I/O完成队列的入口信息将操作结果存储至目标I/O完成队列。
其中,CQE聚合条件包括以下条件中至少一种:
条件一:最大可CQ聚合块大小(maximum CQ coalesced block size,MCCB)满足第三阈值。
当当前时刻同一个I/O完成队列中待发送的N个SQE的大小大于或等于第三阈值时,NVMe控制器可以将N个CQE同时作为同一个PCIe报文的净荷数据发送至主机,其中,N大于或等于2。例如,当前时刻归属于I/O完成队列1中3个CQE(CQE1、CQE2和CQE3)处于待发送状态,3个CQE的大小为190字节,第三阈值为180字节,则NVMe控制器可以将3个CQE同时作为同一个PCIe报文的净荷数据发送给主机。
条件二:I/O完成队列聚合定时器CQT记录时长满足第四阈值。
当一个I/O完成队列中任意一个待发送的CQE的等待时长大于或等于第四阈值时,NVMe控制器可以将等待时长大于或等于第四阈值的至少两个CQE同时作为同一个PCIe报文的净荷数据发送给主机。例如,当前时刻归属于I/O完成队列1的2个CQE(CQE1和CQE2)等待发送时长已达到60s,第四阈值为45s,则NVMe控制器可以将两个CQE作为同一个PCIe报文的净荷数据发送给主机。
S316、NVMe控制器向主机发送第二PCIe报文,第二PCIe报文中携带目标I/O完 成队列的入口信息和至少一个CQE。
每个I/O提交队列会对应一个I/O完成队列,NVMe控制器中存储有I/O提交队列和I/O完成队列的对应关系,NVMe控制器可以根据I/O提交队列和I/O完成队列的对应关系确定目标I/O提交队列对应的目标I/O完成队列,具体为NVMe控制器确定目标I/O完成队列的PCIe地址,再向主机发送携带目标I/O完成队列的入口信息和至少一个CQE的第二PCIe报文。
具体地,图4B为本发明实施例提供的一种主机在基地址寄存器的地址空间中为I/O完成队列分配PCIe地址的示意图。如图所示,基地址寄存器Y是主机可寻址的地址空间,基地址寄存器Y的基地址为基地址200,主机划分一段连续的地址空间第二地址空间,再在第二地址空间中为多个I/O完成队列分配PCIe地址,主机为每个I/O完成队列分配唯一的PCIe地址。其中,第二地址空间的起始地址为(基地址200+偏移地址200)。主机为每个I/O完成队列分配PCIe地址的过程,也可以理解为主机将基地址寄存器中一段连续的地址映射给该I/O完成队列。例如,将基地址寄存器Y中(基地址200+偏移地址200)至(基地址200+偏移地址200+MCC*64)分配给I/O完成队列0,此时,I/O完成队列0的PCIe地址为(基地址200+偏移地址200)。将基地址寄存器Y中(基地址200+偏移地址200+MCC*64)至(基地址200+偏移地址200+MCC*64*2)分配给I/O完成队列1,此时,I/O完成队列1的PCIe地址为(基地址200+偏移地址200+MCC*1)。依此类推,将基地址寄存器Y中(基地址200+偏移地址200+MCC*64*M)至(基地址200+偏移地址200+MCC*64*(M+1))分配给I/O完成队列M,此时,I/O完成队列M的PCIe地址为(基地址200+偏移地址200+MCC*64*M)。
值得说明的是,图4B中第二地址空间是基地址寄存器Y中一段地址空间,第二地址空间的起始地址可以为基地址寄存器Y的基地址,也可以是从基地址寄存器中(基地址+偏移地址),本发明的以下实施例以图4B所示的第二地址空间的起始地址为(基地址+偏移地址)为例进行描述。
为了便于后续描述,在本发明实施例的以下描述中,将目标I/O完成队列的PCIe地址记为第二PCIe地址。目标I/O完成队列的第二PCIe地址可以表示为:
目标I/O完成队列的PCIe地址=(BAR的基地址+偏移地址)+MCC*64*目标I/O完成队列的标识
可选地,主机为各个I/O完成队列的标识分配的PCIe地址,可以在创建完I/O完成队列后通知NVMe控制器,NVMe控制器存储有I/O完成队列的标识和每个I/O完成队列所分配的PCIe地址的映射关系。
值得说明的是,主机和NVMe控制器之间通信的PCIe报文中除图4B所示PCIe地址结构外,还包括NVMe报文头和净荷数据(图4B未示出),NVMe报文头用于记录该报文在经过NVMe协议层处理时所添加的字段,净荷数据则是用于携带一个或多个SQE请求。
S317、主机根据目标目标I/O完成队列的标识的入口信息将至少一个CQE存储至目标I/O完成队列。
主机获取并解析第二PCIe报文,根据目标I/O完成队列的入口信息确定目标I/O完成队列在主机的存储器中地址信息,然后,将至少一个CQE存储至目标I/O完成队 列。具体过程参见如下步骤S3171至步骤S3173。
S3171、主机根据目标I/O完成队列的入口信息确定目标I/O完成队列的标识。
具体利用如下公式3计算目标I/O完成队列的入口标识:
其中,ADD
21为目标I/O完成队列的第二PCIe地址。ADD
22为主机在可寻址的PCIe地址空间中划分的用于标识各个I/O完成队列的连续的地址空间的起始地址,例如,图4B中第二地址空间的起始地址。
示例性地,若目标I/O完成队列的PCIe地址为(base21+offset21+MCC*64*2),目标I/O完成队列的PCIe的起始地址为(base21+offset21),则按照公式3计算获得目标I/O完成队列的标识为2。经过上述公式3的计算,NVMe控制器可以确定目标I/O完成队列标识。
S3172、主机根据第二PCIe报文中净荷数据大小确定CQE的数量,并确定CQE在目标I/O完成队列中存储位置。
S3173、主机根据已确定的目标I/O完成队列的存储位置将至少一个CQE存储至目标I/O完成队列。
主机接收到第二PCIe报文后,解析报文内容,获取其中净荷数据。并计算净荷数据中携带CQE的数量。具体可以根据以下公式4计算CQE的数量:
CQE的数量=第二报文中净荷数据大小/64
主机在确定第二PCIe报文中携带CQE的数量后,还需要进一步确定CQE的存储位置,也就是确定目标I/O完成队列中存储CQE的槽位。具体地,主机记录有各个I/O完成队列在第一存储器中位置信息,并且,可以根据当前时刻目标I/O完成队列的尾指针指示的位置信息确定下一个可用槽位的位置,进而,将第二PCIe报文携带的至少一个CQE存储至目标I/O完成队列。
通过上述过程的描述,主机可以利用基于I/O提交队列的入口信息的方式将数据操作请求存储至目标I/O提交队列,NVMe控制器则通过PCIe地址结构中目标I/O提交队列的入口标识可以直接将同一个PCIe报文中携带的一个或多个SQE存储至目标I/O提交队列。与之类似地,NVMe控制器也可以利用基于I/O完成队列的入口信息的方式将数据操作请求的操作结果存储至目标I/O完成队列。相比于传统的数据处理过程,本申请提供的技术方案取消了门铃和中断机制,利用基于入口信息的方式存储SQE和CQE,简化了数据处理过程。另一方面,由于主机或NVMe控制器可以采用聚合方式将多个报文共同作为PCIe的净荷数据,主机或NVMe控制器能够一次推送多个数据操作请求或操作结果,减少了主机和NVMe控制器之间通信报文的数量,提高了二者通信效率。而且,本申请实施例中简化了NVMe控制器将SQE存储至目标I/O完成队列的加锁范围,进一步简化了传统的NVMe数据处理过程,减少了加锁时长和数据处理时长。应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
作为一种可能的实施例,主机除了利用如图4B所示的一段连续的第一地址空间为多个I/O完成队列分配PCIe地址外,也可以利用基地址寄存器中不连续的若干个存储区间表示各个I/O完成队列。此时,主机和NVMe控制器中记录有每个I/O完成队列所分配的地址和I/O完成队列的标识的映射关系,NVMe控制器可以根据该映射关系向主机发送第二PCIe报文,此时,第二PCIe报文中包括主机为目标I/O完成队列所分配的PCIe地址,该PCIe地址作为目标I/O完成队列的入口信息。主机则可以解析第二PCIe报文获取目标I/O提交队列所分配的PCIe地址字段,再根据该映射关系确定该PCIe地址字段所对应的I/O完成队列的标识,然后,根据该I/O完成队列的标识将第二PCIe报文中至少一个CQE存储至目标I/O完成队列。以此完成主机器基于目标I/O完成队列的入口信息存储至少一个CQE的过程,上述过程也可以取消传统技术中中断机制,简化数据处理过程。而且,NVMe控制器可以通过推送的方式,通过图3所示方法中CQE聚合方式,利用同一个PCIe报文向主机发送多个CQE,减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一种可能的实施例,主机可寻址的PCIe地址空间包括主机的存储器的地址空间和主机中PCIe基地址寄存器的地址空间。除了上述步骤中,利用基地址寄存器的地址空间映射各个I/O完成队列的标识外,本发明实施例还可以利用主机的存储器地址空间中的地址标识各个I/O完成队列,每个I/O完成队列的标识对应唯一的地址。此时,主机和NVMe控制器中存储有存储器地址空间中地址和I/O完成队列的标识的映射关系,NVMe控制器可以根据该映射关系向主机发送携带至少一个CQE的PCIe报文,主机也可以根据该映射关系根据PCIe报文中携带的映射地址确定其对应的I/O完成队列的标识,进而将至少一个CQE存储至目标I/O完成队列。通过上述方法,也可以取消传统技术中中断机制,简化数据处理过程。再结合图3所示的CQE聚合方式,利用同一个PCIe报文向主机发送多个CQE,也可以减少主机和NVMe控制器之间消息报文的数量,提升了数据处理效率。
作为另一种可能的实现方式,除了上述基于PCIe标准利用地址标识目标I/O完成队列的标识外,NVMe控制器也可以按照预先约定,利用PCIe报文中指定字段,或者净荷数据的一部分传输目标I/O完成队列的标识给主机,主机再按照预先约定利用上述指定字段(例如,PCIe报文中预留字段,或净荷数据起始位)解析PCIe报文,获取该PCIe报文中携带的目标I/O完成队列的标识,并将操作结果存储至I/O完成队列。
作为另一种可能的实现方式,主机中记录有I/O提交队列和I/O完成队列的对应关系。相应地,在步骤S316中,第二报文也可以直接携带目标I/O提交队列的标识,主机在接收第二报文后可以获取目标I/O提交队列的标识,然后根据I/O提交队列和I/O完成队列的对应关系确定目标I/O完成队列,再将第二报文中携带的至少一个CQE存储至目标I/O完成队列。
作为另一种可能的实现方式,步骤S310至步骤S313中基于I/O提交队列的入口信息存储SQE的技术方案,以及步骤S314至步骤S317中基于I/O完成队列的入口信息存储CQE的技术方案,可以和现有技术中门铃机制和中断机制混合使用。示例地,主机向NVMe控制器发送SQE时,可以采用步骤S310至步骤S313中基于I/O提交队列的入口信息存储SQE的技术方案。NVMe控制器向主机发送CQE时,仍采用传统技术方 案中的中断机制,由NVMe控制器向主机发送中断信号,然后再通过直接内存存取(direct memory access,DMA)写的方式将CQE发送给主机,由主机将CQE存储至目标I/O完成队列。或者,主机向NVMe控制器发送SQE时采用传统的门铃机制,NVMe控制器向主机发送CQE时采用步骤S314至步骤S317中基于I/O完成队列的入口信息存储CQE的技术方案。上述方法在一定程度上也可以简化数据处理过程,提升数据处理效率。
值得说明的是,对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明所必须的。
本领域的技术人员根据以上描述的内容,能够想到的其他合理的步骤组合,也属于本发明的保护范围内。其次,本领域技术人员也应该熟悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明所必须的。
上文中结合图1至图4B,详细描述了根据本发明实施例所提供的数据处理的方法,下面将结合图5至图7,描述根据本发明实施例所提供的数据处理的NVMe控制器、主机和存储系统。
图5为本发明实施例提供的一种NVMe控制器500的结构示意图。如图所示,NVMe控制器500包括接收单元501、处理单元502;
所述接收单元501,用于接收所述主机发送的第一PCIe报文,其中,所述NVMe控制器的存储器中设置有至少一个I/O提交队列,所述第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE,一个SQE对应一个数据操作请求,每个数据操作请求用于对所述NVMe管理的存储介质进行读操作或写操作;
所述处理单元502,用于根据所述目标I/O提交队列的入口信息,将所述至少一个SQE保存至所述目标I/O提交队列。
可选地,所述目标I/O提交队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第一PCIe地址;则所述处理单元502,还用于根据所述第一PCIe地址确定第二地址,所述第二地址为所述NVMe控制器500的存储器中存储所述目标I/O提交队列的地址;根据所述第二地址将所述至少一个SQE存储至所述目标I/O提交队列。
可选地,所述处理单元502,还用于根据所述目标I/O提交队列的第一PCIe地址确定所述目标I/O提交队列的标识,根据所述目标I/O提交队列的标识确定所述第二地址。
可选地,所述处理单元502,还用于根据以下公式计算所述目标I/O提交队列的标识:
其中,所述ADD
11为所述目标I/O提交队列的第一PCIe地址,所述ADD
21为所述主机在可寻址的PCIe地址空间中划分的用于标识各个I/O提交队列的连续的地址空间的起始地址,所述MCS为每个I/O提交队列中最大可聚合的SQE数量。
可选地,所述处理单元502,还用于在接收单元501接收所述主机的创建指示之前,与所述主机协商每个I/O提交队列中最大可聚合的SQE数量MCS,所述协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。
可选地,所述第一PCIe报文中还包括所述目标I/O提交队列的深度信息M,所述M指示所述第一PCIe报文中携带的SQE的数量,1≤M≤MCS;所述处理单元502,还用于确定M个SQE的预设顺序,根据所述M个SQE的预设顺序将所述M个SQE保存至所述目标I/O提交队列。
可选地,所述主机的存储器中设置有至少一个I/O完成队列,所述NVMe控制器还包括发送单元503;
所述处理单元502,还用于从所述目标I/O提交队列中获取所述至少一个SQE,并根据所述至少一个SQE中携带的所述数据操作请求在所述NVME控制器管理的存储介质进行读取或写入操作;
所述发送单元503,用于向所述主机发送第二PCIe报文,所述第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,每个CQE为所述NVMe控制器执行每个SQE中携带的所述数据操作请求的操作结果。
可选地,所述处理单元502,还用于当满足CQE聚合条件时,所述NVMe控制器将至少两个数据操作请求的操作结果作为所述第二PCIe报文的净荷数据,一个CQE对应一个数据操作请求的操作结果;其中,所述CQE聚合条件包括最大可聚合CQE大小MCCB满足第一阈值或I/O完成队列聚合定时器CQT记录时间满足第二阈值。
可选地,所述第二报文还包括目标I/O完成队列的深度信息N,所述N指示所述第二PCIe报文中携带的CQE的数量;所述处理单元502,还用于按照预设顺序将N个CQE作为所述第二PCIe报文中净荷数据,其中1≤N≤MCC,所述MCC是正整数。
可选地,所述主机为嵌入式处理器,所述嵌入式处理器支持发送容量为至少64字节的PCIE报文。
应理解的是,本发明实施例的NVMe控制器500可以通过专用集成电路(application specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。通过软件实现图2和图3所示的数据处理方法时,NVMe控制器500及其各个模块也可以为软件模块。
根据本发明实施例的NVMe控制器500可对应于执行本发明实施例中描述的方法,并且NVMe控制器500中的各个单元的上述和其它操作和/或功能分别为了实现图2至图3中的各个方法中NVMe控制器执行的的相应流程,为了简洁,在此不再赘述。
图6为本发明实施例提供的一种主机600的结构示意图。如图所示,主机600包括处理单元601、发送单元602和接收单元603;其中,
所述处理单元601,用于根据待发送的数据操作请求的目标输入输出I/O提交队 列的标识,确定所述目标I/O提交队列的入口信息,其中,所述NVMe控制器的存储器中设置有至少一个I/O提交队列;
所述发送单元602,用于向所述NVMe控制器发送第一PCIe报文,其中,所述第一PCIe报文包括所述目标I/O提交队列的入口信息和至少一个提交队列条目SQE,一个SQE对应一个数据操作请求,每个数据操作请求用于对所述NVMe控制器管理的存储介质进行读操作或写操作。
可选地,所述处理单元601,还用于为每个I/O提交队列分配一个在所述主机可寻址的PCIe地址空间中唯一的第一PCIe地址,根据所述目标I/O提交队列的标识,确定所述目标I/O提交队列的第一PCIe地址,并将所述目标I/O提交队列的第一PCIe地址作为所述目标I/O提交队列的入口信息。
可选地,所述发送单元602,还用于向所述NVMe控制器发送创建指示,所述创建指示用于指示所述NVMe控制器在所述NVMe控制器的存储器中设置所述至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列的第一PCIe地址之间的关联关系。
可选地,所述处理单元601,在所述发送单元602向所述NVMe控制器发送创建指示之前,与所述NVMe控制器协商每个I/O提交队列中最大可聚合的SQE数量MCS,所述协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。
可选地,所述第一PCIe报文中还包括所述目标I/O提交队列的深度信息M,所述M指示所述第一PCIe报文中携带的SQE的数量,1≤M≤MCS。
可选地,所述处理单元601,还用于当满足SQE聚合条件时,所述主机将至少两个数据操作请求作为第一PCIe报文的净荷数据,一个数据操作请求对应一个SQE,每个数据操作请求用于对所述NVMe管理的存储介质进行读操作或写操作;其中,所述SQE聚合条件包括最大可聚合SQE大小MCSB满足第三阈值或I/O提交队列聚合定时器SQT记录时长满足第四阈值。
可选地,所述主机600还包括接收单元603,用于接收所述NVMe控制器发送的第二PCIe报文,所述第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,每个CQE为所述NVMe控制器执行每个SQE中携带的所述数据操作请求的操作结果;
所述处理单元601,还用于根据所述目标I/O完成队列的入口信息将所述至少一个完成队列条目SQE存储至目标I/O完成队列。
可选地,所述目标I/O完成队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第二PCIe地址;
所述处理单元601,还用于根据所述第二PCIe地址确定第三地址,所述第三地址为所述主机的存储器中存储所述目标I/O完成队列的地址;以及根据所述第三地址将所述至少一个CQE存储至所述目标I/O完成队列。
可选地,所述处理单元601根据所述第二PCIe地址确定第三地址,包括:根据所述第二PCIe地址确定所述目标I/O完成队列的标识,根据所述目标I/O完成队列的标识确定所述第三地址。
可选地,所述处理单元601,还用于根据以下公式计算所述目标I/O完成队列的标识:
其中,所述ADD
12为所述目标I/O完成交队列的第二PCIe地址,所述ADD
22为在所述主机可寻址的PCIe地址空间中划分的用于标识各个I/O完成队列的连续的地址空间的的起始地址,所述MCC为每个I/O完成队列中最大可聚合的CQE数量;
所述处理单元601,还用于根据所述目标I/O完成队列的标识将所述至少一个CQE存储至主机的存储器。
可选地,主机为嵌入式处理器,所述嵌入式处理器支持发送容量为至少64字节的PCIE报文。
应理解的是,本发明实施例的主机600可以通过专用集成电路(application specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。通过软件实现图2和图3所示的数据处理方法时,主机600及其各个模块也可以为软件模块。
根据本发明实施例的主机600可对应于执行本发明实施例中描述的方法,并且主机600中的各个单元的上述和其它操作和/或功能分别为了实现图2至图3中的各个方法中主机执行的的相应流程,为了简洁,在此不再赘述。
图7为本发明实施例提供的一种存储系统700的示意图,如图所示,所述存储系统700包括主机701、NVMe控制器702、第一存储器703、第二存储器704和总线705。其中,主机701、NVMe控制器702、第一存储器703、第二存储器704通过总线705进行通信,也可以通过无线传输等其他手段实现通信。该第一存储器703用于主机701实现I/O完成队列的数据存储。第二存储器704用于NVMe控制器702实现I/O提交队列的数据存储。
所述主机,用于向所述NVMe控制器发送第一PCIe报文;第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE;
所述NVMe控制器,用于接收所述主机发送的所述第一PCIe报文;根据所述目标I/O提交队列的入口信息,将所述至少一个SQE保存至所述目标I/O提交队列;其中,所述NVMe控制器的存储器中设置有至少一个输入输出I/O提交队列。
可选地,所述主机701,还用于为每个I/O提交队列分配一个在所述主机可寻址的PCIe地址空间中唯一的第一PCIe地址,所述目标I/O提交队列的入口信息为所述目标I/O提交队列的第一PCIe地址;
所述NVMe控制器702,还用于根据所述所述第一PCIe地址确定第二地址,所述第二地址为所述NVMe控制器的存储器中存储所述目标I/O提交队列的地址;根据所述第二地址将所述至少一个SQE存储至所述目标I/O提交队列。
可选地,所述NVMe控制器702,还用于根据所述目标I/O提交队列的第一PCIe 地址确定所述目标I/O提交队列的标识,根据所述目标I/O提交队列的标识确定所述第二地址。
可选地,所述NVMe控制器702,还用于根据以下公式计算所述目标I/O提交队列的标识:
其中,所述ADD
11为所述目标I/O提交队列的第一PCIe地址,所述ADD
21为所述主机在可寻址的PCIe地址空间中划分的用于标识各个I/O提交队列的连续的地址空间的起始地址,所述MCS为每个I/O提交队列中最大可聚合的SQE数量。
可选地,所述NVMe控制器702,还用于在接收所述主机701的第一PCIe报文之前,接收所述主机701的创建指示,根据所述创建指示在所述NVMe控制器702的第二存储器704中设置所述至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列在所述NVMe控制器的存储器中地址信息之间的关联关系。
可选地,所述主机701,用于与所述NVMe控制器702协商每个I/O提交队列中最大可聚合的SQE数量MCS,所述协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。
可选地,所述第一PCIe报文中还包括所述目标I/O提交队列的深度信息M,所述M指示所述第一PCIe报文中携带的SQE的数量,1≤M≤MCS;
所述NVMe控制器702,还用于确定M个SQE的预设顺序,根据所述M个SQE的预设顺序将所述M个SQE保存至所述目标I/O提交队列。
可选地,所述主机701的第一存储器中设置有至少一个I/O完成队列;
所述NVMe控制器702,还用于从所述目标I/O提交队列中获取所述至少一个SQE,并根据所述至少一个SQE中携带的所述数据操作请求在所述NVME控制器702管理的存储介质进行读取或写入操作;向所述主机701发送第二PCIe报文,所述第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,一个CQE对应一个数据操作请求的操作结果;
所述主机701,还用于接收所述NVMe控制器702发送的第二PCIe报文;根据所述目标I/O完成队列的入口信息将所述至少一个完成队列条目SQE存储至目标I/O完成队列。
可选地,所述目标I/O完成队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第二PCIe地址;
所述主机701,还用于根据所述第二PCIe地址确定第三地址,所述第三地址为所述主机的存储器中存储所述目标I/O完成队列的地址;根据所述第三地址将所述至少一个CQE存储至所述目标I/O完成队列。
可选地,所述主机701,还用于根据所述第二PCIe地址确定所述目标I/O完成队列的标识,根据所述目标I/O完成队列的标识确定所述第三地址。
可选地,所述主机701,还用于根据以下公式计算所述目标I/O完成队列的标识:
其中,所述ADD
12为所述目标I/O完成交队列的第二PCIe地址,所述ADD
22为主机在可寻址的PCIe地址空间中划分的用于标识各个I/O完成队列的连续的地址空间的起始地址,所述MCC为每个I/O完成队列中最大可聚合的CQE数量;
所述主机701,还用于根据所述目标I/O完成队列的标识将所述至少一个CQE存储至主机的存储器。
可选地,所述NVMe控制器702,还用于当满足CQE聚合条件时,将至少两个数据操作请求的操作结果作为所述第二PCIe报文的净荷数据,一个CQE对应一个数据操作请求的操作结果;
其中,所述CQE聚合条件包括最大可聚合CQE大小MCCB满足第三阈值或I/O完成队列聚合定时器CQT记录时长满足第四阈值。
可选地,所述第二报文还包括目标I/O完成队列的深度信息N,所述N指示所述第二PCIe报文中携带的CQE的数量;
所述NVMe控制器702,还用于按照预设顺序将N个CQE作为所述第二PCIe报文中净荷数据,其中1≤N≤MCC,所述MCC是正整数。
可选地,所述主机701为嵌入式处理器,所述嵌入式处理器支持发送容量为至少64字节的PCIE报文。
应理解,在本发明实施例中,该主机701可以是CPU,该主机701还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
该第一存储器703可以包括只读存储器和随机存取存储器。还可以包括非易失性随机存取存储器。该第二存储器704可以包括只读存储器和随机存取存储器。还可以包括非易失性随机存取存储器。
该总线705除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线705。
应理解,根据本发明实施例的存储系统700对应于本发明实施例提供的图1所示的存储系统100,该存储设备700用于实现图2和图4中所示方法的相应流程,为了简洁,在此不再赘述。
应理解,根据本发明实施例的数据处理的存储系统700可对应于本发明实施例中的数据处理的NVMe控制器500和主机600,并可以对应于执行根据本发明实施例中图2和图3所示的方法中的相应主体,并且存储系统700中的各个模块的上述和其它操作和/或功能分别为了实现图2至图3中的各个方法的相应流程,为了简洁,在此不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存在计 算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员根据本发明申请提供的具有实施例方式,可想到变化或替换。
Claims (36)
- 一种数据处理的方法,其特征在于,非易失性存储总线标准NVMe控制器与主机之间通过高速串行计算机扩展总线标准PCIe通信,所述方法包括:所述NVMe控制器接收所述主机发送的第一PCIe报文,其中,所述NVMe控制器的存储器中设置有至少一个输入输出I/O提交队列,所述第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE;所述NVMe控制器根据所述目标I/O提交队列的入口信息,将所述至少一个SQE保存至所述目标I/O提交队列。
- 根据权利要求1所述方法,其特征在于,所述目标I/O提交队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第一PCIe地址;则所述NVMe控制器根据所述目标I/O提交队列的入口信息,将所述至少一个SQE保存至所述目标I/O提交队列,包括:所述NVMe控制器根据所述第一PCIe地址确定第二地址,所述第二地址为所述NVMe控制器的存储器中存储所述目标I/O提交队列的地址;所述NVMe控制器根据所述第二地址将所述至少一个SQE存储至所述目标I/O提交队列。
- 根据权利要求2所述方法,其特征在于,所述NVMe控制器根据所述第一PCIe地址确定第二地址,包括:所述NVMe控制器根据所述目标I/O提交队列的第一PCIe地址确定所述目标I/O提交队列的标识,根据所述目标I/O提交队列的标识确定所述第二地址。
- 根据权利要求1至4中任一所述方法,其特征在于,在所述NVMe控制器接收所述主机的第一PCIe报文之前,所述方法还包括:所述NVMe控制器接收所述主机的创建指示,根据所述创建指示在所述NVMe控制器的存储器中设置所述至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列在所述NVMe控制器的存储器中地址信息之间的关联关系。
- 根据权利要求5所述方法,其特征在于,所述NVMe控制器接收所述主机的创建指示之前,还包括:所述NVMe控制器与所述主机协商每个I/O提交队列中最大可聚合的SQE数量MCS,所述协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。
- 根据权利要求6所述方法,其特征在于,所述第一PCIe报文中还包括所述目 标I/O提交队列的深度信息M,所述M指示所述第一PCIe报文中携带的SQE的数量,1≤M≤MCS;所述NVMe控制器将所述至少一个SQE保存至所述目标I/O提交队列,包括:所述NVMe控制器确定M个SQE的预设顺序,根据所述M个SQE的预设顺序将所述M个SQE保存至所述目标I/O提交队列。
- 根据权利要求1-7中任意一项所述方法,其特征在于,所述主机的存储器中设置有至少一个I/O完成队列,所述方法还包括:所述NVMe控制器从所述目标I/O提交队列中获取所述至少一个SQE,并根据所述至少一个SQE中携带的所述数据操作请求在所述NVME控制器管理的存储介质进行读取或写入操作;所述NVMe控制器向所述主机发送第二PCIe报文,所述第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,一个CQE对应一个所述数据操作请求的操作结果。
- 根据权利要求8所述方法,其特征在于,所述方法包括:当满足CQE聚合条件时,所述NVMe控制器将至少两个数据操作请求的操作结果作为所述第二PCIe报文的净荷数据,一个CQE对应一个数据操作请求的操作结果;其中,所述CQE聚合条件包括最大可聚合CQE大小MCCB满足第一阈值或I/O完成队列聚合定时器CQT记录时间满足第二阈值。
- 根据权利要求9所述方法,其特征在于,所述第二报文还包括目标I/O完成队列的深度信息N,所述N指示所述第二PCIe报文中携带的CQE的数量;所述NVMe控制器按照预设顺序将N个CQE作为所述第二PCIe报文中净荷数据,其中1≤N≤MCC,所述MCC是正整数。
- 根据权利要求1至10中任一所述方法,其特征在于,所述主机为嵌入式处理器,所述嵌入式处理器支持发送容量为至少64字节的PCIE报文。
- 一种数据处理的方法,其特征在于,非易失性存储总线标准NVMe控制器与主机之间通过高速串行计算机扩展总线标准PCIe总线通信,所述方法包括:所述主机根据待发送的数据操作请求的目标输入输出I/O提交队列的标识,确定所述目标I/O提交队列的入口信息,其中,所述NVMe控制器的存储器中设置有至少一个I/O提交队列;所述主机向所述NVMe控制器发送第一PCIe报文,其中,所述第一PCIe报文包括所述目标I/O提交队列的入口信息和至少一个提交队列条目SQE。
- 根据权利要求12所述方法,其特征在于,还包括:所述主机为每个I/O提交队列分配一个在所述主机可寻址的PCIe地址空间中唯一的第一PCIe地址;所述主机根据待发起的数据操作的目标I/O提交队列的标识,确定所述目标I/O提交队列的入口信息,包括:所述主机根据所述目标I/O提交队列的标识,确定所述目标I/O提交队列的第一PCIe地址,并将所述目标I/O提交队列的第一PCIe地址作为所述目标I/O提交队列的入口信息。
- 根据权利要求12或13所述方法,其特征在于,还包括:所述主机向所述NVMe控制器发送创建指示,所述创建指示用于指示所述NVMe控制器在所述NVMe控制器的存储器中设置所述至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列的第一PCIe地址之间的关联关系。
- 根据权利要求14所述方法,其特征在于,所述主机向所述NVMe控制器发送创建指示之前,还包括:所述主机与所述NVMe控制器协商每个I/O提交队列中最大可聚合的SQE数量MCS,所述协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中的较小者。
- 根据权利要求15所述方法,其特征在于,所述第一PCIe报文中还包括所述目标I/O提交队列的深度信息M,所述M指示所述第一PCIe报文中携带的SQE的数量,1≤M≤MCS。
- 根据权利要求12至16任一项所述方法,其特征在于,在发送所述第一PCIe报文之前,所述方法还包括:当满足SQE聚合条件时,所述主机将至少两个数据操作请求作为第一PCIe报文的净荷数据;其中,所述SQE聚合条件包括最大可聚合SQE大小MCSB满足第三阈值或I/O提交队列聚合定时器SQT记录时长满足第四阈值。
- 根据权利要求12至17任一项所述方法,其特征在于,所述方法还包括:所述主机接收所述NVMe控制器发送的第二PCIe报文,所述第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE;所述主机根据所述目标I/O完成队列的入口信息将所述至少一个CQE存储至目标I/O完成队列。
- 根据权利要求12至18任一项所述方法,其特征在于,所述目标I/O完成队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第二PCIe地址;则所述主机根据所述目标I/O完成队列的入口信息将所述至少一个CQE存储至目标I/O完成队列,包括:所述主机根据所述第二PCIe地址确定第三地址,所述第三地址为所述主机的存储器中存储所述目标I/O完成队列的地址;所述主机根据所述第三地址将所述至少一个CQE存储至所述目标I/O完成队列。
- 根据权利要求19所述方法,其特征在于,所述主机根据所述第二PCIe地址确定第三地址,包括:所述主机根据所述第二PCIe地址确定所述目标I/O完成队列的标识,根据所述目标I/O完成队列的标识确定所述第三地址。
- 根据权利要求12至21中任一所述方法,其特征在于,所述主机为嵌入式处理器,所述嵌入式处理器支持发送容量为至少64字节的PCIE报文。
- 一种数据处理的系统,其特征在于,所述系统包括非易失性存储总线标准NVMe控制器和主机,所述主机和所述NVMe控制器之间通过高速串行计算机扩展PCIe总线通信;所述主机,用于向所述NVMe控制器发送第一PCIe报文;第一PCIe报文包括目标I/O提交队列的入口信息和至少一个提交队列条目SQE;所述NVMe控制器,用于接收所述主机发送的所述第一PCIe报文;根据所述目标I/O提交队列的入口信息,将所述至少一个SQE保存至所述目标I/O提交队列;其中,所述NVMe控制器的存储器中设置有至少一个输入输出I/O提交队列。
- 根据权利要求23所述的系统,其特征在于,所述主机,还用于为每个I/O提交队列分配一个在所述主机可寻址的PCIe地址空间中唯一的第一PCIe地址,所述目标I/O提交队列的入口信息为所述目标I/O提交队列的第一PCIe地址;所述NVMe控制器,还用于根据所述所述第一PCIe地址确定第二地址,所述第二地址为所述NVMe控制器的存储器中存储所述目标I/O提交队列的地址;根据所述第二地址将所述至少一个SQE存储至所述目标I/O提交队列。
- 根据权利要求24所述的系统,其特征在于,所述NVMe控制器,还用于根据所述目标I/O提交队列的第一PCIe地址确定所述目标I/O提交队列的标识,根据所述目标I/O提交队列的标识确定所述第二地址。
- 根据权利要求23至26中任一所述的系统,其特征在于,所述NVMe控制器,还用于在接收所述主机的第一PCIe报文之前,接收所述主机的创建指示,根据所述创建指示在所述NVMe控制器的存储器中设置所述至少一个I/O提交队列,并记录每个I/O提交队列的标识与每个I/O提交队列在所述NVMe控制器的存储器中地址信息之间的关联关系。
- 根据权利要求27所述的系统,其特征在于,所述主机,用于与所述NVMe控制器协商每个I/O提交队列中最大可聚合的SQE数量MCS,所述协商得到的MCS为所述NVMe控制器所支持的每个I/O提交队列中最大可聚合的SQE数量和所述主机所支持的每个I/O提交队列中最大可聚合的SQE数量中 的较小者。
- 根据权利要求28所述的系统,其特征在于,所述第一PCIe报文中还包括所述目标I/O提交队列的深度信息M,所述M指示所述第一PCIe报文中携带的SQE的数量,1≤M≤MCS;所述NVMe控制器,还用于确定M个SQE的预设顺序,根据所述M个SQE的预设顺序将所述M个SQE保存至所述目标I/O提交队列。
- 根据权利要求23至29中任一所述的系统,其特征在于,所述主机的存储器中设置有至少一个I/O完成队列;所述NVMe控制器,还用于从所述目标I/O提交队列中获取所述至少一个SQE,并根据所述至少一个SQE中携带的所述数据操作请求在所述NVME控制器管理的存储介质进行读取或写入操作;向所述主机发送第二PCIe报文,所述第二PCIe报文包括目标I/O完成队列的入口信息和至少一个完成队列条目CQE,一个CQE对应一个数据操作请求的操作结果;所述主机,还用于接收所述NVMe控制器发送的第二PCIe报文;根据所述目标I/O完成队列的入口信息将所述至少一个完成队列条目存储至目标I/O完成队列。
- 根据权利要求30所述的系统,其特征在于,所述目标I/O完成队列的入口信息为所述主机可寻址的PCIe地址空间中唯一的第二PCIe地址;所述主机,还用于根据所述第二PCIe地址确定第三地址,所述第三地址为所述主机的存储器中存储所述目标I/O完成队列的地址;根据所述第三地址将所述至少一个CQE存储至所述目标I/O完成队列。
- 根据权利要求31所述的系统,其特征在于,所述主机,还用于根据所述第二PCIe地址确定所述目标I/O完成队列的标识,根据所述目标I/O完成队列的标识确定所述第三地址。
- 根据权利要求23至33任一所述的系统,其特征在于,所述NVMe控制器,还用于当满足CQE聚合条件时,将至少两个数据操作请求的操作结果作为所述第二PCIe报文的净荷数据,一个CQE对应一个数据操作请求的操作结果;其中,所述CQE聚合条件包括最大可聚合CQE大小MCCB满足第三阈值或I/O完成队列聚合定时器CQT记录时长满足第四阈值。
- 根据权利要求23至34任一所述的系统,其特征在于,所述第二报文还包括 目标I/O完成队列的深度信息N,所述N指示所述第二PCIe报文中携带的CQE的数量;所述NVMe控制器,还用于按照预设顺序将N个CQE作为所述第二PCIe报文中净荷数据,其中1≤N≤MCC,所述MCC是正整数。
- 根据权利要求23至35中任一所述的系统,其特征在于,所述主机为嵌入式处理器,所述嵌入式处理器支持发送容量为至少64字节的PCIE报文。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18917582.1A EP3614253B1 (en) | 2018-06-30 | 2018-06-30 | Data processing method and storage system |
PCT/CN2018/093919 WO2020000483A1 (zh) | 2018-06-30 | 2018-06-30 | 数据处理的方法和存储系统 |
CN201880004252.8A CN109983449B (zh) | 2018-06-30 | 2018-06-30 | 数据处理的方法和存储系统 |
CN202210259758.1A CN114780458B (zh) | 2018-06-30 | 2018-06-30 | 数据处理的方法和存储系统 |
US16/673,320 US11169938B2 (en) | 2018-06-30 | 2019-11-04 | Non-volatile memory (NVM) express (NVMe) data processing method and system |
US17/498,348 US11636052B2 (en) | 2018-06-30 | 2021-10-11 | Non-volatile memory express (NVMe) data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/093919 WO2020000483A1 (zh) | 2018-06-30 | 2018-06-30 | 数据处理的方法和存储系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/673,320 Continuation US11169938B2 (en) | 2018-06-30 | 2019-11-04 | Non-volatile memory (NVM) express (NVMe) data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020000483A1 true WO2020000483A1 (zh) | 2020-01-02 |
Family
ID=67077737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/093919 WO2020000483A1 (zh) | 2018-06-30 | 2018-06-30 | 数据处理的方法和存储系统 |
Country Status (4)
Country | Link |
---|---|
US (2) | US11169938B2 (zh) |
EP (1) | EP3614253B1 (zh) |
CN (2) | CN109983449B (zh) |
WO (1) | WO2020000483A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022043792A1 (en) * | 2020-08-31 | 2022-03-03 | International Business Machines Corporation | Input/output queue hinting for resource utilization |
CN114925003A (zh) * | 2021-02-12 | 2022-08-19 | 慧与发展有限责任合伙企业 | 控制nvmetm设备中的i/o q连接 |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110460653B (zh) * | 2019-07-30 | 2022-08-30 | 北京百度网讯科技有限公司 | 自动驾驶车辆数据传输的方法及装置 |
CN111221476B (zh) * | 2020-01-08 | 2022-03-29 | 深圳忆联信息系统有限公司 | 提升ssd性能的前端命令处理方法、装置、计算机设备及存储介质 |
JP7028902B2 (ja) * | 2020-02-07 | 2022-03-02 | 株式会社日立製作所 | ストレージシステム及び入出力制御方法 |
US11068422B1 (en) * | 2020-02-28 | 2021-07-20 | Vmware, Inc. | Software-controlled interrupts for I/O devices |
US11636059B2 (en) * | 2020-03-31 | 2023-04-25 | Samsung Electronics Co., Ltd. | Scaling performance in a storage server with storage devices |
CN111857579B (zh) * | 2020-06-30 | 2024-02-09 | 广东浪潮大数据研究有限公司 | 一种ssd盘片控制器复位方法、系统、装置及可读存储介质 |
CN113296691B (zh) * | 2020-07-27 | 2024-05-03 | 阿里巴巴集团控股有限公司 | 数据处理系统、方法、装置以及电子设备 |
CN112256601B (zh) * | 2020-10-19 | 2023-04-21 | 苏州凌云光工业智能技术有限公司 | 数据存取控制方法、嵌入式存储系统及嵌入式设备 |
JP2022076620A (ja) * | 2020-11-10 | 2022-05-20 | キオクシア株式会社 | メモリシステムおよび制御方法 |
CN112346665B (zh) * | 2020-11-30 | 2023-05-19 | 杭州华澜微电子股份有限公司 | 基于pcie的通信方法、装置、设备、系统及存储介质 |
US20210279186A1 (en) * | 2021-05-26 | 2021-09-09 | Intel Corporation | Method and apparatus to perform dynamically controlled interrupt coalescing for a solid state drive |
CN115904488A (zh) * | 2021-08-11 | 2023-04-04 | 华为技术有限公司 | 数据传输方法、系统、装置及设备 |
CN114048156B (zh) * | 2021-10-28 | 2024-05-03 | 山东云海国创云计算装备产业创新中心有限公司 | 一种多通道多映射中断控制器 |
US12117944B2 (en) | 2022-01-27 | 2024-10-15 | Samsung Electronics Co., Ltd. | Systems, methods, and devices for queue management with a coherent interface |
CN114979022B (zh) * | 2022-05-20 | 2023-07-28 | 北京百度网讯科技有限公司 | 远程直接数据存取的实现方法、装置、适配器和存储介质 |
CN114936171B (zh) * | 2022-06-14 | 2023-11-14 | 深存科技(无锡)有限公司 | 存储访问控制器架构 |
CN115622954B (zh) * | 2022-09-29 | 2024-03-01 | 中科驭数(北京)科技有限公司 | 数据传输方法、装置、电子设备及存储介质 |
CN115657961B (zh) * | 2022-11-11 | 2023-03-17 | 苏州浪潮智能科技有限公司 | Raid磁盘阵列管理方法、系统、电子设备及存储介质 |
US20240220119A1 (en) * | 2022-12-29 | 2024-07-04 | SK Hynix NAND Product Solutions Corp. (dba Solidigm) | Methods and systems for dynamic submission data structures |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150095554A1 (en) * | 2013-09-27 | 2015-04-02 | Avalanche Technology, Inc. | Storage processor managing solid state disk array |
US20150169244A1 (en) * | 2013-09-27 | 2015-06-18 | Avalanche Technology, Inc. | Storage processor managing nvme logically addressed solid state disk array |
CN107820693A (zh) * | 2016-12-28 | 2018-03-20 | 华为技术有限公司 | NVMe over Fabric中转发报文的方法、设备和系统 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8966172B2 (en) * | 2011-11-15 | 2015-02-24 | Pavilion Data Systems, Inc. | Processor agnostic data storage in a PCIE based shared storage enviroment |
US9256384B2 (en) * | 2013-02-04 | 2016-02-09 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Method and system for reducing write latency in a data storage system by using a command-push model |
US9304690B2 (en) * | 2014-05-07 | 2016-04-05 | HGST Netherlands B.V. | System and method for peer-to-peer PCIe storage transfers |
KR102403489B1 (ko) * | 2015-07-10 | 2022-05-27 | 삼성전자주식회사 | 비휘발성 메모리 익스프레스 컨트롤러에 의한 입출력 큐 관리 방법 |
US10467155B2 (en) * | 2015-10-26 | 2019-11-05 | Micron Technology, Inc. | Command packets for the direct control of non-volatile memory channels within a solid state drive |
CN106775434B (zh) | 2015-11-19 | 2019-11-29 | 华为技术有限公司 | 一种NVMe网络化存储的实现方法、终端、服务器及系统 |
CN106919531B (zh) * | 2015-12-25 | 2020-02-21 | 华为技术有限公司 | 基于非易失性存储总线协议的交互方法及设备 |
US10860511B1 (en) * | 2015-12-28 | 2020-12-08 | Western Digital Technologies, Inc. | Integrated network-attachable controller that interconnects a solid-state drive with a remote server computer |
CN114579488B (zh) * | 2016-04-04 | 2024-09-27 | 马维尔亚洲私人有限公司 | 用于在与直接目标访问桥接的结构上通过非易失性存储器访问主机存储器的方法和系统 |
CN107992436B (zh) | 2016-10-26 | 2021-04-09 | 华为技术有限公司 | 一种NVMe数据读写方法及NVMe设备 |
-
2018
- 2018-06-30 CN CN201880004252.8A patent/CN109983449B/zh active Active
- 2018-06-30 EP EP18917582.1A patent/EP3614253B1/en active Active
- 2018-06-30 WO PCT/CN2018/093919 patent/WO2020000483A1/zh unknown
- 2018-06-30 CN CN202210259758.1A patent/CN114780458B/zh active Active
-
2019
- 2019-11-04 US US16/673,320 patent/US11169938B2/en active Active
-
2021
- 2021-10-11 US US17/498,348 patent/US11636052B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150095554A1 (en) * | 2013-09-27 | 2015-04-02 | Avalanche Technology, Inc. | Storage processor managing solid state disk array |
US20150169244A1 (en) * | 2013-09-27 | 2015-06-18 | Avalanche Technology, Inc. | Storage processor managing nvme logically addressed solid state disk array |
CN107820693A (zh) * | 2016-12-28 | 2018-03-20 | 华为技术有限公司 | NVMe over Fabric中转发报文的方法、设备和系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3614253A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022043792A1 (en) * | 2020-08-31 | 2022-03-03 | International Business Machines Corporation | Input/output queue hinting for resource utilization |
US11604743B2 (en) | 2020-08-31 | 2023-03-14 | International Business Machines Corporation | Input/output queue hinting for resource utilization |
US11960417B2 (en) | 2020-08-31 | 2024-04-16 | International Business Machines Corporation | Input/output queue hinting for resource utilization |
CN114925003A (zh) * | 2021-02-12 | 2022-08-19 | 慧与发展有限责任合伙企业 | 控制nvmetm设备中的i/o q连接 |
CN114925003B (zh) * | 2021-02-12 | 2023-08-08 | 慧与发展有限责任合伙企业 | 控制nvme设备中的i/o q连接 |
Also Published As
Publication number | Publication date |
---|---|
US11169938B2 (en) | 2021-11-09 |
US20220027292A1 (en) | 2022-01-27 |
EP3614253A4 (en) | 2020-07-08 |
US20200065264A1 (en) | 2020-02-27 |
US11636052B2 (en) | 2023-04-25 |
EP3614253B1 (en) | 2022-02-09 |
CN114780458A (zh) | 2022-07-22 |
CN109983449B (zh) | 2022-03-29 |
EP3614253A1 (en) | 2020-02-26 |
CN114780458B (zh) | 2024-05-14 |
CN109983449A (zh) | 2019-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020000483A1 (zh) | 数据处理的方法和存储系统 | |
US10705974B2 (en) | Data processing method and NVME storage device | |
US10713074B2 (en) | Method, apparatus, and system for accessing storage device | |
US9467512B2 (en) | Techniques for remote client access to a storage medium coupled with a server | |
US8095701B2 (en) | Computer system and I/O bridge | |
US7308523B1 (en) | Flow-splitting and buffering PCI express switch to reduce head-of-line blocking | |
US20200019433A1 (en) | Resource Management Method and Apparatus | |
WO2019233322A1 (zh) | 资源池的管理方法、装置、资源池控制单元和通信设备 | |
WO2021051919A1 (zh) | 一种数据转发芯片及服务器 | |
ES2757375T3 (es) | Método, servidor, y punto final de gestión de recursos | |
US9092366B2 (en) | Splitting direct memory access windows | |
WO2022007470A1 (zh) | 一种数据传输的方法、芯片和设备 | |
WO2013048409A1 (en) | Writing message to controller memory space | |
CN104731635B (zh) | 一种虚拟机访问控制方法,及虚拟机访问控制系统 | |
US20240348686A1 (en) | Remote Data Access Method and Apparatus | |
US11809799B2 (en) | Systems and methods for multi PF emulation using VFs in SSD controller | |
CN110119304A (zh) | 一种中断处理方法、装置及服务器 | |
TW202238399A (zh) | 快速週邊組件互連裝置及其操作方法 | |
WO2021063160A1 (zh) | 访问固态硬盘的方法及存储设备 | |
TW202240414A (zh) | PCIe功能及其操作方法 | |
CN115617718A (zh) | 一种基于AXI总线的读写保序方法及SoC系统 | |
US20230106771A1 (en) | Data Processing Method for Network Adapter and Network Adapter | |
US20240168876A1 (en) | Solving submission queue entry overflow using metadata or data pointers | |
CN113031849A (zh) | 直接内存存取单元及控制部件 | |
US20230350824A1 (en) | Peripheral component interconnect express device and operating method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018917582 Country of ref document: EP Effective date: 20191113 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |