WO2018102968A1 - NVMe over Fabric架构中数据读写命令的控制方法、设备和系统 - Google Patents

NVMe over Fabric架构中数据读写命令的控制方法、设备和系统 Download PDF

Info

Publication number
WO2018102968A1
WO2018102968A1 PCT/CN2016/108600 CN2016108600W WO2018102968A1 WO 2018102968 A1 WO2018102968 A1 WO 2018102968A1 CN 2016108600 W CN2016108600 W CN 2016108600W WO 2018102968 A1 WO2018102968 A1 WO 2018102968A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage space
command
data
cache unit
command queue
Prior art date
Application number
PCT/CN2016/108600
Other languages
English (en)
French (fr)
Inventor
吉辛维克多
邱鑫
吴沛
曲会春
张锦彬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2016/108600 priority Critical patent/WO2018102968A1/zh
Priority to EP20191641.8A priority patent/EP3825857B1/en
Priority to EP16897476.4A priority patent/EP3352087B1/en
Priority to CN201680031202.XA priority patent/CN108369530B/zh
Publication of WO2018102968A1 publication Critical patent/WO2018102968A1/zh
Priority to US16/415,995 priority patent/US11762581B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Definitions

  • the present invention relates to the field of information technology, and in particular, to a fabric-based non-volatile high-speed transmission bus NVMe, NVMe over Fabric, a method, device and system for controlling data read and write commands in an architecture.
  • Non-volatile high-speed transfer bus (English: NVMe, non-volatile memory express) is a controller interface standard that unifies NVMe devices connected by a fast peripheral component interconnect (English: PCIe, Peripheral Component Interconnect Express) bus.
  • the queue (English: Queue) transmission mechanism between the host and the host (English: Host) optimizes the queue interface.
  • NVMe over Fabric NVMe over Fabric
  • Host represents the host, the host is responsible for initiating the reading and writing of data
  • Target represents the target storage device, which is responsible for receiving and executing the commands sent by the host.
  • the NIC in the Target resolves the content in the Write Command to obtain the data length that the Write Command needs to transmit, and allocates a corresponding storage space in the NIC memory for buffering the data to be transmitted by the Host.
  • the cached data is moved to the destination hard disk in the Target.
  • the implementation is similar when the Host reads data from Target's hard disk through Read Command. That is, the data in the target hard disk needs to be cached in the network card memory, and then the data cached in the network card memory is sent to the Host.
  • the Host sends multiple commands to the Target during the same time period.
  • the NVMe over Fabric architecture implements parallel processing of commands through multiple queues.
  • the data to be read and written by the commands in one queue occupies most of the storage space of the network card's NIC memory. This will cause the commands in other queues to not cache enough data to cache the data that needs to be read or written, and cannot be executed in time. Commands that are not executed need to wait for the release of memory space, and apply for available memory space again.
  • Such an implementation method makes the implementation mode of the NIC in the Target when the NIC memory is insufficient is complicated, and the maintainability is also poor.
  • Embodiments of the present invention provide a method, a device, and a system for controlling data read and write commands in an NVMe over Fabric architecture, to solve the problem of reading and writing data in other queues caused by executing data read and write commands in a queue. There is not enough cache space and the processing mechanism caused by the failure is complicated.
  • the embodiment of the present invention provides a method for controlling data read and write commands between a control device and a storage device in an NVMe over Fabric architecture, where the storage device includes a data processing unit, a cache unit, and a storage unit, and the control device Data to be read and written is stored in the storage unit, the data processing unit is configured to receive a data read/write command sent by the control device, and the cache unit is configured to cache data to be transmitted by the data read/write command.
  • the method includes the following steps:
  • the data processing unit receives a control command sent by the control device, where the control command includes information that divides a storage space of the cache unit into two or more storage spaces;
  • the data processing unit divides the storage space of the cache unit into two or more storage spaces according to the control command, and establishes a correspondence between the two or more storage spaces and the command queue, where the command is
  • the queue is a queue formed by data read and write control commands sent by the control device;
  • the data processing unit Receiving, by the data processing unit, the first data read/write command sent by the control device, according to the correspondence between the two or more storage spaces and the command queue, the data to be transmitted by the first data read/write command And cached in a storage space of a cache unit corresponding to the first command queue, where the first data read/write command is a data read/write command in the first command queue.
  • the storage space corresponding to the first data read/write command in the first command queue is cached in the storage corresponding to the first command queue. In space. In this way, the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • control device and the storage device can be connected and communicated through a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • the data processing unit in the storage device may be implemented by a network card, a separate FPGA chip, or a central processing unit (CPU) in the storage device.
  • the cache unit in the storage device may also be implemented by a network card memory, a storage unit in the FPGA chip, a cache unit on the storage device, or a memory of a CPU in the storage device.
  • the cache unit in the storage device may also be a network card memory, a storage unit in the FPGA chip, and a storage device.
  • the buffer unit on the upper cache unit or the memory of the CPU in the storage device is implemented by using at least two cache resource pools.
  • the data processing unit establishes a correspondence between the two or more storage spaces and the command queue, including:
  • the data processing unit establishes a correspondence between the two or more storage spaces and the command queue according to the corresponding relationship information carried in the control command, where the correspondence relationship information is two or more of the cache units. Correspondence between storage space and command queue; or,
  • the data processing unit establishes a correspondence between the two or more storage spaces and the command queue according to the two or more storage spaces that are divided.
  • the method further includes:
  • the proportion of the storage space of the cache unit corresponding to the first command queue is greater than a preset first threshold, and the proportion of the storage space of the cache unit corresponding to the second command queue is less than
  • the second threshold is set, the storage space of the cache unit corresponding to the second command queue is reduced, and the storage space of the cache unit corresponding to the reduced second command queue is allocated to the a storage space of the cache unit corresponding to the first command queue; wherein the first threshold is greater than the second threshold.
  • the storage space of the cache unit can be flexibly allocated according to actual conditions. Not only can the resources of the cache unit be maximized, but also the problem of large amount of data to be transmitted by the data read/write commands in some command queues can be solved, and the ability of the service processing can be improved while avoiding waste of resources.
  • the establishing a correspondence between each storage space and the command queue in the cache unit may be a command queue corresponding to one storage space, or may be a queue group composed of two or more command queues. A storage space.
  • the allocation of the cache unit can be implemented more flexibly to realize the allocation of the storage resources of the cache unit by different command queues.
  • the manner of dividing the storage space of the cache unit including but not limited to: according to the size of the storage space of the cache unit, according to the quality of service of different command queues or according to the priority of different command queues.
  • the command queue corresponding to the storage space allocated to the network card memory has a high priority or a high QOS requirement, and is allocated to The command queue corresponding to the storage space of the memory of the CPU in the storage device has a low priority or a low QOS requirement. Because the cache memory is fast and efficient when the NIC memory is used as the cache unit, the high-priority command queue is allocated to the storage space of the NIC memory to meet the service requirements of high-priority commands.
  • the command queue corresponding to the storage unit allocated to the FPGA chip may have a high priority or a high QOS requirement.
  • the priority of the command queue corresponding to the storage space allocated to the memory of the CPU in the storage device is low.
  • the data processing unit may further bind the multiple command queues into a queue group according to the control command of the control device, where the storage space in the cache unit corresponding to the command queue group is in the queue group.
  • the method further includes:
  • the control device acquires an available storage space of the cache unit corresponding to the first command queue
  • the control device determines whether the storage space occupied by the first data to be transmitted by the first data read/write command is less than or equal to the available storage space of the cache unit corresponding to the first command queue;
  • the first data read/write command is suspended.
  • control device sends the first data read/write command in the first command queue when the storage space of the cache unit corresponding to the first command queue can buffer the data to be transmitted.
  • the problem that the available storage space in the storage space of the cache unit corresponding to the first command queue is insufficient is insufficient, and the processing mechanism caused by the command in the first command queue is complicated.
  • the acquiring, by the control device, the available storage space of the cache unit corresponding to the first command queue includes:
  • the method before the control device sends the request to the data processing unit to obtain the available storage space of the cache unit corresponding to the first command queue, the method further includes:
  • the control device sends a second data read/write command to the storage device, where the data to be transmitted by the second data read/write command is greater than the available storage space of the cache unit corresponding to the first command queue;
  • the control device receives a backpressure message sent by the data processing unit, where the backpressure message is used to indicate that the available storage space of the cache unit corresponding to the first command queue is insufficient. .
  • the control device may not send a request for obtaining the available storage space of the storage space of the cache unit corresponding to the command queue in which the command to be sent is located, each time the data read/write command is sent, only after receiving the data storage read/write command.
  • the obtaining request is sent only when the data processing unit returns a backpressure message that cannot cache data. In this way, the resource consumption of the control device can be saved, and the resource consumption generated by the data processing unit when returning the back pressure message is saved correspondingly.
  • the method further includes:
  • the control device suspends sending the first data read/write command for a preset time, re-acquiring the available storage space of the cache unit corresponding to the first command queue, and storing in the first data
  • the space is less than or equal to the available storage space of the cache unit corresponding to the first command queue, the first data read/write command is sent to the storage device.
  • the preset time for the control device to suspend sending the first data read/write command may be a system default time or a pre-configured time.
  • the control device pauses the preset time for sending the first data read/write command, and can perform flexible setting according to a specific service situation.
  • the control device performs the storage of the available storage space of the cache unit corresponding to the first command queue and the storage of the first data to be transmitted by the first data read/write command. Whether the space is less than or equal to the available storage space of the cache unit corresponding to the first command queue.
  • the preset time can be set differently according to different business scenarios. During the preset time, the available storage space of the cache unit cannot satisfy the storage space requirement of data to be transmitted by all data read/write commands sent by the control device. After the preset time is reached, the available storage space of the cache unit can meet the storage space requirement of the data to be transmitted by the data read/write command sent by the control device.
  • the control device further includes retransmitting the second data read/write command. That is, for the second data read/write command that cannot be executed in time because the storage space of the cache unit corresponding to the first command queue is insufficient, the control device determines the storage of the cache unit corresponding to the first command queue. When the space is greater than the data to be transmitted by the second data read/write command, the control device resends the second data read/write command.
  • the available storage space of the cache unit is a real-time available storage space of the cache unit corresponding to the first command queue recorded locally.
  • the real-time available storage space is the real-time available storage space of the cache unit recorded by the control device.
  • control device may acquire and record an available storage space of the cache unit when the storage device is powered on.
  • the control device may also acquire and record the available storage space of the cache unit at any time after the storage device is powered on.
  • the real-time available storage space of the network card memory recorded by the control device may be a size of a space of the cache unit that can store data or a number of data blocks that can be written.
  • the control device stores, in a dedicated storage space, for example, a dedicated chip, a real-time available storage space of a storage space of the cache unit corresponding to the first command queue. It may also be stored in a storage component existing in the control device, for example, in a cache of a CPU of the control device, or in a cache of a network card of the control device, and may also be a storage space in an independent FPGA chip.
  • the real-time available storage space of the storage space of the cache unit corresponding to the first command queue is stored.
  • the control device subtracts the first available real-time storage space of the cache unit corresponding to the first command queue recorded locally.
  • the control device After receiving the response message sent by the data processing unit to complete the first data read/write command, the control device adds the real-time available storage space of the cache unit corresponding to the first command queue locally recorded. The storage space occupied by the first data.
  • the first data After the control device sends the first data read/write command, the first data reads and writes The data to be transmitted occupies the storage space of the cache unit corresponding to the first command queue. Therefore, it is necessary to subtract the storage space occupied by the first data from the real-time available storage space of the cache unit corresponding to the recorded first command queue.
  • the control device After the control device receives the response message of the first data read/write command sent by the data processing unit, the first data has been migrated out of the cache unit corresponding to the first command queue. Therefore, the real-time available storage space of the cache unit corresponding to the recorded first command queue needs to be added to the storage space occupied by the first data. In this way, the latest available storage space of the cache unit corresponding to the first command queue can be correctly recorded.
  • the method further includes:
  • the control device After the control device pauses to send the first data read/write command for a preset time, the control device determines again whether the storage space occupied by the first data is less than or equal to the corresponding local command queue corresponding to the first command queue.
  • the real-time available storage space of the cache unit and when the storage space occupied by the first data is less than or equal to the real-time available storage space of the cache unit corresponding to the locally recorded first command queue, The first data read and write command is to the storage device.
  • the control device stores, in a dedicated storage space, for example, a dedicated chip, a real-time available storage space of a storage space of the cache unit corresponding to the first command queue. It may also be stored in a storage component existing in the control device, for example, in a cache of a CPU of the control device, or in a cache of a network card of the control device, or in a storage space in the FPGA chip. And storing a real-time available storage space of a storage space of the cache unit corresponding to the first command queue.
  • the first data read/write command sent by the control device is a Write Command
  • the data to be transmitted by the first data read/write command is required.
  • the Write Command carries an SGL, and the SGL includes a field, for example, an entry, where the field includes a source address of the data to be stored in the control device, a length of the data to be stored, And the need to store Information such as the destination address of the data in the storage device.
  • the data processing unit caches the data to be stored in the storage space of the cache unit corresponding to the first command queue according to the source address of the data that needs to be stored in the SGL in the Write Command.
  • the data processing unit may receive the data to be stored by using a network card in the control device in a manner of remote direct data access (RDMA).
  • RDMA remote direct data access
  • the data processing unit modifies the Write Command, and the data to be stored carried by the Write Command is in the
  • the source address in the control device is modified to store the address of the data to be stored in the cache unit corresponding to the first command queue, and the modified Write Command is sent to the controller of the destination hard disk. That is, the SGL carried by the Write Command sent by the data processing unit to the controller of the destination hard disk includes the address of the data to be stored in the cache unit corresponding to the first command queue, and the length of the data to be stored. And information such as a destination address of the data to be stored in the storage device.
  • the data processing unit After determining the destination hard disk, the data processing unit sends the modified Write Command to the controller of the destination hard disk.
  • the controller of the destination hard disk reads the data to be stored from the cache unit according to the address of the data to be stored carried in the received Write Command in the cache unit.
  • the data to be stored is read, for example, in the form of RDMA or Direct Memory Access (English: DMA). And the read data that needs to be stored is written into a storage space corresponding to the destination hard disk.
  • the first data read/write command sent by the control device is a Read Command
  • the data to be transmitted by the first data read/write command is data to be read.
  • the Read Command carries an SGL, where the SGL includes the source address of the data to be read in the storage device, the length of the data to be read, and the data to be read to be written. Information such as the destination address in the control device.
  • the data processing unit modifies the Read Command, and modifies the destination address of the data to be read carried in the Read Command in the control device to the first
  • the address of the data to be read is cached in the storage space of the cache unit corresponding to the command queue, and the modified Read Command is sent to the controller of the destination hard disk. That is, the SGL carried by the Read Command sent by the data processing unit to the destination hard disk controller includes a source address of the data to be read in the storage device, a length of the data to be read, and the Information such as an address of the data to be read is cached in a storage space of the cache unit corresponding to the first command queue.
  • the controller of the destination hard disk migrates the data to be read to the storage space of the cache unit corresponding to the first command queue according to the modified Read Command received.
  • the controller of the destination hard disk migrates the data to be read to the storage space of the cache unit corresponding to the first command queue by means of RDMA.
  • the data processing unit After the data to be read is cached in the storage space of the cache unit corresponding to the first command queue, the data processing unit writes the data that needs to be read according to the Read Command to the control device.
  • the destination address in the cache, and the cached data to be read is sent to the control device.
  • the data processing unit sends the cached data that needs to be read to the control device by means of RDMA.
  • the data processing unit and the storage unit are connected by an architecture based on NVMe, NVMe over PCIe, which is based on a fast peripheral component interconnect standard PCIe.
  • the data processing unit includes a controller, and the controller is configured to control transmission between the cached data in the cache unit and the storage unit, where the controller is an NVMe over Fabric architecture.
  • the controller is an NVMe over Fabric architecture.
  • an embodiment of the present invention provides a method for controlling data read and write commands between a control device and a storage device in an NVMe over Fabric architecture, where the storage device includes a data processing unit, a cache unit, and a storage unit, and the control The data that the device needs to read and write is stored in the storage unit, and the data processing unit is configured to receive a data read/write command sent by the control device, The cache unit is configured to cache data that is required to be transmitted by the data read/write command; wherein the method includes:
  • the control device sends a control command to the data processing unit, where the control command includes information that divides a storage space of the cache unit into two or more storage spaces, so that the data processing unit is configured according to the control command. Dividing a storage space of the cache unit into two or more storage spaces, and establishing a correspondence between the two or more storage spaces and a command queue, where the command queue is data read and written by the control device. a queue of control commands;
  • the control device sends a first data read/write command to the storage device, where the data to be transmitted by the first data read/write command is cached in a storage space of a cache unit corresponding to the first command queue, the first data
  • the read and write commands are data read and write commands in the first command queue.
  • the control device sends the control command, so that the cache unit is divided into different storage spaces, each storage space corresponds to a different command queue, and the first data read/write command in the first command queue is to be transmitted.
  • the data is cached in a storage space corresponding to the first command queue.
  • the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • control device and the storage device can be connected and communicated through a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • the data processing unit in the storage device may be implemented by a network card, an FPGA chip, or a CPU in the storage device.
  • the cache unit in the storage device may also be implemented by a network card memory, a storage unit in the FPGA chip, a cache unit on the storage device, or a memory of a CPU in the storage device.
  • the cache unit in the storage device may also be in the network card memory, in the FPGA chip A storage pool, a cache unit on the storage device, or a cache resource pool composed of at least two of the CPUs in the storage device is implemented.
  • the establishing a correspondence between each storage space and the command queue in the cache unit may be a command queue corresponding to one storage space, or may be a queue group composed of two or more command queues. A storage space.
  • the allocation of the cache unit can be implemented more flexibly to realize the allocation of the storage resources of the cache unit by different command queues.
  • the manner of dividing the storage space of the cache unit including but not limited to: according to the size of the storage space of the cache unit, according to the quality of service of different command queues or according to the priority of different command queues.
  • the command queue corresponding to the storage space allocated to the network card memory has a high priority and a high QOS requirement, and is allocated to the storage device.
  • the priority of the command queue corresponding to the storage space of the memory of the CPU in the storage device is low. Because the cache memory is fast and efficient when the NIC memory is used as the cache unit, the high-priority command queue is allocated to the storage space of the NIC memory to meet the service requirements of high-priority commands.
  • the priority of the command queue corresponding to the storage unit allocated to the storage unit in the FPGA chip is high, and the allocation is high.
  • the priority of the command queue corresponding to the storage space of the memory of the CPU in the storage device is low.
  • the method further includes:
  • the control device acquires an available storage space of the cache unit corresponding to the first command queue
  • the control device determines whether the storage space occupied by the first data to be transmitted by the first data read/write command is less than or equal to the available storage space of the cache unit corresponding to the first command queue;
  • the storage space occupied by the first data is less than or equal to the corresponding one of the first command queues. Transmitting, by the cache unit, the first data read/write command to the storage device;
  • the first data read/write command is suspended.
  • control device sends the first data read/write command in the first command queue when the storage space of the cache unit corresponding to the first command queue can buffer the data to be transmitted.
  • the problem that the available storage space in the storage space of the cache unit corresponding to the first command queue is insufficient is insufficient, and the processing mechanism caused by the command in the first command queue is complicated.
  • the acquiring, by the control device, the available storage space of the cache unit corresponding to the first command queue includes:
  • the method before the control device sends the request to the data processing unit to obtain the available storage space of the cache unit corresponding to the first command queue, the method further includes:
  • the control device sends a second data read/write command to the storage device, where the data to be transmitted by the second data read/write command is greater than the available storage space of the cache unit corresponding to the first command queue;
  • the control device receives a backpressure message sent by the data processing unit, where the backpressure message is used to indicate that the available storage space of the cache unit corresponding to the first command queue is insufficient.
  • the control device may not send a request for obtaining the available storage space of the storage space of the cache unit corresponding to the command queue in which the command to be sent is located, each time the data read/write command is sent, only after receiving the data storage read/write command.
  • the obtaining request is sent only when the data processing unit returns a backpressure message that cannot cache data.
  • the consumption also saves the resource consumption caused by the data processing unit when returning the back pressure message.
  • the available storage space of the cache unit is a real-time available storage space of the cache unit corresponding to the first command queue recorded locally.
  • the local refers to the control device
  • the real-time available storage space of the locally recorded cache unit is a real-time available storage space of the cache unit recorded by the control device.
  • control device may acquire and record an available storage space of the cache unit when the storage device is powered on.
  • the control device may also acquire and record the available storage space of the cache unit at any time after the storage device is powered on.
  • the real-time available storage space of the network card memory recorded by the control device may be a size of a space of the cache unit that can store data or a number of data blocks that can be written.
  • the control device stores, in a dedicated storage space, for example, a dedicated chip, a real-time available storage space of a storage space of the cache unit corresponding to the first command queue. It may also be stored in a storage component existing in the control device, for example, in a cache of a CPU of the control device, or in a cache of a network card of the control device, and may also be a storage space in an independent FPGA chip.
  • the real-time available storage space of the storage space of the cache unit corresponding to the first command queue is stored.
  • the method further includes:
  • the control device After the first data read/write command is sent, the control device subtracts the storage space occupied by the first data from the real-time available storage space of the cache unit corresponding to the first command queue recorded locally;
  • the control device After receiving the response message sent by the data processing unit to complete the first data read/write command, the control device adds the real-time available storage space of the cache unit corresponding to the first command queue locally recorded. The storage space occupied by the first data.
  • the first data After the control device sends the first data read/write command, the first data reads and writes The data to be transmitted occupies the storage space of the cache unit corresponding to the first command queue. Therefore, it is necessary to subtract the storage space occupied by the first data from the real-time available storage space of the cache unit corresponding to the recorded first command queue.
  • the control device After the control device receives the response message of the first data read/write command sent by the data processing unit, the first data has been migrated out of the cache unit corresponding to the first command queue. Therefore, the real-time available storage space of the cache unit corresponding to the recorded first command queue needs to be added to the storage space occupied by the first data. In this way, the latest available storage space of the cache unit corresponding to the first command queue can be correctly recorded.
  • the embodiment of the present invention further provides a storage device, where the storage device is a storage device in an NVMe over fabric architecture, and the storage device performs data transmission between the control device and the control device in the NVMe over fabric architecture.
  • the storage device includes a data processing unit and a cache unit, the data processing unit is configured to receive a data read/write command sent by the control device, and the cache unit is configured to cache data that needs to be transmitted by the data read/write command.
  • the data processing unit includes a processor, and the processor is configured to perform the following steps:
  • control device Receiving a control command sent by the control device, where the control command includes information that divides a storage space of the cache unit into two or more storage spaces;
  • the storage space of the cache unit into two or more storage spaces, and establishing a correspondence between the two or more storage spaces and a command queue, where the command queue is the control a queue formed by data read and write control commands sent by the device;
  • the first data read/write command is a data read/write command in the first command queue.
  • control device and the storage device can be connected and communicated through a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • the data processing unit in the storage device may be implemented by a network card, an FPGA chip, or a CPU in the storage device.
  • the cache unit in the storage device may also be implemented by a network card memory, a storage unit in the FPGA chip, a cache unit on the storage device, or a memory of a CPU in the storage device.
  • the cache unit in the storage device may also be implemented by a cache resource pool composed of at least two of a network card memory, a storage unit in the FPGA chip, a cache unit on the storage device, or a memory of the CPU in the storage device.
  • the processor establishes a correspondence between the two or more storage spaces and a command queue, including:
  • Corresponding relationship between the two or more storage spaces and the command queue is established according to the corresponding relationship information carried in the control command, where the correspondence relationship information is two or more storage spaces and command queues in the cache unit. Correspondence; or,
  • the correspondence between the two or more storage spaces and the command queue is established according to the two or more storage spaces that are divided.
  • the processor is also used to:
  • the proportion of the storage space of the cache unit corresponding to the first command queue is greater than a preset first threshold, and the proportion of the storage space of the cache unit corresponding to the second command queue is less than
  • the second threshold is set, the storage space of the cache unit corresponding to the second command queue is reduced, and the storage space of the cache unit corresponding to the reduced second command queue is allocated to the a storage space of the cache unit corresponding to the first command queue; wherein the first threshold is greater than the second threshold.
  • the storage space of the cache unit can be flexibly allocated according to actual conditions. Not only can the resources of the cache unit be maximized, but also the problem of large amount of data to be transmitted by the data read/write commands in some command queues can be solved, and the ability of the service processing can be improved while avoiding waste of resources.
  • the establishing a correspondence between each storage space and the command queue in the cache unit may be a command queue corresponding to one storage space, or may be a queue group composed of two or more command queues. A storage space.
  • the allocation of the cache unit can be implemented more flexibly to realize the allocation of the storage resources of the cache unit by different command queues.
  • the manner of dividing the storage space of the cache unit including but not limited to: according to the size of the storage space of the cache unit, according to the quality of service of different command queues or according to the priority of different command queues.
  • the command queue corresponding to the storage space allocated to the network card memory has a high priority and a high QOS requirement, and is allocated to the storage device.
  • the priority of the command queue corresponding to the storage space of the memory of the CPU in the storage device is low. Because the cache memory is fast and efficient when the NIC memory is used as the cache unit, the high-priority command queue is allocated to the storage space of the NIC memory to meet the service requirements of high-priority commands.
  • the priority of the command queue corresponding to the storage unit allocated to the storage unit in the FPGA chip is high, and the allocation is high.
  • the priority of the command queue corresponding to the storage space of the memory of the CPU in the storage device is low.
  • an embodiment of the present invention further provides a control device, where the control device is a control device in an NVMe over Fabric architecture, where the control device includes a processor, a network card, and a bus, and the processor and the network card pass a data connection between the control device and a storage device in the NVMe over fabric architecture, the storage device includes a data processing unit, a cache unit, and a storage unit, and the data that the control device needs to read and write is cached in the a storage unit of the storage device, and stored in a storage unit of the storage device; wherein the processor is configured to perform the following steps:
  • the control command including information that divides a storage space of the cache unit into two or more storage spaces, such that the data processing unit And dividing, according to the control command, the storage space of the cache unit into two or more storage spaces, and establishing a correspondence between the two or more storage spaces and a command queue, where the command queue is the control a queue formed by data read and write control commands sent by the device;
  • control device and the storage device can be connected and communicated through a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • the data processing unit in the storage device may be implemented by a network card, an FPGA chip, or a CPU in the storage device.
  • the cache unit in the storage device may also be implemented by a network card memory, a storage unit in the FPGA chip, a cache unit on the storage device, or a memory of a CPU in the storage device.
  • the cache unit in the storage device may also be implemented by a cache resource pool composed of at least two of a network card memory, a storage unit in the FPGA chip, a cache unit on the storage device, or a memory of the CPU in the storage device.
  • the establishing a correspondence between each storage space and the command queue in the cache unit may be a command queue corresponding to one storage space, or may be a queue group composed of two or more command queues. A storage space.
  • the allocation of the cache unit can be implemented more flexibly to realize the allocation of the storage resources of the cache unit by different command queues.
  • the manner of dividing the storage space of the cache unit including but not limited to: according to the size of the storage space of the cache unit, according to the quality of service of different command queues or according to the priority of different command queues.
  • the command queue corresponding to the storage space allocated to the network card memory has a high priority and a high QOS requirement, and is allocated to the storage device.
  • the priority of the command queue corresponding to the storage space of the memory of the CPU in the storage device is low.
  • Cache data when the NIC memory is used as a cache unit Fast and efficient. Therefore, assigning high-priority command queues to the storage space of the NIC memory can meet the business requirements of high-priority commands.
  • the priority of the command queue corresponding to the storage unit allocated to the storage unit in the FPGA chip is high, and the allocation is high.
  • the priority of the command queue corresponding to the storage space of the memory of the CPU in the storage device is low.
  • the processor is also used to:
  • the adjustment command is sent to the data processing unit, where the adjustment command is used to reduce the storage space of the cache unit corresponding to the second command queue, and the second command is reduced.
  • the storage space of the cache unit corresponding to the queue is allocated to the storage space of the cache unit corresponding to the first command queue; wherein the first threshold is greater than the second threshold.
  • the storage space of the cache unit can be flexibly allocated according to actual conditions. Not only can the resources of the cache unit be maximized, but also the problem of large amount of data to be transmitted by the data read/write commands in some command queues can be solved, and the ability of the service processing can be improved while avoiding waste of resources.
  • the processor is further configured to perform the following steps:
  • the first data read/write command is suspended.
  • control device sends the first data read/write command in the first command queue when the storage space of the cache unit corresponding to the first command queue can buffer the data to be transmitted.
  • the problem that the available storage space in the storage space of the cache unit corresponding to the first command queue is insufficient is insufficient, and the processing mechanism caused by the command in the first command queue is complicated.
  • the processor acquiring the available storage space of the cache unit includes:
  • the processor before the processor sends a request to the data processing unit to obtain an available storage space of the cache unit corresponding to the first command queue, the processor is further configured to perform the following step:
  • the control device may not send a request for obtaining the available storage space of the storage space of the cache unit corresponding to the command queue in which the command to be sent is located, each time the data read/write command is sent, only after receiving the data storage read/write command.
  • the obtaining request is sent only when the data processing unit returns a backpressure message that cannot cache data. In this way, the resource consumption of the control device can be saved, and the resource consumption generated by the data processing unit when returning the back pressure message is saved correspondingly.
  • the available storage space of the cache unit is a real-time available storage space of the cache unit corresponding to the first command queue recorded locally.
  • the local refers to the control device
  • the real-time available storage space of the locally recorded cache unit is a real-time available storage space of the cache unit recorded by the control device.
  • control device may acquire and record an available storage space of the cache unit when the storage device is powered on.
  • the control device may also acquire and record the available storage space of the cache unit at any time after the storage device is powered on.
  • the real-time available storage space of the network card memory recorded by the control device may be a size of a space of the cache unit that can store data or a number of data blocks that can be written.
  • the control device stores, in a dedicated storage space, for example, a dedicated chip, a real-time available storage space of a storage space of the cache unit corresponding to the first command queue. It may also be stored in a storage component existing in the control device, for example, in a cache of a CPU of the control device, or in a cache of a network card of the control device, and may also be a storage space in an independent FPGA chip.
  • the real-time available storage space of the storage space of the cache unit corresponding to the first command queue is stored.
  • the processor is further configured to perform the following steps:
  • the real-time available storage space of the cache unit corresponding to the first command queue that is locally recorded is subtracted from the storage space occupied by the first data;
  • an embodiment of the present invention provides a system for implementing data read/write command control, where the system includes a control device and a storage device in an NVMe over Fabric architecture, where the storage device includes a data processing unit, a cache unit, and a storage device. a unit, the data that the control device needs to read and write is stored in the storage unit, and the data processing unit is configured to receive the control device And a data read/write command, wherein the cache unit is configured to cache data to be transmitted by the data read/write command; wherein:
  • the control device is configured to send a control command to the data processing unit, where the control command includes information that divides a storage space of the cache unit into two or more storage spaces;
  • the data processing unit is configured to divide the storage space of the cache unit into two or more storage spaces according to the control command, and establish a correspondence between the two or more storage spaces and the command queue.
  • the command queue is a queue formed by a data read/write control command sent by the control device;
  • the data processing unit is further configured to receive a first data read/write command sent by the control device, and read and write the first data according to a correspondence between the two or more storage spaces and a command queue.
  • the data to be transmitted is cached in a storage space of the cache unit corresponding to the first command queue, and the first data read/write command is a data read/write command in the first command queue.
  • each storage space partitioned in the cache unit corresponds to a different command queue, and data to be transmitted by the first data read/write command in the first command queue is cached in a storage corresponding to the first command queue.
  • the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • control device and the storage device can be connected and communicated through a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • the data processing unit in the storage device may be implemented by a network card, a separate FPGA chip, or a CPU in the storage device.
  • the cache unit in the storage device may also be configured by a network card memory, The storage unit in the FPGA chip, the cache unit on the storage device, or the memory of the CPU in the storage device is implemented.
  • the cache unit in the storage device may also be a network card memory, a storage unit in the FPGA chip, a cache unit or a storage device on the storage device.
  • the implementation is implemented by at least two cache resource pools composed of at least two of the CPUs in the storage device.
  • control device may be a virtual server on a physical server or a physical server.
  • the storage unit in the storage device may be one or more solid state disks (English: SSD, Solid State Disk) or a hard disk drive (English: HDD, Hard Disk Driver).
  • the cache unit may be located in the data processing unit or may be independent of the storage medium of the data processing unit, for example, may be a double data rate (English: DDR, double data rate).
  • the cache unit may also be a memory resource pool formed by a memory resource of a plurality of data processing units in the storage device.
  • the data processing unit establishes a correspondence between the two or more storage spaces and the command queue, including:
  • the data processing unit establishes a correspondence between the two or more storage spaces and the command queue according to the corresponding relationship information carried in the control command, where the correspondence relationship information is two or more of the cache units. Correspondence between storage space and command queue; or,
  • the data processing unit establishes a correspondence between the two or more storage spaces and the command queue according to the two or more storage spaces that are divided.
  • control device is configured to acquire, in a preset time, an occupation ratio of a storage space of the cache unit corresponding to the first command queue, and the cache unit corresponding to the second command queue.
  • the control device sends an adjustment command to the data processing unit, where the adjustment command is used to reduce the buffer corresponding to the second command queue. Storing a storage space of the unit, and allocating a storage space of the cache unit corresponding to the second command queue to a storage space of the cache unit corresponding to the first command queue; The first threshold is greater than the second threshold.
  • the data processing unit is further configured to acquire, in a preset time, an occupation ratio of a storage space of the cache unit corresponding to the first command queue, and the cache corresponding to the second command queue.
  • the proportion of the storage space of the cache unit corresponding to the first command queue is greater than a preset first threshold, and the proportion of the storage space of the cache unit corresponding to the second command queue is less than
  • the second threshold is set, the storage space of the cache unit corresponding to the second command queue is reduced, and the storage space of the cache unit corresponding to the reduced second command queue is allocated to the a storage space of the cache unit corresponding to the first command queue; wherein the first threshold is greater than the second threshold.
  • the storage space of the cache unit can be flexibly allocated according to actual conditions. Not only can the resources of the cache unit be maximized, but also the problem of large amount of data to be transmitted by the data read/write commands in some command queues can be solved, and the ability of the service processing can be improved while avoiding waste of resources.
  • the data processing unit may establish a correspondence between each storage space and the command queue in the cache unit according to the control command, where a command queue corresponds to one storage space, or may be two A queue group consisting of more than one command queue corresponds to one storage space.
  • a command queue corresponds to one storage space
  • a queue group consisting of more than one command queue corresponds to one storage space.
  • the manner in which the data processing unit divides the storage space of the cache unit according to the control command of the control device including but not limited to: according to the size of the storage space of the cache unit, according to the service quality of different command queues or According to the priority of different command queues.
  • the data processing unit may further bind the multiple command queues into one queue group according to the control command of the control device, where the storage in the cache unit corresponding to the command queue group is Space is the sum of the storage space corresponding to each command queue in this queue group.
  • the configuration of the storage space in the cache unit can be further flexibly implemented to meet the different requirements of different command queues for the storage space of the cache unit.
  • the control device is further configured to obtain an available storage space of the cache unit corresponding to the first command queue, and determine whether a storage space occupied by the first data to be transmitted by the first data read/write command is less than or equal to the first An available storage space of the cache unit corresponding to a command queue;
  • the control device is further configured to: when the storage space occupied by the first data is less than or equal to an available storage space of the cache unit corresponding to the first command queue, send the first data read/write command to The storage device stops suspending the sending of the first data read/write command when the storage space occupied by the first data is greater than the available storage space of the cache unit corresponding to the first command queue;
  • the data processing unit is further configured to receive the first data read/write command sent by the control device, and cache data to be transmitted by the first data read/write command in a location corresponding to the first command queue. In the cache unit.
  • control device sends the first data read/write command in the first command queue when the storage space of the cache unit corresponding to the first command queue can buffer the data to be transmitted.
  • the problem that the available storage space in the storage space of the cache unit corresponding to the first command queue is insufficient is insufficient, and the processing mechanism caused by the command in the first command queue is complicated.
  • the acquiring, by the control device, the available storage space of the cache unit corresponding to the first command queue includes:
  • control device before the control device sends a request for acquiring the available storage space of the cache unit corresponding to the first command queue to the data processing unit, the control device is further configured to The storage device sends a second data read/write command, where the data to be transmitted by the second data read/write command is greater than the available storage space of the cache unit corresponding to the first command queue;
  • the control device receives a backpressure message sent by the data processing unit, where the backpressure message is used to indicate that the available storage space of the cache unit corresponding to the first command queue is insufficient.
  • the control device may not send a command queue for acquiring a command to be sent every time a data read/write command is sent, and the request for the available storage space of the storage space of the cache unit is only received.
  • the data processing unit returns a backpressure message that cannot cache data, the foregoing acquisition request is sent, which can save resource consumption of the control device, and correspondingly save resource consumption caused by the data processing unit when returning the back pressure message.
  • control device is further configured to re-acquire the available storage space of the cache unit corresponding to the first command queue after the first data read/write command is paused for a preset time. And sending the first data read/write command to the storage device when the storage space occupied by the first data is less than or equal to the available storage space of the cache unit corresponding to the first command queue.
  • the preset time for the control device to suspend sending the first data read/write command may be a system default time or a pre-configured time.
  • the control device pauses the preset time for sending the first data read/write command, and can perform flexible setting according to a specific service situation.
  • the control device further includes retransmitting the second data read/write command. That is, for the second data read/write command that cannot be executed in time because the storage space of the cache unit corresponding to the first command queue is insufficient, the control device determines the storage of the cache unit corresponding to the first command queue. The control device resends the second data read/write life when the space is greater than the data to be transmitted by the second data read/write command make.
  • the control device acquires the available storage space of the cache unit corresponding to the first command queue and determines the storage space occupied by the first data to be transmitted by the first data read/write command. Whether it is less than or equal to the available storage space of the cache unit corresponding to the first command queue. That is, when the step of acquiring the available storage space of the cache unit before the first data read/write command is sent reaches a preset time, when the available storage space of the cache unit is sufficiently large, the control device may not perform the sending. The step of acquiring the available storage space of the cache unit before the first data read/write command. This can further improve the resource utilization of the control device and the resource utilization of the data processing unit.
  • the preset time can be set differently according to different business scenarios.
  • the available storage space of the cache unit cannot satisfy the storage space requirement of data to be transmitted by all data read/write commands sent by the control device. After the preset time is reached, the available storage space of the cache unit can meet the storage space requirement of the data to be transmitted by the data read/write command sent by the control device.
  • the available storage space of the cache unit is a real-time available storage space of the cache unit corresponding to the first command queue recorded locally.
  • the local refers to the control device
  • the real-time available storage space of the locally recorded cache unit is a real-time available storage space of the cache unit recorded by the control device.
  • control device may acquire and record an available storage space of the cache unit when the storage device is powered on.
  • the control device may also acquire and record the available storage space of the cache unit at any time after the storage device is powered on.
  • the real-time available storage space of the network card memory recorded by the control device may be a size of a space of the cache unit that can store data or a number of data blocks that can be written.
  • control device is stored in a dedicated storage space, such as a dedicated chip. And storing real-time available storage space of the storage space of the cache unit corresponding to the first command queue. It may also be stored in a storage component existing in the control device, for example, in a cache of a CPU of the control device, or in a cache of a network card of the control device, and may also be a storage space in an independent FPGA chip. The real-time available storage space of the storage space of the cache unit corresponding to the first command queue is stored.
  • control device is further configured to: after transmitting the first data read/write command, subtract the real-time available storage space of the cache unit corresponding to the first command queue that is locally recorded. a storage space occupied by the first data; after receiving the response message sent by the data processing unit to complete the first data read/write command, the locally recorded first command queue corresponding to the cache unit The real-time available storage space plus the storage space occupied by the first data.
  • the data to be transmitted by the first data read/write command occupies the storage space of the cache unit corresponding to the first command queue. Therefore, it is necessary to subtract the storage space occupied by the first data from the real-time available storage space of the cache unit corresponding to the recorded first command queue.
  • the control device receives the response message of the first data read/write command sent by the data processing unit, the first data has been migrated out of the cache unit corresponding to the first command queue. Therefore, the real-time available storage space of the cache unit corresponding to the recorded first command queue needs to be added to the storage space occupied by the first data. In this way, the latest available storage space of the cache unit corresponding to the first command queue can be correctly recorded.
  • control device is further configured to pause sending the first data After the read/write command reaches the preset time, the control device determines whether the storage space occupied by the first data is less than or equal to the real-time available storage space of the cache unit corresponding to the first recorded queue of the local record, and Sending the first data read/write command to the storage device when the storage space occupied by the first data is less than or equal to the real-time available storage space of the cache unit corresponding to the first command queue recorded locally.
  • the first data read/write command sent by the control device is a Write Command
  • the data to be transmitted by the first data read/write command is required.
  • the Write Command carries an SGL, and the SGL includes a field, for example, an entry, where the field includes a source address of the data to be stored in the control device, a length of the data to be stored, And information such as a destination address of the data to be stored in the storage device.
  • the data processing unit caches the data to be stored in the storage space of the cache unit corresponding to the first command queue according to the source address of the data that needs to be stored in the SGL in the Write Command.
  • the data processing unit may receive the data to be stored by using a network card in the control device in an RDMA manner.
  • the data processing unit modifies the Write Command, and the data to be stored carried by the Write Command is in the
  • the source address in the control device is modified to store the address of the data to be stored in the cache unit, and the modified Write Command is sent to the controller of the destination hard disk. That is, the SGL carried by the Write Command sent by the data processing unit to the controller of the destination hard disk includes an address in the cache unit that stores the data to be stored, a length of the data to be stored, and the required storage. Information such as the destination address of the data in the storage device.
  • the data processing unit After determining the destination hard disk, the data processing unit sends the modified Write Command to the controller of the destination hard disk.
  • the controller of the destination hard disk reads from the cache unit according to the address of the data to be stored carried in the received Write Command in the cache unit. Taking the data that needs to be stored, for example, reading the data to be stored in an RDMA or DMA manner. And the read data that needs to be stored is written into a storage space corresponding to the destination hard disk.
  • the first data read/write command sent by the control device is a Read Command
  • the data to be transmitted by the first data read/write command is data to be read.
  • the Read Command carries an SGL, where the SGL includes the source address of the data to be read in the storage device, the length of the data to be read, and the data to be read to be written. Information such as the destination address in the control device.
  • the data processing unit modifies the Read Command, and modifies the destination address of the data to be read carried in the Read Command in the control device to the first
  • the address of the data to be read is cached in the storage space of the cache unit corresponding to the command queue, and the modified Read Command is sent to the controller of the destination hard disk. That is, the SGL carried by the Read Command sent by the data processing unit to the destination hard disk controller includes a source address of the data to be read in the storage device, a length of the data to be read, and the Information such as an address of the data to be read is cached in a storage space of the cache unit corresponding to the first command queue.
  • the controller of the destination hard disk migrates the data to be read to the storage space of the cache unit corresponding to the first command queue according to the modified Read Command received.
  • the controller of the destination hard disk migrates the data to be read to the storage space of the cache unit corresponding to the first command queue by means of RDMA.
  • the data processing unit After the data to be read is cached in the storage space of the cache unit corresponding to the first command queue, the data processing unit writes the data that needs to be read according to the Read Command to the control device.
  • the destination address in the cache, and the cached data to be read is sent to the control device.
  • the data processing unit sends the cached data that needs to be read to the control device by means of RDMA.
  • the data processing unit and the storage unit are connected by an architecture based on NVMe, NVMe over PCIe, which is based on a fast peripheral component interconnect standard PCIe.
  • the data processing unit includes a controller, and the controller is configured to control transmission between the cached data in the cache unit and the storage unit, where the controller is an NVMe over Fabric architecture.
  • the controller is an NVMe over Fabric architecture.
  • FIG. 1 is a schematic diagram of an implementation manner of an NVMe over Fabric architecture in the prior art
  • FIG. 2 is a schematic structural diagram of a system for implementing data read/write command control in an NVMe over Fabric architecture according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for controlling data read/write commands in an NVMe over Fabric architecture according to an embodiment of the present disclosure
  • 4A is a schematic flowchart of a method for controlling data read/write commands between a control device and a storage device in an NVMe over Fabric architecture according to an embodiment of the present invention
  • 4B is a schematic flowchart of a method for controlling data read/write commands between a control device and a storage device in another NVMe over Fabric architecture according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a storage device 500 according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a control device 600 according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a system for implementing data read/write command control according to an embodiment of the present invention.
  • first and second in the embodiments of the present invention are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” and “second” may include one or more of the features either explicitly or implicitly.
  • FIG. 1 is a schematic diagram of an implementation manner of an NVMe over Fabric architecture in the prior art.
  • Figure 1 includes Host 100, Target 200, and Target 210.
  • Host 100 is the host, which is mainly responsible for initiating the reading and writing of data, such as reading and writing commands for sending data.
  • Target 200 and Target 210 are target storage devices.
  • NVM Subsystem in the NVMe protocol they are mainly responsible for receiving and executing read and write commands sent by host Host 100.
  • the specific configuration of the host Host 100 includes, but is not limited to, a physical server or a virtual machine on a physical server, and the physical server may be a computer device including components such as a CPU, a memory, and a network card.
  • the Target 200 can be a separate physical hard disk system. As shown in FIG. 1 , the Target 200 includes a network card 201 and one or more hard disks, and the network card 201 and one or more hard disks are respectively connected. It should be noted that, in FIG. 1 , three hard disks are taken as an example for description. In specific implementation, the Target 200 may include more than one hard disk.
  • the hard disk in the Target 200 can be a storage medium with a storage function such as a solid state disk (English: SSD, Solid State Disk) or a hard disk drive (English: HDD, Hard Disk Driver).
  • the network card 201 has the function of a network interface card, and may be a remote network interface card in the NVMe over Fabric (English: RNIC, Remote Network Interface Card). The network card 201 performs data read/write commands or data transmission with the Host 100 through the fabric network. Related communications.
  • the Target210 is similar in structure to the Target 200, including the network card 211 and more than one hard disk.
  • the functions and implementations of the components (network card 211, hard disk, etc.) in Target 210 are similar to the functions and implementations of the components (network card 201 and hard disk) in Target 200. In the specific implementation, there may be multiple Targets.
  • FIG. 1 only shows two Targets (Target 200 and Target 210) as an example.
  • commands sent by Host 100 to Target 200 may be sent multiple times in a time period.
  • Host 100 sends these commands in the form of a queue.
  • the commands that are first written to the queue are executed first.
  • Host 100 may have multiple CPUs, or multiple threads in one CPU. Multiple CPUs or multiple threads can process multiple commands in parallel. Therefore, in the NVMe over Fabric architecture, there can be multiple queues for parallel processing of commands.
  • the network card 201 includes multiple queues, and each queue is composed of commands sent by the Host 100.
  • each queue is composed of commands sent by the Host 100.
  • the data to be read and written by the commands in one queue occupies most of the storage space of the network card memory of the network card 201, and the commands in other queues may not be enough to apply for the network card memory.
  • the storage space is used to cache data that needs to be read or written, and the command cannot be executed in time.
  • the command that cannot be executed in time needs to wait for the other storage space of the network card memory of the network card 201 to be released, so that the available storage space can be applied again.
  • Such an operation would bring a lot of complexity to the design and implementation of the network card 201.
  • these complexities include at least one of the following:
  • the embodiment of the present invention provides a method, a device, and a system for controlling data read and write commands in an NVMe over Fabric architecture.
  • a method for data transmission in the NVMe over Fabric is provided for the embodiment of the present invention.
  • the embodiment of the present invention is described by taking a host Host and a Target connection and implementing data transmission as an example. For Host and more For the case where the Target connects and implements data transfer, it can be implemented by referring to the case where the Host is connected to a Target, and will not be described again.
  • the network card an independent field programmable gate array (English: FPGA), or a central processing unit in the Target (English: CPU, Central processing unit) to receive data read and write commands sent by the host as a control device.
  • a network card, an FPGA chip, or a CPU of a data read/write command sent by a control device in a storage device is collectively referred to as a data processing unit.
  • the data processing unit in the embodiment of the present invention may also be a unit or an entity having the same function as the network card, the FPGA chip, or the CPU, as long as the data read/write command sent by the host as the control device can be received and processed.
  • a data processing unit in a storage device as an embodiment of the present invention.
  • the network card memory is used to cache data to be transmitted by the data read/write command received by the network card.
  • the FPGA is used as a data processing unit in the storage device
  • the storage unit in the FPGA is used to buffer data to be transmitted by the data read/write command received by the FPGA.
  • the CPU in the storage device is used as the data processing unit in the storage device
  • the memory of the CPU is used to cache the data to be transmitted by the data read/write command received by the CPU, that is, to cache the data by sharing the memory of the CPU.
  • the cache unit on the Target such as a cache device with DDR as the cache, can also be used as a cache for the network card, FPGA or CPU.
  • the network card memory, the storage unit in the FPGA chip, the memory of the CPU, or the cache unit on the Target are collectively referred to as a cache unit.
  • the cache unit in the embodiment of the present invention may also be other storage media having the same function as the memory of the network card, the storage unit in the FPGA chip, or the memory of the CPU, as long as it can be used to cache the data sent by the Host as the control device.
  • the data to be transmitted by the read/write command can be used as a cache unit in the storage device of the embodiment of the present invention.
  • the NIC memory, the storage unit in the FPGA chip, the memory of the CPU, or the cache unit on the Target may also form a cache resource pool, and may be received by one or more of the network card, the FPGA chip, or the CPU during implementation.
  • the data read and write commands sent by the Host, and the data to be transmitted is slowed down. There is this cache resource pool.
  • a network card as a data processing unit in a storage device
  • a network card memory as a cache unit in a storage device
  • a Target as a storage device
  • a Host as a control device.
  • the implementation of the network card as a data processing unit can be implemented.
  • the implementation of the memory unit in the FPGA chip, the cache unit on the Target, or the memory of the CPU as the implementation of the cache unit or the resource pool of the memory pool the implementation of the NIC may be implemented as a cache unit, and details are not described herein.
  • the Host300 and the Target400 are connected through a fabric network.
  • the Host300 and the Target400 can be connected and communicated through a network such as iWarp, ROCE, Infiniband, FC, or Omni-Path.
  • the Host 300 includes hardware components such as a CPU 301, a memory 302, and a network card 303.
  • the Target 400 includes a network card 401 and one or more hard disks.
  • the Host 300 is a host and is mainly responsible for initiating reading and writing of data, such as reading and writing commands for sending data to the Target 400.
  • the specific configuration of the host Host 300 includes, but is not limited to, a virtual server on a physical server or a physical server, and the physical server may be a computer device including components such as a CPU, a memory, and a network card.
  • the host Host 300 is a virtual machine on the physical server
  • the foregoing Host 300 includes a CPU 301, a memory 302, and a network card 303
  • the hardware component refers to a CPU that the physical server allocates to the virtual machine. Resources such as memory and network cards.
  • the network card 401 in the Target 400 may also be a virtual network card, which is a network card resource allocated by the physical network card in the Target 400 to the virtual network card.
  • Target 400 is the target storage device. It is also called NVM Subsystem in NVMe. It is mainly responsible for receiving and executing read and write commands sent by host Host 300.
  • the hard disk in Target 400 can be SSD Or a medium having a storage function such as an HDD. In FIG. 2, three hard disks are taken as an example for description.
  • the network card 401 includes a network card processor 4011 and a network card memory 4012.
  • the network card 401 has the function of a network interface card, and may be an RNIC in the NVMe over Fabric.
  • the network card 401 communicates with the host 300 through data in the NVMe over Fabric architecture with respect to data read/write commands or data transmission.
  • the network card memory 4012 is located in the network card 401, that is, the network card 401 includes the network card memory 4012 as an example for description.
  • the network card memory 4012 may also be located outside the network card 401. That is, the network card memory in the Target 400 may be a storage medium independent of the network card 401. In the embodiment of the present invention, the storage medium independent of the network card 401 may be a storage medium such as a DDR.
  • the network card memory 4012 of the network card 401 may also be a memory resource pool formed by the memory resources of multiple network cards in the Target 400.
  • the embodiment of the invention does not limit the specific presentation form of the network card memory. For other implementations of the network card memory, reference may be made to the implementation manner in which the network card memory 4012 is located in the network card 401, and details are not described herein.
  • the method for controlling data read/write commands in the NVMe over Fabric divides the network card memory 4012 of the network card 401 in the Target 400 into different storage spaces, and establishes a command in each storage space and the network card 401.
  • the data to be transmitted by the commands in each command queue can only be cached in the storage space of the corresponding NIC memory. In this way, the problem that the commands in other queues cannot be executed in time due to the execution of a command in a certain queue to occupy a large amount of storage space of the network card memory can be avoided.
  • a method for controlling data read/write commands in an NVMe over Fabric architecture includes:
  • Step 300 The Host 300 sends a control command to the network card 401, where the control command includes information that divides the storage space of the network card memory into two or more storage spaces.
  • the Host 300 can send control commands to the Target 400 through the network to implement control of the Target 400.
  • the control command sent by the Host 300 to the Target 400 includes an indication that the storage space of the NIC memory 4012 is divided into two or more storage spaces. interest.
  • the corresponding relationship between the storage space and the command queue in the NIC memory 4012 may be a storage space corresponding to one command queue, or a storage space composed of two or more command queues. It can be understood that the two or more storage spaces that are divided are independent of each other, and any two storage spaces do not affect each other.
  • the command queue may be a command queue of a command sent by the host, and the commands sent by different hosts correspond to different queues; or may be command queues of commands sent by different CPUs or different threads, and commands sent by different CPUs. Corresponding to different queues, or commands sent by different threads correspond to different queues.
  • Step 302 The network card 401 divides the storage space of the NIC memory 4012 into two or more storage spaces according to the control command, and establishes a correspondence between the two storage spaces and the command queue; Receiving, by the first data read/write command sent by the host 300, buffering, according to the correspondence between the two or more storage spaces and the command queue, data to be transmitted by the first data read/write command to the The command space in which the first data read/write command is located is in the storage space of the network card memory 4012.
  • different command queues correspond to storage spaces of different network card memories.
  • the data to be transmitted by the data read/write command sent by the host 300 is cached in the storage space corresponding to the command queue where the data read/write command is located, and the storage space of the network card memory corresponding to the different command queues does not interfere with each other. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of storage space of the network card memory, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the network card memory. .
  • the correspondence between the two or more storage spaces and the command queues established by the network card 401 may be the correspondence between the command queues carried by the network card 401 and the two or more storage spaces according to the commands sent by the host 300.
  • the information establishes a correspondence between the two or more storage spaces and the command queue.
  • the network card 401 establishes the two or more storage spaces and command queues.
  • the correspondence between the two or more storage spaces and the command queue may be established by the network card 401 according to the two or more storage spaces that are divided. Specifically, the network card 401 can establish a correspondence between the two or more storage spaces and the command queue according to a preset template of the corresponding relationship.
  • a queue with a higher priority in the template has a larger storage space, a larger storage space set in a queue template with a higher traffic requirement, and the like.
  • the network card 401 can establish a correspondence between the two or more storage spaces and the command queue on the basis of a preset template according to a specific service implementation scenario.
  • the Host 300 divides the storage space of the NIC memory 4012 into multiple storage spaces, and may be implemented in multiple manners, including but not limited to:
  • the Host 300 divides the storage space of the NIC memory 4012 according to the size of the storage space.
  • the size of each storage space may be the same or different.
  • the total storage space of the network card memory 4012 is 100 GB.
  • the size of the storage space divided in the network card memory 4012 may be 10 GB; or the storage space corresponding to the eight command queues may be It is 10G, and the storage space corresponding to the other two command queues is 15G and 5G respectively.
  • the Host 300 allocates different storage spaces according to the Quality of Service (QOS) of different command queues in the controller 402. That is, the Host 300 has a high command queue for QOS, and the allocated storage space is large. For the command queue with low QOS requirements, the allocated storage space is small. For example, the total storage space of the NIC memory 4012 is 100 GB. In the case where the command queue is 10, the QOS requirement of the command queue 1 is higher than the QOS requirement of the command queue 2, and the size of the storage space allocated by the Host 300 to the command queue 1 For 15GB, the storage space allocated to Command Queue 2 is 5GB.
  • QOS Quality of Service
  • Mode 3 Host 300 allocates different storage spaces according to the priority of different command queues. That is, the Host 300 has a large storage space for the command queue with high priority, and the allocated storage space is small for the command queue with low priority.
  • the total storage space of the NIC memory 4012 is 100 GB, in the case where the command queue in the controller 402 is 10, the priority of the command queue 1 is higher than the priority of the queue 2, and the storage space allocated by the Host 300 to the command queue 1 is 15 GB, which is assigned to the command.
  • the storage space of queue 2 is 5GB.
  • the Host 300 can also bind multiple command queues into a queue group.
  • the storage space in the NIC memory 4012 corresponding to the command queue group is corresponding to each command queue in the queue group.
  • the sum of storage space is a part of the configuration of the storage space in the NIC memory 4012.
  • data to be transmitted by the data read/write commands executed in the partial command queue may occur, occupying more storage space, and the storage space occupied by the data read/write commands executed in the partial command queue less.
  • the storage space corresponding to the command queue 1 is 10 GB
  • the storage space corresponding to the command queue 2 is 12 GB.
  • the data to be cached by executing the data read/write command in the command queue 1 occupies 8 GB of storage space, that is, 80% of the storage space of the NIC memory 4021 corresponding to the command queue 1 is occupied; and the data to be cached by the data read/write command in the command queue 2 occupies 4.8 GB of storage space, that is, only the NIC corresponding to the command queue 2 is occupied. 40% of the memory of the memory 4021.
  • the Host 300 can temporarily adjust, reduce the storage space of the NIC memory 4012 corresponding to the command queue 2, and expand the storage space of the NIC memory 4012 corresponding to the command queue 1.
  • a part of the storage space in the storage space of the NIC memory 4012 corresponding to the queue 2 is allocated to the command queue 1.
  • the Host 300 can adjust the storage space allocated to the NIC memory corresponding to different command queues in a timely manner to more flexibly meet the needs of actual services.
  • the CPU in the Host 300 obtains the occupation ratio of the storage space of the NIC memory 4012 corresponding to the first command queue, and the occupation ratio of the storage space of the NIC memory 4012 corresponding to the second command queue. .
  • a preset first threshold for example, 80%
  • the second command queue When the occupation ratio of the storage space of the corresponding network card memory 4012 is less than a preset second threshold (for example, 30%), the CPU in the host 300 sends a command to the network card 401 to control the network card 401 to increase the first command queue.
  • the storage space of the NIC memory corresponding to the second command queue to be reduced is allocated to the storage space of the NIC memory corresponding to the first command queue.
  • the network card 401 can also obtain the storage ratio of the storage space corresponding to the different command queues, and adjust the storage space corresponding to the different command queues according to the obtained occupation ratio. No longer.
  • the manner in which the Host 300 increases the storage space of the NIC memory 4012 corresponding to the first command queue may be increased according to a fixed ratio, for example, increasing the storage space of the corresponding NIC memory 4012 in a fixed time. %, the capacity of the storage space is increased in three times. It is also possible to increase the preset ratio at a time according to a preset ratio, for example, increasing the storage space of the corresponding network card memory 4012 by 30% at a time.
  • the network card 401 suspends buffering data into the NIC memory 4012.
  • the network card memory is used as a cache unit, and the network card is described as a data processing unit.
  • the NIC memory and the CPU's memory are used together as a cache unit.
  • the command queue of the corresponding part of the NIC memory may be used, and the memory of the CPU corresponds to the command queue of another part.
  • the storage space of the NIC memory may be corresponding to a command queue with a high priority or a high QOS requirement, and a memory space of a CPU in the storage device may be used. Assign a command queue with a lower priority or a lower QOS requirement. Because the cache memory is fast and efficient when the NIC memory is used as the cache unit, the high-priority command queue is allocated to the storage space of the NIC memory to meet the service requirements of high-priority commands.
  • the storage unit in the FPGA chip may also correspond to a command queue with a high priority or a high QOS requirement.
  • the storage space allocation of the memory of the CPU in the storage device corresponds to a command queue with a low priority or a low QOS requirement.
  • any one of the plurality of command queues when the network card 401 caches the data to be transmitted by the data read/write command in the command queue, the storage space of the network card memory 4012 corresponding to the command queue may be insufficient.
  • the problem occurs when the data read/write command in the command queue fails to apply for sufficient storage space of the network card memory 4012, resulting in a complicated processing mechanism that cannot be performed in time.
  • the embodiment of the present invention further provides three possible implementation manners to solve any one of the multiple command queues.
  • the storage space of the NIC memory corresponding to the queue is insufficient, resulting in a complicated processing mechanism caused by the command in the command queue being unable to be executed.
  • the following three implementations are described as equivalent implementations.
  • the descriptions of the first, second, and third descriptions in the embodiments of the present invention are only for clearly describing the three implementation manners, and do not represent the order of advantages and disadvantages of the three implementation manners.
  • the first command queue described in the embodiment of the present invention is any command queue in the command queue.
  • the first data read/write command described in the embodiment of the present invention is also any first data read/write command.
  • the method provided by the embodiment of the present invention further includes:
  • Step 304 The Host 300 sends a request for acquiring the available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue to the network card 401 before sending the first data read/write command in the first command queue.
  • the storage space available in the storage space of the NIC memory 4012 corresponding to the first command queue The space is the storage space that is not occupied by the storage space of the network card memory 4012 corresponding to the first command queue when the network card 401 receives the request sent by the host 300.
  • the request of the host 300 to the network card 401 to obtain the available storage space in the storage space of the network card memory 4012 corresponding to the first command queue may be implemented by sending a request message.
  • the request message carries a request for the network card 401 to return the available storage space in the storage space of the network card memory 4012 corresponding to the first command queue.
  • the request message may be a request message, and the request message includes a field for obtaining available storage space of the network card memory 4012.
  • the embodiment of the present invention does not limit the form of the request message, nor does it limit the form of information carried in the request message indicating that the network card 401 returns the available storage space of the network card memory 4012.
  • the Host 300 can also obtain information about the available storage space of the NIC memory 4012 by reading a register that records the available storage space information of the NIC memory 4012.
  • Step 306A The Host 300 receives the information about the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue returned by the network card 401.
  • the NIC 401 stores the NIC memory 4012 corresponding to the first command queue.
  • the information of the available storage space in the space is carried in the response message and returned to the Host 300.
  • the host 300 obtains, from the response message, information about available storage space in the storage space of the network card memory 4012 corresponding to the first command queue.
  • the available storage space of the network card memory 4012 returned by the network card 401 is the available storage space of the network card memory 4012 when the network card 401 receives the request sent by the host 300. Therefore, the network card 401 returns the available storage space of the NIC memory 4012 of the Host 300, which is also the real-time available storage space of the NIC memory 4012.
  • Step 308A The host 300 determines, according to the information about the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue, whether the storage space occupied by the data to be transmitted by the first data read/write command is less than or equal to The NIC memory corresponding to the first command queue 4012 of available storage space in the storage space;
  • the size of the available storage space of the network card memory 4012 is 100 MB, and the storage space occupied by the data to be stored is 50 MB, and whether the storage space occupied by the data to be stored is determined to be less than or equal to the network card.
  • the memory 4012 has available storage space, it can be realized by judging that 50MB is less than 100MB.
  • the length of the available storage space of the network card memory 4012 is 50 data blocks, and the storage space occupied by the data to be stored is 60 data blocks, and whether the storage space occupied by the data to be stored is determined is When less than or equal to the available storage space of the network card memory 4012, it can be realized by judging that 60 data blocks are larger than 50 data blocks.
  • Step 310A When the storage space occupied by the data to be transmitted by the first data read/write command is less than or equal to the available storage space in the storage space of the network card memory 4012 corresponding to the first command queue, the Host 300 sends The first data read and write command;
  • Step 312A When the storage space occupied by the data to be transmitted by the first data read/write command is larger than the available storage space in the storage space of the network card memory 4012 corresponding to the first command queue, the Host 300 pauses to send the storage space. The first data read and write command is described.
  • Step 312A in the figure is located below step 310A, just for the clear setting of the drawing, and does not represent the sequential execution sequence of step 312A and step 310A.
  • step step 312A and step 310A are parallel implementation steps. .
  • the host 300 sends the first data read/write command in the first command queue when the storage space of the NIC memory 4012 corresponding to the first command queue can buffer the data to be transmitted.
  • the problem that the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue is insufficient is insufficient, and the processing mechanism caused by the command in the first command queue is complicated.
  • the host may resend the request for obtaining the available storage space in the storage space of the network card memory 4012 corresponding to the first command queue.
  • the first data read and write command is to be transmitted
  • the storage space used is smaller than the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue, the first data read/write command is sent.
  • the preset time for the Host 300 to suspend sending the first data read/write command may be a system default time or a pre-configured time. During the time range set by the preset time, the Host 300 does not perform step 304A. Specifically, the preset time for the Host 300 to pause the sending of the first data read/write command may be set by setting a timer in the Host 300, and the Host 300 starts after the timer is set. Steps 304A-312A are performed. It can be understood that the preset time for the Host 300 to pause the sending of the first data read/write command can be flexibly set according to a specific service situation.
  • a second possible implementation manner provided by the embodiment of the present invention is to further optimize the first possible implementation manner.
  • the Host 300 does not need to send the information about the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue before sending the data read/write command in the first command queue. Request.
  • the process of step 304A-step 312A is started only when the backpressure packet sent by the network card 401 cannot be buffered and the data read/write command in the first command queue cannot be buffered.
  • the Host 300 may not need to send a command before each time. Both send requests to get available storage.
  • the process of the above steps 304A-312A is started only when the backpressure packet sent by the network card 401 is received. In this way, not only the technical problems in the prior art can be effectively solved, but also the efficiency when the Host 300 sends a command can be further improved. Saves the resource occupation caused by the Host 300 sending a request to obtain the available storage space of the network card memory 4012. Similarly, since the network card 401 does not need to return the information of the available storage space every time the request command sent by the Host 300 is received, the resource occupation of the network card 401 is saved accordingly.
  • the host 300 further sends a second data read/write command in the first command queue, and the host 300 does not need to obtain the first command queue before sending the second data read/write command.
  • the backpressure packet indicates the available storage space of the storage space in the NIC memory 4012 corresponding to the first command queue, and cannot cache the data to be transmitted by the second data command sent by the Host 300.
  • the Host 300 After receiving the backpressure packet sent by the network card 401, the Host 300 obtains the first command queue corresponding to the other data read/write commands (for example, the first data read/write command) in the first command queue.
  • the storage space information in the storage space of the NIC memory 4012, and the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue, and the other data in the first command queue can be cached.
  • the other data read and write commands eg, the first data read and write command
  • Step 304A - The flow of step 312A. That is, after the process of performing the steps 304A-312A reaches the preset time, the Host 300 directly sends the data read/write command to the network card 401 when the data read/write command in the first command queue needs to be sent.
  • the preset time of the process of the step 304A to the step 312A may be set according to specific needs, and may be the default time of the system, or may be based on the set time previously issued by the administrator.
  • the preset time of the process of step 304A-step 312A may be changed in real time according to the actual service situation, for example, the network card corresponding to the first command queue. If the storage space of the memory 4012 is high, the preset time of the process of the step 304A-step 312A is long; if the storage space of the NIC memory 4012 corresponding to the first command queue is low, the execution is performed.
  • the preset time of the process of step 304A-step 312A is short.
  • the backpressure message sent by the network card 401 to the Host 300 may be a directly generated message or a message, or may be a message or a message carried by the response message.
  • the message or the message directly generated when the storage space of the NIC memory 4012 corresponding to the first command queue is insufficient may be the command response message returned by the network card 401 to the Host 300, and the first message is carried in the response message.
  • the other type of the message or the packet as long as the available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue is insufficient, and the data read/write command in the first command queue sent by the Host 300 cannot be cached.
  • the data message to be transmitted can be used as a back pressure message sent by the network card 401 to the Host 300.
  • the information about the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue that is carried in the backpressure message is insufficient to store the data that the Host 300 needs to store, and may be an error code or an advance. Set the logo, etc.
  • the Host 300 further includes retransmitting the second data read/write command in the process of performing the steps 304A-312A. That is, for the second data read/write command that cannot be executed in time due to insufficient storage space of the NIC memory 4012 corresponding to the first command queue, the Host 300 determines the storage of the NIC memory 4012 corresponding to the first command queue. When the space is larger than the data to be transmitted by the second data read/write command, the Host 300 resends the second data read/write command.
  • the Host 300 acquires and records the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue. Each time the host 300 sends the first data read/write command in the first command queue, it is first determined whether the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the recorded first command queue is greater than or equal to First data The storage space occupied by the data to be transmitted by the read and write commands. When the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the recorded first command queue is greater than or equal to the storage space occupied by the data to be transmitted by the first data read/write command, the Host 300 sends the first A data read and write command.
  • the Host 300 pauses to send the first Data read and write commands.
  • the host 300 sends the first data read/write command, which is sent when the storage space of the NIC memory 4012 corresponding to the first command queue can buffer the data to be transmitted by the first data read/write command. Therefore, the problem of complicated processing mechanism caused by the command in the first command queue being cached due to insufficient storage space of the NIC memory 4021 corresponding to the first command queue can be avoided.
  • the method provided by the embodiment of the present invention further includes:
  • Step 304B The host 300 acquires and records the real-time available storage space of the storage space of the network card memory 4012 corresponding to the first command queue.
  • the Host 300 can record the available storage space of the obtained NIC memory 4012 locally, and record the available storage space of the acquired NIC memory 4012 in the Host 300.
  • the Host 300 can obtain the available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue when the Target 400 is powered on, as the storage space of the NIC memory 4012 corresponding to the first command queue.
  • Real-time available storage space The available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue is the NIC corresponding to the first command queue, because the NIC memory 4012 has not yet been cached.
  • the Host 300 may obtain the available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue at any time after the target 400 is powered on, as the NIC corresponding to the first command queue.
  • the available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue obtained at this time is smaller than the total storage space of the storage space of the NIC memory 4012 corresponding to the first command queue.
  • Step 306B Before sending the first data read/write command in the first command queue, the host 300 obtains the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue recorded locally, and determines the Whether the storage space occupied by the data to be transmitted by the first data read/write command is less than or equal to the real-time available storage space of the storage space of the network card memory 4012 corresponding to the first command queue recorded locally;
  • the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue recorded locally is the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue recorded by the Host 300.
  • the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue recorded by the Host 300 may be a size of a space in which data can be stored.
  • the storage space occupied by the data to be transmitted by the first data read/write command may be the size of the storage space occupied by the data to be transmitted by the first data read/write command.
  • the NIC of the first command queue corresponding to the record may also be expressed in other forms.
  • the real-time available storage space of the storage space of the storage unit 4012 for example, the number of data blocks that can be written by the storage space of the network card memory 4012 corresponding to the first command queue.
  • the storage space occupied by the data to be transmitted by the first data read/write command is the number of data blocks to be transmitted by the first data read/write command.
  • Step 308B When the storage space occupied by the data to be transmitted by the first data read/write command is less than or equal to the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue recorded locally, Host 300 Sending the first data read/write command to the Target 400, and subtracting the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the recorded first command queue, and subtracting the data to be transmitted by the first data read/write command Obtaining a storage space, and obtaining a real-time available storage space of the storage space of the network card memory 4012 corresponding to the first command queue of the updated local record;
  • the data to be transmitted by the first data read/write command is cached in the storage space of the network card memory 4012 corresponding to the first command queue. Therefore, the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue of the local record needs to be subtracted from the storage space occupied by the data to be transmitted by the first data read/write command, so as to be correctly recorded.
  • the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue is the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue.
  • Step 310B When the storage space occupied by the data to be transmitted by the first data read/write command is larger than the real-time available storage space of the storage space of the network card memory 4012 corresponding to the first command queue, the Host 300 pauses to send. The first data read and write command.
  • the Host 300 can store the storage space of the NIC memory 4012 corresponding to the first command queue.
  • the first data read/write command in the first command queue is sent when the data to be transmitted is buffered.
  • the available storage space in the storage space of the NIC memory 4012 corresponding to the first command queue is insufficient, and the processing mechanism caused by the command in the first command queue is complicated.
  • Step 310B in the drawing is located below step 308B, but for the sake of clarity of the drawing, it does not mean that step 310B and step 308B have sequential execution order. In the embodiment of the present invention, step 310B and step 308B are parallel implementation steps.
  • the implementation manner provided by the embodiment of the present invention further includes:
  • Step 312B when the data to be transmitted by the first data read/write command cached in the network card memory 4012 is migrated to the destination address, the network card 401 sends a response message to complete the migration to the Host 300;
  • the data to be transmitted by the first data read/write command is migrated to the destination address, and is different because the first data read/write command is a write command or a read command.
  • the first data read/write command is a write command
  • the data to be transmitted by the first data read/write command is migrated to the hard disk of the Target 400
  • the first data read/write command is a read command
  • the data to be transmitted by the first data read/write command is migrated to the Host 300.
  • Step 314B the host 300 adds the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue locally recorded according to the received response message, and adds the first data.
  • the data to be transmitted by the first data read/write command has been migrated from the network card memory 4012 because the host 300 receives the response message for completing the migration of the data to be transmitted by the first data read/write command.
  • the storage space of the NIC memory 4012 corresponding to the first command queue will increase the corresponding available storage space, that is, the storage space occupied by the data to be transmitted by the first data read/write command. Therefore, the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue locally recorded by the Host 300 is added to the first data read/write command.
  • the storage space occupied by the data to be transmitted can correctly record the real-time available storage space of the storage space of the network card memory 4012 corresponding to the first command queue.
  • the Host 300 pauses to send the first data read/write command, and may re-execute step 306B after waiting for a preset time.
  • the preset time that the Host 300 waits for may be the default preset time or a preset time set based on the needs of a specific service.
  • the Host 300 performs the step 306B again, that is, whether the storage space occupied by the data to be transmitted by the first data read/write command is determined to be less than or equal to the network card corresponding to the first command queue recorded locally.
  • the storage space of the network card memory 4012 corresponding to the first command queue is smaller than the data to be transmitted by the first data read/write command.
  • the Host 300 records the location of the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue. For example, it may be recorded in a special storage space in the Host 300, for example, a dedicated chip is used to store the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue. It may also be stored in a storage component existing in the Host 300, such as in the cache of the CPU 301, in the memory 302 or in the cache of the network card 303, or in a storage space in the FPGA chip.
  • the Host 300 records the real-time available storage space of the storage space of the NIC memory 4012 corresponding to the first command queue, and may also have multiple implementation manners, for example, recording in the form of a table, or a Formal records of variables and so on.
  • the embodiment of the present invention does not limit the real-time available storage of the storage space of the network card memory 4012 corresponding to the first command queue. The specific form of space.
  • the data to be transmitted by the first data read/write command is cached in the first A command queue corresponds to the storage space of the network card memory 4012. Moreover, the data to be transmitted by the first data read/write command cached in the network card memory 4012 is migrated to the storage space corresponding to the destination address.
  • the following describes the manner in which the first data read/write command is a write command and a read command, and details the manner in which the network card 401 buffers the data to be transmitted by the first data read/write command, and the manner in which the cached data is migrated.
  • the first data read/write command is a write command.
  • the first data read/write command sent by the host 300 is a Write Command
  • the data to be transmitted by the first data read/write command is data to be stored.
  • the Write Command carries an SGL, and the SGL includes a field, for example, an entry, the field includes a source address of the data to be stored in the Host 300, a length of the data to be stored, and a Information such as the destination address of the data to be stored in the Target 400.
  • the SGL may also include multiple fields, such as multiple entries, each entry containing the source address of the data to be stored in the Host 100, the length of the data to be stored, and the data to be stored. Information such as the destination address in Target 200.
  • the data to be stored includes multiple address segments, that is, the data to be stored is discontinuous in the Host 100, and exists in multiple address segments, multiple entries are required to record multiple address segments.
  • the data The embodiment of the present invention is described by taking an entry in the SGL as an example.
  • the network card 401 is based on the data stored in the SGL carried in the Write Command.
  • the source address in the 300 caches the data to be stored in the storage space of the network card memory 4012 corresponding to the first command queue.
  • the network card 401 can receive the data to be stored through the network card 303 in an RDMA manner.
  • the network card 401 modifies the Write Command, and the data to be stored carried by the Write Command is in the Host 300.
  • the source address is modified to store the address of the data to be stored in the NIC memory 4012 corresponding to the first command queue, and the modified Write Command is sent to the controller of the destination hard disk. That is, the SGL carried by the Write Command of the controller that is sent by the network card 401 to the controller of the destination hard disk includes the address of the data to be stored in the network card memory 4012 corresponding to the first command queue, the length of the data to be stored, and The information such as the destination address of the data to be stored in the Target 400.
  • the destination hard disk is determined by the network card 401 according to the destination address in the Target 400 according to the data that needs to be stored in the Write Command.
  • the network card 401 can determine, according to the destination address in the target 400, the data of the data to be stored in the target 400, and the destination address of the data to be stored in the Target 400.
  • the hard disk is determined to be the destination hard disk.
  • each hard disk corresponds to an address segment
  • the network card 401 determines the address segment where the destination address is located according to the destination address in the Target 400 according to the data to be stored in the SGL of the Write Command, and the address The hard disk corresponding to the segment is the destination hard disk.
  • the network card 401 After determining the destination hard disk, the network card 401 sends the modified Write Command to the controller of the destination hard disk.
  • the controller of the destination hard disk reads the data to be stored from the network card memory 4012 according to the address of the data to be stored carried in the received Write Command in the network card memory 4012, for example, by RDMA or direct memory.
  • Access (English: DMA, Direct Memory Access) reads the data that needs to be stored. And the read data that needs to be stored is written into a storage space corresponding to the destination hard disk.
  • connection between the network card 401 and the hard disk in the Target 400 may be implemented based on the NVMe over PCIe architecture. Therefore, the controller of the target hard disk in the Target 400 and the network card 401 can realize data transmission or migration through the connection and communication in the NVMe over PCIe architecture.
  • the first data read and write command is a read command
  • the first data read/write command sent by the host 300 is a Read Command
  • the data to be transmitted by the first data read/write command is data to be read.
  • the Read Command carries an SGL, where the SGL includes the source address of the data to be read in the Target 400, the length of the data to be read, and the data to be read to be written to the Host. Information such as the destination address in 300.
  • the network card 401 modifies the Read Command, and modifies the destination address of the data to be read carried in the Read Command in the Host 300 to the network card corresponding to the first command queue.
  • the address of the data to be read is buffered in the storage space of the memory 4012, and the modified Read Command is sent to the controller of the destination hard disk. That is, the SGL carried by the Read Command sent by the network card 401 to the destination hard disk controller includes the source address of the data to be read in the Target 400, the length of the data to be read, and the first command queue.
  • the storage space of the NIC memory 4012 caches information such as the address of the data to be read.
  • the controller of the destination hard disk migrates the data to be read to the storage space of the network card memory 4012 corresponding to the first command queue according to the modified Read Command received.
  • the controller of the destination hard disk migrates the data to be read to the storage space of the network card memory 4012 corresponding to the first command queue by means of RDMA.
  • the network card 401 After the data to be read is cached in the storage space of the network card memory 4012 corresponding to the first command queue, the network card 401 writes the data that needs to be read according to the Read Command to the destination address in the Host 300. And sending the cached data to be read to the Host 300.
  • the network card 401 sends the cached data that needs to be read to the Host 300 by means of RDMA.
  • the network card 401 and the hard disk in the Target 400 are connected according to an NVMe over PCIe architecture.
  • the data to be read is cached in the storage space of the network card memory 4012 corresponding to the first command queue by means of connection and communication in the NVMe over PCIe architecture. in.
  • the destination hard disk is determined by the network card 401 according to the source address of the data that needs to be read in the Read Command according to the Read Command.
  • the network card 401 can determine, according to the source address of the data to be read in the target 400, which of the targets 400 the data to be read is in the target 400, and the data to be read is in the Target 400.
  • the hard disk where the source address is located is determined as the destination hard disk.
  • the modification of the Write Command or the Read Command may be implemented by a control module in the network card 401.
  • the control module can be implemented by a physical chip (such as a processor such as an ARM, X86 or Power PC), a software module running on a physical chip, or a virtual machine technology created on a physical chip. Or multiple virtual controllers.
  • the control module can be a physical controller or an NVM Controller in the NVMe over Fabric.
  • the CPU 301 in the Host 300 may perform the process of step 300 and step 304A-step 312A or step 304B-step 310B, or the network card 303 in the host 300 may perform step 300 and step 304A-step 312A or Step 304B - The process of step 310B.
  • the process of step 300 and step 304A-step 312A or step 304B-step 310B may be performed, for example, may be an FPGA chip or the like performing step 300 and step 304A-step 312A or step. 304B - The process of step 310B.
  • the foregoing step 300 and the step 304A-step 312A or the step 304B-step 310B may also be implemented by at least one of a CPU 301, a network card 303, and a certain chip or logic component of the Host 300.
  • the network card 303 performs the above step 300 and step 304A-step 306A
  • the CPU 301 performs the above steps 308A - 312A
  • the network card 303 performs the above step 300 and the steps 304B - 306B
  • the CPU 301 executes the above steps 308B - 310B.
  • the CPU 301 may perform the above step 300 and steps 304A-306A, the network card 303 performs the above steps 308A-312A; or the CPU 301 performs the above step 300 and the steps 304B-306B, and the network card 303 performs the above steps 308B-310B.
  • the chip or logic component in the Host 300 may also perform the above step 300 and the steps 304A-306A, the CPU 301 performs the above steps 308A-312A; or the network card 303 performs the above 300 and the step 304B-step 306B, the chip in the Host 300 Or the logic component performs the above steps 308B-310B and the like.
  • the embodiment of the present invention does not limit the specific execution step 300 and the implementation manner of the execution body in step 304A-step 312A or step 304B-step 310B.
  • the CPU 301 and the network card 303 respectively correspond to a CPU and a network card in the virtual machine, and the CPU and the network card in the virtual machine are implemented by a physical CPU and a physical network card carrying the virtual function.
  • the implementation manner is similar to the foregoing implementation manner, and details are not described herein.
  • FIG. 4(A) is a schematic flowchart of a method for controlling data read/write commands between a control device and a storage device in an NVMe over Fabric architecture according to an embodiment of the present invention.
  • the control method is applied to data transmission between a control device and a storage device in an NVMe over fabric architecture
  • the storage device includes a data processing unit, a cache unit, and a storage unit, and the data that the control device needs to read and write is stored in the storage.
  • the data processing unit is configured to receive a data read/write command sent by the control device, where the cache unit memory is used to buffer data to be transmitted by the data read/write command; as shown in FIG. 4(A) , the method includes:
  • Step 400A The data processing unit receives a control command sent by the control device, where the control command includes information that divides a storage space of the cache unit into two or more storage spaces.
  • Step 402A The data processing unit divides the storage space of the cache unit into two or more storage spaces according to the control command, and establishes the two or more storage spaces and Corresponding relationship between command queues, wherein the command queue is a queue formed by data read/write control commands sent by the control device;
  • Step 404A The data processing unit receives the first data read/write command sent by the control device, and according to the correspondence between the two or more storage spaces and the command queue, the first data read/write command is required.
  • the transmitted data is cached in a storage space of a cache unit corresponding to the first command queue, and the first data read/write command is a data read/write command in the first command queue.
  • the storage space corresponding to the first data read/write command in the first command queue is cached in the storage corresponding to the first command queue. In space. In this way, the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • FIG. 4(B) is a schematic flowchart of a method for controlling data read/write commands between a control device and a storage device in another NVMe over Fabric architecture according to an embodiment of the present invention.
  • the storage device includes a data processing unit, a cache unit, and a storage unit.
  • the data that the control device needs to read and write is stored in the storage unit, and the data processing unit is configured to receive a data read/write command sent by the control device.
  • the cache unit is configured to cache data that needs to be transmitted by the data read/write command; as shown in FIG. 4(B), the method includes:
  • Step 400B The control device sends a control command to the data processing unit, where the control command includes dividing the storage space of the cache unit into information of two or more storage spaces, so that the data processing unit is configured according to the The control command divides the storage space of the cache unit into two or more storage spaces, and establishes a correspondence between the two or more storage spaces and the command queue, where the command queue is sent by the control device.
  • Data read and write control a queue of orders
  • Step 402B The control device sends a first data read/write command to the storage device, where data to be transmitted by the first data read/write command is cached in a storage space of a cache unit corresponding to the first command queue,
  • the first data read/write command is a data read/write command in the first command queue.
  • the storage space corresponding to the first data read/write command in the first command queue is cached in the storage corresponding to the first command queue. In space. In this way, the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • the detailed implementation process of the method shown in FIG. 4(A) and FIG. 4(B) can also be implemented by referring to the implementation manners shown in FIG. 2 and FIG. 3 above, and details are not described herein again.
  • the data processing unit can be implemented by referring to the manner of the network card 401.
  • the cache unit can be implemented by referring to the implementation of the network card memory 4012.
  • the storage unit can be implemented by referring to the hard disk in FIG. 2, and the control device can be implemented by referring to the implementation manner of the Host300. The details will not be described again.
  • FIG. 5 is a schematic structural diagram of a storage device 500 according to an embodiment of the present invention.
  • the storage device 500 is a storage device in an NVMe over fabric architecture, and the storage device 500 performs data transmission with a control device in the NVMe over fabric architecture, where the storage device 500 includes a data processing unit 501 and a cache unit. 502.
  • the data processing unit 501 is configured to receive a data read/write command sent by the control device, where the buffer unit 502 is configured to cache data that is required to be transmitted by the data read/write command; wherein the data processing unit 501
  • a processor 5011 is included, and the processor 5011 is configured to perform the following steps:
  • the storage space of the element 502 is divided into information of two or more storage spaces;
  • the storage space of the cache unit 502 is divided into two or more storage spaces, and the correspondence between the two or more storage spaces and the command queue is established, and the command queue is the Controlling a queue formed by data read and write control commands sent by the device;
  • the first data read/write command is a data read/write command in the first command queue.
  • the storage device 500 stores, by using a different command queue in the cache unit, the data to be transmitted by the first data read/write command in the first command queue is cached in the first command queue. Corresponding storage space. In this way, the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • the detailed implementation of the storage device 500 shown in FIG. 5 in the embodiment of the present invention may also be implemented by referring to the implementation manners shown in FIG. 2 and FIG. 3, and details are not described herein again.
  • the data processing unit 501 can be implemented by referring to the manner of the network card 401.
  • the buffer unit 502 can be implemented by referring to the implementation of the network card memory 4012.
  • the storage unit 503 can be implemented by referring to the hard disk in FIG.
  • FIG. 6 is a schematic structural diagram of a control device 600 according to an embodiment of the present invention.
  • the control device 600 is a control device in an NVMe over Fabric architecture.
  • the control device 600 includes a processor 601, a network card 602, and a bus 603.
  • the processor 601 and the network card 602 are connected by a bus 603.
  • the control device 600 and The data is transmitted between the storage devices in the NVMe over fabric.
  • the storage device includes a data processing unit, a cache unit, and a storage unit.
  • the data that the control device 600 needs to read and write is cached in the cache unit of the storage device. And stored in the deposit a storage unit of the storage device; wherein the processor 601 is configured to perform the following steps:
  • control command including dividing the storage space of the cache unit into information of two or more storage spaces, so that the data processing unit is configured to cache the storage according to the control command
  • the storage space of the unit is divided into two or more storage spaces, and a correspondence between the two or more storage spaces and the command queue is established, where the command queue is composed of data read/write control commands sent by the control device.
  • the control device 600 by sending a command, causes each storage space allocated by the cache unit to correspond to a different command queue, and the data to be transmitted by the first data read/write command in the first command queue is cached in the first command queue. In the corresponding storage space. In this way, the storage space of the cache unit corresponding to the different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It avoids the problem that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of cache unit storage space, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the cache unit. .
  • control device 600 shown in FIG. 6 in the embodiment of the present invention may also be implemented by referring to the implementation manners shown in FIG. 2 and FIG. 3, and details are not described herein again.
  • control device 600 can be implemented with reference to the implementation of the Host 300.
  • FIG. 7 is a schematic structural diagram of a system for implementing data read/write command control according to an embodiment of the present invention.
  • the system includes a control device 700 and a storage device 800.
  • the control device 700 and the storage device 800 implement data transmission based on an NVMe over fabric architecture.
  • the storage device 800 includes a data processing unit 801, a cache unit 802, and storage.
  • the unit 803, the data that the control device 700 needs to read and write is stored in the storage unit 803, the data processing unit 801 is configured to receive a data read/write command sent by the control device 700, and the cache unit 802 is used to Cache Data to be transmitted by the data read and write commands; where:
  • the control device 700 is configured to send a control command to the data processing unit 801, where the control command includes information that divides a storage space of the cache unit 802 into two or more storage spaces;
  • the data processing unit 801 is configured to divide the storage space of the cache unit 802 into two or more storage spaces according to the control command, and establish a correspondence between the two or more storage spaces and the command queue. Relationship, the command queue is a queue formed by the data read/write control command sent by the control device 700;
  • the data processing unit 801 is further configured to receive a first data read/write command sent by the control device 700, and read the first data according to a correspondence between the two or more storage spaces and a command queue.
  • the data to be transmitted by the write command is cached in the storage space of the cache unit 802 corresponding to the first command queue, and the first data read/write command is a data read/write command in the first command queue.
  • each storage space partitioned in the cache unit 802 corresponds to a different command queue, and data to be transmitted by the first data read/write command in the first command queue is cached in a corresponding manner with the first command queue.
  • the storage space of the NIC memory corresponding to different command queues is used to cache the data to be transmitted by the data read/write command in the corresponding command queue. It is avoided that the data to be transmitted by the data read/write command in a certain command queue occupies a large amount of storage space of the cache unit 802, and the data read/write commands in other command queues cannot be executed due to insufficient storage space of the network card memory. problem.
  • the data processing unit 801 can be implemented by referring to the manner of the network card 401.
  • the buffer unit 802 can be implemented by referring to the implementation of the network card memory 4012.
  • the storage unit 803 can be implemented by referring to the implementation manner of the hard disk in FIG. 300 implementations to achieve and so on.
  • the buffer unit 802 is described as an example in the data processing unit 801.
  • the cache unit 802 may also be located outside the data processing unit 801. That is, the cache unit 802 in the storage device 800 may be a storage medium independent of the data processing unit 801, such as a storage medium such as DDR.
  • the cache unit 802 may also be a memory resource pool formed by a storage resource of multiple data processing units in the storage device 800. The embodiment of the invention does not limit the specific presentation form of the network card memory.
  • the methods or steps described in connection with the present disclosure may be implemented in a hardware, or may be implemented by a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read only memory (English: Read Only Memory, ROM), erasable and programmable.
  • Read-only memory English: Erasable Programmable ROM, EPROM
  • electrically erasable programmable read-only memory English: Electrically EPROM, EEPROM
  • registers hard disk, mobile hard disk, CD-ROM (English: CD-ROM) or the field Any other form of storage medium known.
  • An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • the ASIC can be located in a core network interface device.
  • the processor and the storage medium may also exist as discrete components in the core network interface device.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention contributes in essence or to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (English: ROM, Read-Only Memory), a random access memory (English: RAM, Random Access Memory), a magnetic disk or an optical disk, and the like, which can be stored.
  • the medium of the program code includes: a U disk, a mobile hard disk, a read-only memory (English: ROM, Read-Only Memory), a random access memory (English: RAM, Random Access Memory), a magnetic disk or an optical disk, and the like, which can be stored.
  • the medium of the program code includes: a U disk, a mobile hard disk, a read-only memory (English: ROM, Read-Only Memory), a random access memory (English: RAM, Random Access Memory), a magnetic disk or an optical disk, and the like, which can be stored.
  • the medium of the program code includes: a U disk, a mobile hard disk, a read-only memory (English: ROM, Read-Only Memory), a random access

Abstract

一种实现NVMe over Fabric架构中数据读写命令的控制方法、设备和系统。该方法包括,数据处理单元接收控制设备发送的控制命令,所述数据处理单元根据控制设备发送的控制命令,将缓存单元的存储空间划分为两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系;所述数据处理单元在接收所述控制设备发送的第一命令队列中的第一数据读写命令时,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元的存储空间中。避免了因某一个命令队列中的数据读写命令所要传输的数据占用缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。

Description

NVMe over Fabric架构中数据读写命令的控制方法、设备和系统 技术领域
本发明涉及信息技术领域,特别涉及一种基于Fabric的非易失性高速传输总线NVMe,NVMe over Fabric,架构中数据读写命令的控制方法、设备和系统。
背景技术
非易失性高速传输总线(英文:NVMe,non-volatile memory express)是一种控制器接口标准,统一了通过快速外围部件互连(英文:PCIe,Peripheral Component Interconnect Express)总线进行连接的NVMe设备和主机(英文:Host)之间的队列(英文:Queue)传输机制,优化了队列接口等。
已经发布的PCIe架构的NVMe标准在产业界取得了巨大的成功之后,业界很快希望将NVMe标准扩展到数据中心领域。但受限于数据中心领域没有大量现成的PCIe网络以及PCIe协议本身的缺陷(扩展性、远距离连接等),产业界正在推动将NVMe协议运行在iWarp,基于融合以太的远程内存直接访问协议(英文:ROCE,remote direct memory access over Converged Ethernet),Infiniband,光纤通道(英文:FC,Fiber Channel),Omni-Path等网络上,以提供更灵活的更广泛的应用。业界将NVMe协议运行在iWarp、ROCE、Infiniband、FC和Omni-Path等网络上的应用称之为NVMe over Fabric(简称NOF)。
在NVMe over Fabric的架构中,Host表示主机,主机负责发起数据的读写;Target表示目标存储设备,负责接收并且执行Host发送的命令。当 Target接收到Host发送的Write Command之后,Target中的网卡解析Write Command中的内容得到Write Command需要传输的数据长度,并在网卡内存中分配对应的存储空间,用于缓存Host待传输的数据。Target的网卡缓存需要传输的数据后,再将缓存的数据迁入Target中的目的硬盘中。当Host通过Read Command从Target的硬盘中读取数据时,实现过程类似。即需要先将Target的硬盘中的数据缓存在网卡内存中,再将缓存在网卡内存中的数据,发送给Host。
在具体的业务实现时,同一时间段内,Host会向Target发送多个命令。当有多个命令需要发送时,NVMe over Fabric架构通过多个队列来实现命令的并行处理。当多个队列中存在多个命令时,会存在一个队列中的命令所要读写的数据占用网卡的网卡内存的大部分存储空间情况。会导致其它队列中的命令因申请不到足够的存储空间来缓存需要读写的数据,而不能及时被执行。未被执行的命令需要等待内存空间的释放,以及再次申请可用内存空间等。这样的实现方式使得Target中网卡在处理网卡内存不足时的实现方式复杂,可维护性也差。
发明内容
本发明实施例提供一种实现NVMe over Fabric架构中数据读写命令的控制方法、设备和系统,以解决因执行某个队列中的数据读写命令,导致的其它队列中的数据读写命令因无足够缓存空间而执行失败时所带来的处理机制复杂的问题。
一方面,本发明实施例提供了一种NVMe over Fabric架构中控制设备与存储设备之间数据读写命令的控制方法,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其中,所述 方法包括下述步骤:
所述数据处理单元接收所述控制设备发送的控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
所述数据处理单元接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
通过上述方法,缓存单元中被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
可选的,所述NVMe over Fabric架构中,所述控制设备与所述存储设备之间可以通过iWarp、ROCE、Infiniband、FC或Omni-Path等网络实现连接和通信。
所述存储设备中的数据处理单元可以是网卡、独立的FPGA芯片或所述存储设备中的中央处理器(英文:CPU,central processing unit)来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备 上的缓存单元或所述存储设备中的CPU的内存中的至少两个组成的缓存资源池来实现来实现。
在一个可能的设计中,所述数据处理单元建立所述两个以上的存储空间与命令队列之间的对应关系包括:
所述数据处理单元根据所述控制命令中携带的对应关系信息,建立所述两个以上的存储空间与命令队列之间的对应关系,所述对应关系信息是所述缓存单元中两个以上的存储空间与命令队列的对应关系;或,
所述数据处理单元根据被划分的两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系。
在一个可能的设计中,所述方法还包括:
获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
通过上述调整不同命令队列分配的缓存单元的存储空间,能够根据实际的情况,灵活分配缓存单元的存储空间。不仅能够最大化地利用缓存单元的资源,还能够解决部分命令队列中数据读写命令要传输的数据量大的问题,在避免资源浪费的同时,提升了业务处理的能力。
可选的,所述建立所述缓存单元中每个存储空间与命令队列之间的对应关系,可以是一个命令队列对应一个存储空间,也可以是由两个以上的命令队列组成的队列组对应一个存储空间。这样能够更灵活地实现缓存单元的分配,以实现不同命令队列对缓存单元的存储资源的分配。
可选的,所述划分缓存单元的存储空间的方式,包括但不限于:根据缓存单元的存储空间的大小划分,根据不同命令队列的服务质量划分或根据不同命令队列的优先级划分。
作为一种可选的实现方式,当缓存单元包括网卡内存和所述存储设备中的CPU的内存时,分配给网卡内存的存储空间所对应的命令队列的优先级高或QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低或QOS要求低等。由于网卡内存作为缓存单元时缓存数据的速度快、效率高,因此,将高优先级的命令队列分配给网卡内存的存储空间,能够满足高优先级的命令的业务需求。可以理解,当缓存单元包括FPGA芯片中的存储单元和所述存储设备中的CPU的内存时,也可以是分配给FPGA芯片中的存储单元所对应的命令队列的优先级高或QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。
可选的,所述数据处理单元还可以根据所述控制设备的控制命令,将多个命令队列绑定为一个队列组,该命令队列组对应的缓存单元中的存储空间,是这个队列组中每个命令队列对应的存储空间的总和。这样,能够进一步灵活地实现所述缓存单元中存储空间的配置,以满足不同命令队列对缓存单元的存储空间的不同需求。
在一个可能的设计中,所述方法还包括:
所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间;
所述控制设备判断第一数据读写命令所要传输的第一数据占用的存储空间,是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
在所述第一数据占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储 设备;
在所述第一数据占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令。
这样,所述控制设备会在第一命令队列对应的缓存单元的存储空间能够缓存需要传输的数据时,才发送所述第一命令队列中的第一数据读写命令。能够避免所述第一命令队列对应的缓存单元的存储空间中可用存储空间不足,因缓存所述第一命令队列中的命令所带来的处理机制复杂的问题。
在一个可能的设计中,所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间包括:
所述控制设备在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
在一个可能的设计中,在所述控制设备向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求之前,所述方法还包括:
所述控制设备向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
所述控制设备接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。。
通过上述方式,所述控制设备可以不用在每次发送数据读写命令时都发送获取待发送命令所在的命令队列所对应的所述缓存单元的存储空间的可用存储空间的请求,只在收到所述数据处理单元返回的不能缓存数据的反压消息时才发送上述获取请求。这样,能够节省所述控制设备的资源消耗,也相应节省了数据处理单元因返回反压消息时产生的资源消耗。
在一个可能的设计中,所述方法还包括:
所述控制设备暂停发送所述第一数据读写命令达到预设时间后,重新获取所述第一命令队列对应的所述缓存单元的可用存储空间,并在所述第一数据所占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备。
可选的,所述控制设备暂停发送所述第一数据读写命令的预设时间,可以是系统默认的时间或预先配置的时间。并且,所述控制设备暂停发送所述第一数据读写命令的预设时间,可以根据具体的业务情况进行灵活设定。
在一个可能的设计中,所述控制设备只在预设时间内执行获取所述第一命令队列对应的缓存单元的可用存储空间以及判断第一数据读写命令所要传输的第一数据占用的存储空间是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间的步骤。所述预设时间可以根据不同的业务场景进行不用的设置。在所述预设时间内,所述缓存单元的可用存储空间不能满足所述控制设备发送的所有数据读写命令所要传输的数据对存储空间的需求。在所述预设时间达到后,所述缓存单元的可用存储空间能满足所述控制设备发送的数据读写命令所要传输的数据对存储空间的需求。
进一步的,所述控制设备接收到所述内存发送的反压报文后,还包括重传所述第二数据读写命令。即对于因所述第一命令队列对应的缓存单元的存储空间不足,不能及时被执行的所述第二数据读写命令,所述控制设备在判断所述第一命令队列对应的缓存单元的存储空间大于所述第二数据读写命令所要传输的数据时,所述控制设备重新发送所述第二数据读写命令。
在一个可能的设计中,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
其中,所述本地指的是所述控制设备,所述本地记录的所述缓存单元 的实时可用存储空间,是所述控制设备记录的所述缓存单元的实时可用存储空间。
可选的,所述控制设备可以在所述存储设备上电初始化时,获取并记录所述缓存单元的可用存储空间。所述控制设备也可以在所述存储设备上电初始化后的任一时间,获取并记录所述缓存单元的可用存储空间。
可选的,所述控制设备记录的所述网卡内存的实时可用存储空间的形式,可以是所述缓存单元的可存储数据的空间的大小或可被写入的数据块的个数。
可选的,所述控制设备在专门的存储空间中,例如专门的芯片中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。也可以是存储在所述控制设备中已有的存储部件中,例如所述控制设备的CPU的缓存中,或所述控制设备的网卡的缓存中,还可以独立的FPGA芯片中的一个存储空间中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。
由于所述控制设备发送所述第一数据读写命令,是所述第一命令队列对应的缓存单元的存储空间能够缓存所述第一数据读写命令所要传输的数据时发送的。因此,能够避免因所述第一命令队列所对应的缓存单元的存储空间不足,而缓存所述第一命令队列中的命令所带来的复杂处理机制的问题。
在一个可能的设计中,所述控制设备在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;
所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
所述控制设备在发送所述第一数据读写命令后,所述第一数据读写命 令所要传输的数据会占用所述第一命令队列对应的所述缓存单元的存储空间。因此,需要将记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据所占用的存储空间。所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,所述第一数据已经被迁移出所述第一命令队列对应的所述缓存单元。因此,需要将记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据所占用的存储空间。这样,能够正确记录所述第一命令队列对应的所述缓存单元最新的可用存储空间。
在一个可能的设计中,所述方法还包括:
所述控制设备暂停发送所述第一数据读写命令达到预设时间后,所述控制设备再次判断所述第一数据占用的存储空间是否小于或等于本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间,并在所述第一数据所占用的存储空间小于或等于本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间时,发送所述第一数据读写命令给所述存储设备。
可选的,所述控制设备在专门的存储空间中,例如专门的芯片中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。也可以是存储在所述控制设备中已有的存储部件中,例如所述控制设备的CPU的缓存中,或所述控制设备的网卡的缓存中,还可以是FPGA芯片中的一个存储空间中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。
可选的,当所述第一数据读写命令是写命令时,所述控制设备发送的所述第一数据读写命令为Write Command,所述第一数据读写命令所要传输的数据是需要存储的数据。所述Write Command中携带SGL,所述SGL中包括一个字段,例如可以是一个entry,该字段包含所述需要存储的数据在所述控制设备中的源地址、所述需要存储的数据的长度、以及所述需要存储 的数据在所述存储设备中的目的地址等信息。
所述数据处理单元根据Write Command中SGL携带的所述需要存储的数据在控制设备中的源地址,将所述需要存储的数据缓存在所述第一命令队列对应的缓存单元的存储空间中。可选的,所述数据处理单元可以远程直接数据存取(英文:RDMA,remote direct memory access)的方式,通过所述控制设备中的网卡接收所述需要存储的数据。
当所述需要存储的数据缓存在所述第一命令队列对应的缓存单元的存储空间后,所述数据处理单元修改所述Write Command,将所述Write Command携带的所述需要存储的数据在所述控制设备中的源地址,修改为所述第一命令队列对应的缓存单元中存储所述需要存储的数据的地址,并将修改后的Write Command发送给目的硬盘的控制器。即所述数据处理单元发送给目的硬盘的控制器的Write Command携带的SGL中包括所述第一命令队列对应的缓存单元中存储所述需要存储的数据的地址,所述需要存储的数据的长度、以及所述需要存储的数据在所述存储设备中的目的地址等信息。
所述数据处理单元在确定目的硬盘后,将修改后的Write Command发送给目的硬盘的控制器。目的硬盘的控制器根据接收到的Write Command中携带的所述需要存储的数据在所述缓存单元中的地址,从所述缓存单元中读取所述需要存储的数据。例如以RDMA或直接内存访问(英文:DMA,Direct Memory Access)的方式读取所述需要存储的数据。并将读取到的所述需要存储的数据写入目的硬盘对应的存储空间中。
当所述第一数据读写命令是读命令时,所述控制设备发送的所述第一数据读写命令为Read Command,所述第一数据读写命令所要传输的数据是需要读取的数据。所述Read Command中携带SGL,所述SGL中包含所述需要读取的数据在所述存储设备中的源地址、所述需要读取的数据的长度、以及所述需要读取的数据要写入所述控制设备中的目的地址等信息。
所述数据处理单元收到所述Read Command后,修改所述Read Command,将所述Read Command中携带的所述需要读取的数据在所述控制设备中的目的地址,修改为所述第一命令队列对应的缓存单元的存储空间中缓存所述需要读取的数据的地址,并将修改后的Read Command发送给目的硬盘的控制器。即所述数据处理单元发送给目的硬盘控制器的Read Command携带的SGL中包括所述需要读取的数据在所述存储设备中的源地址、所述需要读取的数据的长度、以及所述第一命令队列对应的缓存单元的存储空间中缓存所述需要读取的数据的地址等信息。目的硬盘的控制器根据接收到的所述修改后的Read Command,将所述需要读取的数据迁移到所述第一命令队列对应的缓存单元的存储空间中。可选的,目的硬盘的控制器通过RDMA的方式,将所述需要读取的数据迁移到所述第一命令队列对应的缓存单元的存储空间中。
当所述需要读取的数据缓存在所述第一命令队列对应的缓存单元的存储空间后,所述数据处理单元根据所述Read Command中所述需要读取的数据要写入所述控制设备中的目的地址,将缓存的所述需要读取的数据发送给所述控制设备。可选的,所述数据处理单元通过RDMA的方式,将缓存的所述需要读取的数据发送给所述控制设备。
在一个可能的设计中,所述数据处理单元与所述存储单元之间通过基于快捷外围部件互连标准PCIe的NVMe,NVMe over PCIe,架构实现连接。
在一个可能的设计中,所述数据处理单元中包括控制器,所述控制器用于控制所述缓存单元中缓存的数据与所述存储单元之间的传输,所述控制器是NVMe over Fabric架构中的物理控制器或非易矢性存储控制器。
另一方面,本发明实施例提供了一种NVMe over Fabric架构中控制设备与存储设备之间数据读写命令的控制方法,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令, 所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其中,所述方法包括:
所述控制设备向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息,使得所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
所述控制设备向所述存储设备发送第一数据读写命令,所述第一数据读写命令所要传输的数据被缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
通过上述方法,所述控制设备通过发送控制命令,使得所述缓存单元被划分成不同的存储空间,每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
可选的,所述NVMe over Fabric架构中,所述控制设备与所述存储设备之间可以通过iWarp、ROCE、Infiniband、FC或Omni-Path等网络实现连接和通信。
所述存储设备中的数据处理单元可以是网卡、FPGA芯片或所述存储设备中的CPU来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的 存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存中的至少两个组成的缓存资源池来实现来实现。
可选的,所述建立所述缓存单元中每个存储空间与命令队列之间的对应关系,可以是一个命令队列对应一个存储空间,也可以是由两个以上的命令队列组成的队列组对应一个存储空间。这样能够更灵活地实现缓存单元的分配,以实现不同命令队列对缓存单元的存储资源的分配。
可选的,所述划分缓存单元的存储空间的方式,包括但不限于:根据缓存单元的存储空间的大小划分,根据不同命令队列的服务质量划分或根据不同命令队列的优先级划分。
作为一种可选的实现方式,当缓存单元包括网卡内存和所述存储设备中的CPU的内存时,分配给网卡内存的存储空间所对应的命令队列的优先级高QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。由于网卡内存作为缓存单元时缓存数据的速度快、效率高,因此,将高优先级的命令队列分配给网卡内存的存储空间,能够满足高优先级的命令的业务需求。可以理解,当缓存单元包括FPGA芯片中的存储单元和所述存储设备中的CPU的内存时,也可以是分配给FPGA芯片中的存储单元所对应的命令队列的优先级高QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。
在一个可能的设计中,所述方法还包括:
所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间;
所述控制设备判断第一数据读写命令所要传输的第一数据占用的存储空间,是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
在所述第一数据占用的存储空间小于或等于所述第一命令队列对应的 所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备;
在所述第一数据占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令。
这样,所述控制设备会在第一命令队列对应的缓存单元的存储空间能够缓存需要传输的数据时,才发送所述第一命令队列中的第一数据读写命令。能够避免所述第一命令队列对应的缓存单元的存储空间中可用存储空间不足,因缓存所述第一命令队列中的命令所带来的处理机制复杂的问题。
在一个可能的设计中,所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间包括:
所述控制设备在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
在一个可能的设计中,在所述控制设备向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求之前,所述方法还包括:
所述控制设备向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
所述控制设备接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。
通过上述方式,所述控制设备可以不用在每次发送数据读写命令时都发送获取待发送命令所在的命令队列所对应的所述缓存单元的存储空间的可用存储空间的请求,只在收到所述数据处理单元返回的不能缓存数据的反压消息时才发送上述获取请求。这样,能够节省所述控制设备的资源消 耗,也相应节省了数据处理单元因返回反压消息时产生的资源消耗。
在一个可能的设计中,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
其中,所述本地指的是所述控制设备,所述本地记录的所述缓存单元的实时可用存储空间,是所述控制设备记录的所述缓存单元的实时可用存储空间。
可选的,所述控制设备可以在所述存储设备上电初始化时,获取并记录所述缓存单元的可用存储空间。所述控制设备也可以在所述存储设备上电初始化后的任一时间,获取并记录所述缓存单元的可用存储空间。
可选的,所述控制设备记录的所述网卡内存的实时可用存储空间的形式,可以是所述缓存单元的可存储数据的空间的大小或可被写入的数据块的个数。
可选的,所述控制设备在专门的存储空间中,例如专门的芯片中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。也可以是存储在所述控制设备中已有的存储部件中,例如所述控制设备的CPU的缓存中,或所述控制设备的网卡的缓存中,还可以独立的FPGA芯片中的一个存储空间中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。
在一个可能的设计中,所述方法还包括:
所述控制设备在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;
所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
所述控制设备在发送所述第一数据读写命令后,所述第一数据读写命 令所要传输的数据会占用所述第一命令队列对应的所述缓存单元的存储空间。因此,需要将记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据所占用的存储空间。所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,所述第一数据已经被迁移出所述第一命令队列对应的所述缓存单元。因此,需要将记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据所占用的存储空间。这样,能够正确记录所述第一命令队列对应的所述缓存单元最新的可用存储空间。
另一方面,本发明实施例还提供了一种存储设备,所述存储设备是NVMe over Fabric架构中的存储设备,所述存储设备与所述NVMe over Fabric架构中的控制设备之间进行数据传输,所述存储设备包括数据处理单元和缓存单元,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其中,所述数据处理单元包括处理器,所述处理器用于执行下述步骤:
接收所述控制设备发送的控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
可选的,所述NVMe over Fabric架构中,所述控制设备与所述存储设备之间可以通过iWarp、ROCE、Infiniband、FC或Omni-Path等网络实现连接和通信。
所述存储设备中的数据处理单元可以是网卡、FPGA芯片或所述存储设备中的CPU来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存中的至少两个组成的缓存资源池来实现来实现。
在一个可能的设计中,所述处理器建立所述两个以上的存储空间与命令队列之间的对应关系包括:
根据所述控制命令中携带的对应关系信息,建立所述两个以上的存储空间与命令队列之间的对应关系,所述对应关系信息是所述缓存单元中两个以上的存储空间与命令队列的对应关系;或,
根据被划分的两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系。
在一个可能的设计中,所述处理器还用于:
获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
通过上述调整不同命令队列分配的缓存单元的存储空间,能够根据实际的情况,灵活分配缓存单元的存储空间。不仅能够最大化地利用缓存单元的资源,还能够解决部分命令队列中数据读写命令要传输的数据量大的问题,在避免资源浪费的同时,提升了业务处理的能力。
可选的,所述建立所述缓存单元中每个存储空间与命令队列之间的对应关系,可以是一个命令队列对应一个存储空间,也可以是由两个以上的命令队列组成的队列组对应一个存储空间。这样能够更灵活地实现缓存单元的分配,以实现不同命令队列对缓存单元的存储资源的分配。
可选的,所述划分缓存单元的存储空间的方式,包括但不限于:根据缓存单元的存储空间的大小划分,根据不同命令队列的服务质量划分或根据不同命令队列的优先级划分。
作为一种可选的实现方式,当缓存单元包括网卡内存和所述存储设备中的CPU的内存时,分配给网卡内存的存储空间所对应的命令队列的优先级高QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。由于网卡内存作为缓存单元时缓存数据的速度快、效率高,因此,将高优先级的命令队列分配给网卡内存的存储空间,能够满足高优先级的命令的业务需求。可以理解,当缓存单元包括FPGA芯片中的存储单元和所述存储设备中的CPU的内存时,也可以是分配给FPGA芯片中的存储单元所对应的命令队列的优先级高QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。
另一方面,本发明实施例还提供了一种控制设备,所述控制设备是NVMe over Fabr ic架构中的控制设备,所述控制设备包括处理器、网卡和总线,所述处理器和网卡通过总线连接,所述控制设备与NVMe over Fabric架构中的存储设备之间进行数据传输,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据缓存在所述存储设备的缓存单元中,并存储在所述存储设备的存储单元;其中,所述处理器用于执行下述步骤:
向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息,使得所述数据处理单元 根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
向所述存储设备发送第一数据读写命令,所述第一数据读写命令所要传输的数据被缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
可选的,所述NVMe over Fabric架构中,所述控制设备与所述存储设备之间可以通过iWarp、ROCE、Infiniband、FC或Omni-Path等网络实现连接和通信。
所述存储设备中的数据处理单元可以是网卡、FPGA芯片或所述存储设备中的CPU来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存中的至少两个组成的缓存资源池来实现来实现。
可选的,所述建立所述缓存单元中每个存储空间与命令队列之间的对应关系,可以是一个命令队列对应一个存储空间,也可以是由两个以上的命令队列组成的队列组对应一个存储空间。这样能够更灵活地实现缓存单元的分配,以实现不同命令队列对缓存单元的存储资源的分配。
可选的,所述划分缓存单元的存储空间的方式,包括但不限于:根据缓存单元的存储空间的大小划分,根据不同命令队列的服务质量划分或根据不同命令队列的优先级划分。
作为一种可选的实现方式,当缓存单元包括网卡内存和所述存储设备中的CPU的内存时,分配给网卡内存的存储空间所对应的命令队列的优先级高QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。由于网卡内存作为缓存单元时缓存数据的 速度快、效率高,因此,将高优先级的命令队列分配给网卡内存的存储空间,能够满足高优先级的命令的业务需求。可以理解,当缓存单元包括FPGA芯片中的存储单元和所述存储设备中的CPU的内存时,也可以是分配给FPGA芯片中的存储单元所对应的命令队列的优先级高QOS要求高,分配给所述存储设备中的CPU的内存的存储空间所对应的命令队列的优先级低QOS要求低等。
在一个可能的设计中,所述处理器还用于:
获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,发送调整命令给所述数据处理单元,所述调整命令用于减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
通过上述调整不同命令队列分配的缓存单元的存储空间,能够根据实际的情况,灵活分配缓存单元的存储空间。不仅能够最大化地利用缓存单元的资源,还能够解决部分命令队列中数据读写命令要传输的数据量大的问题,在避免资源浪费的同时,提升了业务处理的能力。
在一个可能的设计中,所述处理器还用于执行下述步骤:
获取所述第一命令队列对应的所述缓存单元的可用存储空间;
判断第一数据读写命令所要传输的第一数据占用的存储空间,是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
在所述第一数据占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储 设备;
在所述第一数据占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令。
这样,所述控制设备会在第一命令队列对应的缓存单元的存储空间能够缓存需要传输的数据时,才发送所述第一命令队列中的第一数据读写命令。能够避免所述第一命令队列对应的缓存单元的存储空间中可用存储空间不足,因缓存所述第一命令队列中的命令所带来的处理机制复杂的问题。
在一个可能的设计中,所述处理器获取所述缓存单元的可用存储空间包括:
所述处理器在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
在一个可能的设计中,在所述处理器向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求之前,所述处理器还用于执行下述步骤:
向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。
通过上述方式,所述控制设备可以不用在每次发送数据读写命令时都发送获取待发送命令所在的命令队列所对应的所述缓存单元的存储空间的可用存储空间的请求,只在收到所述数据处理单元返回的不能缓存数据的反压消息时才发送上述获取请求。这样,能够节省所述控制设备的资源消耗,也相应节省了数据处理单元因返回反压消息时产生的资源消耗。
在一个可能的设计中,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
其中,所述本地指的是所述控制设备,所述本地记录的所述缓存单元的实时可用存储空间,是所述控制设备记录的所述缓存单元的实时可用存储空间。
可选的,所述控制设备可以在所述存储设备上电初始化时,获取并记录所述缓存单元的可用存储空间。所述控制设备也可以在所述存储设备上电初始化后的任一时间,获取并记录所述缓存单元的可用存储空间。
可选的,所述控制设备记录的所述网卡内存的实时可用存储空间的形式,可以是所述缓存单元的可存储数据的空间的大小或可被写入的数据块的个数。
可选的,所述控制设备在专门的存储空间中,例如专门的芯片中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。也可以是存储在所述控制设备中已有的存储部件中,例如所述控制设备的CPU的缓存中,或所述控制设备的网卡的缓存中,还可以独立的FPGA芯片中的一个存储空间中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。
在一个可能的设计中,所述处理器还用于执行下述步骤:
在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;
在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
另一方面,本发明实施例提供了一种实现数据读写命令控制的系统,所述系统包括NVMe over Fabric架构中的控制设备和存储设备,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发 送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其中:
所述控制设备,用于向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
所述数据处理单元,用于根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
所述数据处理单元,还用于接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的所述缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
通过上述系统,缓存单元中被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
可选的,所述NVMe over Fabric架构中,所述控制设备与所述存储设备之间可以通过iWarp、ROCE、Infiniband、FC或Omni-Path等网络实现连接和通信。
所述存储设备中的数据处理单元可以是网卡、独立FPGA芯片或所述存储设备中CPU来实现。所述存储设备中的缓存单元也可以由网卡内存、 FPGA芯片中的存储单元、存储设备上的缓存单元或所述存储设备中的CPU的内存来实现。所述存储设备中的缓存单元也可以由网卡内存、FPGA芯片中的存储单元、存储设备上的缓存单元或所
述存储设备中的CPU的内存中的至少两个组成的缓存资源池来实现来实现。
可选的,所述控制设备可以是物理服务器或物理服务器上的虚拟机。所述存储设备中的存储单元可以为一个或一个以上的固态磁盘(英文:SSD,Solid State Disk)或硬盘驱动器(英文:HDD,Hard Disk Driver)。所述缓存单元可以位于所述数据处理单元中,也可以独立于所述数据处理单元的存储介质,例如可以是独立于数据处理单元的双倍数据速率(英文:DDR,double data rate)。所述缓存单元还可以是所述存储设备中多个数据处理单元的内存资源共同构成的一个内存资源池。
在一个可能的设计中,所述数据处理单元建立所述两个以上的存储空间与命令队列之间的对应关系包括:
所述数据处理单元根据所述控制命令中携带的对应关系信息,建立所述两个以上的存储空间与命令队列之间的对应关系,所述对应关系信息是所述缓存单元中两个以上的存储空间与命令队列的对应关系;或,
所述数据处理单元根据被划分的两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系。
在一个可能的设计中,所述控制设备,用于获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元缓存单元间的占用比例;
当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,所述控制设备发送调整命令给所述数据处理单元,所述调整命令用于减少所述第二命令队列所对应的所述缓 存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
在一个可能的设计中,所述数据处理单元还用于获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
通过上述调整不同命令队列分配的缓存单元的存储空间,能够根据实际的情况,灵活分配缓存单元的存储空间。不仅能够最大化地利用缓存单元的资源,还能够解决部分命令队列中数据读写命令要传输的数据量大的问题,在避免资源浪费的同时,提升了业务处理的能力。
可选的,所述数据处理单元可以根据所述控制命令,建立所述缓存单元中每个存储空间与命令队列之间的对应关系,可以是一个命令队列对应一个存储空间,也可以是由两个以上的命令队列组成的队列组对应一个存储空间。这样能够更灵活地实现缓存单元的分配,以实现不同命令队列对缓存单元的存储资源的分配。
可选的,所述数据处理单元根据所述控制设备的控制命令划分缓存单元的存储空间的方式,包括但不限于:根据缓存单元的存储空间的大小划分,根据不同命令队列的服务质量划分或根据不同命令队列的优先级划分。
可选的,所述数据处理单元还可以根据所述控制设备的控制命令,将多个命令队列绑定为一个队列组,该命令队列组对应的缓存单元中的存储 空间,是这个队列组中每个命令队列对应的存储空间的总和。这样,能够进一步灵活地实现所述缓存单元中存储空间的配置,以满足不同命令队列对缓存单元的存储空间的不同需求。
在一个可能的设计中,
所述控制设备,还用于获取所述第一命令队列对应的所述缓存单元的可用存储空间,判断第一数据读写命令所要传输的第一数据占用的存储空间是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
所述控制设备,还用于在所述第一数据所占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备;在所述第一数据所占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令;
所述数据处理单元,还用于接收所述控制设备发送的所述第一数据读写命令,并将所述第一数据读写命令所要传输的数据缓存在所述第一命令队列对应的所述缓存单元中。
这样,所述控制设备会在第一命令队列对应的缓存单元的存储空间能够缓存需要传输的数据时,才发送所述第一命令队列中的第一数据读写命令。能够避免所述第一命令队列对应的缓存单元的存储空间中可用存储空间不足,因缓存所述第一命令队列中的命令所带来的处理机制复杂的问题。
在一个可能的设计中,所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间包括:
所述控制设备在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
在一个可能的设计中,在所述控制设备向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求之前,所述控制设备还用于向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
所述控制设备接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。
通过上述方式,所述控制设备可以不用在每次发送数据读写命令时都发送获取待发送命令所在的命令队列,所对应的所述缓存单元的存储空间的可用存储空间的请求,只在收到所述数据处理单元返回不能缓存数据的反压消息时才发送上述获取请求,能够节省所述控制设备的资源消耗,也相应节省了数据处理单元因返回反压消息时产生的资源消耗。
在一个可能的设计中,所述控制设备还用于在暂停发送所述第一数据读写命令达到预设时间后,重新获取所述第一命令队列对应的所述缓存单元的可用存储空间,并在所述第一数据所占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备。
可选的,所述控制设备暂停发送所述第一数据读写命令的预设时间,可以是系统默认的时间或预先配置的时间。并且,所述控制设备暂停发送所述第一数据读写命令的预设时间,可以根据具体的业务情况进行灵活设定。
进一步的,所述控制设备接收到所述内存发送的反压报文后,还包括重传所述第二数据读写命令。即对于因所述第一命令队列对应的缓存单元的存储空间不足,不能及时被执行的所述第二数据读写命令,所述控制设备在判断所述第一命令队列对应的缓存单元的存储空间大于所述第二数据读写命令所要传输的数据时,所述控制设备重新发送所述第二数据读写命 令。
在一个可能的设计中,所述控制设备只在预设时间内获取所述第一命令队列对应的缓存单元的可用存储空间以及判断第一数据读写命令所要传输的第一数据占用的存储空间是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间。即所述控制设备执行在发送第一数据读写命令之前获取所述缓存单元的可用存储空间的步骤达到预设时间时,所述缓存单元的可用存储空间足够大时,可以不再执行在发送第一数据读写命令之前获取所述缓存单元的可用存储空间的步骤。这样能进一步提升所述控制设备的资源利用率,以及所述数据处理单元的资源利用率。所述预设时间可以根据不同的业务场景进行不用的设置。在所述预设时间内,所述缓存单元的可用存储空间不能满足所述控制设备发送的所有数据读写命令所要传输的数据对存储空间的需求。在所述预设时间达到后,所述缓存单元的可用存储空间能满足所述控制设备发送的数据读写命令所要传输的数据对存储空间的需求。
在一个可能的设计中,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
其中,所述本地指的是所述控制设备,所述本地记录的所述缓存单元的实时可用存储空间,是所述控制设备记录的所述缓存单元的实时可用存储空间。
可选的,所述控制设备可以在所述存储设备上电初始化时,获取并记录所述缓存单元的可用存储空间。所述控制设备也可以在所述存储设备上电初始化后的任一时间,获取并记录所述缓存单元的可用存储空间。
可选的,所述控制设备记录的所述网卡内存的实时可用存储空间的形式,可以是所述缓存单元的可存储数据的空间的大小或可被写入的数据块的个数。
可选的,所述控制设备在专门的存储空间中,例如专门的芯片中,存 储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。也可以是存储在所述控制设备中已有的存储部件中,例如所述控制设备的CPU的缓存中,或所述控制设备的网卡的缓存中,还可以独立的FPGA芯片中的一个存储空间中,存储所述第一命令队列对应的缓存单元的存储空间的实时可用存储空间。
由于所述控制设备发送所述第一数据读写命令,是所述第一命令队列对应的缓存单元的存储空间能够缓存所述第一数据读写命令所要传输的数据时发送的。因此,能够避免因所述第一命令队列所对应的缓存单元的存储空间不足,而缓存所述第一命令队列中的命令所带来的复杂处理机制的问题。
在一个可能的设计中,所述控制设备还用于在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
所述控制设备在发送所述第一数据读写命令后,所述第一数据读写命令所要传输的数据会占用所述第一命令队列对应的所述缓存单元的存储空间。因此,需要将记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据所占用的存储空间。所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,所述第一数据已经被迁移出所述第一命令队列对应的所述缓存单元。因此,需要将记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据所占用的存储空间。这样,能够正确记录所述第一命令队列对应的所述缓存单元最新的可用存储空间。
在一个可能的设计中,所述控制设备还用于在暂停发送所述第一数据 读写命令达到预设时间后,所述控制设备再次判断所述第一数据占用的存储空间是否小于或等于本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间,并在所述第一数据所占用的存储空间小于或等于本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间时,发送所述第一数据读写命令给所述存储设备。
可选的,当所述第一数据读写命令是写命令时,所述控制设备发送的所述第一数据读写命令为Write Command,所述第一数据读写命令所要传输的数据是需要存储的数据。所述Write Command中携带SGL,所述SGL中包括一个字段,例如可以是一个entry,该字段包含所述需要存储的数据在所述控制设备中的源地址、所述需要存储的数据的长度、以及所述需要存储的数据在所述存储设备中的目的地址等信息。
所述数据处理单元根据Write Command中SGL携带的所述需要存储的数据在控制设备中的源地址,将所述需要存储的数据缓存在所述第一命令队列对应的缓存单元的存储空间中。可选的,所述数据处理单元可以RDMA的方式,通过所述控制设备中的网卡接收所述需要存储的数据。
当所述需要存储的数据缓存在所述第一命令队列对应的缓存单元的存储空间后,所述数据处理单元修改所述Write Command,将所述Write Command携带的所述需要存储的数据在所述控制设备中的源地址,修改为所述缓存单元中存储所述需要存储的数据的地址,并将修改后的Write Command发送给目的硬盘的控制器。即所述数据处理单元发送给目的硬盘的控制器的Write Command携带的SGL中包括所述缓存单元中存储所述需要存储的数据的地址,所述需要存储的数据的长度、以及所述需要存储的数据在所述存储设备中的目的地址等信息。
所述数据处理单元在确定目的硬盘后,将修改后的Write Command发送给目的硬盘的控制器。目的硬盘的控制器根据接收到的Write Command中携带的所述需要存储的数据在所述缓存单元中的地址,从所述缓存单元中读 取所述需要存储的数据,例如以RDMA或DMA的方式读取所述需要存储的数据。并将读取到的所述需要存储的数据写入目的硬盘对应的存储空间中。
当所述第一数据读写命令是读命令时,所述控制设备发送的所述第一数据读写命令为Read Command,所述第一数据读写命令所要传输的数据是需要读取的数据。所述Read Command中携带SGL,所述SGL中包含所述需要读取的数据在所述存储设备中的源地址、所述需要读取的数据的长度、以及所述需要读取的数据要写入所述控制设备中的目的地址等信息。
所述数据处理单元收到所述Read Command后,修改所述Read Command,将所述Read Command中携带的所述需要读取的数据在所述控制设备中的目的地址,修改为所述第一命令队列对应的缓存单元的存储空间中缓存所述需要读取的数据的地址,并将修改后的Read Command发送给目的硬盘的控制器。即所述数据处理单元发送给目的硬盘控制器的Read Command携带的SGL中包括所述需要读取的数据在所述存储设备中的源地址、所述需要读取的数据的长度、以及所述第一命令队列对应的缓存单元的存储空间中缓存所述需要读取的数据的地址等信息。目的硬盘的控制器根据接收到的所述修改后的Read Command,将所述需要读取的数据迁移到所述第一命令队列对应的缓存单元的存储空间中。可选的,目的硬盘的控制器通过RDMA的方式,将所述需要读取的数据迁移到所述第一命令队列对应的缓存单元的存储空间中。
当所述需要读取的数据缓存在所述第一命令队列对应的缓存单元的存储空间后,所述数据处理单元根据所述Read Command中所述需要读取的数据要写入所述控制设备中的目的地址,将缓存的所述需要读取的数据发送给所述控制设备。可选的,所述数据处理单元通过RDMA的方式,将缓存的所述需要读取的数据发送给所述控制设备。
在一个可能的设计中,所述数据处理单元与所述存储单元之间通过基于快捷外围部件互连标准PCIe的NVMe,NVMe over PCIe,架构实现连接。
在一个可能的设计中,所述数据处理单元中包括控制器,所述控制器用于控制所述缓存单元中缓存的数据与所述存储单元之间的传输,所述控制器是NVMe over Fabric架构中的物理控制器或非易矢性存储控制器。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为现有技术的NVMe over Fabric架构中一种实现方式架构示意图;
图2为本发明实施例的NVMe over Fabric架构中实现数据读写命令控制的系统结构示意图;
图3为本发明实施例提供的一种NVMe over Fabric架构中数据读写命令的控制方法流程示意图;
图4(A)为本发明实施例一种NVMe over Fabr ic架构中控制设备与存储设备之间数据读写命令的控制方法的流程示意图;
图4(B)为本发明实施例另一种NVMe over Fabric架构中控制设备与存储设备之间数据读写命令的控制方法的流程示意图;
图5为本发明实施例提供的一种存储设备500的结构示意图;
图6为本发明实施例提供的一种控制设备600的结构示意图;
图7为本发明实施例提供的一种实现数据读写命令控制的系统的结构示意图。
具体实施方式
下面结合附图,对本发明的实施例进行描述。
另外,本发明实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
参考图1,图1为现有技术中NVMe over Fabric架构中一种实现方式架构示意图。图1中包括Host 100、Target 200和Target 210。其中,Host 100是主机,主要负责发起数据的读写,例如发送数据的读写命令等。Target 200和Target 210是目标存储设备,在NVMe协议中也称为NVM Subsystem,主要负责接收并且执行主机Host 100发送的读写命令。其中,主机Host 100的具体形态包括但不限于物理服务器或物理服务器上的虚拟机,所述物理服务器可以是包括CPU、内存和网卡等组成部件的计算机设备等。Target 200可以是一个独立的物理硬盘系统,如图1所示,Target 200包括网卡201和一个以上的硬盘,网卡201和一个以上的硬盘分别连接。需要说明的是,图1中以三个硬盘为例进行说明,在具体实现时,Target 200可以包括一个以上的硬盘。Target 200中的硬盘可以是固态磁盘(英文:SSD,Solid State Disk)或硬盘驱动器(英文:HDD,Hard Disk Driver)等具备存储功能的存储介质。其中网卡201具有网络接口卡的功能,可以是NVMe over Fabric中的远端网络接口卡(英文:RNIC,Remote Network Interface Card),网卡201通过Fabric网络与Host 100进行与数据读写命令或数据传输相关的通信。
Target210的结构与Target 200类似,包括网卡211和一个以上的硬盘。Target 210中组成部件(网卡211和硬盘等)的功能和实现方式,与Target 200中的组成部件(网卡201和硬盘)的功能和实现方式类同。在具体实现时,还可以有多个Target,图1只是示出两个Target(Target 200和Target 210)为例进行说明。
在已有技术中,Host 100向Target 200发送的命令(例如Write Command或Read Command),可能会在一个时间段内发送多个。当有多个命令需要发送时,Host 100会通过队列的形式发送这些命令,先写入队列的命令先被执行。并且,Host 100可能有多个CPU,或一个CPU中有多个线程。多个CPU或多个线程可以并行处理多个命令。因此,在NVMe over Fabric架构中,可以有多个队列来实现命令的并行处理。
通常情况下,网卡201中会包含多个队列,每个队列是Host 100发送的命令所构成的。当多个队列中有多个命令时,会存在一个队列中的命令所要读写的数据占用网卡201的网卡内存的大部分存储空间,而其它队列中的命令会因申请不到足够的网卡内存的存储空间来缓存需要读写的数据而导致命令不能及时被执行。
不能及时被执行的命令,需要等待网卡201的网卡内存的其它存储空间被释放,才能再次申请可用存储空间。这样的操作会给网卡201的设计和实现带来很多的复杂性,例如,这些复杂性至少包括下述之一:
1)当网卡201的网卡内存的存储空间小于需要传输数据所占用的存储空间时,网卡201需要暂时缓存对应的Write Command或Read Command;
2)当网卡201长时间没有申请到可用存储空间(例如网卡201的网卡内存长时间没有可用存储空间)时,需要设计一种机制老化(例如删除)长时间被缓存的Write Command或Read Command;
3)老化掉长时间缓存的Write Command或Read Command之后,还需要一种机制通知host 100,host 100需重新发送相关的命令或数据。
为解决上述实现机制复杂的问题,本发明实施例提供一种NVMe over Fabric架构中数据读写命令的控制方法、设备和系统。为清楚地对本发明实施例提供一种NVMe over Fabric中数据传输的方法进行说明,本发明实施例以主机Host与一个Target连接并实现数据传递为例进行说明。对于Host与多 个Target连接并实现数据传递的情况,可以参照Host与一个Target连接的情况来实现,不再赘述。
需要说明的是,作为存储设备的Target,在具体实现时,可以由网卡、独立的现场可编程门阵列(英文:FPGA,field programmable gate array)芯片或Target中的中央处理器(英文:CPU,central processing unit)来接收作为控制设备的Host发送的数据读写命令。本发明实施例将存储设备中接收控制设备发送的数据读写命令的网卡、FPGA芯片或CPU等,统称为数据处理单元。可以理解,本发明实施例中的数据处理单元,还可以是与网卡、FPGA芯片或CPU具有相同功能的单元或实体,只要能够接收作为控制设备的Host发送的数据读写命令并处理,都可以作为本发明实施例的存储设备中的数据处理单元。
当网卡作为存储设备中的数据处理单元时,网卡内存用于缓存网卡接收到的数据读写命令所要传输的数据。当FPGA作为存储设备中的数据处理单元时,FPGA中的存储单元用于缓存FPGA接收到的数据读写命令所要传输的数据。当存储设备中的CPU作为存储设备中的数据处理单元时,CPU的内存用于缓存CPU接收到的数据读写命令所要传输的数据,即通过共享CPU的内存实现数据的缓存。另外,在Target上的缓存单元,例如以DDR作为缓存的缓存设备,也可以作为网卡、FPGA或CPU的缓存。本发明实施例中将上述网卡内存、FPGA芯片中的存储单元、CPU的内存或Target上的缓存单元,统称为缓存单元。可以理解,本发明实施例中的缓存单元,还可以是与网卡内存、FPGA芯片中的存储单元或CPU的内存具有相同功能的其它存储介质,只要能够用于缓存作为控制设备的Host发送的数据读写命令所要传输的数据,都可以作为本发明实施例的存储设备中的缓存单元。并且,上述网卡内存、FPGA芯片中的存储单元、CPU的内存或Target上的缓存单元也可以组成一个缓存资源池,在具体实现时,可以由网卡、FPGA芯片或CPU中的一个或多个接收Host发送的数据读写命令,并将需要传输的数据缓 存在该缓存资源池中。
下面以网卡作为存储设备中的数据处理单元,网卡内存作为存储设备中的缓存单元,Target作为存储设备,Host作为控制设备,对本发明实施例进行说明。可以理解,对于FPGA和CPU作为数据处理单元的实现方式,可以参考网卡作为数据处理单元的实现方式来实现。对于FPGA芯片中的存储单元、Target上的缓存单元或CPU的内存作为缓存单元的实现方式或其组成的资源池的实现方式,可以参考网卡内存作为缓存单元的实现方式来实现,不再赘述。
下面以图2所示的架构和组成方式为例,对本发明实施例提供的一种NVMe over Fabric中数据读写命令的控制方法进行说明。如图2所示,Host300与Target400通过Fabric网络连接。具体的,Host300与Target400之间可以通过iWarp、ROCE、Infiniband、FC或Omni-Path等网络实现连接和通信。
Host 300包括CPU 301、内存302和网卡303等硬件组成,Target 400包括网卡401和一个以上的硬盘。Host 300是主机,主要负责发起数据的读写,例如向Target 400发送数据的读写命令等。主机Host 300的具体形态包括但不限于物理服务器或物理服务器上的虚拟机,所述物理服务器可以是包括CPU、内存和网卡等组成部件的计算机设备等。需要说明的是,在主机Host300为物理服务器上的虚拟机的情况下,以上所述的Host 300包括CPU 301、内存302和网卡303等硬件组成指的是物理服务器分配给该虚拟机使用的CPU、内存和网卡等资源。同样的,在Target 400中的网卡401也可以为虚拟网卡,该虚拟网卡为Target 400中的物理网卡分配给该虚拟网卡使用的网卡资源。
Target 400是目标存储设备,在NVMe中也称为NVM Subsystem,主要负责接收并执行主机Host 300发送的读写命令。Target 400中的硬盘可以是SSD 或HDD等具有存储功能的介质,图2中以三个硬盘为例进行说明。网卡401包括网卡处理器4011和网卡内存4012。网卡401具有网络接口卡的功能,可以是NVMe over Fabric中的RNIC。网卡401通过NVMe over Fabric架构中的网络与Host 300进行与数据读写命令或数据传输相关的通信。
需要说明的是,图2以网卡内存4012位于网卡401中,即网卡401中包括网卡内存4012为例进行说明。在具体实现时,网卡内存4012也可以位于网卡401的外部,即Target 400中的网卡内存可以是独立于网卡401的存储介质。本发明实施例中,独立于网卡401的存储介质,可以是DDR等存储介质。作为另一种可选的实现方式,网卡401的网卡内存4012也可以是Target 400中多个网卡的内存资源共同构成的一个内存资源池。本发明实施例不限定网卡内存的具体呈现形式。对于网卡内存的其它实现方式,可以参照网卡内存4012位于网卡401中的实现方式来实现,不再赘述。
本发明实施例提供的一种NVMe over Fabric中数据读写命令的控制方法方法,通过将Target400中网卡401的网卡内存4012划分成不同的存储空间,并建立每个存储空间与网卡401中的命令队列的一一对应关系。每个命令队列中的命令所要传输的数据,只能被缓存在对应的网卡内存的存储空间中。这样,能够避免因某一个队列中命令执行占用大量的网卡内存的存储空间,导致的其它队列中的命令不能及时被执行的问题。
具体的,如图3所示,本发明实施例提供的一种NVMe over Fabric架构中数据读写命令的控制方法包括:
步骤300:Host 300向网卡401发送控制命令,所述控制命令包括将所述网卡内存的存储空间划分为两个以上的存储空间的信息;
其中,Host 300作为Target 400的控制方,能够通过网络发送控制命令给Target 400,以实现对Target 400的控制。Host 300向Target 400发送的控制命令包括将网卡内存4012的存储空间划分为两个以上的存储空间的指示信 息。
其中,网卡内存4012中每个存储空间与命令队列之间的对应关系,可以是一个命令队列对应一个存储空间,也可以是由两个以上的命令队列组成的队列组对应一个存储空间。可以理解,被划分的两个以上的存储空间之间相互独立,任意两个存储空间之间互不影响。
本发明实施例中,命令队列可以是一个Host发送的命令的命令队列,不同Host发送的命令,对应不同的队列;也可以是不同CPU或不同线程发送的命令的命令队列,不同CPU发送的命令对应不同的队列,或不同线程发送的命令对应不同的队列等。
步骤302:所述网卡401根据所述控制命令,将所述网卡内存4012的存储空间划分为两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系;并接收所述Host 300发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到所述第一数据读写命令所在的命令队列对应的网卡内存4012的存储空间中。
通过上述方法,不同的命令队列对应不同的网卡内存的存储空间。由于Host 300发送的数据读写命令所要传输的数据被缓存在与该数据读写命令所在的命令队列所对应的存储空间中,不同命令队列所对应的网卡内存的存储空间互不干扰。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的网卡内存的存储空间,导致的其它命令队列中的数据读写命令因网卡内存的存储空间不足而无法被执行的问题。
其中,网卡401建立所述两个以上的存储空间与命令队列之间的对应关系可以是网卡401根据所述Host 300发送的命令中携带的命令队列与两个以上的存储空间之间的对应关系的信息,建立所述两个以上的存储空间与命令队列之间的对应关系。网卡401建立所述两个以上的存储空间与命令队列 之间的对应关系也可以是网卡401根据被划分的两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系。具体的,网卡401可以根据预先设置的对应关系的模板,建立所述两个以上的存储空间与命令队列之间的对应关系。例如,模板中设置优先级高的队列对应的存储空间大,对流量要求高的队列模板中设置的存储空间大等等。网卡401可以根据具体的业务实现场景,在预先设置的模板的基础上,建立所述两个以上的存储空间与命令队列之间的对应关系。
可选的,步骤300中,Host 300将网卡内存4012的存储空间划分为多个存储空间可以有多种实现方式,包括但不限于:
方式一:Host 300根据存储空间的大小划分网卡内存4012的存储空间。在根据存储空间大小划分网卡内存4012的存储空间时,每个存储空间的大小可以相同,也可以不同。例如,网卡内存4012总的存储空间是100GB,在命令队列是10个的情况下,网卡内存4012中被划分的存储空间的大小可以是10GB;也可以是其中8个命令队列对应的存储空间都是10G,另外两个命令队列对应的存储空间分别是15G和5G。
方式二:Host 300根据控制器402中不同命令队列的服务质量(QOS,Quality of Service),分配不同的存储空间。即Host 300对QOS要求高的命令队列,分配的存储空间大,对QOS要求低的命令队列,分配的存储空间小。例如,网卡内存4012总的存储空间是100GB,在命令队列是10个的情况下,命令队列1的QOS要求高于命令队列2的QOS要求,则Host 300分配给命令队列1的存储空间的大小为15GB,分配给命令队列2的存储空间的大小为5GB。
方式三:Host 300根据不同命令队列的优先级,分配不同的存储空间。即Host 300对优先级要求高的命令队列,分配的存储空间大,对优先级要求低的命令队列,分配的存储空间小。例如,网卡内存4012总的存储空间是 100GB,在控制器402中的命令队列是10个的情况下,命令队列1的优先级高于队列2的优先级,则Host 300分配给命令队列1的存储空间的大小为15GB,分配给命令队列2的存储空间的大小为5GB。
作为一种可选的实现方式,Host 300还可以将多个命令队列绑定为一个队列组,该命令队列组对应的网卡内存4012中的存储空间,是这个队列组中每个命令队列对应的存储空间的总和。这样,能够更灵活地实现网卡内存4012中存储空间的配置,以满足不同命令队列对网卡内存4012的存储空间的不同需求。
在具体实现时,在一段时间内,可能会出现部分命令队列中执行的数据读写命令所要传输的数据,占用较多的存储空间,而部分命令队列中执行的数据读写命令占用的存储空间较少。例如,命令队列1对应的存储空间是10GB,命令队列2对应的存储空间是12GB;在一段时间内,执行命令队列1中的数据读写命令所要缓存的数据,占用了8GB的存储空间,即占用了命令队列1对应的网卡内存4021的存储空间中的80%;而执行命令队列2中的数据读写命令所要缓存的数据占用了4.8GB的存储空间,即只占用命令队列2对应的网卡内存4021的存储空间中的40%。为更好的利用网卡内存4012的存储资源,Host 300可以临时调整,缩小命令队列2对应的网卡内存4012的存储空间,扩大命令队列1对应的网卡内存4012的存储空间。即将命令队列2对应的网卡内存4012的存储空间中的部分存储空间,划分给命令队列1。通过这种方式,Host 300可以适时地调整分配给不同命令队列对应的网卡内存的存储空间,以更灵活地满足实际业务的需要。
具体的,可以是Host 300中的CPU获取预设时间内,第一命令队列所对应的网卡内存4012的存储空间的占用比例,和第二命令队列所对应的网卡内存4012的存储空间的占用比例。当第一命令队列所对应的网卡内存4012的存储空间的占用比例大于预设的第一阈值(例如80%),且第二命令队列 所对应的网卡内存4012的存储空间的占用比例小于预设的第二阈值(例如30%)时,Host 300中的CPU发送向网卡401发送命令,以控制网卡401增加所述第一命令队列所对应的网卡内存4012的存储空间,并相应减少所述第二命令队列所对应的网卡内存4012的存储空间。即将减少的所述第二命令队列所对应的所述网卡内存的存储空间,分配给所述第一命令队列所对应的所述网卡内存的存储空间。可选的,网卡401也可以获取不同命令队列所对应的存储空间的占用比例,并根据获取到的占用比例对不同命令队列所对应的存储空间进行调整。不再赘述。
可选的,Host 300增加所述第一命令队列所对应的网卡内存4012的存储空间的方式,可以按照固定比例增加的方式,例如在固定时间内依次增加对应的网卡内存4012的存储空间的10%,分3次完成存储空间的容量的增加。也可以按照预先设定的比例,一次增加预设的比例,例如一次增加对应的网卡内存4012的存储空间的30%等。
需要说明的是,当Host 300调整命令队列对应的网卡内存4012的存储空间的容量时,为避免网卡内存4012缓存数据的失败,网卡401暂停向网卡内存4012中缓存数据。
上述实现方式中,是以网卡内存作为缓存单元,网卡作为数据处理单元来描述的。在具体实现时,可能会存在着网卡内存与CPU的内存共同作为缓存单元的情况。在这种情况下,可以将网卡内存对应部分的命令队列,CPU的内存对应另一部分的命令队列。当不同的命令队列有不同的优先级或不同的QOS要求时,可以将网卡内存的存储空间对应优先级高或对QOS要求高的命令队列,将所述存储设备中的CPU的内存的存储空间分配对应优先级低或QOS要求低的命令队列。由于网卡内存作为缓存单元时缓存数据的速度快、效率高,因此,将高优先级的命令队列分配给网卡内存的存储空间,能够满足高优先级的命令的业务需求。
可以理解,当缓存单元包括FPGA芯片中的存储单元和所述存储设备中的CPU的内存时,也可以将FPGA芯片中的存储单元对应优先级高或对QOS要求高的命令队列,将所述存储设备中的CPU的内存的存储空间分配对应优先级低或QOS要求低的命令队列。
进一步的,对于多个命令队列中的任意一个命令队列,网卡401在缓存该命令队列中的数据读写命令所要传输的数据的时候,也可能存在该命令队列对应的网卡内存4012的存储空间不足,导致当该命令队列中的数据读写命令因申请不到足够的网卡内存4012的存储空间,导致无法及时被执行所带来的复杂处理机制的问题。
为进一步提升本发明实施例提供的技术方案的技术效果,在上述步骤300和步骤302的基础上,本发明实施例进一步提供三种可能的实现方式,以解决多个命令队列中的任意一个命令队列所对应的网卡内存的存储空间不足,导致该命令队列中的命令无法被执行所带来的复杂处理机制的问题。需要说明的是,下面分别描述三种可能的实现方式是同等的实现方式。本发明实施例中用第一、第二和第三描述的描述方式,只是为了清楚描述该三种实现方式,并不代表此三种实现方式存在先后优劣顺序。并且,本发明实施例描述的第一命令队列,是命令队列中的任意一个命令队列;本发明实施例描述的第一数据读写命令,也是任意一个第一数据读写命令。
第一种可能的实现方式
在上述步骤302之后,如图3所示,本发明实施例提供的方法进一步包括:
步骤304A:Host 300在发送第一命令队列中的第一数据读写命令之前,向网卡401发送获取所述第一命令队列对应的网卡内存4012的存储空间的可用存储空间的请求;
其中,所述第一命令队列对应的网卡内存4012的存储空间中可用存储 空间,是网卡401在接收到Host 300发送的请求时,所述第一命令队列对应的网卡内存4012的存储空间中当前未被占用的存储空间。
Host 300向网卡401发送获取所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的请求,可以通过发送请求消息来实现。所述请求消息携带请求网卡401返回所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的请求。例如,所述请求消息可以是一个请求报文,该请求报文中包含获取网卡内存4012的可用存储空间的字段。本发明实施例不限定请求消息的形式,也不限定请求消息中携带的指示网卡401返回网卡内存4012可用存储空间的信息的形式。或者,Host 300也可以通过读取记录有网卡内存4012可用存储空间信息的寄存器,来获取网卡内存4012可用存储空间的信息。
步骤306A:Host 300接收所述网卡401返回的所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的信息;
具体的,可以是网卡401接收到Host 300发送的获取所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的请求后,将所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的信息,携带在响应消息中返回给Host 300。Host 300从所述响应消息中,获取所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的信息。
其中,所述网卡401返回的所述网卡内存4012的可用存储空间,是所述网卡401接收到Host 300发送的请求时网卡内存4012的可用存储空间。因此,网卡401返回给Host 300的所述网卡内存4012的可用存储空间,也是所述网卡内存4012的实时可用存储空间。
步骤308A:Host 300根据所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的信息,判断所述第一数据读写命令所要传输的数据所占用的存储空间,是否小于或等于所述第一命令队列对应的网卡内存 4012的存储空间中可用存储空间;
例如,所述网卡内存4012的可用存储空间的大小为100MB,所述需要存储的数据所占用的存储空间为50MB,在判断所述需要存储的数据所占用的存储空间是否小于或等于所述网卡内存4012可用存储空间时,可以通过判断50MB小于100MB来实现。或者,所述网卡内存4012的可用存储空间的长度为50个数据块,所述需要存储的数据所占用的存储空间为60个数据块,在判断所述需要存储的数据所占用的存储空间是否小于或等于所述网卡内存4012可用存储空间时,可以通过判断60个数据块大于50个数据块来实现。
步骤310A:当所述第一数据读写命令所要传输的数据所占用的存储空间,小于或等于所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间时,所述Host 300发送所述第一数据读写命令;
步骤312A:当所述第一数据读写命令所要传输的数据所占用的存储空间,大于所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间时,所述Host 300暂停发送所述第一数据读写命令。
附图中步骤312A位于步骤310A下方,只是为了附图清晰化的设置,并不代表步骤312A和步骤310A有先后的执行顺序,本发明实施例中,步骤步骤312A和步骤310A是并列的实现步骤。
这样,Host 300会在第一命令队列对应的网卡内存4012的存储空间能够缓存需要传输的数据时,才发送所述第一命令队列中的第一数据读写命令。能够避免所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间不足,因缓存所述第一命令队列中的命令所带来的处理机制复杂的问题。
可选的,所述Host 300暂停发送所述第一数据读写命令达到预设时间后,可以再重新发送获取所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的请求。当所述第一数据读写命令所要传输的数据所占 用的存储空间,小于所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间时,再发送所述第一数据读写命令。
所述Host 300暂停发送所述第一数据读写命令的预设时间,可以是系统默认的时间或预先配置的时间。在所述预设时间设定的时间范围内,Host300不执行步骤304A。具体的,可以通过在Host 300中设定定时器的方式,设定所述Host 300暂停发送所述第一数据读写命令的预设时间,Host 300在定时器设定的时间到达后再启动执行步骤304A-步骤312A。可以理解,所述Host 300暂停发送所述第一数据读写命令的预设时间,可以根据具体的业务情况进行灵活设定。
第二种可能的实现方式
本发明实施例提供的第二种可能的实现方式,是对第一种可能的实现方式的进一步优化。在本实现方式中,Host 300不用在每次发送所述第一命令队列中的数据读写命令之前,都发送获取所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的信息的请求。只是在收到网卡401发送的不能缓存所述第一命令队列中的数据读写命令所要传输数据的反压报文时,才启动上述步骤304A-步骤312A的流程。
由于在具体实现时,当所述第一命令队列对应的网卡内存4012的存储空间中当前可用的存储空间足够大,且可以缓存较多的数据时,Host 300可以不用在每次发送命令之前,都发送获取可用存储空间的请求。只在收到网卡401发送的反压报文时,才启动上述步骤304A-步骤312A的流程。这样,不仅能够有效、地解决现有技术中的技术问题,还能够进一步提升Host 300发送命令时的效率。节省Host 300因发送获取网卡内存4012可用存储空间的请求所造成的资源占用。同样的,由于网卡401也不需要每次接收Host 300发送的请求命令都要返回可用存储空间的信息,也相应节省了网卡401的资源占用。
具体的,在步骤304A之前,还包括Host 300发送第一命令队列中的第二数据读写命令,Host 300发送所述第二数据读写命令之前,不需要获取所述第一命令队列对应的网卡内存4012中存储空间的可用存储空间的信息。如果所述第二数据读写命令需要传输的数据大于所述第一命令队列对应的网卡内存4012中存储空间的可用存储空间,网卡401会产生一个反压报文并发送给Host 300。所述反压报文指示所述第一命令队列对应的网卡内存4012中存储空间的可用存储空间,不能缓存Host 300发送的第二数据命令所要传输的数据。
当Host 300接收到网卡401发送的所述反压报文后,在发送第一命令队列中的其它数据读写命令(例如第一数据读写命令)之前,先获取所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间的信息,并在判断所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间,能够缓存所述第一命令队列中的其它数据读写命令(例如第一数据读写命令)所要传输的数据时,才发送该其它数据读写命令(例如第一数据读写命令)。即Host 300接收到网卡401发送的反压报文后,再执行步骤304A-步骤312A的流程。
进一步的,在执行步骤304A-步骤312A的流程达到预设时间后,所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间,恢复到有足够空间缓存数据时,可以不用再执行步骤304A-步骤312A的流程。即在执行步骤304A-步骤312A的流程达到预设时间后,在需要发送所述第一命令队列中的数据读写命令时,Host 300直接将数据读写命令给网卡401。
本发明实施例中,执行步骤304A-步骤312A的流程的预设时间,可以根据需要具体的设定,可以是系统默认的时间,也可以是基于管理员预先下发的设定时间。并且,执行步骤304A-步骤312A的流程的预设时间,可以根据实际的业务情况实时的变更,例如在所述第一命令队列对应的网卡 内存4012的存储空间占用率较高的情况下,执行步骤304A-步骤312A流程的预设时间长;在所述第一命令队列对应的网卡内存4012的存储空间占用率较低的情况下,执行步骤304A-步骤312A流程的预设时间短等。
上述实施例中,网卡401向Host 300发送的反压消息,可以是直接产生的消息或报文,也可以是通过响应消息携带的消息或报文。例如可以是所述第一命令队列对应的网卡内存4012的存储空间不足时直接产生的消息或报文;也可以是网卡401向Host 300返回的命令响应消息,在响应消息中携带所述第一命令队列对应的网卡内存4012的存储空间不足的信息。其它类型的消息或报文,只要能够携带所述第一命令队列对应的网卡内存4012的存储空间的可用存储空间不足,且不能缓存Host 300发送的所述第一命令队列中的数据读写命令所要传输的数据消息,都可以作为网卡401向Host 300发送的反压报文。可选的,所述反压消息中携带的、所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间不足且不能存储Host 300需要存储的数据的信息,可以是错误码或预先设定的标识等。
进一步的,Host 300接收到内存401发送的反压消息后,在执行步骤304A-步骤312A的过程中,还包括重传所述第二数据读写命令。即对于因所述第一命令队列对应的网卡内存4012的存储空间不足,不能及时被执行的所述第二数据读写命令,Host 300在判断所述第一命令队列对应的网卡内存4012的存储空间大于所述第二数据读写命令所要传输的数据时,Host 300重新发送所述第二数据读写命令。
第三种可能的实现方式
在本实现方式中,Host 300获取并记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。Host 300每次在发送所述第一命令队列中的第一数据读写命令时,先判断记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间是否大于或等于所述第一数据 读写命令所要传输的数据占用的存储空间。在记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,大于或等于所述第一数据读写命令所要传输的数据占用的存储空间时,Host 300发送所述第一数据读写命令。在记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,小于所述第一数据读写命令所要传输的数据占用的存储空间时,Host 300暂停发送所述第一数据读写命令。
由于Host 300发送所述第一数据读写命令,是所述第一命令队列对应的网卡内存4012的存储空间能够缓存所述第一数据读写命令所要传输的数据时发送的。因此,能够避免因所述第一命令队列所对应的网卡内存4021的存储空间不足,而缓存所述第一命令队列中的命令所带来的复杂处理机制的问题。
具体的,如图3所示,在上述步骤302之后,本发明实施例提供的方法进一步包括:
步骤304B:Host 300获取并记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间;
其中,Host 300可以将获取的网卡内存4012的可用存储空间记录在本地,即将获取的网卡内存4012的可用存储空间记录在Host 300中。
具体的,Host 300可以在Target 400上电初始化时,获取所述第一命令队列对应的网卡内存4012的存储空间的可用存储空间,作为所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。由于在Target400上电初始化时,网卡内存4012还未缓存有数据,因此获取到的所述第一命令队列对应的网卡内存4012的存储空间的可用存储空间,就是所述第一命令队列对应的网卡内存4012的存储空间的总存储空间。将该总存储空间作为记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,能够最大化的利用所述第一命令队列对应的网卡内存4012的存 储空间。
可选的,Host 300也可以在Target 400上电初始化后的任一时间,获取所述第一命令队列对应的网卡内存4012的存储空间的可用存储空间,作为所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。此时获取的所述第一命令队列对应的网卡内存4012的存储空间的可用存储空间,会因缓存有数据而小于所述第一命令队列对应的网卡内存4012的存储空间的总存储空间。
步骤306B:Host 300在发送所述第一命令队列中的第一数据读写命令之前,获取本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,判断所述第一数据读写命令所要传输的数据占用的存储空间,是否小于或等于本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间;
其中,本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,就是Host 300记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。
具体的,所述Host 300记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,可以是可存储数据的空间的大小。相应的,所述第一数据读写命令所要传输的数据占用的存储空间,可以是所述第一数据读写命令所要传输的数据所占用的存储空间的大小。在判断所述第一数据读写命令所要传输的数据占用的存储空间,是否小于或等于记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间时,可以通过判断所述第一数据读写命令所要传输的数据所占用的存储空间的大小,是否小于或等于记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间的大小来实现。
当然,也可以用其它形式表述记录的所述第一命令队列对应的网卡内 存4012的存储空间的实时可用存储空间,例如所述第一命令队列对应的网卡内存4012的存储空间可被写入的数据块的个数。相应的,所述第一数据读写命令所要传输的数据占用的存储空间,是所述第一数据读写命令所要传输的数据块的个数。在判断所述第一数据读写命令所要传输的数据占用的存储空间,是否小于或等于记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间时,可以通过判断所述第一数据读写命令所要传输的数据块个数,是否小于或等于所述第一命令队列对应的网卡内存4012的存储空间可被写入的数据块的个数来实现。
步骤308B:在所述第一数据读写命令所要传输的数据占用的存储空间,小于或等于本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间时,Host 300发送所述第一数据读写命令给Target 400,并将记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,减去所述第一数据读写命令所要传输数据占用的存储空间,得到更新后的本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间;
当Host 300发送所述第一数据读写命令后,所述第一数据读写命令所要传输的数据,会被缓存在所述第一命令队列对应的网卡内存4012的存储空间中。因此,需要将本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,减去所述第一数据读写命令所要传输的数据所占用的存储空间,才能正确记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。
步骤310B:在所述第一数据读写命令所要传输的数据占用的存储空间,大于本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间时,Host 300暂停发送所述第一数据读写命令。
这样,Host 300会在第一命令队列对应的网卡内存4012的存储空间能够 缓存需要传输的数据时,才发送所述第一命令队列中的第一数据读写命令。这样能够避免所述第一命令队列对应的网卡内存4012的存储空间中可用存储空间不足,因缓存所述第一命令队列中的命令所带来的处理机制复杂的问题。
附图中步骤310B位于步骤308B下方,只是为了附图清晰化的设置,并不代表步骤310B和步骤308B有先后的执行顺序,本发明实施例中,步骤310B和步骤308B是并列的实现步骤。
在上述步骤308B之后,本发明实施例提供的实现方式还包括:
步骤312B(图中未示出):当缓存在网卡内存4012中的所述第一数据读写命令所要传输的数据被迁移到目的地址后,网卡401将完成迁移的响应消息发送给Host 300;
其中,所述第一数据读写命令所要传输的数据被迁移到目的地址,因所述第一数据读写命令是写命令还是读命令而有不同。当所述第一数据读写命令是写命令时,所述第一数据读写命令所要传输的数据被迁移到Target400的硬盘中;当所述第一数据读写命令是读命令时,所述第一数据读写命令所要传输的数据被迁移到Host 300中。
步骤314B(图中未示出):Host 300根据接收到的响应消息,将本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,加上所述第一数据读写命令所要传输的数据所占用的存储空间。
由于Host 300在收到对所述第一数据读写命令所要传输的数据完成迁移的响应消息时,所述第一数据读写命令所要传输的数据已经从网卡内存4012迁移出。所述第一命令队列对应的网卡内存4012的存储空间将会增加相应的可用存储空间,即释放所述第一数据读写命令所要传输的数据所占用的存储空间。因此,将Host 300本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,加上所述第一数据读写命令 所要传输的数据占用的存储空间,能够正确记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。
上述步骤310B中,Host 300暂停发送所述第一数据读写命令,还可以在等待预设时间后再重新执行步骤306B。Host 300等待的预设时间,可以是默认的预设时间,也可以基于具体业务的需要而设置的预设时间。在达到预设时间后,Host 300再次执行步骤306B,即再次判断所述第一数据读写命令所要传输的数据占用的存储空间,是否小于或等于本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。如果所述第一数据读写命令所要传输的数据占用的存储空间,小于或等于本地记录的所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间,则执行步骤308B。
Host 300在预设时间后再执行步骤306B,能够避免在所述第一命令队列对应的网卡内存4012的存储空间小于所述第一数据读写命令所要传输的数据时,因反复地执行判断的步骤所带来的Host 300资源的占用和消耗。可以理解,Host 300等待的预设时间可以基于实际的情况灵活调整。
上述步骤304B中,Host 300记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间的位置可以有多种实现方式。例如,可以是记录在Host 300中专门的存储空间中,例如专门的芯片用于存储所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间。也可以是存储在Host 300中已有的存储部件中,例如CPU 301的缓存中,内存302中或网卡303的缓存中,还可以是FPGA芯片中的一个存储空间中。
本发明实施例中,Host 300记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储空间的方式,也可以有多种实现方式,例如,以一个表格的形式记录,或一个变量的形式记录等等。本发明实施例不限定记录所述第一命令队列对应的网卡内存4012的存储空间的实时可用存储 空间的具体形式。
在上述三种可能的实现方式中,无论是步骤310A还是步骤308B,在Host300发送所述第一数据读写命令之后,所述第一数据读写命令所要传输的数据会被缓存在所述第一命令队列对应的网卡内存4012的存储空间中。并且,缓存在网卡内存4012中的所述第一数据读写命令所要传输的数据,会被迁移到目的地址对应的存储空间中。所述网卡401缓存所述第一数据读写命令所要传输的数据的方式,以及缓存在网卡内存4012中的所述第一数据读写命令所要传输的数据被迁移的方式,会因所述第一数据读写命令为写命令或读命令的不同而有不同的实现方式。
下面分别就所述第一数据读写命令是写命令和读命令两种情况,详细描述网卡401缓存所述第一数据读写命令所要传输的数据的方式,以及对缓存的数据迁移的方式。
一、所述第一数据读写命令为写命令
当所述第一数据读写命令是写命令时,所述Host 300发送的所述第一数据读写命令为Write Command,所述第一数据读写命令所要传输的数据是需要存储的数据。所述Write Command中携带SGL,所述SGL中包括一个字段,例如可以是一个entry,该字段包含所述需要存储的数据在Host 300中的源地址、所述需要存储的数据的长度、以及所述需要存储的数据在Target 400中的目的地址等信息。需要说明的是,所述SGL也可以包括多个字段,例如多个entry,每个entry都包含需要存储的数据在Host 100中的源地址、需要存储的数据的长度、以及需要存储的数据在Target 200中的目的地址等信息。当所述需要存储的数据包括多个地址段,即所述需要存储的数据在Host 100中是不连续,存在于多个地址段中时,就需要用多个entry来记录多个地址段中的数据。本发明实施例以SGL中包括一个entry为例进行说明。
网卡401根据Write Command中SGL携带的所述需要存储的数据在Host 300中的源地址,将所述需要存储的数据缓存在所述第一命令队列对应的网卡内存4012的存储空间中。可选的,网卡401可以RDMA的方式,通过网卡303接收所述需要存储的数据。
当所述需要存储的数据缓存在所述第一命令队列对应的网卡内存4012的存储空间后,网卡401修改所述Write Command,将所述Write Command携带的所述需要存储的数据在Host 300中的源地址,修改为所述第一命令队列对应的网卡内存4012中存储所述需要存储的数据的地址,并将修改后的Write Command发送给目的硬盘的控制器。即网卡401发送给目的硬盘的控制器的Write Command携带的SGL中包括所述第一命令队列对应的网卡内存4012中存储所述需要存储的数据的地址,所述需要存储的数据的长度、以及所述需要存储的数据在Target 400中的目的地址等信息。
其中,目的硬盘是网卡401根据所述Write Command中所述需要存储的数据在Target 400中的目的地址确定的。网卡401能够根据所述需要存储的数据在Target 400中的目的地址,确定所述需要存储的数据在Target 400中的哪个硬盘中,并将所述需要存储的数据在Target 400中的目的地址所在的硬盘确定为目的硬盘。在Target 400中,每个硬盘都会对应一个地址段,网卡401根据所述Write Command的SGL中所述需要存储的数据在Target 400中的目的地址,确定该目的地址所在的地址段,与该地址段对应的硬盘即为目的硬盘。
网卡401在确定目的硬盘后,将修改后的Write Command发送给目的硬盘的控制器。目的硬盘的控制器根据接收到的Write Command中携带的所述需要存储的数据在所述网卡内存4012中的地址,从网卡内存4012中读取所述需要存储的数据,例如以RDMA或直接内存访问(英文:DMA,Direct Memory Access)的方式读取所述需要存储的数据。并将读取到的所述需要存储的数据写入目的硬盘对应的存储空间中。
本发明实施例中,网卡401与Target 400中的硬盘之间,可以基于NVMe over PCIe架构实现连接。因此,Target 400中目的硬盘的控制器与网卡401之间,可以通过NVMe over PCIe架构中连接和通信的方式,实现数据的传输或迁移。
二、所述第一数据读写命令为读命令
当所述第一数据读写命令是读命令时,所述Host 300发送的所述第一数据读写命令为Read Command,所述第一数据读写命令所要传输的数据是需要读取的数据。所述Read Command中携带SGL,所述SGL中包含所述需要读取的数据在Target 400中的源地址、所述需要读取的数据的长度、以及所述需要读取的数据要写入Host 300中的目的地址等信息。
网卡401收到所述Read Command后,修改所述Read Command,将所述Read Command中携带的所述需要读取的数据在Host 300中的目的地址,修改为所述第一命令队列对应的网卡内存4012的存储空间中缓存所述需要读取的数据的地址,并将修改后的Read Command发送给目的硬盘的控制器。即网卡401发送给目的硬盘控制器的Read Command携带的SGL中包括所述需要读取的数据在Target 400中的源地址、所述需要读取的数据的长度、以及所述第一命令队列对应的网卡内存4012的存储空间中缓存所述需要读取的数据的地址等信息。目的硬盘的控制器根据接收到的所述修改后的Read Command,将所述需要读取的数据迁移到所述第一命令队列对应的网卡内存4012的存储空间中。可选的,目的硬盘的控制器通过RDMA的方式,将所述需要读取的数据迁移到所述第一命令队列对应的网卡内存4012的存储空间中。
当所述需要读取的数据缓存在所述第一命令队列对应的网卡内存4012的存储空间后,网卡401根据所述Read Command中所述需要读取的数据要写入Host 300中的目的地址,将缓存的所述需要读取的数据发送给Host 300。 可选的,网卡401通过RDMA的方式,将缓存的所述需要读取的数据发送给Host 300。可选的,所述网卡401与Target 400中的硬盘之间基于NVMe over PCIe架构实现连接。网卡401与Target 400中目的硬盘的控制器之间,通过NVMe over PCIe架构中连接和通信的方式,将所述需要读取的数据缓存在所述第一命令队列对应的网卡内存4012的存储空间中。
其中,目的硬盘是网卡401根据所述Read Command中所述需要读取的数据在Target 400中的源地址确定的。网卡401能够根据所述需要读取的数据在Target 400中的源地址,确定所述需要读取的数据在Target 400中的哪个硬盘中,并将所述需要读取的数据在Target 400中的源地址所在的硬盘确定为目的硬盘。
具体的,在上述实现网卡内存4012中数据缓存和迁移过程中,可以由网卡401中的一个控制模块来实现对所述Write Command或Read Command的修改。该控制模块可以由一个物理芯片(例如ARM、X86或Power PC等处理器)实现,也可以由运行在物理芯片上的软件模块来实现,还可以是在物理芯片上通过虚拟机技术创建的一个或多个虚拟控制器。该控制模块可以是NVMe over Fabric中的Physical controller或NVM Controller。
本发明实施例中,可以是Host 300中的CPU301执行步骤300以及步骤304A-步骤312A或步骤304B-步骤310B的过程,也可以是Host 300中的网卡303执行步骤300以及步骤304A-步骤312A或步骤304B-步骤310B的过程。还可以是Host 300中的某一芯片或逻辑部件,来执行步骤300以及步骤304A-步骤312A或步骤304B-步骤310B的过程,例如可以是FPGA芯片等执行步骤300以及步骤304A-步骤312A或步骤304B-步骤310B的过程。
在实际实现中,上述步骤300以及步骤304A-步骤312A或步骤304B-步骤310B也可以由CPU301、网卡303、Host 300中的某一芯片或逻辑部件中的至少其中一个来实现。例如,网卡303执行上述步骤300以及步骤304A-步骤 306A,CPU301执行上述步骤308A-步骤312A;或网卡303执行上述步骤300以及步骤304B-步骤306B,CPU301执行上述步骤308B-步骤310B。也可以是CPU301执行上述步骤300以及步骤304A-步骤306A,网卡303执行上述步骤308A-步骤312A;或CPU301执行上述步骤300以及步骤304B-步骤306B,网卡303执行上述步骤308B-步骤310B。还可以是Host 300中的芯片或逻辑部件执行上述步骤300以及步骤304A-步骤306A,CPU301执行上述步骤308A-步骤312A;或者,网卡303执行上述300以及步骤304B-步骤306B,Host 300中的芯片或逻辑部件执行上述步骤308B-步骤310B等。本发明实施例不限定具体的执行步骤300以及步骤304A-步骤312A或步骤304B-步骤310B中的执行主体的实现方式。
当上述Host 300是通过虚拟机实现时,上述CPU301、网卡303分别对应虚拟机中的CPU和网卡,虚拟机中的CPU和网卡通过承载其虚拟功能的物理CPU和物理网卡来实现。其实现方式与上述实现方式类似,不再赘述。
图4(A)为本发明实施例一种NVMe over Fabr ic架构中控制设备与存储设备之间数据读写命令的控制方法的流程示意图。该控制方法应用于NVMe over Fabric架构中控制设备与存储设备之间的数据传输,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元内存用于缓存所述数据读写命令所需要传输的数据;如图4(A)所示,所述方法包括:
步骤400A:所述数据处理单元接收所述控制设备发送的控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
步骤402A:所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与 命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
步骤404A:所述数据处理单元接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
通过上述方法,缓存单元中被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
图4(B)为本发明实施例另一种NVMe over Fabric架构中控制设备与存储设备之间数据读写命令的控制方法的流程示意图。所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;如图4(B)所示,所述方法包括:
步骤400B:所述控制设备向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息,使得所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命 令所构成的队列;
步骤402B:所述控制设备向所述存储设备发送第一数据读写命令,所述第一数据读写命令所要传输的数据被缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
通过上述方法,缓存单元中被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
具体的,图4(A)和图4(B)所示的方法的详细实现过程,还可以参考上述图2和图3所示的实现方式来实现,不再赘述。例如,数据处理单元可以参考网卡401的方式来实现,缓存单元可以参考网卡内存4012的实现方式来实现,存储单元可以参考图2中的硬盘来实现,控制设备可以参考Host300的实现方式来实现,具体不再赘述。
图5为本发明实施例提供的一种存储设备500的结构示意图。所述存储设备500是NVMe over Fabric架构中的存储设备,所述存储设备500与所述NVMe over Fabric架构中的控制设备之间进行数据传输,所述存储设备500包括数据处理单元501和缓存单元502,所述数据处理单元501用于接收所述控制设备发送的数据读写命令,所述缓存单元502用于缓存所述数据读写命令所需要传输的数据;其中,所述数据处理单元501包括处理器5011,所述处理器5011用于执行下述步骤:
接收所述控制设备发送的控制命令,所述控制命令包括将所述缓存单 元502的存储空间划分为两个以上的存储空间的信息;
根据所述控制命令,将所述缓存单元502的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元502的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
上述存储设备500,通过将缓存单元中被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
本发明实施例中图5所示的存储设备500的详细实现方式,还可以参考上述图2和图3所示的实现方式来实现,不再赘述。例如,数据处理单元501可以参考网卡401的方式来实现,缓存单元502可以参考网卡内存4012的实现方式来实现,存储单元503可以参考图2中的硬盘来实现等。
图6为本发明实施例提供的一种控制设备600的结构示意图。所述控制设备600是NVMe over Fabric架构中的控制设备,所述控制设备600包括处理器601、网卡602和总线603,所述处理器601和网卡602通过总线603连接,所述控制设备600与NVMe over Fabric架构中的存储设备之间进行数据传输,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备600需要读写的数据缓存在所述存储设备的缓存单元中,并存储在所述存 储设备的存储单元;其中,所述处理器601用于执行下述步骤:
向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息,使得所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
向所述存储设备发送第一数据读写命令,所述第一数据读写命令所要传输的数据被缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
上述控制设备600,通过发送命令使得缓存单元被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的缓存单元的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元的存储空间,导致的其它命令队列中的数据读写命令因缓存单元的存储空间不足而无法被执行的问题。
本发明实施例中图6所示的控制设备600的详细实现方式,还可以参考上述图2和图3所示的实现方式来实现,不再赘述。例如,控制设备600可以参考Host 300的实现方式来实现等。
图7为本发明实施例提供的一种实现数据读写命令控制的系统的结构示意图。如图7所示,所述系统包括控制设备700和存储设备800,控制设备700和存储设备800之间基于NVMe over Fabric架构实现数据传输,存储设备800包括数据处理单元801、缓存单元802和存储单元803,所述控制设备700需要读写的数据存储在所述存储单元803中,所述数据处理单元801用于接收所述控制设备700发送的数据读写命令,所述缓存单元802用于缓存所 述数据读写命令所需要传输的数据;其中:
所述控制设备700,用于向所述数据处理单,801发送控制命令,所述控制命令包括将所述缓存单元802的存储空间划分为两个以上的存储空间的信息;
所述数据处理单元801,用于根据所述控制命令,将所述缓存单元802的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备700发送的数据读写控制命令所构成的队列;
所述数据处理单元801,还用于接收所述控制设备700发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元802的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
通过上述系统,缓存单元802中被划分的每个存储空间对应不同的命令队列,第一命令队列中的第一数据读写命令所要传输的数据被缓存在与所述第一命令队列所对应的存储空间中。这样,不同命令队列所对应的网卡内存的存储空间,分别用于缓存对应的命令队列中数据读写命令所要传输的数据。避免了因某一个命令队列中的数据读写命令所要传输的数据占用大量的缓存单元802的存储空间,导致的其它命令队列中的数据读写命令因网卡内存的存储空间不足而无法被执行的问题。
本发明实施例中图7所示的系统的详细实现方式,还可以参考上述图2和图3所示的实现方式来实现,不再赘述。例如,数据处理单元801可以参考网卡401的方式来实现,缓存单元802可以参考网卡内存4012的实现方式来实现,存储单元803可以参考图2中硬盘的实现方式来实现,控制设备700可以参考Host 300的实现方式来实现等。
需要说明的是,图7中,以缓存单元802在数据处理单元801中为例进行说明。在具体实现时,缓存单元802也可以位于数据处理单元801的外部,即存储设备800中的缓存单元802可以是独立于数据处理单元801的存储介质,例如DDR等存储介质。可选的,缓存单元802也可以是存储设备800中多个数据处理单元的存储资源共同构成的一个内存资源池。本发明实施例不限定网卡内存的具体呈现形式。
结合本发明公开内容所描述的方法或步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(英文:Random Access Memory,RAM)、闪存、只读存储器(英文:Read Only Memory,ROM)、可擦除可编程只读存储器(英文:Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(英文:Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(英文:CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及方法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上 述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文:ROM,Read-Only Memory)、随机存取存储器(英文:RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (30)

  1. 一种基于Fabric的非易失性高速传输总线NVMe,NVMe over Fabric,架构中控制设备与存储设备之间数据读写命令的控制方法,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其特征在于:
    所述数据处理单元接收所述控制设备发送的控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
    所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
    所述数据处理单元接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
  2. 根据权利要求1所述的方法,其特征在于,所述数据处理单元建立所述两个以上的存储空间与命令队列之间的对应关系包括:
    所述数据处理单元根据所述控制命令中携带的对应关系信息,建立所述两个以上的存储空间与命令队列之间的对应关系,所述对应关系信息是所述缓存单元中两个以上的存储空间与命令队列的对应关系;或,
    所述数据处理单元根据被划分的两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的 占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
    当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
  4. 根据权利要求1-3所述的任一方法,其特征在于,所述方法还包括:
    所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间;
    所述控制设备判断第一数据读写命令所要传输的第一数据占用的存储空间,是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
    在所述第一数据占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备;
    在所述第一数据占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令。
  5. 根据权利要求4所述的方法,其特征在于,所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间包括:
    所述控制设备在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
  6. 根据权利要求5所述的方法,其特征在于,在所述控制设备向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储 空间的请求之前,所述方法还包括:
    所述控制设备向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
    所述控制设备接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。
  7. 根据权利要求1-3所述的任一方法,其特征在于,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    所述控制设备在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;
    所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
  9. 根据权利要求1-8所述的任一方法,其特征在于,所述数据处理单元与所述存储单元之间通过基于快捷外围部件互连标准PCIe的NVMe,NVMe over PCIe,架构实现连接。
  10. 一种基于Fabric的非易失性高速传输总线NVMe,NVMe over Fabric,架构中控制设备与存储设备之间数据读写命令的控制方法,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数 据;其特征在于:
    所述控制设备向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息,使得所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
    所述控制设备向所述存储设备发送第一数据读写命令,所述第一数据读写命令所要传输的数据被缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间;
    所述控制设备判断第一数据读写命令所要传输的第一数据占用的存储空间,是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
    在所述第一数据占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备;
    在所述第一数据占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令。
  12. 根据权利要求10所述的方法,其特征在于,所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间包括:
    所述控制设备在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空 间。
  13. 根据权利要求12所述的方法,其特征在于,在所述控制设备向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求之前,所述方法还包括:
    所述控制设备向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
    所述控制设备接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。
  14. 根据权利要求11所述的方法,其特征在于,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    所述控制设备在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;
    所述控制设备在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
  16. 一种存储设备,所述存储设备是基于Fabric的非易失性高速传输总线NVMe,NVMe over Fabric,架构中的存储设备,所述存储设备与所述NVMe over Fabric架构中的控制设备之间进行数据传输,所述存储设备包括数据处理单元和缓存单元,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其特征在于,所述数据处理单元包括处理器,所述处理器用于执行下述步骤:
    接收所述控制设备发送的控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
    根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
    接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
  17. 根据权利要求16所述的存储设备,其特征在于,所述处理器建立所述两个以上的存储空间与命令队列之间的对应关系包括:
    根据所述控制命令中携带的对应关系信息,建立所述两个以上的存储空间与命令队列之间的对应关系,所述对应关系信息是所述缓存单元中两个以上的存储空间与命令队列的对应关系;或,
    根据被划分的两个以上的存储空间,建立所述两个以上的存储空间与命令队列之间的对应关系。
  18. 根据权利要求16或17所述的存储设备,其特征在于,所述处理器还用于:
    获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
    当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
  19. 一种控制设备,所述控制设备是基于Fabric的非易失性高速传输总线NVMe,NVMe over Fabric,架构中的控制设备,所述控制设备包括处理器、网卡和总线,所述处理器和网卡通过总线连接,所述控制设备与NVMeover Fabric架构中的存储设备之间进行数据传输,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据缓存在所述存储设备的缓存单元中,并存储在所述存储设备的存储单元;其特征在于,所述处理器用于执行下述步骤:
    向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息,使得所述数据处理单元根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
    向所述存储设备发送第一数据读写命令,所述第一数据读写命令所要传输的数据被缓存到第一命令队列对应的缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
  20. 根据权利要求19所述的控制设备,其特征在于,所述处理器还用于:
    获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述缓存单元的存储空间的占用比例;
    当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,发送调整命令给所述数据处理单元,所述调整命令用于减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
  21. 根据权利要求19或20所述的控制设备,其特征在于,所述处理器还用于执行下述步骤:
    获取所述第一命令队列对应的所述缓存单元的可用存储空间;
    判断第一数据读写命令所要传输的第一数据占用的存储空间,是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
    在所述第一数据占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备;
    在所述第一数据占用的存储空间大于所述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令。
  22. 根据权利要求21所述的控制设备,其特征在于,所述处理器获取所述缓存单元的可用存储空间包括:
    所述处理器在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
  23. 根据权利要求22所述的控制设备,其特征在于,在所述处理器向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求之前,所述处理器还用于执行下述步骤:
    向所述存储设备发送第二数据读写命令,所述第二数据读写命令所要传输的数据大于所述第一命令队列对应的所述缓存单元的可用存储空间;
    接收所述数据处理单元发送的反压消息,所述反压消息用于指示所述第一命令队列对应的所述缓存单元的可用存储空间不足。
  24. 根据权利要求21所述的控制设备,其特征在于,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
  25. 根据权利要求24所述的控制设备,其特征在于,所述处理器还用 于执行下述步骤:
    在发送所述第一数据读写命令后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间减去所述第一数据占用的存储空间;
    在接收到所述数据处理单元发送的完成所述第一数据读写命令的响应消息后,将本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间加上所述第一数据占用的存储空间。
  26. 一种实现数据读写命令控制的系统,所述系统包括基于Fabric的非易失性高速传输总线NVMe,NVMe over Fabric,架构中的控制设备和存储设备,所述存储设备包括数据处理单元、缓存单元和存储单元,所述控制设备需要读写的数据存储在所述存储单元中,所述数据处理单元用于接收所述控制设备发送的数据读写命令,所述缓存单元用于缓存所述数据读写命令所需要传输的数据;其特征在于:
    所述控制设备,用于向所述数据处理单元发送控制命令,所述控制命令包括将所述缓存单元的存储空间划分为两个以上的存储空间的信息;
    所述数据处理单元,用于根据所述控制命令,将所述缓存单元的存储空间划分为两个以上的存储空间,并建立所述两个以上的存储空间与命令队列之间的对应关系,所述命令队列是所述控制设备发送的数据读写控制命令所构成的队列;
    所述数据处理单元,还用于接收所述控制设备发送的第一数据读写命令,根据所述两个以上的存储空间与命令队列之间的对应关系,将所述第一数据读写命令所要传输的数据,缓存到第一命令队列对应的所述缓存单元的存储空间中,所述第一数据读写命令是所述第一命令队列中的数据读写命令。
  27. 根据权利要求26所述的系统,其特征在于,所述控制设备,用于获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述网卡内存缓存单元间的占用比例;
    当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,所述控制设备发送调整命令给所述数据处理单元,所述调整命令用于减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值;或,所述数据处理单元还用于获取预设时间内,第一命令队列所对应的所述缓存单元的存储空间的占用比例,和第二命令队列所对应的所述网卡内存缓存单元间的占用比例;
    当所述第一命令队列所对应的所述缓存单元的存储空间的占用比例大于预设的第一阈值,且所述第二命令队列所对应的所述缓存单元的存储空间的占用比例小于预设的第二阈值时,减少所述第二命令队列所对应的所述缓存单元的存储空间,并将减少的所述第二命令队列所对应的所述缓存单元的存储空间,分配给所述第一命令队列所对应的所述缓存单元的存储空间;其中,所述第一阈值大于所述第二阈值。
  28. 根据权利要求26或27所述的系统,其特征在于,
    所述控制设备,还用于获取所述第一命令队列对应的所述缓存单元的可用存储空间,判断第一数据读写命令所要传输的第一数据占用的存储空间是否小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间;
    所述控制设备,还用于在所述第一数据所占用的存储空间小于或等于所述第一命令队列对应的所述缓存单元的可用存储空间时,发送所述第一数据读写命令给所述存储设备;在所述第一数据所占用的存储空间大于所 述第一命令队列对应的所述缓存单元的可用存储空间时,暂停发送所述第一数据读写命令;
    所述数据处理单元,还用于接收所述控制设备发送的所述第一数据读写命令,并将所述第一数据读写命令所要传输的数据缓存在所述第一命令队列对应的所述缓存单元中。
  29. 根据权利要求28所述的系统,其特征在于,所述控制设备获取所述第一命令队列对应的所述缓存单元的可用存储空间包括:
    所述控制设备在向所述存储设备发送第一数据读写命令之前,向所述数据处理单元发送获取所述第一命令队列对应的所述缓存单元的可用存储空间的请求,以获取所述第一命令队列对应的所述缓存单元的可用存储空间。
  30. 根据权利要求28所述的系统,其特征在于,所述缓存单元的可用存储空间是本地记录的所述第一命令队列对应的所述缓存单元的实时可用存储空间。
PCT/CN2016/108600 2016-12-05 2016-12-05 NVMe over Fabric架构中数据读写命令的控制方法、设备和系统 WO2018102968A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/CN2016/108600 WO2018102968A1 (zh) 2016-12-05 2016-12-05 NVMe over Fabric架构中数据读写命令的控制方法、设备和系统
EP20191641.8A EP3825857B1 (en) 2016-12-05 2016-12-05 Method, device, and system for controlling data read/write command in nvme over fabric architecture
EP16897476.4A EP3352087B1 (en) 2016-12-05 2016-12-05 Control method for data read/write command in nvme over fabric framework, device and system
CN201680031202.XA CN108369530B (zh) 2016-12-05 2016-12-05 非易失性高速传输总线架构中数据读写命令的控制方法、设备和系统
US16/415,995 US11762581B2 (en) 2016-12-05 2019-05-17 Method, device, and system for controlling data read/write command in NVMe over fabric architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/108600 WO2018102968A1 (zh) 2016-12-05 2016-12-05 NVMe over Fabric架构中数据读写命令的控制方法、设备和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/415,995 Continuation US11762581B2 (en) 2016-12-05 2019-05-17 Method, device, and system for controlling data read/write command in NVMe over fabric architecture

Publications (1)

Publication Number Publication Date
WO2018102968A1 true WO2018102968A1 (zh) 2018-06-14

Family

ID=62490612

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/108600 WO2018102968A1 (zh) 2016-12-05 2016-12-05 NVMe over Fabric架构中数据读写命令的控制方法、设备和系统

Country Status (4)

Country Link
US (1) US11762581B2 (zh)
EP (2) EP3352087B1 (zh)
CN (1) CN108369530B (zh)
WO (1) WO2018102968A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783209A (zh) * 2018-11-28 2019-05-21 四川商通实业有限公司 一种多级缓存提高服务器处理效率的方法及系统
CN113099490A (zh) * 2021-03-09 2021-07-09 深圳震有科技股份有限公司 一种基于5g通信的数据包传输方法和系统

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062514B (zh) * 2018-08-16 2021-08-31 郑州云海信息技术有限公司 一种基于命名空间的带宽控制方法、装置和存储介质
CN109165105A (zh) * 2018-08-17 2019-01-08 郑州云海信息技术有限公司 一种主机和物理机系统
CN111176826A (zh) * 2018-11-13 2020-05-19 北京忆芯科技有限公司 基于资源分配优化的命令处理方法
CN109582247B (zh) * 2018-12-10 2022-04-22 浪潮(北京)电子信息产业有限公司 一种主机到存储系统io的传输方法及存储系统
CN109656479B (zh) * 2018-12-11 2022-03-25 湖南国科微电子股份有限公司 一种构建存储器命令序列的方法及装置
US11366610B2 (en) 2018-12-20 2022-06-21 Marvell Asia Pte Ltd Solid-state drive with initiator mode
CN113767360A (zh) 2019-03-14 2021-12-07 马维尔亚洲私人有限公司 在驱动器级别处对非易失性存储器联网消息的终止
EP3939237B1 (en) 2019-03-14 2024-05-15 Marvell Asia Pte, Ltd. Transferring data between solid state drives (ssds) via a connection between the ssds
WO2020186270A1 (en) 2019-03-14 2020-09-17 Marvell Asia Pte, Ltd. Ethernet enabled solid state drive (ssd)
CN110515868A (zh) * 2019-08-09 2019-11-29 苏州浪潮智能科技有限公司 显示图像的方法和装置
CN112579311B (zh) * 2019-09-30 2023-11-10 华为技术有限公司 访问固态硬盘的方法及存储设备
CN112732166A (zh) * 2019-10-28 2021-04-30 华为技术有限公司 访问固态硬盘的方法及装置
EP4152163A4 (en) * 2020-06-11 2023-11-15 Huawei Technologies Co., Ltd. METHOD FOR PROCESSING METADATA IN A STORAGE DEVICE AND ASSOCIATED DEVICE
CN111897665A (zh) * 2020-08-04 2020-11-06 北京泽石科技有限公司 数据队列的处理方法及装置
JP2022076620A (ja) * 2020-11-10 2022-05-20 キオクシア株式会社 メモリシステムおよび制御方法
KR20220067872A (ko) * 2020-11-18 2022-05-25 에스케이하이닉스 주식회사 컨트롤러 및 컨트롤러의 동작방법
CN113986137A (zh) * 2021-10-28 2022-01-28 英韧科技(上海)有限公司 存储装置和存储系统
CN115858160B (zh) * 2022-12-07 2023-12-05 江苏为是科技有限公司 远程直接内存访问虚拟化资源分配方法及装置、存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093466A (zh) * 2007-08-10 2007-12-26 杭州华三通信技术有限公司 通过缓存写数据的方法和缓存系统及装置
CN102103545A (zh) * 2009-12-16 2011-06-22 中兴通讯股份有限公司 一种数据缓存的方法、装置及系统
CN103135957A (zh) * 2013-02-01 2013-06-05 北京邮电大学 使用、管理多队列数据的共用缓存空间的方法和系统
CN104536701A (zh) * 2014-12-23 2015-04-22 记忆科技(深圳)有限公司 一种nvme协议多命令队列的实现方法及系统

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007206799A (ja) 2006-01-31 2007-08-16 Toshiba Corp データ転送装置、情報記録再生装置およびデータ転送方法
US7594060B2 (en) * 2006-08-23 2009-09-22 Sun Microsystems, Inc. Data buffer allocation in a non-blocking data services platform using input/output switching fabric
JP2009181314A (ja) 2008-01-30 2009-08-13 Toshiba Corp 情報記録装置およびその制御方法
US20090216960A1 (en) * 2008-02-27 2009-08-27 Brian David Allison Multi Port Memory Controller Queuing
CN101387943B (zh) 2008-09-08 2011-05-25 创新科存储技术(深圳)有限公司 一种存储设备以及缓存数据的方法
JP5454224B2 (ja) 2010-02-25 2014-03-26 ソニー株式会社 記憶装置および記憶システム
CN102075436B (zh) 2011-02-10 2014-09-17 华为数字技术(成都)有限公司 以太网络及其数据传输方法和装置
US9323659B2 (en) 2011-08-12 2016-04-26 Sandisk Enterprise Ip Llc Cache management including solid state device virtualization
US9026735B1 (en) * 2011-11-18 2015-05-05 Marvell Israel (M.I.S.L.) Ltd. Method and apparatus for automated division of a multi-buffer
US8554963B1 (en) 2012-03-23 2013-10-08 DSSD, Inc. Storage system with multicast DMA and unified address space
US8578106B1 (en) * 2012-11-09 2013-11-05 DSSD, Inc. Method and system for queue demultiplexor with size grouping
US9408050B2 (en) 2013-01-31 2016-08-02 Hewlett Packard Enterprise Development Lp Reducing bandwidth usage of a mobile client
US9086916B2 (en) * 2013-05-15 2015-07-21 Advanced Micro Devices, Inc. Architecture for efficient computation of heterogeneous workloads
US9785356B2 (en) 2013-06-26 2017-10-10 Cnex Labs, Inc. NVM express controller for remote access of memory and I/O over ethernet-type networks
US9785545B2 (en) 2013-07-15 2017-10-10 Cnex Labs, Inc. Method and apparatus for providing dual memory access to non-volatile memory
EP3117583A4 (en) * 2014-03-08 2017-11-01 Diamanti, Inc. Methods and systems for converged networking and storage
US20170228173A9 (en) 2014-05-02 2017-08-10 Cavium, Inc. Systems and methods for enabling local caching for remote storage devices over a network via nvme controller
US9501245B2 (en) 2014-05-02 2016-11-22 Cavium, Inc. Systems and methods for NVMe controller virtualization to support multiple virtual machines running on a host
US9507722B2 (en) 2014-06-05 2016-11-29 Sandisk Technologies Llc Methods, systems, and computer readable media for solid state drive caching across a host bus
US9740646B2 (en) 2014-12-20 2017-08-22 Intel Corporation Early identification in transactional buffered memory
KR102403489B1 (ko) * 2015-07-10 2022-05-27 삼성전자주식회사 비휘발성 메모리 익스프레스 컨트롤러에 의한 입출력 큐 관리 방법
CN106095694B (zh) 2016-06-15 2019-07-09 华为技术有限公司 数据存储方法及装置
JP2018073040A (ja) 2016-10-27 2018-05-10 東芝メモリ株式会社 メモリシステム
US10521121B2 (en) 2016-12-29 2019-12-31 Intel Corporation Apparatus, system and method for throttling a rate at which commands are accepted in a storage device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093466A (zh) * 2007-08-10 2007-12-26 杭州华三通信技术有限公司 通过缓存写数据的方法和缓存系统及装置
CN102103545A (zh) * 2009-12-16 2011-06-22 中兴通讯股份有限公司 一种数据缓存的方法、装置及系统
CN103135957A (zh) * 2013-02-01 2013-06-05 北京邮电大学 使用、管理多队列数据的共用缓存空间的方法和系统
CN104536701A (zh) * 2014-12-23 2015-04-22 记忆科技(深圳)有限公司 一种nvme协议多命令队列的实现方法及系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783209A (zh) * 2018-11-28 2019-05-21 四川商通实业有限公司 一种多级缓存提高服务器处理效率的方法及系统
CN109783209B (zh) * 2018-11-28 2023-08-22 四川商通实业有限公司 一种多级缓存提高服务器处理效率的方法及系统
CN113099490A (zh) * 2021-03-09 2021-07-09 深圳震有科技股份有限公司 一种基于5g通信的数据包传输方法和系统

Also Published As

Publication number Publication date
EP3352087A1 (en) 2018-07-25
EP3825857A1 (en) 2021-05-26
US11762581B2 (en) 2023-09-19
CN108369530A (zh) 2018-08-03
CN108369530B (zh) 2021-06-15
EP3352087B1 (en) 2020-09-16
EP3352087A4 (en) 2018-08-22
EP3825857B1 (en) 2023-05-03
US20190272123A1 (en) 2019-09-05

Similar Documents

Publication Publication Date Title
WO2018102968A1 (zh) NVMe over Fabric架构中数据读写命令的控制方法、设备和系统
US10838665B2 (en) Method, device, and system for buffering data for read/write commands in NVME over fabric architecture
WO2018102967A1 (zh) NVMe over Fabric架构中数据读写命令的控制方法、存储设备和系统
US10387202B2 (en) Quality of service implementation in a networked storage system with hierarchical schedulers
CN107995129B (zh) 一种nfv报文转发方法和装置
CN108701004A (zh) 一种数据处理的系统、方法及对应装置
WO2018035856A1 (zh) 实现硬件加速处理的方法、设备和系统
US9098404B2 (en) Storage array, storage system, and data access method
US11579803B2 (en) NVMe-based data writing method, apparatus, and system
WO2021073546A1 (zh) 数据访问方法、装置和第一计算设备
WO2017141413A1 (ja) 計算機、通信ドライバ、および通信制御方法
US7730239B2 (en) Data buffer management in a resource limited environment
WO2014202003A1 (zh) 数据存储系统的数据传输方法、装置及系统
CN108197039B (zh) 一种ssd控制器混合流数据的传输方法和系统
WO2023246843A1 (zh) 数据处理方法、装置及系统
CN109167740B (zh) 一种数据传输的方法和装置
CN114415959B (zh) 一种sata磁盘动态加速访问方法和装置
US11675540B2 (en) In-line data flow for computational storage
WO2020118650A1 (zh) 快速发送写数据准备完成消息的方法、设备和系统
US20230229360A1 (en) Disaggregation computing system and method
JPWO2018173300A1 (ja) I/o制御方法およびi/o制御システム
CN116820732A (zh) 一种内存分配方法及相关产品
CN114911411A (zh) 一种数据存储方法、装置及网络设备

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2016897476

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE