WO2022143774A1 - 一种数据访问方法及相关设备 - Google Patents

一种数据访问方法及相关设备 Download PDF

Info

Publication number
WO2022143774A1
WO2022143774A1 PCT/CN2021/142495 CN2021142495W WO2022143774A1 WO 2022143774 A1 WO2022143774 A1 WO 2022143774A1 CN 2021142495 W CN2021142495 W CN 2021142495W WO 2022143774 A1 WO2022143774 A1 WO 2022143774A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
access request
network device
client
multiple clients
Prior art date
Application number
PCT/CN2021/142495
Other languages
English (en)
French (fr)
Inventor
閤先军
韩兆皎
余博伟
陈灿
谭春毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21914506.7A priority Critical patent/EP4261671A4/en
Priority to JP2023540613A priority patent/JP2024501713A/ja
Publication of WO2022143774A1 publication Critical patent/WO2022143774A1/zh
Priority to US18/345,519 priority patent/US20230342087A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/12Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
    • G06F13/124Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine
    • G06F13/128Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine for dedicated transfers to a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Definitions

  • the present invention relates to the technical field of storage, and in particular, to a data access method and related equipment.
  • RDMA remote direct memory access
  • RDMA is a direct remote memory access technology, that is, data can be directly and quickly migrated from one system to another remote system memory without any impact on the operating system, reducing the consumption of the central processing unit (CPU) involved in the data transmission process, liberating the memory bandwidth, thereby improving the performance of the system for processing services, with high bandwidth and low latency and low CPU usage.
  • CPU central processing unit
  • the network device of the host first writes the data to the memory of the storage device through the RDMA operation, and the CPU in the storage device needs to store the data in the memory to a persistent storage medium, such as a solid-state drive. (solid state disk, SSD).
  • a persistent storage medium such as a solid-state drive. (solid state disk, SSD).
  • Storing data in memory to a persistent storage medium through the CPU consumes CPU resources, which affects the communication between the host and the storage device.
  • CQ submission queue
  • CQ completion queue
  • the embodiments of the present invention disclose a data access method and related equipment, which can directly store data persistently under large-scale networking connections, reduce CPU occupation of storage devices, and expand applicable scenarios.
  • the present application provides a storage device, comprising: a network device and a storage unit, the storage unit is connected to multiple clients through the network device, and the network device is used to access the multiple clients
  • the request is sent to an access queue of the storage unit; the storage unit is used to execute the access request in the access queue and return the processing result of the access request; the network device is also used to store the storage unit
  • the processing result of the returned access request is returned to the client corresponding to the access request.
  • the network device may be a network interface controller RNIC supporting remote direct memory access, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC) chip, or the like.
  • RNIC network interface controller
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • the network device sends access requests from multiple clients to an access queue for processing, and returns the processing results of the access requests to the corresponding clients, so that one access queue corresponds to multiple clients. It breaks through the inherent limit of the number of access queues, supports large-scale networking connections, and expands the applicable scenarios.
  • a network device stores a correspondence between the information of the multiple clients and the access queue, and the network device is used to store the multiple clients The access request of the terminal is sent to the access queue of the storage unit through the corresponding relationship.
  • the network device pre-stores the correspondence between the client's information and the access queue, and sends the access requests of multiple clients to one access queue through the correspondence, so that one access queue can process multiple The request of the client, so as to ensure that the storage device can support large-scale network connections.
  • the information of the multiple clients is connection information generated when the multiple clients establish connections with the network device respectively, and the network device is in
  • the access queue is determined according to the connection information corresponding to the client and the corresponding relationship carried in the access request; the connection information is and the access request is sent to the access queue; the storage unit returns the connection information while returning the processing result of the access request; the network device determines the corresponding access request according to the connection information client, and return the processing result to the client corresponding to the access request.
  • the network device can accurately distinguish multiple clients by sending the connection information and the access request to the access queue at the same time, and returns the connection information when the storage unit returns the processing result.
  • the connection information of the access request determines the client corresponding to the access request, so that when a large-scale network connection is performed, when multiple clients correspond to one access queue at the same time, the multiple clients can be accurately distinguished and the processing results are returned. Effectively expand the applicable scenarios.
  • the information of the multiple clients is connection information generated when the multiple clients establish connections with the network device respectively, and the network device is in When an access request from any one of the multiple clients is received, a local identifier is allocated to the client identifier carried in the access request, and the local identifier is used to uniquely identify the client, and establish all the client identifiers.
  • the network device receives the processing result of the access request returned by the storage unit, it obtains the local identifier from the processing result, and determines the connection information corresponding to the client according to the local identifier, and returning the processing result to the client corresponding to the connection information.
  • the client identifier is defined by the client itself, and there may be cases where the client identifiers defined by different clients are the same. Therefore, each client cannot be accurately distinguished by the client identifier.
  • the local identifier is determined by the network. The device converts the client ID of each client to obtain uniqueness, and the local ID corresponding to each client is different, so each client can be accurately distinguished by the local ID.
  • the network device allocates a local identifier to the client identifier in the access request to uniquely identify the client, and then establishes a connection between the client identifier, the local identifier and the connection information corresponding to the client.
  • Corresponding relationship so as to avoid the indistinguishable situation when the client identifiers defined by different clients are the same, and can accurately distinguish multiple clients, and after the storage unit returns the processing result, determine the client according to the local identifier
  • Corresponding connection information so that when multiple clients correspond to one access queue at the same time in a large-scale network connection, the multiple clients can be accurately distinguished and the processing results are returned, which effectively expands the applicable scenarios.
  • a remote direct memory access RDMA connection is established between the multiple clients and the network device, and the connection information is a queue generated when the RDMA connection is established. to QP.
  • the present application provides a data access method, the method comprising: a network device receiving access requests sent by multiple clients connected to the network device, and sending the access requests to one of the storage units access queue; the network device receives the processing result of the access request returned by the storage unit after executing the access request in the access queue; the network device returns the processing result of the access request returned by the storage unit to The client corresponding to the access request.
  • the network device stores a corresponding relationship between the information of the multiple clients and the access queue, and the network device stores the corresponding relationship according to the mapping relationship
  • the access requests of the multiple clients are sent to the access queue of the storage unit.
  • the information of the multiple clients is connection information generated when the multiple clients establish connections with the network device respectively
  • the network device Sending the access requests of the multiple clients to an access queue of the storage unit includes: when an access request from any one of the multiple clients is received, according to the access request carried in the access request.
  • the processing result returned by the processing unit includes the connection information , the network device returning the processing result of the access request returned by the storage unit to the client corresponding to the access request, including: the network device determining the client corresponding to the access request according to the connection information, and Return the processing result to the client corresponding to the access request.
  • the information of the multiple clients is connection information generated when the multiple clients establish connections with the network device respectively
  • the network device Sending the access requests of the multiple clients to an access queue of the storage unit includes: when receiving the access request of any one of the multiple clients, for the client carried in the access request
  • the terminal identifier is allocated a local identifier, and the local identifier is used to uniquely identify the client, and establish a correspondence between the client identifier, the local identifier and the connection information corresponding to the client;
  • the client identifier is replaced with a local identifier; the access request is sent to the access queue corresponding to the connection information; the network device returns the processing result of the access request returned by the storage unit to the corresponding access request
  • the client includes: when the network device receives the processing result of the access request returned by the storage unit, obtains the local identifier from the processing result, and determines the connection corresponding to the client according to the local identifier information, and return the processing result to the client
  • a remote direct memory access RDMA connection is established between the multiple clients and the network device, and the connection information is generated when the RDMA connection is established. Queue pair QP.
  • the present application provides a network device, comprising: a receiving unit, configured to receive access requests sent by multiple clients connected to the network device; a sending unit, configured to send the access requests to a storage an access queue of the unit; the receiving unit is further configured to receive the processing result of the access request returned after the storage unit executes the access request in the access queue; the sending unit is further configured to send the access request to the storage unit The processing result of the access request returned by the storage unit is returned to the client corresponding to the access request.
  • the network device further includes a storage unit, where the storage unit is configured to store the correspondence between the information of the multiple clients and the access queue , the sending unit is specifically configured to: send the access requests of the multiple clients to the access queue of the storage unit according to the mapping relationship.
  • the information of the multiple clients is connection information generated when the multiple clients establish connections with the network device respectively
  • the network device also includes a processing unit, the processing unit is configured to, when receiving an access request from any one of the multiple clients, according to the connection information corresponding to the client carried in the access request and the The corresponding relationship determines the access queue; the sending unit is specifically configured to send the connection information and the access request to the access queue; the sending unit is further configured to determine the access according to the connection information The client corresponding to the request is requested, and the processing result is returned to the client corresponding to the access request.
  • the information of the multiple clients is connection information generated when the multiple clients establish connections with the network device respectively
  • the network device also includes a processing unit, the processing unit is configured to, when receiving an access request from any one of the multiple clients, assign a local identifier to the client identifier carried in the access request, the local identifier The identifier is used to uniquely identify the client, and establish a correspondence between the client identifier, the local identifier and the connection information corresponding to the client, and replace the client identifier carried in the access request with the local identifier;
  • the sending unit is specifically configured to send the access request to the access queue corresponding to the connection information; the processing unit is further configured to, when receiving the processing result of the access request returned by the storage unit, The local identifier is obtained from the processing result, and the connection information corresponding to the client is determined according to the local identifier; the sending unit is further configured to determine the client corresponding to the access request according to the connection information, and Return the processing
  • a remote direct memory access RDMA connection is established between the multiple clients and the network device, and the connection information is generated when the RDMA connection is established. Queue pair QP.
  • the present application provides a computing device, the computing device includes a processor and a memory, the processor and the memory are connected through an internal bus, and instructions are stored in the memory, and the processor calls the The instructions in the memory are used to execute the above-mentioned first aspect and the data access method provided by any one of the implementation manners in combination with the above-mentioned second aspect.
  • the present application provides a computer storage medium, where a computer program is stored in the computer storage medium, and when the computer program is executed by a processor, the second aspect and any combination of the second aspect can be implemented.
  • the present application provides a computer program product, the computer program includes instructions, when the computer program is executed by a computer, the computer can execute the above-mentioned second aspect and any implementation manner in combination with the above-mentioned second aspect The flow of the provided data access method.
  • FIG. 1 is a schematic diagram of writing data into a solid-state hard disk provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of another data writing solid state disk provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • connection establishment method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a data writing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a submission queue description structure format provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a data reading method provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another data writing method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another data reading method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • a host may also be called a client, and may specifically include a physical machine, a virtual machine, a container, etc., for generating or consuming data, such as an application server, a distributed file system server, and the like.
  • the network device of the host is a device used by the host for data communication, and may specifically include a network interface controller (NIC), an RNIC, and the like.
  • NIC network interface controller
  • the access request of the host mainly includes data read and write operations, that is, the host writes the generated data into the storage unit of the storage device, or reads data from the storage unit of the storage device.
  • the storage device may also be called a server, and may specifically include a device capable of storing data in the form of external centralized storage or distributed storage, such as a storage server, a distributed database server, and the like.
  • the network device of the storage device is the device used by the storage device for data communication, and may specifically include NIC, RNIC, etc., and the storage unit of the storage device is the device used by the storage device for persistent data storage, such as SSD.
  • submission queue (SQ) and doorbell of solid-state storage (SSD) In the storage device, the CPU of the storage device communicates with the SSD through the NVMe protocol. During the initialization phase when the storage device is started, the CPU of the storage device will A submission queue (SQ) and a completion queue (CQ) are established for the SSD in the memory of the storage device through the NVMe protocol, a doorbell is created in the SSD, and the CPU stores the commands sent to the SSD. In SQ, write the command in the doorbell at the position of the SQ, and obtain the command from the SQ through the SSD for execution. The information of the command is stored in the completion queue, and the CPU can determine the completed command by reading the information of the completed command in the completion queue, and delete the completed command from the sending queue.
  • SQ submission queue
  • CQ completion queue
  • the RDMA communication protocol is a set of protocol specifications followed by computing devices for RDMA operations.
  • There are currently three communication protocols that support RDMA namely the infinite bandwidth (infiniBand, IB) protocol, the Ethernet (RDMA over converged ethernet, RoCE) protocol, Internet wide area (RDMA protocal, IWARP) protocol, these three protocols can all use the same set of APIs, but they have different physical layers and link layers.
  • a send queue (send queue, SQ) will be created in the network card of the host, and a receive queue corresponding to the send queue will be created in the network card of the storage device accordingly ( receive queue, RQ), the sending queue and the receiving queue form a queue pair (queue pair, QP), map the address of the queue to the virtual address of the application, and the application can directly transmit data to the storage device through the QP
  • the data can be stored in the memory of the storage device.
  • the host when the host transmits data through RDMA, it will first transmit the data to the memory of the storage device, and then move the data from the memory to the SSD through the CPU of the storage device.
  • FIG. 1 which shows a schematic diagram of a data writing scenario
  • the network device 1110 in the host 110 first writes data to the network device (that is, the RNIC 1240 ) of the storage device 120 through an RDMA operation, and then the RNIC 1240 writes data to the CPU 1210
  • the data is written into the memory 1220 with the assistance of the RNIC driver 1211 of The data is moved from the memory 1220 to the SSD 1230 for persistent storage.
  • the SSD 1230 completes the persistent storage of the data, it notifies the storage software 1212 through an interrupt.
  • the CPU 1210 returns a write completion notification message to the host 110 through the RNIC 1240.
  • the CPU including the RNIC driver, the storage software, and the SSD driver
  • the CPU is required to participate in the entire storage process, which will consume a lot of CPU resources.
  • FIG. 2 shows another schematic diagram of a data writing scenario
  • the application server 210, the application server 220, and the application server 230 are connected to the storage server 240 through the RDMA network, and the structure of each application server is similar , taking the application server 210 as an example, the application server 210 includes a CPU 211 and a memory 212, and is connected with an RNIC 213; the storage server 240 includes a CPU 241 and a memory 242, and is connected with an RNIC 243 and a persistent storage medium.
  • persistent storage media includes, but is not limited to, SSD 244.
  • Each application server is connected with the storage server to form a QP, so there are multiple QPs in the memory 242 of the storage server 240, such as QP1, QP2 and QP3, and each QP corresponds to the connection between the storage server 240 and an application server.
  • the SSD 244 Including multiple SQs and CQs, the maximum number can be supported to 64k, but based on memory and performance considerations, currently 256 are typically selected.
  • the application server 210 registers the memory 212 required for data communication with the RNIC 213, and the storage server 240 registers the memory 242 required for data communication with the RNIC 243, so that the RNIC 213 and the RNIC 243 can
  • the memory 242 and the memory 212 are operated through RDMA, and the storage server 240 binds the SQ of the SSD 244 and the stored QP one by one, for example, binds the QP1 connected to the application server 210 with the SQ1, and then performs mapping, and register the virtual address obtained after the mapping with RNIC243, and RNIC243 sends the SQ1 address to RNIC213 through the RDMA connection, so that RNIC213 can directly and remotely operate the SQ1 address of SSD244.
  • the application server 210 When data is written, the application server 210 generates data through the CPU 211 and stores the generated data in the memory 212, then writes the data into the memory 242 of the storage server 240 through the RNIC 213, and notifies the SSD 244 to store the memory 242 according to the SQ address of the SSD 244. The data in the storage server 240 is moved to the SSD 244 for persistent storage. If other application servers need to write data to the storage server 240, the process is similar to the above, and will not be repeated here.
  • the above solution can bypass the CPU and software participation of the storage server and directly write data to the SSD by binding the QP and the SQ in the SSD one by one, but it is limited by the number of SQs in the SSD.
  • the above solutions will no longer be applicable, that is, large-scale networking scenarios cannot be supported.
  • the present application provides a data access method, when the number of connections of the storage device far exceeds the number of SQs supported by the SSD, by extending the submission queue description structure format (SQE) of the SSD or the client identification of the application server. Converted so that the storage device can send access requests from multiple clients to one access queue of the storage unit, that is, multiple connections of the storage device can be bound to one SQ, thus supporting large-scale networking connections and expanding the applicable scenarios .
  • SQL submission queue description structure format
  • the technical solutions of the embodiments of the present application can be applied to any system that requires remote access to persistent storage media, especially large-scale networking scenarios with a large number of connections, such as distributed storage, high performance computing (HPC) )Wait.
  • distributed storage the storage device will be connected to a large number of application servers at the same time, and when the storage device needs to support each application server to directly access the SSD, the data access method provided by the embodiments of the present application can be used in the distributed storage system, so that the Solve the bandwidth bottleneck existing in data reading and writing, and improve data reading and writing efficiency.
  • FIG. 3 shows a schematic diagram of a system architecture according to an embodiment of the present application.
  • the system 300 includes: an application server 310, an application server 320, an application server 330, and a storage server 340.
  • the application server 310, the application server 320, and the application server 330 are connected to the storage server 220 through an RDMA network.
  • the application server 310 includes CPU311 and memory 312, and is connected with RNIC213.
  • the structure of application server 320 and application server 330 is similar to that of application server 310; storage server 340 includes CPU341 and memory 342, and is connected with RNIC343 and storage unit. Note, it should be understood that the storage unit includes but is not limited to the SSD344.
  • the application server 310, the application server 320 and the application server 330 are connected to the storage server 340 at the same time, there are three QPs in the memory 342 of the storage server 340, namely QP1, QP2 and QP3, and the storage server 240 binds QP1 and QP2 with SQ1 Determine, bind QP3 and SQ2. After the above process is completed, data read and write operations can be further performed.
  • the application server 310 For example, taking the application server 310 writing data to the storage server 340 as an example, the application server 310 generates data through the CPU 311 and stores the generated data in the memory 312, and then uses The RNIC 313 writes the data and the data description information into the memory 242 of the storage server 340, wherein the data description information includes the starting address, data length, operation type, etc. of the data in the memory 242.
  • the data The size of the description information is preferably 64 bytes.
  • the RNIC 343 determines that the SQ corresponding to the connection QP1 of the application server 310 is SQ1 according to the preset binding relationship.
  • SSD344 After SSD344 completes storage, it stores the QPN information It is copied into the CQE of the CQ, and the storage server 340 determines the corresponding QPN according to the CQE, or the storage server 340 determines the corresponding client identification and the corresponding QPN according to the local identification in the CQE, and then finds the corresponding QP according to the QPN, and then passes The QP replies to the application server 310 with a write data complete message.
  • RNIC313, RNIC323, and RNI333 may be programmable RNICs
  • SSD344 is programmable SSD, which can actively sense the completion status of SQ and actively report it.
  • Application server 310, application server 320, application server 330, and storage server 340 It includes physical machines, virtual machines, containers, etc., and can be deployed on one or more computing devices (such as a central server) in a cloud environment, or on one or more computing devices (such as servers) in an edge environment.
  • the data access system shown in FIG. 3 is that the storage server can bind the connections (QP) with multiple application servers to the same SQ at the same time, and connect multiple application servers to the same SQ.
  • the access request of the server is sent to one SQ, which breaks through the inherent limit of the number of SQs of SSD, can support large-scale networking connections, and expand the applicable scenarios.
  • connection establishment and memory registration process before data access will be described, and the connection establishment between the application server 310 and the storage server 340 will be described as an example.
  • Other application servers are similar to the application server 310.
  • the process includes:
  • the application server 310 and the storage server 340 may establish an RDMA connection based on any protocol of IB, RoCE or IWARP.
  • the application server 310 and the storage server 340 register the memory addresses (which may be contiguous virtual memory or contiguous physical memory spaces) for data communication, and provide them to the network device as virtual continuous buffers.
  • the buffers use virtual
  • the network device is an RNIC as an example for description, and no further distinction will be made in the subsequent description.
  • the application server 310 registers the memory 312 with the RNIC 313
  • the storage server 340 registers the memory 342 with the RNIC 343 .
  • the operating systems of the application server 310 and the storage server 340 will check the permission of the registered block, and the registration process will write the mapping table of the virtual address and the physical address of the memory that needs to be registered into the RNIC.
  • the permissions of the corresponding memory area will be set, including local write, remote read, and remote write.
  • the memory registration process locks the memory page. In order to prevent the memory page from being replaced, the registration process needs to maintain the mapping of physical and virtual memory at the same time.
  • the application server 310 and the storage server 340 can register all their own memory, or randomly select a part of the memory for registration, and when registering, register the starting address and the memory of the memory to be registered.
  • the data length is provided to RNIC so that RNIC can determine which memory needs to be registered.
  • each memory registration will generate a remote identifier (key) and a local identifier.
  • the remote identifier is used by the remote host to access the local memory
  • the local identifier is used by the local host to access the local memory.
  • the storage server 340 provides the remote identification generated by the memory registration to the application server 310 so that the application server 310 can remotely access the system memory 342 of the storage server 340 during the RDMA operation.
  • the same memory buffer can be registered multiple times (even with different operation permissions), and each registration will generate a different identity.
  • the application server and the storage server will negotiate to create a QP during the process of establishing the RDMA connection, and will create the associated send queue SQ and receive queue RQ when creating the QP.
  • the application server 310 and the storage server 340 QP can be used for communication.
  • the application server 310 can remotely operate the memory 342 of the storage server 340 through RDMA.
  • the storage server 340 maps the SQ address and the doorbell address of the SSD 344 and registers them with the RNIC 343 .
  • the storage server 340 has established an SQ for the SSD 344 in the memory 342, and established a doorbell in the SSD 344, so that the CPU 341 in the storage server 340 can communicate with the SSD 344.
  • SQ address and the doorbell address are addresses in the memory address space of the kernel mode, and cannot be directly registered to the RNIC343. They need to be converted into virtual addresses in the user mode to register.
  • the storage server 340 maps the SQ address and the doorbell address of the SSD to a logically continuous user state virtual address, and then provides the virtual address obtained by the mapping to the RNIC 343 of the storage server for registration, and its registration process is similar to the above-mentioned memory registration process, It is not repeated here.
  • the storage server 340 may complete the mapping process in a memory mapping (memory mapping, MMAP) manner, so as to map the SQ address and the doorbell address as virtual addresses in the user mode, so as to ensure normal communication therewith.
  • MMAP memory mapping
  • the storage server 340 binds the QP with the SQ of the SSD 344.
  • the SSD 344 will be assigned multiple SQ addresses during the initialization phase
  • the RNIC 343 of the storage server 340 and the RNICs of multiple application servers including the application server 310 will also create multiple QPs when establishing an RDMA connection
  • the storage server 340 The management software in the device binds the SQ address and the QP, and sends the binding relationship to the RNIC343 for saving.
  • the storage server 340 can accurately distinguish it by means of numbers, that is, for each QP, there is a unique QP number (QP number) corresponding to it. , QPN).
  • the storage server 340 binds N QPs to one SQ to support large-scale networking connections, where the specific value of N can be set according to actual needs, for example, can be set to 100, which is not limited in this application.
  • the storage server 340 can identify the QP corresponding to each SQ address according to the stored binding relationship, and then can distinguish different clients or application servers.
  • the application server 310 and the storage server 340 can successfully establish an RDMA connection and perform data transmission, the application server 310 can remotely operate the memory 342 of the storage server 340, and the storage server 340 can communicate with the storage server 340 according to the QP. In the binding relationship of the SQ, the data written by the application server 310 is persistently stored.
  • the data writing process will be described in detail below, taking the application server 310 writing data to the storage server 340 as an example, as shown in FIG. 5 , the The process includes:
  • the application in the application server 310 generates data that needs to be written to the SSD 344 of the storage server 340 , and then stores the data in the memory 312 of the application server 310 first.
  • the RNIC 313 of the application server 310 writes the data to be written and the description information of the data to be written into the memory 342 of the storage server 340 .
  • the application in the application server 310 sends an RDMA request to the RNIC 313 of the application server 310, and the request includes the address of the data to be written in the memory 312 (for example, including the start address and the data length), and then the RNIC 313 according to the request
  • the data to be written is retrieved from the memory 312 of the application server 310, and the address of the data to be written in the storage server 340 (including the starting address and the data length) and the remote address sent by the storage server 340 to operate the memory corresponding to the address
  • the identifier is encapsulated into a dedicated packet, and at the same time, the description information of the data to be written is also encapsulated into the dedicated packet, wherein the description information of the data to be written includes the starting address and data length of the data to be written in the storage server 340 and the data operation type (ie, data write operation), etc., and then send the dedicated message to the RNIC 343 of the storage server 340 through the QP.
  • the RNIC 343 of the storage server 340 After receiving the dedicated message, the RNIC 343 of the storage server 340 confirms whether the application server 310 has the authority to operate the memory 342 of the storage server 340 according to the remote identifier in the message, and after confirmation, writes the data to be written into the message into the memory corresponding to the address, and the description information of the data to be written is also written into the memory 342 .
  • S503 The RNIC 343 of the storage server 340 fills the SQE corresponding to the SQ according to the QP corresponding to the data to be written and the description information of the data to be written.
  • the RNIC 343 of the storage server 340 can determine the corresponding QP according to the binding relationship saved in advance.
  • each SQ contains one or more SQEs
  • the format of each SQE follows the NVMe protocol, and its size is 64 bytes, as shown in Figure 6, which is a schematic diagram of an SQE format, which includes specific command fields, reserved field, SQ identifier field, SQ header pointer field, status field, command identifier field, etc.
  • the RNIC 343 of the storage server 340 fills the SQE corresponding to the SQ according to the description information of the data to be written.
  • the RNIC 343 of the storage server 340 will extend the reserved field in the SQE, and use the reserved field to save the QPN corresponding to the QP, so that the SQE carries the QPN information.
  • S504 The RNIC 343 of the storage server 340 writes the write data notification information into the doorbell address of the SSD 344.
  • the RNIC 343 of the storage server 340 writes the write data notification information into the doorbell address of the SSD 344, wherein the write data notification information includes the SQ address written to the SQE, and the write data notification information is used to notify the SSD 344 to read the SQ address SQE in .
  • the SSD 344 reads the SQE in the SQ address according to the write data notification information in the doorbell address, and moves the data to be written from the memory 342 of the storage server 340 to the SSD 344 according to the content in the SQE.
  • the SSD344 wakes up after receiving the write data notification information written in the doorbell address, and then reads the SQE in the SQ address contained in the write data notification information, determines that it is a data write operation, and then according to the address carried in the SQE
  • the data to be written is found from the memory 342 of the storage server 340, and the data to be written is moved to the SSD 344 to complete persistent storage.
  • the data to be written is moved from the memory 342 of the storage server 340 to the SSD 344 without the participation of any software and CPU, and is directly completed by the SSD 344, which reduces the CPU occupation of the storage server 340 and effectively reduces the cost.
  • each SQ has a corresponding CQ
  • each CQ contains one or more CQEs
  • the size of each CQE is also 64 bytes
  • its format is the same as the SQE shown in Figure 6 above. The format is similar.
  • the RNIC 343 of the storage server 340 determines a QP corresponding to the QPN information according to the QPN information in the CQE, and uses the QP to notify the application server 310 that data writing is complete.
  • the RNIC 343 After receiving the write command completion notification sent by the SSD 344, the RNIC 343 reads the CQE from the CQ to obtain the QPN information, then determines the corresponding QP according to the QPN information, and then uses the QP to notify the application server 310 that the data writing is complete , so as to complete the entire data writing process.
  • the method flow shown in FIG. 5 describes in detail the process of writing data from the application server to the SSD.
  • the application server can also read data from the SSD.
  • the data reading process will be described in detail below, as shown in FIG. 7 .
  • the process includes:
  • the RNIC 313 of the application server 310 writes the description information of the data to be read into the memory 342 of the storage server 340 .
  • the application in the application server 310 generates a data read request, and then sends the data read request to the RNIC 313 of the application server 310, where the read request includes the address of the data to be read in the SSD 344 (including the start address and the data length). ) and the address where the data is stored in the memory 342 of the storage server 340 after being read from the SSD 344 .
  • the RNIC 313 of the application server 310 operates the memory 342 of the storage server 340 by using the stored remote identification, and writes the description information of the data to be read into the memory 342 of the storage server 340, wherein the description information of the data to be read includes the description information of the data to be read.
  • the RNIC 343 of the storage server 340 fills in the SQE corresponding to the SQ according to the QP corresponding to the data to be read and the description information of the data to be read.
  • the RNIC 343 of the storage server 340 can determine the SQ corresponding to the QP according to the pre-saved binding relationship, and the storage server 340 The RNIC343 fills the SQE corresponding to the SQ according to the description information of the data to be read.
  • the RNIC 343 of the storage server 340 will expand the reserved field in the SQE, and use the reserved field to save the QPN corresponding to the QP, so that the SQE carries the QPN information.
  • the RNIC 343 of the storage server 340 writes the read data notification information into the doorbell address of the SSD 344, wherein the read data notification information includes the SQ address written to the SQE, and the read data notification information is used to notify the SSD 344 to read the SQ address SQE in .
  • the SSD 344 reads the SQE in the SQ address according to the read data notification information in the doorbell address, and moves the data to be read from the SSD 344 to the memory 342 of the storage server 340 according to the content in the SQE.
  • SSD344 wakes up after receiving the read data notification information written in the doorbell address, and then reads the SQE in the SQ address contained in the read data notification information, determines that it is a data read operation, and then according to the address carried in the SQE The data is retrieved from the SSD 344 and moved to the memory 342 corresponding to the storage server 340 .
  • the SSD 344 copies the QPN information in the SQE to the CQE of the CQ, and notifies the RNIC 343 that the read command is completed.
  • the SSD 344 moves the data to the memory 342 of the storage server 340, it copies the QPN field in the SQE to the reserved field in the CQE, and then notifies the RNIC 343 that the read command is completed.
  • the RNIC 343 of the storage server 340 determines the QP corresponding to the QPN information according to the QPN information in the CQE, uses the QP to write the data to be read into the memory 312 of the application server 310, and notifies the application server 310 that the data read is completed.
  • the RNIC 343 reads the CQE from the CQ to obtain the QPN information, then determines the corresponding QP according to the QPN information, and then uses the QP to write the data to be read into into the memory 312 of the application server 310, and then notify the application server 310 that the data reading is completed, thereby completing the entire data reading process.
  • the flow includes:
  • the storage server 340 receives the data to be written and the description information of the data to be written written into the memory 342 by the application server.
  • each application server connected to the storage server 340 uses its own QP to write the data and data description information generated by the application into the memory 342 of the storage server 340 through the RNIC.
  • the application server 310 uses QP1 to write the data to be written and the data description information.
  • the description information of the data to be written is written into the memory 342 of the storage server 340
  • the application server 320 uses QP2 to write the data to be written and the description information of the data to be written into the memory 342 of the storage server 340 .
  • the description information of the data to be written includes the starting address and data length of the data to be written in the storage server 340, the type of data operation (ie, a data write operation), and the like.
  • the description information of the data to be written also carries a client identifier (ie, cid), and the client identifier is defined by each application server. Therefore, the client identifiers defined by different application servers may be the same. For example, the client identifier defined by the application server 310 is cid1, and the client identifier defined by the application server 320 is also cid1.
  • connection (QP) with the storage server 340 and the client identifier defined by itself, that is, the corresponding QPN can be determined through the client identifier.
  • the RNIC 343 of the storage server 340 converts the client identifier of each application server into a local identifier, and establishes a client identifier and local identifier mapping table.
  • the RNIC 343 of the storage server 340 needs to convert the client identifier of each application server into a local unique identifier, so that different application servers can be accurately distinguished.
  • the client identifier carried in the description information of the data to be written in the memory 342 written by the application server 310 is 00000001
  • the client ID carried by the application server 320 in the description information of the data to be written in the memory 342 is written.
  • the identifier is also 00000001
  • the client identifier carried in the description information of the data to be written written into the memory 342 by the application server 310 is 00000101
  • the RNIC 343 converts the received client identifier corresponding to each application server, and converts it is a local unique identifier, for example, converting the client identifier corresponding to the application server 310 to 00000001, converting the client identifier corresponding to the application server 320 to 00000010, and converting the client identifier corresponding to the application server 330 to 00000011, you can It is understood that after the conversion, the identifier corresponding to each application server is unique, and the converted local identifier can be used to accurately distinguish different application servers.
  • the RNIC 343 will also establish a mapping table between the client identification and the local identification.
  • the RNIC 343 may use a hash table to record the mapping relationship between the client identification and the local identification.
  • the key is the local identifier
  • the value is the client identifier and the corresponding QPN.
  • the RNIC 343 can query the client identifier of each application server and the corresponding local identifier through the hash table.
  • S803 The RNIC 343 of the storage server 340 fills the SQE corresponding to the SQ according to the description information of the data to be written.
  • the RNIC 343 can determine each QP according to the pre-saved binding relationship The corresponding SQ, and then fill the SQE corresponding to the SQ according to the description information of the data to be written. It is worth noting that in the process of filling the SQE, the RNIC343 will change the identifier field in the SQE, and the corresponding This field is filled with the local identifier of , for example, for the application server 320, the RNIC 343 will fill this field with 00000010 instead of 00000001.
  • the RNIC 343 writes the write data notification information into the doorbell address of the SSD 344, wherein the write data notification information includes the SQ address written to the SQE, and the write data notification information is used to notify the SSD 344 to read the SQE in the SQ address.
  • the SSD 344 reads the SQE in the SQ address according to the write data notification information in the doorbell address, and moves the data to be written from the memory 342 of the storage server 340 to the SSD 344 according to the content in the SQE.
  • the SSD344 wakes up after receiving the write data notification information written in the doorbell address, and then reads the SQE in the SQ address contained in the write data notification information, determines that it is a data write operation, and then according to the address carried in the SQE
  • the data to be written is found from the memory 342 of the storage server 340, and the data to be written is moved to the SSD 344 to complete persistent storage.
  • SSD344 fills in CQE in the CQ corresponding to the SQ.
  • the format of the CQE is consistent with the format of the SQE, and the CQE also contains an identifier field, which stores the local identifier corresponding to the application server. , and then notify RNIC343 that the write command is complete.
  • the RNIC 343 of the storage server 340 queries the client identification and local identification mapping table according to the local identification in the CQE, determines the client identification corresponding to the local identification, thereby determines the QPN corresponding to the client identification, and utilizes the QPN The corresponding QP notifies the application server that the data writing is complete.
  • the RNIC 343 After receiving the write command completion notification sent by the SSD 344, the RNIC 343 reads the CQE from the CQ to obtain the local ID, and then queries the client ID and the local ID mapping table according to the local ID, and obtains the client ID corresponding to the local ID. terminal identification and QPN, and then determine the corresponding QP according to the QPN, and finally use the QP to notify the application server that the data writing is completed, thereby completing the entire data writing process.
  • the multiple QP Bind with an SQ convert the client identifier corresponding to each QP into a local unique identifier, and save the converted local identifier to the identifier field in the SQE.
  • the corresponding client ID and QP are accurately found by querying the client ID and local ID mapping table, and different application servers are accurately distinguished, so as to reply a completion message to it. This can effectively support large-scale networking connections and expand the applicable scenarios.
  • the method process shown in FIG. 8 describes in detail the process of writing data from the application server to the SSD.
  • the application server can also read data from the SSD.
  • the data read process will be described in detail below, as shown in FIG. 9 , The process includes:
  • the storage server 340 receives the description information of the data to be read written into the memory 342 by the application server.
  • the application server uses the respective QP to write the description information of the data to be read into the memory 342 of the storage server 340 through the RNIC, wherein the description information of the data to be read includes the starting address and data of the data to be read in the SSD 344 length and data operation type (ie, data read operation), etc.
  • the description information of the data to be read also carries the client identifier. For the specific process, reference may be made to the relevant description in the above S801, which will not be repeated here.
  • the RNIC 343 of the storage server 340 converts the client identifier of each application server into a local identifier, and establishes a client identifier and local identifier mapping table.
  • the RNIC 343 can use a hash table to record the mapping relationship between the client identifier and the local identifier. For the specific process, refer to the relevant description in S802 above.
  • S903 The RNIC 343 of the storage server 340 fills the SQE corresponding to the SQ according to the description information of the data to be read.
  • the RNIC 343 determines the SQ corresponding to each QP according to the pre-saved binding relationship, then fills the SQE corresponding to the SQ according to the description information of the data to be read, and fills the converted local identifier of the application server into the identifier field in the SQE , and the specific process may refer to the relevant description in S803 above.
  • the RNIC 343 writes the read data notification information into the doorbell address of the SSD 344, wherein the read data notification information includes the SQ address written to the SQE, and the read data notification information is used to notify the SSD 344 to read the SQE in the SQ address.
  • the SSD344 reads the SQE in the SQ address according to the read data notification information in the doorbell address, and moves the data to be read from the SSD344 to the memory 342 of the storage server 340 according to the content in the SQE.
  • SSD344 wakes up after receiving the read data notification information written in the doorbell address, and then reads the SQE in the SQ address contained in the read data notification information, determines that it is a data read operation, and then according to the address carried in the SQE Find the data to be read from the SSD 344 and move the data to be read to the memory 342 of the storage server 340 .
  • SSD344 fills in CQE in the CQ corresponding to the SQ.
  • the format of the CQE is consistent with the format of the SQE, and the CQE also contains an identifier field, which stores the local identifier corresponding to the application server, and then Notifies RNIC343 that the read command is complete.
  • the RNIC 343 of the storage server 340 queries the client identification and local identification mapping table according to the local identification in the CQE, determines the client identification corresponding to the local identification, thereby determines the QPN corresponding to the client identification, and utilizes the QPN
  • the corresponding QP writes the data to be read into the memory of the application server, and then notifies the application server that the data read is complete.
  • RNIC 343 After receiving the read command completion notification sent by SSD344, RNIC 343 reads CQE from CQ to obtain a local ID, and then queries the client ID and local ID mapping table according to the local ID, and obtains the client ID corresponding to the local ID. Terminal ID and QPN, and then determine the corresponding QP according to the QPN, and finally use the QP to write the data to be read into the memory of the application server, and notify the application server that the data read is completed, thereby completing the entire data read process.
  • FIG. 10 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the network device 10 includes a receiving unit 11 and a sending unit 12 . in,
  • a receiving unit 11 configured to receive access requests sent by multiple clients connected to the network device 10;
  • sending unit 12 for sending the access request to an access queue of the storage unit
  • Described receiving unit 11 is also used for receiving the processing result of the access request of described multiple clients returned after described storage unit executes the access request in the access queue;
  • the sending unit 12 is further configured to return the processing result of the access request returned by the storage unit to the client corresponding to the access request.
  • the network device 10 further includes a storage unit 13, the storage unit 13 is configured to store the correspondence between the information of the multiple clients and the access queue, and the sending unit 12 is specifically configured to : send the access requests of the multiple clients to the access queue of the storage unit according to the mapping relationship.
  • the access request includes data description information
  • the network device 10 further includes a processing unit 14, and the processing unit 14 is configured to fill the data description information into the SQE corresponding to the access queue, and save the QPN information corresponding to the multiple clients in the reserved field of the SQE.
  • the processing unit 14 is further configured to determine, according to the QPN information in the CQE corresponding to the completion queue corresponding to the access queue, the client corresponding to the processing result of the access request returned by the storage unit, the CQE
  • the QPN information in is obtained by the storage unit by copying the QPN information in the SQE after executing the access request in the access queue; the sending unit 12 is specifically used for: according to the QP corresponding to the QPN information, The processing result is returned to the client corresponding to the access request.
  • the access request includes data description information, and the data description information carries a client identifier.
  • the processing unit 14 is further configured to convert the client identifier into a local identifier, and establish the client identifier. an identifier and the local identifier mapping table, wherein the local identifier is used to uniquely identify the multiple clients.
  • the processing unit 14 is further configured to: fill the data description information into the SQE corresponding to the access queue, wherein the SQE includes the local identifier; according to the corresponding access queue
  • the sending unit 12 is specifically configured to: return the processing result to the client corresponding to the access request according to the QP corresponding to the client identifier.
  • each unit of the network device may be added, reduced or combined as required.
  • the operations and/or functions of each unit in the network device are to implement the corresponding processes of the methods described in FIG. 4 , FIG. 5 , FIG. 7 , FIG. 8 and FIG.
  • FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 20 includes a processor 21 , a communication interface 22 and a memory 23 , and the processor 21 , the communication interface 22 and the memory 23 are connected to each other through an internal bus 24 .
  • the computing device 20 may be the network device in FIG. 3 .
  • the functions performed by the network device in FIG. 3 are actually performed by the processor 21 of the network device.
  • the processor 21 may be composed of one or more general-purpose processors, such as a central processing unit (central processing unit, CPU), or a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • CPLD complex programmable logic device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the bus 24 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus 24 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the memory 23 may include volatile memory (volatile memory), such as random access memory (random access memory, RAM); the memory 23 may also include non-volatile memory (non-volatile memory), such as read-only memory (read- only memory, ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (solid-state drive, SSD); the memory 23 may also include a combination of the above types.
  • volatile memory volatile memory
  • non-volatile memory such as read-only memory (read- only memory, ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (solid-state drive, SSD
  • the memory 23 may also include a combination of the above types.
  • the program code may be used to implement the functional units shown in the network device 10, or to implement the method steps in the method embodiments shown in FIG. 4, FIG. 5, FIG. 7, FIG. 8 and FIG. .
  • Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored.
  • the program When the program is executed by a processor, it can implement some or all of the steps described in the above method embodiments, and implement the above The function of any one of the functional units described in Figure 10.
  • Embodiments of the present application also provide a computer program product, which, when run on a computer or a processor, causes the computer or processor to execute one or more steps in any one of the above methods. If each constituent unit of the above-mentioned device is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the computer-readable storage medium.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be implemented in the present application.
  • the implementation of the examples constitutes no limitation.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种数据访问方法及相关设备。其中,该方法包括:通过网络设备接收与所述网络设备连接的多个客户端发送的访问请求,并将所述访问请求发送至存储单元的一个访问队列;利用所述存储单元执行所述访问队列中的访问请求,并返回所述多个客户端的访问请求的处理结果;通过所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。上述方法能够利用一个访问队列处理多个客户端发送的访问请求,实现了一个访问队列对应多个客户端,突破了访问队列的固有数量限制,支持大规模组网连接,扩展了适用场景。

Description

一种数据访问方法及相关设备 技术领域
本发明涉及存储技术领域,尤其涉及一种数据访问方法及相关设备。
背景技术
随着近年来大数据、云计算以及人工智能等计算机信息技术的快速发展,全球互联网数据规模呈指数级增长,众多高并发、低时延应用对高性能硬件的需求催生了高性能存储器的出现。对于高性能存储器而言,由于其I/O吞吐能力强大,分布式文件系统需要分配较多的计算资源以完成数据处理与数据交换,从而导致系统的传输时延增加,限制了网络传输能力和系统性能。为了解决这个问题,远程直接内存访问(remote direct memory access,RDMA)应运而生,RDMA是一种直接进行远程内存存取的技术,即可以直接将数据从一个系统快速迁移到另一个远程系统存储器中,而不对操作系统造成任何影响,减少了中央处理器(central processing unit,CPU)参与数据传输过程的消耗,解放了内存带宽,进而提升了系统处理业务的性能,具有高带宽、低时延及低CPU占用率的特点。
目前使用RDMA在读写数据时,主机的网络设备首先通过RDMA操作将数据写入到存储设备的内存,存储设备中的CPU需要将内存中的数据再存储至持久性的存储介质,例如固态硬盘(solid state disk,SSD)中。而通过CPU将内存中的数据存储至持久性的存储介质需要消耗CPU资源,从而影响主机与存储设备之间的通信,此外,由于SSD的提交队列(submission queue,SQ)和完成队列(completion queue,CQ)资源有限,导致存储设备只能支持少量的网络设备连接,无法支持大规模网络设备连接。
因此,如何实现在大规模组网连接的场景下,主机的网络设备直接将数据存储至持久性的存储介质,减少对存储设备的CPU占用是目前亟待解决的问题。
发明内容
本发明实施例公开了一种数据访问方法及相关设备,能够在大规模组网连接下直接将数据进行持久化存储,减少对存储设备的CPU占用,扩展适用场景。
第一方面,本申请提供了一种存储设备,包括:网络设备和存储单元,所述存储单元通过所述网络设备连接至多个客户端,所述网络设备用于将所述多个客户端的访问请求发送至所述存储单元的一个访问队列;所述存储单元用于执行所述访问队列中的访问请求,并返回所述访问请求的处理结果;所述网络设备还用于将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
可选的,网络设备可以是支持远程直接内存访问的网络接口控制器RNIC、现场可编程门阵列(field programmable gate array,FPGA)、专用集成电路(application specific integrated circuit,ASIC)芯片等。
在本申请提供的方案中,网络设备将多个客户端的访问请求发送给一个访问队列进行处理,并将访问请求的处理结果返回给对应的客户端,实现了一个访问队列对应多个客户端, 突破了访问队列的固有数量限制,能够支持大规模组网连接,扩展了适用场景。
结合第一方面,在第一方面一种可能的实现方式中,网络设备中存储有所述多个客户端的信息与所述访问队列的对应关系,所述网络设备用于将所述多个客户端的访问请求通过所述对应关系发送至所述存储单元的所述访问队列。
在本申请提供的方案中,网络设备预先存储了客户端的信息与访问队列的对应关系,并通过该对应关系将多个客户端的访问请求发送至一个访问队列,实现了一个访问队列可以处理多个客户端的请求,从而保证存储设备能够支持大规模组网连接。
结合第一方面,在第一方面一种可能的实现方式中,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备在接收到所述多个客户端中的任意一个客户端的访问请求时,根据所述访问请求中携带的所述客户端对应的连接信息及所述对应关系确定所述访问队列;将所述连接信息及所述访问请求发送至所述访问队列;所述存储单元在返回所述访问请求的处理结果的同时,返回所述连接信息;所述网络设备根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
在本申请提供的方案中,网络设备通过将连接信息以及访问请求同时发送至访问队列,从而对多个客户端进行准确区分,并在存储单元返回处理结果的同时返回所述连接信息,根据返回的连接信息确定所述访问请求对应的客户端,从而实现了在大规模组网连接时,多个客户端同时对应一个访问队列时,可以准确的对多个客户端进行区分并返回处理结果,有效扩展了适用场景。
结合第一方面,在第一方面一种可能的实现方式中,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备在接收到所述多个客户端中的任意一个客户端的访问请求时,为所述访问请求中携带的客户端标识分配一个本地标识,所述本地标识用于唯一标识所述客户端,并建立所述客户端标识、本地标识与所述客户端对应的连接信息的对应关系;将所述访问请求中携带的客户端标识替换为本地标识;将所述访问请求发送至所述连接信息对应的访问队列中;所述网络设备接收到所述存储单元返回的所述访问请求的处理结果时,从所述处理结果获取所述本地标识,根据所述本地标识确定所述客户端对应的连接信息,并将所述处理结果返回给所述连接信息对应的客户端。
应理解,客户端标识是由客户端自己定义的,可能存在不同的客户端所定义的客户端标识是相同的情况,因此通过客户端标识不能准确对各个客户端进行区分,本地标识是由网络设备对每个客户端的客户端标识进行转换得到的,具有唯一性,每个客户端对应的本地标识都不一样,所以通过本地标识可以准确的对各个客户端进行区分。
在本申请提供的方案中,网络设备为访问请求中的客户端标识分配一个本地标识,用以唯一标识该客户端,然后建立客户端标识、本地标识以及与所述客户端对应的连接信息的对应关系,从而避免了在不同客户端所定义的客户端标识相同时无法区分的情况,能够对多个客户端进行准确区分,并在存储单元返回处理结果之后,根据本地标识确定所述客户端对应的连接信息,从而实现了在大规模组网连接时,多个客户端同时对应一个访问队列时,可以准确的对多个客户端进行区分并返回处理结果,有效扩展了适用场景。
结合第一方面,在第一方面一种可能的实现方式中,所述多个客户端与所述网络设备建 立的为远程直接内存访问RDMA连接,所述连接信息为建立RDMA连接时生成的队列对QP。
第二方面,本申请提供了一种数据访问方法,所述方法包括:网络设备接收与所述网络设备连接的多个客户端发送的访问请求,并将所述访问请求发送至存储单元的一个访问队列;所述网络设备接收所述存储单元执行所述访问队列中的访问请求后返回的所述访问请求的处理结果;所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
结合第二方面,在第二方面的一种可能的实现方式中,所述网络设备中存储有所述多个客户端的信息与所述访问队列的对应关系,所述网络设备根据所述映射关系将所述多个客户端的访问请求发送至所述存储单元的所述访问队列。
结合第二方面,在第二方面的一种可能的实现方式中,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备将所述多个客户端的访问请求发送至所述存储单元的一个访问队列,包括:在接收到所述多个客户端中的任意一个客户端的访问请求时,根据所述访问请求中携带的所述客户端对应的连接信息及所述对应关系确定所述访问队列;将所述连接信息及所述访问请求发送至所述访问队列;所述处理单元返回的所述处理结果包括所述连接信息,所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端,包括:所述网络设备根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
结合第二方面,在第二方面的一种可能的实现方式中,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备将所述多个客户端的访问请求发送至所述存储单元的一个访问队列,包括:在接收到所述多个客户端中的任意一个客户端的访问请求时,为所述访问请求中携带的客户端标识分配一个本地标识,所述本地标识用于唯一标识所述客户端,并建立所述客户端标识、本地标识与所述客户端对应的连接信息的对应关系;将所述访问请求中携带的客户端标识替换为本地标识;将所述访问请求发送至所述连接信息对应的访问队列中;所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端,包括:所述网络设备接收到所述存储单元返回的所述访问请求的处理结果时,从所述处理结果获取所述本地标识,根据所述本地标识确定所述客户端对应的连接信息,并将所述处理结果返回给所述连接信息对应的客户端。
结合第二方面,在第二方面的一种可能的实现方式中,所述多个客户端与所述网络设备建立的为远程直接内存访问RDMA连接,所述连接信息为建立RDMA连接时生成的队列对QP。
第三方面,本申请提供了一种网络设备,包括:接收单元,用于接收与所述网络设备连接的多个客户端发送的访问请求;发送单元,用于将所述访问请求发送至存储单元的一个访问队列;所述接收单元,还用于接收所述存储单元执行所述访问队列中的访问请求后返回的所述访问请求的处理结果;所述发送单元,还用于将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
结合第三方面,在第三方面的一种可能的实现方式中,所述网络设备还包括存储单元,所述存储单元,用于存储所述多个客户端的信息与所述访问队列的对应关系,所述发送单元, 具体用于:根据所述映射关系将所述多个客户端的访问请求发送至所述存储单元的所述访问队列。
结合第三方面,在第三方面的一种可能的实现方式中,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备还包括处理单元,所述处理单元,用于在接收到所述多个客户端中的任意一个客户端的访问请求时,根据所述访问请求中携带的所述客户端对应的连接信息及所述对应关系确定所述访问队列;所述发送单元,具体用于将所述连接信息及所述访问请求发送至所述访问队列;所述发送单元,还用于根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
结合第三方面,在第三方面的一种可能的实现方式中,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备还包括处理单元,所述处理单元,用于在接收到所述多个客户端中的任意一个客户端的访问请求时,为所述访问请求中携带的客户端标识分配一个本地标识,所述本地标识用于唯一标识所述客户端,并建立所述客户端标识、本地标识与所述客户端对应的连接信息的对应关系,将所述访问请求中携带的客户端标识替换为本地标识;所述发送单元,具体用于将所述访问请求发送至所述连接信息对应的访问队列中;所述处理单元,还用于在接收到所述存储单元返回的所述访问请求的处理结果时,从所述处理结果获取所述本地标识,根据所述本地标识确定所述客户端对应的连接信息;所述发送单元,还用于根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
结合第三方面,在第三方面的一种可能的实现方式中,所述多个客户端与所述网络设备建立的为远程直接内存访问RDMA连接,所述连接信息为建立RDMA连接时生成的队列对QP。
第四方面,本申请提供了一种计算设备,所述计算设备包括处理器和存储器,所述处理器和所述存储器通过内部总线相连,所述存储器中存储有指令,所述处理器调用所述存储器中的指令以执行上述第而方面以及结合上述第二方面中的任意一种实现方式所提供的数据访问的方法。
第五方面,本申请提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,当所述计算机程序被处理器执行时,可以实现上述第二方面以及结合上述第二方面中的任意一种实现方式所提供的数据访问方法的流程。
第六方面,本申请提供了一种计算机程序产品,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第二方面以及结合上述第二方面中的任意一种实现方式所提供的数据访问方法的流程。
附图说明
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术 人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据写入固态硬盘的示意图;
图2是本申请实施例提供的另一种数据写入固态硬盘的示意图;
图3是本申请实施例提供的一种系统架构的示意图;
图4是本申请实施例提供的一种连接建立方法的流程示意图;
图5是本申请实施例提供的一种数据写入方法的流程示意图;
图6是本申请实施例提供的一种提交队列描述结构格式示意图;
图7是本申请实施例提供的一种数据读取方法的流程示意图;
图8是本申请实施例提供的另一种数据写入方法的流程示意图;
图9是本申请实施例提供的另一种数据读取方法的流程示意图;
图10是本申请实施例提供的一种网络设备的结构示意图;
图11是本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
下面结合附图对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。
首先,结合附图对本申请中所涉及的部分用语和相关技术进行解释说明,以便于本领域技术人员理解。
主机又可以称为客户端,具体可以包括物理机、虚拟机、容器等,用于产生或消费数据,例如应用服务器、分布式文件系统服务器等。
主机的网络设备为主机用于数据通信的设备,具体可以包括网络接口控制器(network interface controller,NIC)、RNIC等。
主机的访问请求主要包括数据读写操作,即主机将产生的数据写入到存储设备的存储单元中,或者从存储设备的存储单元中读取数据。
存储设备又可以称为服务端,具体可以包括外置集中式存储或分布式存储等形态的能够存储数据的设备,例如存储服务器、分布式数据库服务器等。
存储设备的网络设备为存储设备用于数据通信的设备,具体可以包括NIC、RNIC等,存储设备的存储单元为存储设备用于进行数据持久化存储的设备,例如SSD等。
固态存储(SSD)的提交队列(submission queue,SQ)和门铃(doorbell):在存储设备中,存储设备的CPU和SSD通过NVMe协议通信,在存储设备启动时的初始化阶段,存储设备的CPU会通过所述NVMe协议在所述存储设备的内存中为所述SSD建立提交队列(submission queue,SQ)和完成队列(completion queue,CQ),在SSD中创建门铃,CPU把发送给SSD的命令存储在SQ中,并将所述命令在所述SQ的位置写在所述门铃中,并通过所述SSD从所述SQ中获取命令执行,当所述SSD执行完一个命令后,将执行完的命令的信息存储在所述完成队列中,CPU通过读取所述完成队列中的执行完的命令的信息,即可确定执行完成的命令,并将执行完成的命令从所述发送队列中删除。
RDMA通信协议是一种用于进行RDMA操作的计算设备所遵循的一套协议规范,目前有三种支持RDMA的通信协议,分别是无限带宽(infiniBand,IB)协议、以太网(RDMA over  converged ethernet,RoCE)协议、因特网广域(internet wide area RDMA protocal,IWARP)协议,这三种协议都可以使用同一套API来使用,但它们有着不同的物理层和链路层。在主机设备与存储设备之间通过RDMA进行通信时,会在主机的网卡中创建发送队列(send queue,SQ),相应的会在存储设备的网卡中创建与该发送队列相对应的接收队列(receive queue,RQ),发送队列和接收队列形成队列对(queue pair,QP),将所述队列的地址映射给应用的虚拟地址,应用即可直接通过所述QP将数据传输至所述存储设备的网卡中,进而可以将数据存储于存储设备的内存中。
目前在主机通过RDMA传输数据的时候,会首先将数据传输至存储设备的内存中,然后通过存储设备的CPU将数据从内存搬移至SSD。如图1所示,其示出了一种数据写入场景示意图,主机110中的网络设备1110首先通过RDMA操作将数据写入到存储设备120的网络设备(即RNIC1240中),然后RNIC1240在CPU1210中的RNIC驱动1211协助下将数据写入到内存1220中,存储设备120中的存储软件1212通过事件或中断的方式感知到数据被写入到内存1220中,之后CPU1210通过SSD驱动1213控制SSD1230将数据从内存1220搬移至SSD1230中进行持久化存储,SSD1230在完成对数据持久化存储之后,通过中断的方式告知存储软件1212,最后CPU1210通过RNIC1240向主机110返回写完成通知消息。
由于在对数据进行持久化存储时,需要CPU(包括RNIC驱动、存储软件和SSD驱动)的参与才能完成整个存储过程,这样将消耗大量CPU资源。
为了减少对CPU的占用,降低处理时延,将数据直接写入到SSD中,可以利用SSD的SQ地址和主机与存储设备之间的QP进行一一绑定从而实现。如图2所示,其示出了另一种数据写入场景示意图,应用服务器210、应用服务器220、应用服务器230通过RDMA网络与存储服务器240进行连接,每个应用服务器的结构都是类似的,以应用服务器210为例,应用服务器210包括CPU211和内存212,并连接有RNIC213;存储服务器240包括CPU241和内存242,并连接有RNIC243和持久化存储介质,这里以SSD244为例进行说明,应理解,持久化存储介质包括但不限于SSD244。每一个应用服务器与存储服务器连接形成一个QP,因此存储服务器240的内存242中存在多个QP,例如QP1、QP2和QP3,每一个QP对应存储服务器240和一个应用服务器的连接,此外,SSD244中包括多个SQ和CQ,其最大数量可支持到64k,但基于内存和性能考虑,目前典型选用256个。以应用服务器210为例说明SQ和QP的具体绑定过程,应用服务器210将数据通信需要的内存212注册给RNIC213,存储服务器240将数据通信需要的内存242注册给RNIC243,以使得RNIC213和RNIC243可以通过RDMA方式操作内存242和内存212,同时存储服务器240将SSD244的SQ与所保存的QP进行一一绑定,例如将与应用服务器210连接的QP1与SQ1进行绑定,然后对SQ1的地址进行映射,并将映射后得到的虚拟地址注册给RNIC243,RNIC243将SQ1地址通过RDMA连接发送给RNIC213,以使得RNIC213可以直接远程操作SSD244的SQ1地址。在数据写入时,应用服务器210通过CPU211产生数据并将产生的数据存储至内存212中,然后通过RNIC213将数据写入存储服务器240的内存242中,并根据SSD244的SQ地址通知SSD244将内存242中的数据搬移至SSD244中进行持久化存储,若其它应用服务器需要向存储服务器240写入数据,其过程与上述类似,在此不再赘述。
需要说明的是,上述方案通过将QP与SSD中的SQ进行一一绑定,能够绕过存储服务器的CPU和软件参与,直接将数据写入SSD中,但是受限于SSD中的SQ数量,当连接数量过多时,上述方案将不再适用,即无法支持大规模组网场景。
基于上述,本申请提供了一种数据访问方法,在存储设备的连接数量远远超过SSD所支持的SQ数量时,通过扩展SSD的提交队列描述结构格式(SQE)或对应用服务器的客户端标识进行转换,以使得存储设备可以将多个客户端的访问请求发送至存储单元的一个访问队列,即存储设备的多个连接可以绑定至一个SQ,从而支持大规模组网连接,扩展了适用场景。
本申请实施例的技术方案可以应用于任何需要远程访问持久性存储介质的系统中,尤其是连接数较多的大规模组网场景,例如,分布式存储、高性能计算(high performance computing,HPC)等。例如,在分布式存储中,存储设备将同时连接大量的应用服务器,存储设备需要支持每个应用服务器直接访问SSD时,分布式存储系统中可使用本申请实施例提供的数据访问方法,从而可以解决数据在进行读写时存在的带宽瓶颈,提高数据读写效率。
图3示出了本申请实施例的一种系统架构的示意图。如图3所示,该系统300包括:应用服务器310、应用服务器320、应用服务器330和存储服务器340,应用服务器310、应用服务器320和应用服务器330通过RDMA网络与存储服务器220进行连接,应用服务器310包括CPU311和内存312,并连接有RNIC213,应用服务器320和应用服务器330的结构与应用服务器310类似;存储服务器340包括CPU341和内存342,并连接有RNIC343和存储单元,这里以SSD344为例进行说明,应理解,存储单元包括但不限于SSD344。由于应用服务器310、应用服务器320和应用服务器330同时与存储服务器340连接,所以存储服务器340的内存342中存在三个QP,即QP1、QP2和QP3,存储服务器240将QP1和QP2与SQ1进行绑定,将QP3与SQ2进行绑定。在完成上述流程之后,可以进一步进行数据的读写操作,例如以应用服务器310向存储服务器340写入数据为例,应用服务器310通过CPU311产生数据并将产生的数据存储至内存312中,然后通过RNIC313将数据以及数据描述信息写入存储服务器340的内存242中,其中,数据描述信息包括数据在内存242中的起始地址、数据长度、操作类型等,为与NVMe协议进行兼容和匹配,数据描述信息的大小优选为64字节,存储服务器340在接收到应用服务器310写入的数据之后,RNIC343根据预先设定的绑定关系,确定与应用服务器310连接QP1对应的SQ为SQ1,应用服务器340根据数据描述信息,填充SQ1对应的提交队列描述结构格式(SQE)字段,并利用该SQE中的保留(reserved)字段保存QP编号(QP number,QPN),即QP1对应的编号,或者是,将数据描述信息中携带的客户端标识转换为本地标识,并将其保存在SQE中,然后通知SSD344将内存342中的数据搬移至SSD344中进行持久化存储,SSD344在完成存储之后,将QPN信息拷贝到CQ的CQE内,存储服务器340根据CQE确定相应的QPN,或者是,存储服务器340根据CQE中的本地标识确定对应的客户端标识及相应的QPN,进而根据QPN找到对应的QP,然后通过该QP向应用服务器310回复写数据完成消息。
在本申请实施例中,RNIC313、RNIC323和RNI333可以是可编程RNIC,SSD344是可编程SSD,可以主动感知SQ的完成状态并主动上报,应用服务器310、应用服务器320、应用服务器330和存储服务器340包括物理机、虚拟机、容器等形态,可以部署在云环境上的一个或 多个计算设备(例如中心服务器),或者边缘环境中的一个或多个计算设备(例如服务器)上。
可以看出,图3所示的数据访问系统与图2所示的数据访问系统相比,存储服务器可以将与多个应用服务器的连接(QP)同时绑定至同一个SQ,将多个应用服务器的访问请求发送至一个SQ,突破了SSD的SQ固有数量限制,能够支持大规模组网连接,扩展了适用场景。
结合图3所示的系统架构的示意图,下面将结合图4描述本申请实施例提供的数据访问方法。首先对数据访问之前的连接建立和内存注册过程进行描述,以应用服务器310与存储服务器340建立连接为例进行说明,其它应用服务器与应用服务器310类似,如图4所示,该流程包括:
S401:应用服务器310和存储服务器340建立RDMA连接。
可选的,应用服务器310和存储服务器340可以是基于IB、RoCE或IWARP任一个协议建立RDMA连接。
具体地,应用服务器310和存储服务器340将需要进行数据通信的内存地址(可以是连续的虚拟内存或者连续的物理内存空间)进行注册,提供给网络设备作为虚拟的连续缓冲区,缓冲区使用虚拟地址,为了便于理解和叙述,本申请实施例中以网络设备为RNIC为例进行说明,在后续描述中不再做进一步区分。例如,应用服务器310将内存312注册给RNIC313,存储服务器340将内存342注册给RNIC343。应理解,在注册时,应用服务器310和存储服务器340的操作系统会检查被注册块的许可,注册进程将需要被注册的内存的虚拟地址与物理地址的映射表写入RNIC,此外,在注册内存时,对应内存区域的权限将会被设定,权限包括本地写、远程读、远程写等。注册后内存注册进程锁定了内存页,为了防止内存页被替换出去,注册进程需要同时保持物理和虚拟内存的映射。
可选的,应用服务器310和存储服务器340在进行内存注册时,可以将自身所有的内存进行注册,或者是随机选取部分内存进行注册,在进行注册时,将需要注册的内存的起始地址和数据长度提供给RNIC,以使得RNIC可以确定需要注册的内存。
值得说明的是,每个内存注册都会对应生成一个远程标识(key)和一个本地标识,远程标识用于远端主机访问本地内存,本地标识用于本地主机访问本地内存。例如,在接收数据操作的期间,存储服务器340将内存注册产生的远程标识提供给应用服务器310,以使得应用服务器310在RDMA操作期间可以远程访问存储服务器340的系统内存342。另外,同一内存缓冲区可以被多次注册(甚至设置不同的操作权限),并且每次注册都会生成不同的标识。
进一步的,应用服务器和存储服务器在建立RDMA连接的过程中将会协商创建QP,在创建QP时将会创建关联的发送队列SQ和接收队列RQ,在创建完成之后,应用服务器310和存储服务器340可以利用QP进行通信。
可以理解,在应用服务器310和存储服务器340建立RDMA连接之后,应用服务器310可以通过RDMA方式远程操作存储服务器340的内存342。
S402:存储服务器340对SSD344的SQ地址和doorbell地址进行映射并将其注册给RNIC343。
具体地,在存储服务器340的初始化阶段,存储服务器340在内存342中已经为SSD344建立了SQ,且在SSD344中建立了doorbell,以实现存储服务器340中的CPU341与SSD344 进行通信。需要说明的是,该SQ地址和doorbell地址是内核态的内存地址空间中的地址,不能直接注册给RNIC343,需要将其转换为用户态的虚拟地址才能进行注册。
进一步的,存储服务器340将SSD的SQ地址和doorbell地址映射为逻辑连续的用户态虚拟地址,然后将映射得到的虚拟地址提供给存储服务器的RNIC343进行注册,其注册过程与上述内存注册过程类似,在此不再赘述。可选的,存储服务器340可以是以内存映射(memory mapping,MMAP)的方式完成映射过程,从而将SQ地址和doorbell地址映射为用户态的虚拟地址,保证可以与其进行正常通信。
S403:存储服务器340将QP与SSD344的SQ进行绑定。
具体地,SSD344在初始化阶段将会被分配多个SQ地址,存储服务器340的RNIC343和应用服务器310在内的多个应用服务器的RNIC在建立RDMA连接时也将会创建多个QP,存储服务器340中的管理软件将SQ地址和QP进行绑定,并将绑定关系发送给RNIC343进行保存。
需要说明的是,对于每一个应用服务器与存储服务器340的连接,存储服务器340可以通过编号等方式对其进行准确区分,即对于每一个QP,其存在唯一一个与之对应的QP编号(QP number,QPN)。
进一步的,存储服务器340将N个QP与一个SQ进行绑定以支持大规模组网连接,其中N的具体数值可以根据实际需要进行设置,例如可以设置为100,本申请对此不作限定。
可以看出,在存储服务器340将SQ地址和QP进行绑定之后,存储服务器340可以根据存储的绑定关系辨别与各个SQ地址对应的QP,进而可以区分不同的客户端或应用服务器。
可以理解,通过执行图4所示的方法流程,应用服务器310和存储服务器340可以成功建立RDMA连接并进行数据传输,应用服务器310可以远程操作存储服务器340的内存342,存储服务器340可以根据QP与SQ的绑定关系,将应用服务器310写入的数据进行持久化存储。
结合图3所示的系统架构以及图4所示的连接建立方法流程,下面将对数据写流程进行详细描述,以应用服务器310写入数据到存储服务器340为例,如图5所示,该流程包括:
S501:应用服务器310中的应用将待写入数据写入到本地内存中。
具体地,应用服务器310中的应用产生需要写入存储服务器340的SSD344的数据,然后将该数据首先保存在应用服务器310的内存312中。
S502:应用服务器310的RNIC313将待写入数据和待写入数据描述信息写入到存储服务器340的内存342中。
具体地,应用服务器310中的应用将RDMA请求发送到应用服务器310的RNIC313,该请求中包括待写入数据在内存312中的地址(例如包括起始地址和数据长度),然后RNIC313根据该请求从应用服务器310的内存312中取出待写入数据,并把待写入数据在存储服务器340中的地址(包括起始地址和数据长度)以及存储服务器340发送的操作该地址对应的内存的远程标识封装到专用报文,同时,将待写入数据描述信息也封装到该专用报文中,其中,待写入数据描述信息包括待写入数据在存储服务器340中的起始地址和数据长度以及数据操作类型(即数据写操作)等,然后将该专用报文通过QP发送至存储服务器340的RNIC343。存储服务器340的RNIC343接收到专用报文之后,根据报文中的远程标识确认应用服务器310是否具备操作存储服务器340的内存342的权限,在确认之后,将待写入数据写入到报文中 的地址对应的内存中,并将待写入数据描述信息也写入内存342中。
S503:存储服务器340的RNIC343根据待写入数据对应的QP和待写入数据描述信息,填充SQ对应的SQE。
具体地,应用服务器310将待写入数据和待写入数据描述信息通过QP写入到存储服务器340的内存342之后,存储服务器340的RNIC343可以根据预先保存的绑定关系确定与该QP对应的SQ,每个SQ包含一个或多个SQE,每个SQE的格式都遵循NVMe协议规定,其大小为64字节,如图6所示,是一种SQE格式示意图,其包括具体命令字段、保留字段、SQ标识符字段、SQ头指针字段、状态字段、命令标识符字段等,存储服务器340的RNIC343根据待写入数据描述信息填充该SQ对应的SQE。
值得说明的是,存储服务器340的RNIC343在填充SQE时,将扩展SQE中的保留字段,利用保留字段保存该QP对应的QPN,以使得SQE携带QPN信息。
S504:存储服务器340的RNIC343将写数据通知信息写入到SSD344的doorbell地址中。
具体地,存储服务器340的RNIC343将写数据通知信息写入到SSD344的doorbell地址中,其中,写数据通知信息包括写入SQE的SQ地址,写数据通知信息用于通知SSD344去读取该SQ地址中的SQE。
S505:SSD344根据doorbell地址中的写数据通知信息,读取SQ地址中的SQE,并根据SQE中的内容将待写入数据从存储服务器340的内存342中搬移至SSD344。
具体地,SSD344在接收到写入doorbell地址中的写数据通知信息后被唤醒,然后读取写数据通知信息中包含的SQ地址中的SQE,确定是数据写操作,然后根据SQE中携带的地址从存储服务器340的内存342中找到待写入数据,并将待写入数据搬移至SSD344,完成持久化存储。
可以看出,待写入数据从存储服务器340的内存342搬移至SSD344不需要任何软件和CPU的参与,直接由SSD344完成,减少了对存储服务器340的CPU占用,且有效降低了成本。
S506:SSD344在完成数据持久化存储之后,将SQE中的QPN信息拷贝至CQ的CQE中,并通知RNIC343写命令完成。
具体地,在NVMe中,每一个SQ都有一个CQ与之对应,且每一个CQ包含一个或多个CQE,每个CQE的大小也为64字节,其格式与上述图6所示的SQE格式类似,SSD344在完成数据持久化存储之后,将SQE中的QPN字段拷贝至CQE中的保留字段,然后通知RNIC343写命令完成。
S507:存储服务器340的RNIC343根据CQE中的QPN信息确定与该QPN信息对应的QP,并利用该QP通知应用服务器310数据写完成。
具体地,RNIC343在接收到SSD344发送的写命令完成通知后,从CQ中读取CQE从而获得QPN信息,然后根据该QPN信息确定与之对应的QP,然后利用该QP通知应用服务器310数据写完成,从而完成整个数据写流程。
可以看出,在将待写入数据写入到SSD344的过程中,当存在多个QP时,通过将多个QP与一个SQ进行绑定,并利用SQE中的保留字段保存QPN,在完成数据写入之后,可以通过CQE中的QPN准确找到相应的QP,从而回复完成消息,可以有效支持大规模组网连接,扩展了适用场景。
图5所述的方法流程详细阐述了数据从应用服务器写入SSD的过程,相应的,应用服务器还可以从SSD中读取数据,下面将对数据读流程进行详细描述,如图7所示,该流程包括:
S701:应用服务器310的RNIC313将待读取数据描述信息写入到存储服务器340的内存342中。
具体地,应用服务器310中的应用产生数据读请求,然后将该数据读请求发送到应用服务器310的RNIC313,该读请求中包括待读取数据在SSD344中的地址(包括起始地址和数据长度)以及数据从SSD344读出之后存储于存储服务器340的内存342中的地址。
进一步的,应用服务器310的RNIC313利用保存的远程标识操作存储服务器340的内存342,将待读取数据描述信息写入到存储服务器340的内存342中,其中,待读取数据描述信息包括待读取数据在SSD344中的起始地址和地址长度、待读取数据需要存储于存储服务器340的内存342中的地址以及数据操作类型(即数据读操作)RNIC343根据待写入数据对应的QP和待写入数据描述信息,填充SQ对应的SQE。
S702:存储服务器340的RNIC343根据待读取数据对应的QP和待读取数据描述信息,填充SQ对应的SQE。
具体地,应用服务器310将待读取数据描述信息通过QP写入到存储服务器340的内存342之后,存储服务器340的RNIC343可以根据预先保存的绑定关系确定与该QP对应的SQ,存储服务器340的RNIC343根据待读取数据描述信息填充该SQ对应的SQE。
同理,存储服务器340的RNIC343在填充SQE时,将扩展SQE中的保留字段,利用保留字段保存该QP对应的QPN,以使得SQE携带QPN信息。
S703:存储服务器340的RNIC343将读数据通知信息写入到SSD344的doorbell地址中。
具体地,存储服务器340的RNIC343将读数据通知信息写入到SSD344的doorbell地址中,其中,读数据通知信息包括写入SQE的SQ地址,读数据通知信息用于通知SSD344去读取该SQ地址中的SQE。
S704:SSD344根据doorbell地址中的读数据通知信息,读取SQ地址中的SQE,并根据SQE中的内容将待读取数据从SSD344搬移至存储服务器340的内存342中。
具体地,SSD344在接收到写入doorbell地址中的读数据通知信息后被唤醒,然后读取读数据通知信息中包含的SQ地址中的SQE,确定是数据读操作,然后根据SQE中携带的地址从SSD344中取出数据,并将该数据搬移至存储服务器340对应的内存342中。
S705:SSD344在完成数据搬移之后,将SQE中的QPN信息拷贝至CQ的CQE中,并通知RNIC343读命令完成。
具体地,SSD344将数据搬移至存储服务器340的内存342中之后,将SQE中的QPN字段拷贝至CQE中的保留字段,然后通知RNIC343读命令完成。
S706:存储服务器340的RNIC343根据CQE中的QPN信息确定与该QPN信息对应的QP,利用该QP将待读取数据写入到应用服务器310的内存312中,并通知应用服务器310数据读完成。
具体地,RNIC343在接收到SSD344发送的读命令完成通知后,从CQ中读取CQE从而获得QPN信息,然后根据该QPN信息确定与之对应的QP,然后利用该QP将待读取数据写入到应用服务器310的内存312中,然后通知应用服务器310数据读完成,从而完成整个数据读 流程。
需要说明的是,图7所示的方法实施例与图5所示的方法实施例基于同一思想,在具体实现过程中可以相互参照,为了简洁,在此不再赘述。
结合图3所示的系统架构以及图4所示的连接建立方法流程,下面将对另一种数据写入的方法进行详细描述,如图8所示,该流程包括:
S801:存储服务器340接收应用服务器写入内存342中的待写入数据和待写入数据描述信息。
具体地,与存储服务器340连接的各个应用服务器利用各自的QP将应用产生的数据及数据描述信息通过RNIC写入存储服务器340的内存342中,例如,应用服务器310利用QP1将待写入数据和待写入数据描述信息写入存储服务器340的内存342中、应用服务器320利用QP2将待写入数据和待写入数据描述信息写入存储服务器340的内存342中。其中,待写入数据描述信息包括待写入数据在存储服务器340中的起始地址和数据长度以及数据操作类型(即数据写操作)等。
需要说明的是,待写入数据描述信息中还携带着客户端标识(即cid),该客户端标识由每个应用服务器自己定义,因此,不同的应用服务器所定义的客户端标识可能相同,例如应用服务器310所定义的客户端标识为cid1,应用服务器320所定义的客户端标识也为cid1。
应理解,对于每一个应用服务器来说,其与存储服务器340的连接(QP)和自身所定义的客户端标识存在对应关系,即通过该客户端标识可以确定与之对应的QPN。
S802:存储服务器340的RNIC343将各个应用服务器的客户端标识转换为本地标识,并建立客户端标识与本地标识映射表。
具体地,由于客户端标识是由应用服务器自己定义的,可能存在相同的情况,因此通过客户端标识无法对不同的应用服务器进行准确区分。所以,存储服务器340的RNIC343需要将各个应用服务器的客户端标识转换为本地唯一标识,从而可以准确区分不同的应用服务器。
示例性的,应用服务器310写入内存342中的待写入数据描述信息中所携带的客户端标识为00000001,应用服务器320写入内存342中的待写入数据描述信息中所携带的客户端标识也为00000001,应用服务器310写入内存342中的待写入数据描述信息中所携带的客户端标识为00000101,RNIC343对接收到的各个应用服务器所对应的客户端标识进行转换,将其转换为本地唯一标识,例如,将应用服务器310所对应的客户端标识转换为00000001,将应用服务器320所对应的客户端标识转换为00000010,将应用服务器330所对应的客户端标识转换为00000011,可以理解,在经过转换后,每个应用服务器所对应的标识是唯一的,可以利用转换后的本地标识对不同的应用服务器进行准确区分。
此外,RNIC343在完成标识转换之后,还将建立客户端标识与本地标识映射表,可选的,RNIC343可以利用哈希表记录客户端标识与本地标识的映射关系,在该哈希表中,其关键字(key)为本地标识,其值(value)为客户端标识以及与之对应的QPN,RNIC343可以通过该哈希表查询到各个应用服务器的客户端标识以及与之对应的本地标识。
S803:存储服务器340的RNIC343根据待写入数据描述信息,填充SQ对应的SQE。
具体地,存储服务器340在接收到各个应用服务器利用各自的QP将应用产生的数据及数据描述信息通过RNIC写入存储服务器340的内存342中之后,RNIC343可以根据预先保存的 绑定关系确定各个QP对应的SQ,然后根据待写入数据描述信息填充SQ对应的SQE,值得说明的是,在填充SQE的过程中,对于SQE中的标识符字段,RNIC343将对其进行更改,将应用服务器所对应的本地标识填充至该字段中,例如,对于应用服务器320来说,RNIC343会将00000010填充至该字段中,而不是将00000001填充至该字段中。
S804:存储服务器340的RNIC343将写数据通知信息写入到SSD344的doorbell地址中。
具体地,RNIC343将写数据通知信息写入到SSD344的doorbell地址中,其中,写数据通知信息包括写入SQE的SQ地址,写数据通知信息用于通知SSD344去读取该SQ地址中的SQE。
S805:SSD344根据doorbell地址中的写数据通知信息,读取SQ地址中的SQE,并根据SQE中的内容将待写入数据从存储服务器340的内存342中搬移至SSD344。
具体地,SSD344在接收到写入doorbell地址中的写数据通知信息后被唤醒,然后读取写数据通知信息中包含的SQ地址中的SQE,确定是数据写操作,然后根据SQE中携带的地址从存储服务器340的内存342中找到待写入数据,并将待写入数据搬移至SSD344,完成持久化存储。
S806:SSD344在完成数据持久化存储之后,通知RNIC343写命令完成。
具体地,SSD344在完成数据持久化存储之后,在该SQ对应的CQ中填入CQE,CQE的格式与SQE格式一致,且CQE中也包含标识符字段,该字段保存了应用服务器对应的本地标识,然后通知RNIC343写命令完成。
S807:存储服务器340的RNIC343根据CQE中的本地标识,查询客户端标识与本地标识映射表,确定与该本地标识对应的客户端标识,从而确定与该客户端标识对应的QPN,并利用该QPN对应的QP通知应用服务器数据写完成。
具体地,RNIC343在接收到SSD344发送的写命令完成通知后,从CQ中读取CQE从而获得本地标识,然后根据该本地标识查询客户端标识与本地标识映射表,得到与该本地标识对应的客户端标识和QPN,然后根据该QPN确定相应的QP,最后利用该QP通知应用服务器数据写完成,从而完成整个数据写流程。
可以看出,在将待写入数据写入到SSD334的过程中,当存在多个应用服务器(即多个QP),且各个应用服务器所定义的客户端标识可能相同时,通过将多个QP与一个SQ进行绑定,并将各个QP对应的客户端标识转换为本地唯一标识,将转换后的本地标识保存至SQE中的标识符字段,在完成数据写入之后,可以根据CQE中的本地标识,通过查询客户端标识与本地标识映射表准确找到相应的客户端标识和QP,对不同应用服务器进行准确区分,从而向其回复完成消息。这样可以有效支持大规模组网连接,扩展了适用场景。
图8所述的方法流程详细阐述了数据从应用服务器写入SSD的过程,相应的,应用服务器还可以从SSD中读取数据,下面将对数据读流程进行详细描述,如图9所示,该流程包括:
S901:存储服务器340接收应用服务器写入内存342中的待读取数据描述信息。
具体地,应用服务器利用各自的QP将待读取数据描述信息通过RNIC写入存储服务器340的内存342中,其中,待读取数据描述信息包括待读取数据在SSD344中的起始地址和数据长度以及数据操作类型(即数据读操作)等,此外,待读取数据描述信息还携带着客户端标识。其具体过程可以参照上述S801中的相关描述,在此不再赘述。
S902:存储服务器340的RNIC343将各个应用服务器的客户端标识转换为本地标识,并建立客户端标识与本地标识映射表。
具体地,RNIC343在将各个应用服务器的客户端标识转换为本地唯一标识之后,可以利用哈希表记录客户端标识与本地标识的映射关系,其具体过程参照上述S802中的相关描述。
S903:存储服务器340的RNIC343根据待读取数据描述信息,填充SQ对应的SQE。
具体地,RNIC343根据预先保存的绑定关系确定各个QP对应的SQ,然后根据待读取数据描述信息填充SQ对应的SQE,并将转换后的应用服务器的本地标识填充至SQE中的标识符字段,其具体过程可以参照上述S803中的相关描述。
S904:存储服务器340的RNIC343将读数据通知信息写入到SSD344的doorbell地址中。
具体地,RNIC343将读数据通知信息写入到SSD344的doorbell地址中,其中,读数据通知信息包括写入SQE的SQ地址,读数据通知信息用于通知SSD344去读取该SQ地址中的SQE。
S905:SSD344根据doorbell地址中的读数据通知信息,读取SQ地址中的SQE,并根据SQE中的内容将待读取数据从SSD344搬移至存储服务器340的内存342中。
具体地,SSD344在接收到写入doorbell地址中的读数据通知信息后被唤醒,然后读取读数据通知信息中包含的SQ地址中的SQE,确定是数据读操作,然后根据SQE中携带的地址从SSD344中找到待读取数据,并将待读取数据搬移至存储服务器340的内存342中。
S906:SSD344在完成数据搬移之后,通知RNIC343读命令完成。
具体地,SSD344在完成数据搬移之后,在该SQ对应的CQ中填入CQE,CQE的格式与SQE格式一致,且CQE中也包含标识符字段,该字段保存了应用服务器对应的本地标识,然后通知RNIC343读命令完成。
S907:存储服务器340的RNIC343根据CQE中的本地标识,查询客户端标识与本地标识映射表,确定与该本地标识对应的客户端标识,从而确定与该客户端标识对应的QPN,并利用该QPN对应的QP将待读取数据写入到应用服务器的内存中,然后通知应用服务器数据读完成。
具体地,RNIC343在接收到SSD344发送的读命令完成通知后,从CQ中读取CQE从而获得本地标识,然后根据该本地标识查询客户端标识与本地标识映射表,得到与该本地标识对应的客户端标识和QPN,然后根据该QPN确定相应的QP,最后利用该QP将待读取数据写入到应用服务器的内存中,并通知应用服务器数据读完成,从而完成整个数据读流程。
应理解,图9所示的方法实施例与图5所示的方法实施例基于同一思想,在具体实现过程中可以相互参照,为了简洁,在此不再赘述。
上述详细阐述了本申请实施例的方法,为了便于更好的实施本申请实施例的上述方案,相应地,下面还提供用于配合实施上述方案的相关设备。
参见图10,图10是本申请实施例提供的一种网络设备的结构示意图。如图10所示,该网络设备10包括接收单元11和发送单元12。其中,
接收单元11,用于接收与所述网络设备10连接的多个客户端发送的访问请求;
发送单元12,用于将所述访问请求发送至存储单元的一个访问队列;
所述接收单元11,还用于接收所述存储单元执行所述访问队列中的访问请求后返回的所 述多个客户端的访问请求的处理结果;
所述发送单元12,还用于将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
作为一个实施例,所述网络设备10还包括存储单元13,所述存储单元13,用于存储所述多个客户端的信息与所述访问队列的对应关系,所述发送单元12,具体用于:根据所述映射关系将所述多个客户端的访问请求发送至所述存储单元的所述访问队列。
作为一个实施例,所述访问请求包括数据描述信息,所述网络设备10还包括处理单元14,所述处理单元14,用于将所述数据描述信息填充至所述访问队列对应的SQE中,并将所述多个客户端对应的QPN信息保存至所述SQE的保留字段中。
作为一个实施例,所述处理单元14,还用于根据所述访问队列对应的完成队列所对应的CQE中的QPN信息确定与存储单元返回的访问请求的处理结果对应的客户端,所述CQE中的QPN信息为所述存储单元在执行所述访问队列中的访问请求后通过拷贝所述SQE中的QPN信息得到;所述发送单元12,具体用于:根据所述QPN信息对应的QP将所述处理结果返回给所述访问请求对应的客户端。
作为一个实施例,所述访问请求包括数据描述信息,所述数据描述信息携带客户端标识,所述处理单元14,还用于将所述客户端标识转换为本地标识,并建立所述客户端标识与所述本地标识映射表,其中,所述本地标识用于对所述多个客户端进行唯一标识。
作为一个实施例,所述处理单元14,还用于:将所述数据描述信息填充至所述访问队列对应的SQE中,其中,所述SQE中包括所述本地标识;根据所述访问队列对应的完成队列所对应的CQE中的本地标识,查询所述客户端标识与本地标识映射表,确定与所述本地标识对应的客户端标识及与存储单元返回的访问请求的处理结果对应的客户端;所述发送单元12,具体用于:根据所述客户端标识对应的QP将所述处理结果返回给所述访问请求对应的客户端。
应理解,上述网络设备的结构仅仅作为一种示例,不应构成具体的限定,可以根据需要对网络设备的各个单元进行增加、减少或合并。此外,网络设备中的各个单元的操作和/或功能分别为了实现上述图4、图5、图7、图8和图9所描述的方法的相应流程,为了简洁,在此不再赘述。
参见图11,图11是本申请实施例提供的一种计算设备的结构示意图。如图11所示,该计算设备20包括:处理器21、通信接口22以及存储器23,所述处理器21、通信接口22以及存储器23通过内部总线24相互连接。
所述计算设备20可以是图3中的网络设备。图3中的网络设备所执行的功能实际上是由所述网络设备的处理器21来执行。
所述处理器21可以由一个或者多个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。
总线24可以是外设部件互连标准(peripheral component interconnect,PCI)总线或 扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线24可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但不表示仅有一根总线或一种类型的总线。
存储器23可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器23也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM)、快闪存储器(flash memory)、硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器23还可以包括上述种类的组合。程序代码可以是用来实现网络设备10所示的功能单元,或者用于实现图4、图5、图7、图8和图9所示的方法实施例中以网络设备为执行主体的方法步骤。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤,以及实现上述图10所描述的任意一个功能单元的功能。
本申请实施例还提供了一种计算机程序产品,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。上述所涉及的设备的各组成单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (17)

  1. 一种存储设备,其特征在于,包括:网络设备及存储单元,
    所述存储单元通过所述网络设备连接至多个客户端,所述网络设备用于将所述多个客户端的访问请求发送至所述存储单元的一个访问队列;
    所述存储单元用于执行所述访问队列中的访问请求,并返回所述访问请求的处理结果;
    所述网络设备还用于将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
  2. 如权利要求1所述的存储设备,其特征在于,所述网络设备中存储有所述多个客户端的信息与所述访问队列的对应关系,所述网络设备用于将所述多个客户端的访问请求通过所述对应关系发送至所述存储单元的所述访问队列。
  3. 如权利要求1或2所述的存储设备,其特征在于,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备在用于将所述多个客户端的访问请求发送至所述存储单元的一个访问队列时,具体用于:
    在接收到所述多个客户端中的任意一个客户端的访问请求时,根据所述访问请求中携带的所述客户端对应的连接信息及所述对应关系确定所述访问队列;
    将所述连接信息及所述访问请求发送至所述访问队列;
    所述存储单元在返回所述访问请求的处理结果的同时,返回所述连接信息;
    所述网络设备根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
  4. 如权利要求1或2所述的存储设备,其特征在于,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备在用于将所述多个客户端的访问请求发送至所述存储单元的一个访问队列时,具体用于:
    在接收到所述多个客户端中的任意一个客户端的访问请求时,为所述访问请求中携带的客户端标识分配一个本地标识,所述本地标识用于唯一标识所述客户端,并建立所述客户端标识、本地标识与所述客户端对应的连接信息的对应关系;
    将所述访问请求中携带的客户端标识替换为本地标识;
    将所述访问请求发送至所述连接信息对应的访问队列中;
    所述网络设备接收到所述存储单元返回的所述访问请求的处理结果时,从所述处理结果获取所述本地标识,根据所述本地标识确定所述客户端对应的连接信息,并将所述处理结果返回给所述连接信息对应的客户端。
  5. 如权利要求3或4所述的存储设备,其特征在于,所述多个客户端与所述网络设备建立的为远程直接内存访问RDMA连接,所述连接信息为建立RDMA连接时生成的队列对QP。
  6. 一种数据访问方法,其特征在于,所述方法包括:
    网络设备接收与所述网络设备连接的多个客户端发送的访问请求,并将所述访问请求发送至存储单元的一个访问队列;
    所述网络设备接收所述存储单元执行所述访问队列中的访问请求后返回的所述访问请求的处理结果;
    所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
  7. 如权利要求6所述的方法,其特征在于,所述网络设备中存储有所述多个客户端的信息与所述访问队列的对应关系,所述通过网络设备将所述多个客户端的访问请求发送至存储单元的一个访问队列,包括:
    所述网络设备根据所述映射关系将所述多个客户端的访问请求发送至所述存储单元的所述访问队列。
  8. 如权利要求6或7所述的方法,其特征在于,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备将所述多个客户端的访问请求发送至所述存储单元的一个访问队列,包括:
    在接收到所述多个客户端中的任意一个客户端的访问请求时,根据所述访问请求中携带的所述客户端对应的连接信息及所述对应关系确定所述访问队列;
    将所述连接信息及所述访问请求发送至所述访问队列;
    所述处理单元返回的所述处理结果包括所述连接信息,所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端,包括:
    所述网络设备根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
  9. 如权利要求6或7所述的方法,其特征在于,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备将所述多个客户端的访问请求发送至所述存储单元的一个访问队列,包括:
    在接收到所述多个客户端中的任意一个客户端的访问请求时,为所述访问请求中携带的客户端标识分配一个本地标识,所述本地标识用于唯一标识所述客户端,并建立所述客户端标识、本地标识与所述客户端对应的连接信息的对应关系;
    将所述访问请求中携带的客户端标识替换为本地标识;
    将所述访问请求发送至所述连接信息对应的访问队列中;
    所述网络设备将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端,包括:
    所述网络设备接收到所述存储单元返回的所述访问请求的处理结果时,从所述处理结果获取所述本地标识,根据所述本地标识确定所述客户端对应的连接信息,并将所述处理结果返回给所述连接信息对应的客户端。
  10. 如权利要求8或9所述的方法,其特征在于,所述多个客户端与所述网络设备建立的为远程直接内存访问RDMA连接,所述连接信息为建立RDMA连接时生成的队列对QP。
  11. 一种网络设备,其特征在于,包括:
    接收单元,用于接收与所述网络设备连接的多个客户端发送的访问请求;
    发送单元,用于将所述访问请求发送至存储单元的一个访问队列;
    所述接收单元,还用于接收所述存储单元执行所述访问队列中的访问请求后返回的所述访问请求的处理结果;
    所述发送单元,还用于将所述存储单元返回的访问请求的处理结果返回给所述访问请求对应的客户端。
  12. 如权利要求11所述的网络设备,其特征在于,所述网络设备还包括存储单元,
    所述存储单元,用于存储所述多个客户端的信息与所述访问队列的对应关系,
    所述发送单元,具体用于:
    根据所述映射关系将所述多个客户端的访问请求发送至所述存储单元的所述访问队列。
  13. 如权利要求11或12所述的网络设备,其特征在于,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备还包括处理单元,
    所述处理单元,用于在接收到所述多个客户端中的任意一个客户端的访问请求时,根据所述访问请求中携带的所述客户端对应的连接信息及所述对应关系确定所述访问队列;
    所述发送单元,具体用于将所述连接信息及所述访问请求发送至所述访问队列;
    所述发送单元,还用于根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
  14. 如权利要求11或12所述的网络设备,其特征在于,所述多个客户端的信息为所述多个客户端分别与所述网络设备建立连接时生成的连接信息,所述网络设备还包括处理单元,
    所述处理单元,用于在接收到所述多个客户端中的任意一个客户端的访问请求时,为所述访问请求中携带的客户端标识分配一个本地标识,所述本地标识用于唯一标识所述客户端,并建立所述客户端标识、本地标识与所述客户端对应的连接信息的对应关系,将所述访问请求中携带的客户端标识替换为本地标识;
    所述发送单元,具体用于将所述访问请求发送至所述连接信息对应的访问队列中;
    所述处理单元,还用于在接收到所述存储单元返回的所述访问请求的处理结果时,从所述处理结果获取所述本地标识,根据所述本地标识确定所述客户端对应的连接信息;
    所述发送单元,还用于根据所述连接信息确定所述访问请求对应的客户端,并将所述处理结果返回给所述访问请求对应的客户端。
  15. 如权利要求13或14所述的网络设备,其特征在于,所述多个客户端与所述网络设 备建立的为远程直接内存访问RDMA连接,所述连接信息为建立RDMA连接时生成的队列对QP。
  16. 一种计算设备,其特征在于,所述计算设备包括存储器和处理器,所述处理器执行所述存储器中存储的计算机指令,使得所述计算设备执行权利要求6-10任一项所述的方法。
  17. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时实现权利要求6-10任一项所述的方法的功能。
PCT/CN2021/142495 2020-12-31 2021-12-29 一种数据访问方法及相关设备 WO2022143774A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21914506.7A EP4261671A4 (en) 2020-12-31 2021-12-29 DATA ACCESS METHOD AND ASSOCIATED DEVICE
JP2023540613A JP2024501713A (ja) 2020-12-31 2021-12-29 データアクセス方法および関連デバイス
US18/345,519 US20230342087A1 (en) 2020-12-31 2023-06-30 Data Access Method and Related Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011645307.9A CN114691026A (zh) 2020-12-31 2020-12-31 一种数据访问方法及相关设备
CN202011645307.9 2020-12-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/345,519 Continuation US20230342087A1 (en) 2020-12-31 2023-06-30 Data Access Method and Related Device

Publications (1)

Publication Number Publication Date
WO2022143774A1 true WO2022143774A1 (zh) 2022-07-07

Family

ID=82135714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142495 WO2022143774A1 (zh) 2020-12-31 2021-12-29 一种数据访问方法及相关设备

Country Status (5)

Country Link
US (1) US20230342087A1 (zh)
EP (1) EP4261671A4 (zh)
JP (1) JP2024501713A (zh)
CN (1) CN114691026A (zh)
WO (1) WO2022143774A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861082B (zh) * 2023-03-03 2023-04-28 无锡沐创集成电路设计有限公司 一种低延时图片拼接系统及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216811A (zh) * 2007-01-05 2008-07-09 英业达股份有限公司 存储设备指定存取顺序的方法
US8595385B1 (en) * 2013-05-28 2013-11-26 DSSD, Inc. Method and system for submission queue acceleration
CN107818056A (zh) * 2016-09-14 2018-03-20 杭州华为数字技术有限公司 一种队列管理方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9467512B2 (en) * 2012-01-17 2016-10-11 Intel Corporation Techniques for remote client access to a storage medium coupled with a server
US10509592B1 (en) * 2016-07-26 2019-12-17 Pavilion Data Systems, Inc. Parallel data transfer for solid state drives using queue pair subsets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216811A (zh) * 2007-01-05 2008-07-09 英业达股份有限公司 存储设备指定存取顺序的方法
US8595385B1 (en) * 2013-05-28 2013-11-26 DSSD, Inc. Method and system for submission queue acceleration
CN107818056A (zh) * 2016-09-14 2018-03-20 杭州华为数字技术有限公司 一种队列管理方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PENG LONG-GEN YOU HONG-TAO YIN WAN-WANG: "Research of Message Scalable Technology over InfiniBand Network", COMPUTER SCIENCE, vol. 40, no. 3, 15 March 2013 (2013-03-15), pages 104 - 106+120, XP055947764 *
See also references of EP4261671A4 *

Also Published As

Publication number Publication date
US20230342087A1 (en) 2023-10-26
EP4261671A4 (en) 2024-05-29
JP2024501713A (ja) 2024-01-15
EP4261671A1 (en) 2023-10-18
CN114691026A (zh) 2022-07-01

Similar Documents

Publication Publication Date Title
US10642777B2 (en) System and method for maximizing bandwidth of PCI express peer-to-peer (P2P) connection
WO2022017475A1 (zh) 一种数据访问方法及相关设备
WO2019161557A1 (zh) 一种通信的方法及装置
US7668841B2 (en) Virtual write buffers for accelerated memory and storage access
WO2017114091A1 (zh) 一种nas数据访问的方法、系统及相关设备
CN112291293B (zh) 任务处理方法、相关设备及计算机存储介质
US11025564B2 (en) RDMA transport with hardware integration and out of order placement
WO2022007470A1 (zh) 一种数据传输的方法、芯片和设备
US11068412B2 (en) RDMA transport with hardware integration
US20220222016A1 (en) Method for accessing solid state disk and storage device
US20240039995A1 (en) Data access system and method, device, and network adapter
CN114201268B (zh) 一种数据处理方法、装置、设备及可读存储介质
WO2023098050A1 (zh) 远程数据访问方法及装置
WO2022143774A1 (zh) 一种数据访问方法及相关设备
CN116185553A (zh) 数据迁移方法、装置及电子设备
CN110471627B (zh) 一种共享存储的方法、系统及装置
US11675510B2 (en) Systems and methods for scalable shared memory among networked devices comprising IP addressable memory blocks
WO2022199357A1 (zh) 数据处理方法及装置、电子设备、计算机可读存储介质
WO2015062390A1 (zh) 虚拟机迁移方法、装置及系统
WO2017177400A1 (zh) 一种数据处理方法及系统
CN116032498A (zh) 一种内存区域注册方法、装置及设备
CN114490463A (zh) 一种保序执行写请求的方法及网络设备
US11979459B1 (en) Configuration of data connections between a host and a shared network adapter
WO2024051259A1 (zh) 数据处理方法及装置
WO2024041140A1 (zh) 数据处理方法、加速器及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21914506

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023540613

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 2021914506

Country of ref document: EP

Effective date: 20230712

NENP Non-entry into the national phase

Ref country code: DE