WO2023174341A1 - Data read-write method, and device, storage node and storage medium - Google Patents

Data read-write method, and device, storage node and storage medium Download PDF

Info

Publication number
WO2023174341A1
WO2023174341A1 PCT/CN2023/081675 CN2023081675W WO2023174341A1 WO 2023174341 A1 WO2023174341 A1 WO 2023174341A1 CN 2023081675 W CN2023081675 W CN 2023081675W WO 2023174341 A1 WO2023174341 A1 WO 2023174341A1
Authority
WO
WIPO (PCT)
Prior art keywords
read
storage node
write
storage
network link
Prior art date
Application number
PCT/CN2023/081675
Other languages
French (fr)
Chinese (zh)
Inventor
金浩
屠要峰
韩银俊
许军宁
陈正华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023174341A1 publication Critical patent/WO2023174341A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present disclosure relates to the field of communications, and in particular, to a data reading and writing method, device, storage node and storage medium.
  • a distributed storage system usually contains multiple storage nodes.
  • Each storage node contains one or more storage devices that support the NVMe (non-volatile memory express, non-volatile memory host controller interface specification) storage layer protocol.
  • Each storage node provides a logical address space.
  • the target space of IO (read and write) operations may be located on any one or more storage nodes.
  • the cluster topology may be updated at any time.
  • the present disclosure provides a data reading and writing method, device, storage node and storage medium to solve the problem of long network transmission paths and the introduction of new network delays when the second server forwards requests and response results when accessing a distributed storage cluster.
  • the second server includes a proxy server.
  • a data reading and writing method is provided, which is applied to a client device.
  • the method includes: sending a first reading and writing request to the first storage node through a first network link established with the first storage node.
  • the first storage node Be any storage node in the storage system; receive the read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request,
  • the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • a data reading and writing method is provided, which is applied to a second storage node.
  • the method includes: receiving a first reading and writing request from a first storage node, where the first storage node is any storage node in the storage system; In response to the first read and write request, a read and write response is returned to the client device through the second network link established by the client device and the second storage node.
  • the second network link and the first network link belong to the same storage protocol channel.
  • a network link is a network link established between the client device and the first storage node.
  • a data reading and writing system including: a client device, a first storage node and a second storage node.
  • the client device is configured to send a first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; through the first network link established with the second storage node
  • the second network link established receives the read and write response from the second storage node, and uses the read and write response as a response to the first read and write request.
  • the second storage node is determined by the first storage node.
  • a node capable of executing the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • a client device including: a first sending unit and a first receiving unit.
  • the first sending unit is configured to send the first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system;
  • the first receiving unit configured to receive a read and write response from the second storage node through the second network link established with the second storage node and use the read and write response as a response to the first read and write request, the second storage node being configured by the first The node determined by the storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • a second storage node including: a second receiving unit and a second sending unit.
  • the second receiving unit is configured to receive the first read and write request from the first storage node, which is any storage node in the storage system; the second sending unit is configured to respond to the first read and write request, Return a read and write response to the client device through the second network link established by the client device and the second storage node.
  • the second network link and the first network link belong to the same storage protocol channel.
  • the first network link is the connection between the client device and the second storage node. The network link established by the first storage node.
  • an electronic device including: a processor, a memory and a communication bus, wherein the processor and the memory complete communication with each other through the communication bus; the memory is used to store computer programs; the processor is used to execute the memory
  • the program stored in implements the data reading and writing method described in the first aspect or the data reading and writing method described in the second aspect.
  • a computer-readable storage medium which stores a computer program.
  • the computer program is executed by a processor, the data reading and writing method described in the first aspect or the data reading and writing method described in the second aspect is implemented.
  • Figure 1 is a schematic flow chart of the data reading and writing method in the present disclosure
  • Figure 2 is another schematic flow chart of the data reading and writing method in the present disclosure
  • Figure 3 is a schematic structural diagram of the data reading and writing system in the present disclosure
  • Figure 4 is a schematic diagram of the link layering principle between nodes and clients in the distributed storage system in the present disclosure
  • Figure 5 shows the node X, node Y and client device in the distributed storage system in this disclosure. Interaction flow chart between;
  • Figure 6 is a schematic structural diagram of a client device in the present disclosure.
  • Figure 7 is a schematic structural diagram of the second storage node in the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device in the present disclosure.
  • a distributed storage system usually contains multiple storage nodes.
  • Each storage node contains one or more storage devices that support the NVMe (non-volatile memory express, non-volatile memory host controller interface specification) interface specification.
  • Multiple Storage nodes provide a logical address space.
  • the target space of IO (read and write) operations may be located on any one or more storage nodes.
  • the topology of the cluster may be updated at any time.
  • the first method is that the client device copies a cluster partition table and can calculate the target storage node to be accessed.
  • the client device directly establishes an NVMe link with the target storage device to implement IO operations.
  • This method requires customized NVMe client devices to be updated synchronously in real time.
  • the cluster's partition information, routing calculation rules, client devices and cluster services are highly coupled, and the implementation cost is very high.
  • the client device sends an IO request to the first server.
  • the first server determines the second server where the request target address is located and forwards the IO request to the second server. After the second server completes the IO request, it needs to notify the second server. One server, and then the first server sends the response result.
  • the response message of this method needs to be sent from the second server to the first server, and then the latter sends it to the client device.
  • the network transmission path is long and new network delays are introduced.
  • the NVMe client device and each storage node in the distributed storage system have matching network links.
  • An NVMe Target server can only use the network link that matches the NVMe client device. Send read and write requests or receive read and write responses. if If the link used by the NVMe client device to send read and write requests and the link used to receive read and write responses do not belong to the same path, then even if the read and write requests sent match the read and write responses received, the NVMe client device cannot identify them.
  • the NVMe Target server and the NVMe client device establish network links through queue mapping.
  • the NVMe client device includes the NVMe layer and the network layer.
  • the NVMe layer includes the submission queue and the completion queue
  • the network layer includes the submission queue and the completion queue.
  • Each NVMe Target server in the distributed storage system also includes an NVMe layer and a network layer.
  • the NVMe layer includes a submission queue and a completion queue
  • the network layer includes a submission queue and a completion queue.
  • the establishment process of the network link between the NVMe client device and the NVMe Target server is: on the NVMe client device side, the submission queue of the NVMe layer is mapped to the submission queue of the network layer, and the NVMe client
  • the submission queue of the network layer on the device side is connected to the completion queue of the network layer on the NVMe Target server through network transmission; the completion queue of the network layer on the NVMe Target server is mapped to the completion queue of the NVMe layer.
  • the establishment process of the network link between the NVMe client device and the NVMe Target server is: on the NVMe client device side, the completion queue of the NVMe layer is mapped to the completion queue of the network layer, and the completion queue of the network layer The completion queue is connected to the submission queue of the network layer of the NVMe Target server through network transmission; on the NVMe Target server side, the submission queue of the network layer is mapped to the submission queue of the NVMe layer.
  • the NVMe layer submission queue of the NVMe client device sends a read and write request to the network layer submission queue, and the network layer submission queue sends the read and write request to the NVMe Target server's network
  • the completion queue of the layer sends the read and write request; after the completion queue of the network layer of the NVMe Target server receives the read and write request, it sends the read and write request to the completion queue of the NVMe layer.
  • the NVMe layer submission queue of the NVMe Target server sends a read and write response to the network layer submission queue
  • the network layer submission queue sends a read and write response to the network of the NVMe client device.
  • the completion queue of the NVMe client device sends the read and write response; after receiving the read and write request, the completion queue of the network layer of the NVMe client device sends the read and write response to the completion queue of the NVMe layer.
  • the NVMe layer of each NVMe client device is only set up with one submission queue and one completion queue.
  • the network layer is also set with only one submission queue and completion queue. Therefore, one NVMe client device can only be bound to one NVMe Target. server, and can only interact with the NVMe Target server. If you need to interact with other NVMe Target servers, you need to forward them through the bound NVMe Target server.
  • the read and write requests sent by the NVMe client device to the first NVMe Target server can only be sent to the NVMe client device by the first NVMe Target server through its own NVMe layer submission queue and network layer submission queue. Send read and write responses. And if the read-write response is sent by the second The NVMe Target server sends it to the NVMe client device through its NVMe layer submission queue and network layer submission queue, so the NVMe client device will not be able to recognize the read and write response.
  • the read and write response corresponding to the read and write request can only be obtained through the second NVMe Target server, the read and write response must be forwarded to the NVMe client device through the first NVMe Target server, so that the NVMe client device can recognize the read and write response.
  • Write response when the read and write response corresponding to the read and write request can only be obtained through the second NVMe Target server, the read and write response must be forwarded to the NVMe client device through the first NVMe Target server, so that the NVMe client device can recognize the read and write response. Write response.
  • the present disclosure provides a data reading and writing method, which can be applied to client devices.
  • the method may include the following steps 101 to 102.
  • Step 101 Send a first read and write request to the first storage node through the first network link established with the first storage node.
  • the first storage node is any storage node in the storage system.
  • Step 102 Receive a read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request.
  • the second storage node is The node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • a read-write response from the second storage node is received, and the read-write response is used as a response to the first read-write request according to the storage protocol layer identifier.
  • the first network link and the second network link are both bottom network links of the same storage protocol layer link channel.
  • the read and write requests in this embodiment include read requests and/or write requests, and accordingly, the read and write responses include read responses and/or write responses.
  • the read and write request sent by the client device is specifically a read request, then the storage node returns a read response; when the read and write request sent by the client device is specifically a write request, then the storage node returns a write response. response.
  • the network link established between the client device and the storage node can be used to send requests and receive responses at the same time.
  • the client device can send a read and write request to the first storage node through the first network link, and the first storage node can also return a read and write response to the client device through the first network link.
  • the client device when the network links between the client device and different storage nodes are aggregated to implement the same storage protocol channel, when the client device sends a read and write request to a storage node, no matter which storage node the read and write response comes from, the client The device can recognize the read and write ring. Specifically in this embodiment, since the first network link and the second network link belong to the same storage protocol channel, although the read and write responses come from the second storage node, the client device can still recognize the read and write responses.
  • the data reading and writing method includes: the client and the storage cluster establish a storage protocol layer link, a first network link, a second network link and other network layer links, and the storage protocol layer selects any network link Send an IO read and write request and receive an IO read and write response from any network link.
  • the storage protocol layer identifier of the read and write response matches the storage protocol layer identifier of the read and write request.
  • This embodiment provides the following two methods to aggregate the first network link and the second network link to implement the same storage protocol channel.
  • different storage nodes still have uniquely matching submission queues and completion queues at the NVMe layer of the client device, but the submission queue and completion queue can be marked by the added storage protocol management layer to indicate that they can be processed through the
  • the submission queue sends read and write requests to different storage nodes and indicates that any read or write responses received by the completion queue can be identified.
  • the storage protocol management layer marking method Although the first network link and the second network link can be aggregated into one storage protocol channel, the storage protocol management layer needs to be added, resulting in a relatively large workload.
  • this embodiment sets a storage node that has a network link relationship with the client device.
  • the completion queue mapped to the NVMe layer of the client device is the same completion queue; the storage node that has a network link relationship with the client device Node, the submission queue mapped to the NVMe layer of the client device is the same submission queue.
  • the focus is on enabling the client device to identify the read and write response returned by the second storage node. Therefore, when the first network link and the second network link belong to the same storage protocol channel, the first network link is The completion queues of the storage node and the second storage node mapped to the NVMe layer of the client device are the same completion queue. In this way, no matter which storage node sends the read and write response, the NVMe layer of the client device receives the read and write response from the same completion queue. Read and write responses and identify them.
  • the first completion queue and the second completion queue in the network layer are obtained.
  • the first completion queue is used to receive read and write responses from the first storage node
  • the second completion queue is used to receive the read and write responses from the first storage node.
  • a read-write response from the second storage node and obtain the completion queue of the storage protocol layer.
  • the completion queue of the storage protocol layer is used to receive a read-write response from the first completion queue or a read-write response from the second completion queue;
  • the first completion queue and the second completion queue are mapped to the completion queues of the storage protocol layer.
  • the client device may send multiple read and write requests in a short period of time, in order to distinguish whether the received read and write response is a response to the first read and write request, in this embodiment, the third read and write request is Before the read-write response returned by the second storage node is used as a response to the first read-write request, the read-write response may also be verified.
  • the read-write response before using the read-write response as a response to the first read-write request, parse the storage protocol layer session identifier in the read-write response and the storage protocol layer session identifier in the first read-write request; determine the read-write response
  • the storage protocol layer session ID in is the same as the storage protocol layer session ID in the first read and write request.
  • the session identifier in the read-write response and the session identifier in the first read-write request are the same, it is confirmed that the read-write response and the first read-write request are requests and responses for the same session, and receipt can be confirmed at this time.
  • the read-write response matches the first read-write request, that is, the read-write response is the response to the first read-write request.
  • the second storage node can be directly accessed to reduce the probability of request forwarding.
  • This embodiment can also establish a second storage node after determining the response to the first read-write request. Mapping relationship between storage nodes and client devices. In an exemplary embodiment, this embodiment may also After determining the response to the first read-write request, a mapping relationship between the second storage node and the read-write request target data is established.
  • the identifier of the second storage node is extracted from the read and write response; based on the identifier of the second storage node, a mapping relationship between the second storage node and the client device is established, and the mapping relationship indicates that the read and write response data is stored in the second storage node. storage node. In one embodiment, the identifier of the second storage node is extracted from the read and write response; based on the identifier of the second storage node, a mapping relationship between the second storage node and the read and write request target data is established, and the mapping relationship indicates that the read and write response data is stored in Second storage node.
  • the data stored in the second storage node may be hotspot data. That is to say, the mapping relationship indicates that the hotspot data is stored in the second storage node. Therefore, when the client device needs to access the hotspot data, in order to reduce the probability of request forwarding, , you can directly send an access request to the second storage node.
  • the specific implementation of the client device accessing the read-write response in the second storage node may be: based on the mapping relationship, a second read-write request is generated, and the second read-write request is used to access the read-write response; through the second network link, send a second read-write request to the second storage node, and receive a read-write response returned by the second storage node in response to the second read-write request.
  • the first network link and the second network link belong to the same storage protocol channel.
  • the first storage node and the second storage node are mapped to the same submission queue of the NVMe layer of the client device.
  • the first submission queue and the second submission queue in the network layer are obtained.
  • the first submission queue is used to send read and write requests to the first storage node
  • the second submission queue is used to send read and write requests to the second storage node.
  • the node sends read and write requests; and obtains the submission queue in the storage protocol layer.
  • the submission queue of the storage protocol layer is used to send read and write requests to the first submission queue and the second submission queue; and maps the first submission queue and the second submission queue.
  • submission queue to the storage protocol layer is used to send read and write requests to the first submission queue and the second submission queue.
  • the first read and write request is sent to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; through The second network link established by the second storage node receives the read and write response from the second storage node, and uses the read and write response as a response to the first read and write request.
  • the second storage node is determined by the first storage node.
  • a node capable of executing the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • the storage protocol layer identification information of the read-write request and the read-write response is the same, so although the client device sends the first read-write request to the first storage node, However, it is still possible to send the read-write response to the client device after the second storage node executes the first read-write request, so that the client device can recognize that the read-write response is for the first read-write request.
  • the present disclosure provides a data reading and writing method, which can be applied to the second storage node; as shown in Figure 2, the method can include the following steps 201 to 202.
  • Step 201 Receive a first read and write request from a first storage node, which is any storage node in the storage system;
  • Step 202 Respond to the first read and write request through the client device and the second storage node.
  • the second network link established by the node returns a read and write response to the client device.
  • the second network link and the first network link belong to the same storage protocol channel.
  • the first network link is the network link established between the client device and the first storage node. .
  • the second network link and the first network link are both underlying network links of the same storage protocol channel.
  • the method is further configured to: receive a second read-write request from the client device through the second network link, the second read-write request being used to access the read-write response; wherein the read-write response Includes second storage node data.
  • a read and write response is returned to the client device over the second network link.
  • the present disclosure provides a data reading and writing system.
  • the system mainly includes: client Device 301, storage cluster, the storage cluster includes: a first storage node 302 and a second storage node 303 and other storage nodes; the client device 301 establishes a storage protocol layer link channel with the storage cluster to communicate with the storage cluster. All storage nodes establish network layer links respectively.
  • the client device 301 is configured to send a first read and write request to the first storage node 302 through the first network link established with the first storage node 302, which is any storage node in the storage system; through The second network link established by the second storage node 303 receives the read and write response from the second storage node 303, and uses the read and write response as a response to the first read and write request.
  • the second storage node 303 is configured by the first storage node 303.
  • the node determined by node 302 is capable of executing the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • the client device 301 can send read and write requests to the second storage node 302 through the second network link established with the second storage node 303, and receive data from the second storage node 302 through the second network link established with the second storage node 303.
  • the read and write response of the second storage node 303 In an exemplary embodiment, the first network link and the second network link are both underlying network links of the same storage protocol channel.
  • the following takes the client device as an NVMe client device and the storage node as an NVMe Target server as an example to describe the application environment in this disclosure.
  • the existing NVM Express over Fabrics Revision 1.1a specification requires that the NVMe layer IO queue and the underlying network link have a one-to-one correspondence.
  • This disclosure proposes to decouple the storage layer and network layer, and the NVMe protocol and network link support 1:N(N Not less than 1) Mapping relationship.
  • NVMe request messages and response messages support transmission on different network links.
  • the client establishes network links with all servers in the distributed cluster. These network connections are mapped to the same storage layer NVMe protocol channel. Realize that one NVMe Path (path) is mapped to multiple network links, thereby completely solving the problem of standard NVMe clients accessing distributed storage clusters to obtain higher performance and better user experience.
  • This disclosure is different from the existing MultiPath (multipath) function. Multipath means that there are multiple path channels at the NVMe protocol level. According to the protocol specification requirements, an NVMe link is selected to implement IO interaction.
  • FIG. 4 a schematic diagram of the link principle between nodes and clients in the distributed storage system as shown in Figure 4 is given.
  • the client and all server nodes in the distributed storage system are respectively Create a network link, aggregate multiple network links to implement an NVMe access path, and the client reads and writes the entire cluster through an NVMe path.
  • Figure 5 is an interaction flow chart between node X, node Y and client device in the distributed storage system.
  • Step 501 The client sends an IO request to node X, requiring node X to have a data partition table and be able to determine the location information of the target space;
  • Step 502 Node Storage node Y forwards the request message to node Y.
  • the NVMe session life cycle of node X ends;
  • Step 503 Storage node Y receives the NVMe request, parses and processes the NVMe message, and executes the corresponding IO request;
  • Step 504 Node Y sends NVMe responds with messages to the client.
  • this disclosure makes the following extensions to the NVMe client device and the NVMe Target server respectively.
  • NVMe client device side According to the NVMe protocol definition, the client establishes a network link with the server through the connect command. The submission queue and completion queue of the NVMe protocol are mapped to the submission queue and completion queue of the network link respectively. .
  • the connect command specifies the server address list, and the client establishes network links with all service nodes. Taking RDMA as an example, all links share the same RDMA Protection Domain (PD), and the server shares the client through the RDMA network. Terminal memory region (Memory Region).
  • the client When the client sends a message, it uses the reserved field of the NVMe message header to save the link identifier FID (Fabric ID).
  • FID link identifier
  • the network layer sends a standard NVMe request message to the correct server based on the FID carried in the message header.
  • the completion queue notifies the NVMe layer protocol stack of messages received by all links, which are then received and processed by the client.
  • the enhancement of the client in this disclosure also includes the address mapping table caching function.
  • the response messages of all read and write requests are returned by the target node.
  • the client receives the response, it immediately updates the target address of the request and the mapping relationship of the target node. Subsequently, the address Access can directly initiate requests to the target node, reducing the probability of request forwarding and greatly improving read and write efficiency.
  • NVMe Target server side The distributed storage cluster implements a logical storage volume and provides external storage volume services through NVMe Target. This disclosure enhances the NVMe Target server side.
  • a fabric link dedicated to the NVMe Target service is established between NVMe Target servers. This link supports forwarding NVMe requests to other nodes as they are, so the NVMe Target server can receive NVMe requests from the client and other NVMe Target servers at the same time.
  • each NVMe Target server can calculate the server node where the target is located based on the request.
  • the NVMe Target server only sends NVMe response messages to the client, regardless of whether the request is forwarded by other nodes.
  • This disclosure extends the client and server of the NVMe-oF protocol.
  • the protocol layer messages transmitted by the network layer are consistent with the existing standard protocols. Therefore, the enhanced NVMe client of this disclosure can be compatible with the standard NVMe-oF Target server, a standard NVMe client can access the Target server enhanced by this disclosure.
  • the network layer is not limited to RDMA and is compatible with network types supported by the standard NVMe protocol.
  • the client device mainly includes : first sending unit 601, first receiving unit 602.
  • the first sending unit 601 is configured to send a first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; the first receiving Unit 602 is configured to receive a read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request.
  • the second storage node is A node capable of executing the first read and write request determined by the first storage node; the first network link and the second network link belong to the same storage protocol channel.
  • the client device includes an IO client device.
  • the client device includes: a storage protocol layer sending unit, a storage protocol layer receiving unit, a first network layer sending unit, a first network layer receiving unit, a second network layer sending unit, a second network layer receiving unit.
  • the storage protocol layer sending unit sends the first read and write request
  • the network layer first sending unit sends the first read and write request to the first storage node
  • the network layer second receiving unit receives the read and write response from the second storage node
  • the storage protocol layer The receiving unit determines the read-write response as a response to the first read-write request.
  • the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • the client device includes: a storage protocol layer sending unit, a storage protocol layer receiving unit, a first sending unit of the network layer, a first receiving unit of the network layer, a second sending unit of the network layer, a third sending unit of the network layer. Two receiving units, etc.
  • the storage protocol layer sending unit selects the first sending unit of the network layer to send the IO read and write request.
  • the storage protocol layer receiving unit supports the second receiving unit of the network layer to receive the storage layer read and write response.
  • the first sending unit of the network layer is any sending unit of the network layer.
  • the second receiving unit of the network layer is any receiving unit of the network layer.
  • read and write requests include storage protocol layer information and network layer link information.
  • the second receiving unit of the network layer transfers the read and write response from the second storage node to the storage protocol layer receiving unit, and the storage protocol layer receiving unit parses the storage protocol layer identifier and network layer identifier of the read and write response. , compare the storage protocol layer identifier of the read-write response with the storage protocol layer identifier of the first read-write request, and the read-write response received by the second receiving unit of the network is a response to the first read-write request sent by the second sending unit of the network.
  • the storage protocol layer records the second network layer identifier and establishes a mapping relationship between the read and write request target data and the second network link identifier.
  • the first network link and the second network link are any network link channels of the storage protocol layer.
  • the client device is further configured to: after using the read and write response as a response to the first read and write request, extract the identification of the second storage node from the read and write response; based on the second storage node identification, establishing a mapping relationship between the second storage node and the client device, and the mapping relationship indicates that the read and write responses are stored in the second storage node.
  • the client device is further configured to: after establishing a mapping relationship between the second storage node and the client device based on the identity of the second storage node, generate a second read-write request based on the mapping relationship.
  • the second read-write request is used to access the read-write response; send the second read-write request to the second storage node through the second network link, and receive the read-write response returned by the second storage node in response to the second read-write request.
  • the client device is further configured to: obtain the first submission queue and the second submission queue in the network layer before sending the second read and write request to the second storage node through the second network link,
  • the first submission queue is used to send read and write requests to the first storage node
  • the second submission queue is used to send read and write requests to the second storage node
  • the submission queue in the storage protocol layer is obtained, and the submission queue in the storage protocol layer is used to Send read and write requests to the first submission queue and the second submission queue; map the first submission queue of the network layer and the second submission queue of the network layer to the submission queue of the storage protocol layer.
  • the client device is further configured to: obtain the first completion queue in the network layer before receiving the read and write response from the second storage node through the second network link established with the second storage node. and a second completion queue, the first completion queue is used to receive read and write responses from the first storage node, the second completion queue is used to receive read and write responses from the second storage node; and obtains the completion queue of the storage protocol layer , the completion queue of the storage protocol layer is used to receive read and write responses from the first completion queue of the network layer or the read and write responses of the second completion queue of the network layer; map the first completion queue of the network layer and the second completion queue of the network layer to Stores the completion queue for the protocol layer.
  • the client device is further configured to: parse the session identifier in the read-write response and the session identifier in the first read-write request before using the read-write response as a response to the first read-write request. ; Make sure the session ID in the read-write response is the same as the session ID in the first read-write request.
  • the present disclosure provides a second storage node.
  • the second storage node mainly includes: a second receiving unit 701 and a second sending unit 702.
  • the second receiving unit 701 is configured to receive the first read and write request from the first storage node, which is any storage node in the storage system; wherein the second receiving unit includes an internal network layer receiving unit.
  • the second sending unit 702 is configured to respond to the first read and write request, return a read and write response to the client device through the second network link established by the client device and the second storage node, the second network link and the first network link. Belonging to the same storage protocol channel, the first network link is the network link established between the client device and the first storage node.
  • the second sending unit includes a client network layer sending unit.
  • a second storage node includes: a cluster internal network layer sending unit, a cluster internal network layer receiving unit, a client network layer sending unit, and a client network layer receiving unit.
  • the cluster internal network layer receiving unit receives the first read and write request from the first storage node, which is any storage node in the storage system; the client network layer sending unit communicates with the second storage node through the client device The established second network link returns a read and write response to the client device.
  • the second network link belongs to The storage protocol layer link channel established between the client and the storage cluster.
  • the second storage node is any storage node in the cluster.
  • the second storage node is further configured to: in response to the first read and write request, after returning a read and write response to the client device through the second network link established by the client device and the second storage node, Receive a second read and write request from the client device through the second network link, the second read and write request is used to access the read and write response; respond to the second read and write request, return to the client device through the second network link Read and write responses.
  • the present disclosure also provides an electronic device.
  • the electronic device mainly includes: a processor 801, a memory 802 and a communication bus 803.
  • the processor 801 and the memory 802 communicate through the communication bus 803. complete mutual communication.
  • the memory 802 stores a program that can be executed by the processor 801.
  • the processor 801 executes the program stored in the memory 802 to implement the following steps: sending a message to the first storage node through the first network link established with the first storage node.
  • the first storage node is any storage node in the storage system; through the second network link established with the second storage node, the read and write response from the second storage node is received, and the read and write response is In response to the first read and write request, the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel;
  • the second network link and the first network link belong to the same storage protocol channel.
  • the first network link is the network link established between the client device and the first storage node.
  • the communication bus 803 mentioned in the above electronic equipment may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus 803 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • the memory 802 may include random access memory (RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM random access memory
  • non-volatile memory non-volatile memory
  • the memory may also be at least one storage device located remotely from the aforementioned processor 801.
  • the above-mentioned processor 801 can be a general-purpose processor, including a Central Processing Unit (CPU for short), a Network Processor (NP for short), etc., or it can also be a Digital Signal Processing (DSP for short). ), Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium stores a computer program.
  • the computer program When the computer program is run on a computer, it causes the computer to execute the above embodiments.
  • the described data reading and writing methods are also provided.
  • the solution provided by this disclosure establishes multiple network links with multiple storage nodes in the storage cluster and simultaneously establishes a storage protocol layer link; multiple network links belong to the same storage protocol link.
  • the method provided by the present disclosure sends a first read and write request to the first storage node through the first network link established with the first storage node.
  • One storage node is any storage node in the storage system; through the second network link established with the second storage node, the read and write response from the second storage node is received, and the read and write response is used as the first read and write request.
  • the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  • the client device sends the first read and write request to the first storage node, the first read and write request can still be executed by the second storage node. Finally, the read-write response is sent to the client device, so that the client device can recognize that the read-write response is for the first read-write request.
  • the computer program product includes one or more computer instructions.
  • the computer instructions when loaded and executed on a computer, produce processes or functions in accordance with the present disclosure, in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via a wireline (e.g., Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, microwave, etc.) means to transmit to another website, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes, etc.), optical media (such as DVDs), or semiconductor media (such as solid state drives).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present disclosure relates to a data read-write method, and a device, a storage node and a storage medium. The method comprises: by means of a first network link which is established with a first storage node, sending a first read-write request to the first storage node; and by means of a second network link which is established with a second storage node, receiving a read-write response from the second storage node, and using the read-write response as a response to the first read-write request, wherein the first network link and the second network link belong to the same storage protocol channel.

Description

数据读写方法、设备、存储节点及存储介质Data reading and writing methods, equipment, storage nodes and storage media
相关申请的交叉引用Cross-references to related applications
本公开要求享有2022年03月16日提交的名称为“数据读写方法、设备、存储节点及存储介质”的中国专利申请CN202210258761.1的优先权,其全部内容通过引用并入本公开中。This disclosure claims the priority of Chinese patent application CN202210258761.1 titled "Data reading and writing method, device, storage node and storage medium" submitted on March 16, 2022, the entire content of which is incorporated into this disclosure by reference.
技术领域Technical field
本公开涉及通信领域,尤其涉及一种数据读写方法、设备、存储节点及存储介质。The present disclosure relates to the field of communications, and in particular, to a data reading and writing method, device, storage node and storage medium.
背景技术Background technique
分布式存储系统中,通常包含多个存储节点,每一个存储节点包含一个或多个支持NVMe(non-volatile memory express,非易失性内存主机控制器接口规范)存储层协议的存储设备,多个存储节点提供一个逻辑地址空间,IO(读写)操作的目标空间可能位于任何一个或多个存储节点,另外集群的拓扑结构随时可能更新,这些都为标准的客户端设备访问带来技术问题,例如标准的客户端无法及时获取存储集群的路由信息、代理模式网络传输路径长、网络延迟大。A distributed storage system usually contains multiple storage nodes. Each storage node contains one or more storage devices that support the NVMe (non-volatile memory express, non-volatile memory host controller interface specification) storage layer protocol. Each storage node provides a logical address space. The target space of IO (read and write) operations may be located on any one or more storage nodes. In addition, the cluster topology may be updated at any time. These all bring technical problems to standard client device access. , For example, the standard client cannot obtain the routing information of the storage cluster in time, the proxy mode network transmission path is long, and the network delay is large.
发明内容Contents of the invention
本公开提供了一种数据读写方法、设备、存储节点及存储介质,用以解决访问分布式存储集群时由第二服务器转发请求、响应结果存在的网络传输路径长,引入新的网络延迟的问题。其中,第二服务器包括代理服务器。The present disclosure provides a data reading and writing method, device, storage node and storage medium to solve the problem of long network transmission paths and the introduction of new network delays when the second server forwards requests and response results when accessing a distributed storage cluster. question. Wherein, the second server includes a proxy server.
第一方面,提供一种数据读写方法,应用于客户端设备,方法包括:通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。In a first aspect, a data reading and writing method is provided, which is applied to a client device. The method includes: sending a first reading and writing request to the first storage node through a first network link established with the first storage node. The first storage node Be any storage node in the storage system; receive the read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request, The second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
第二方面,提供一种数据读写方法,应用于第二存储节点,方法包括:接收来自于第一存储节点的第一读写请求,第一存储节点为存储系统中的任一存储节点;响应于第一读写请求,通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应,第二网络链接和第一网络链接属于同一条存储协议通道,第一网络链接为客户端设备与第一存储节点建立的网络链接。In a second aspect, a data reading and writing method is provided, which is applied to a second storage node. The method includes: receiving a first reading and writing request from a first storage node, where the first storage node is any storage node in the storage system; In response to the first read and write request, a read and write response is returned to the client device through the second network link established by the client device and the second storage node. The second network link and the first network link belong to the same storage protocol channel. A network link is a network link established between the client device and the first storage node.
第三方面,提供一种数据读写系统,包括:客户端设备、第一存储节点和第二存储节点。客户端设备用于通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定 的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。In a third aspect, a data reading and writing system is provided, including: a client device, a first storage node and a second storage node. The client device is configured to send a first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; through the first network link established with the second storage node The second network link established receives the read and write response from the second storage node, and uses the read and write response as a response to the first read and write request. The second storage node is determined by the first storage node. A node capable of executing the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
第四方面,提供一种客户端设备,包括:第一发送单元、第一接收单元。第一发送单元,配置为通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;第一接收单元,配置为通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。In a fourth aspect, a client device is provided, including: a first sending unit and a first receiving unit. The first sending unit is configured to send the first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; the first receiving unit , configured to receive a read and write response from the second storage node through the second network link established with the second storage node and use the read and write response as a response to the first read and write request, the second storage node being configured by the first The node determined by the storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
第五方面,提供一种第二存储节点,包括:第二接收单元、第二发送单元。第二接收单元,配置为接收来自于第一存储节点的第一读写请求,第一存储节点为存储系统中的任一存储节点;第二发送单元,配置为响应于第一读写请求,通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应,第二网络链接和第一网络链接属于同一条存储协议通道,第一网络链接为客户端设备与第一存储节点建立的网络链接。In a fifth aspect, a second storage node is provided, including: a second receiving unit and a second sending unit. The second receiving unit is configured to receive the first read and write request from the first storage node, which is any storage node in the storage system; the second sending unit is configured to respond to the first read and write request, Return a read and write response to the client device through the second network link established by the client device and the second storage node. The second network link and the first network link belong to the same storage protocol channel. The first network link is the connection between the client device and the second storage node. The network link established by the first storage node.
第六方面,提供一种电子设备,包括:处理器、存储器和通信总线,其中,处理器和存储器通过通信总线完成相互间的通信;存储器,用于存储计算机程序;处理器,用于执行存储器中所存储的程序,实现第一方面所述的数据读写方法或第二方面所述的数据读写方法。In a sixth aspect, an electronic device is provided, including: a processor, a memory and a communication bus, wherein the processor and the memory complete communication with each other through the communication bus; the memory is used to store computer programs; the processor is used to execute the memory The program stored in implements the data reading and writing method described in the first aspect or the data reading and writing method described in the second aspect.
第七方面,提供一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现第一方面所述的数据读写方法或第二方面所述的数据读写方法。In a seventh aspect, a computer-readable storage medium is provided, which stores a computer program. When the computer program is executed by a processor, the data reading and writing method described in the first aspect or the data reading and writing method described in the second aspect is implemented.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本公开的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
为了更清楚地说明本公开或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present disclosure or the prior art, the drawings needed to be used in the embodiments or description of the prior art will be briefly introduced below. It is obvious that for those of ordinary skill in the art, Other drawings can also be obtained based on these drawings without incurring any creative effort.
图1为本公开中数据读写方法的一种流程示意图;Figure 1 is a schematic flow chart of the data reading and writing method in the present disclosure;
图2为本公开中数据读写方法的又一种流程示意图;Figure 2 is another schematic flow chart of the data reading and writing method in the present disclosure;
图3为本公开中数据读写系统的结构示意图;Figure 3 is a schematic structural diagram of the data reading and writing system in the present disclosure;
图4为本公开中分布式存储系统中节点与客户端的链接分层原理示意图;Figure 4 is a schematic diagram of the link layering principle between nodes and clients in the distributed storage system in the present disclosure;
图5为本公开中以分布式存储系统中的节点X、节点Y以及客户端设备三者 之间的交互流程图;Figure 5 shows the node X, node Y and client device in the distributed storage system in this disclosure. Interaction flow chart between;
图6为本公开中客户端设备的结构示意图;Figure 6 is a schematic structural diagram of a client device in the present disclosure;
图7为本公开中第二存储节点的结构示意图;Figure 7 is a schematic structural diagram of the second storage node in the present disclosure;
图8为本公开中电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device in the present disclosure.
具体实施方式Detailed ways
为使本公开的目的、技术方案和优点更加清楚,下面将结合本公开中的附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the technical solutions in the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the present disclosure. Obviously, the described embodiments are part of the implementation of the present disclosure. examples, not all examples. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without any creative efforts fall within the scope of protection of this disclosure.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
分布式存储系统中,通常包含多个存储节点,每一个存储节点包含一个或多个支持NVMe(non-volatile memory express,非易失性内存主机控制器接口规范)接口规范的存储设备,多个存储节点提供一个逻辑地址空间,IO(读写)操作的目标空间可能位于任何一个或多个存储节点,另外集群的拓扑结构随时可能更新,这些都为标准的客户端设备访问带来引入新的网络延迟等技术问题。目前,NVMe客户端设备访问分布式存储系统的相关技术主要有两种。A distributed storage system usually contains multiple storage nodes. Each storage node contains one or more storage devices that support the NVMe (non-volatile memory express, non-volatile memory host controller interface specification) interface specification. Multiple Storage nodes provide a logical address space. The target space of IO (read and write) operations may be located on any one or more storage nodes. In addition, the topology of the cluster may be updated at any time. These all introduce new issues to standard client device access. Technical issues such as network delays. Currently, there are two main technologies for NVMe client devices to access distributed storage systems.
第一种,客户端设备复制一份集群分区表,能够计算出访问的目标存储节点,客户端设备直接与目标存储设备建立NVMe链接实现IO操作,该方法需要定制化NVMe客户端设备实时同步更新集群的分区信息、路由计算规则,客户端设备与集群服务高度耦合,实现成本非常高。The first method is that the client device copies a cluster partition table and can calculate the target storage node to be accessed. The client device directly establishes an NVMe link with the target storage device to implement IO operations. This method requires customized NVMe client devices to be updated synchronously in real time. The cluster's partition information, routing calculation rules, client devices and cluster services are highly coupled, and the implementation cost is very high.
第二种,客户端设备向第一个服务器发送IO请求,第一服务器确定请求目标地址所在的第二服务器,并将IO请求转发到第二服务器,第二服务器完成IO请求后,需要通知第一服务器,再由第一服务器发送响应结果,该方法的响应消息需要由第二服务器发送至第一个服务器,再由后者发送给客户端设备,网络传输路径长,引入新的网络延迟。In the second type, the client device sends an IO request to the first server. The first server determines the second server where the request target address is located and forwards the IO request to the second server. After the second server completes the IO request, it needs to notify the second server. One server, and then the first server sends the response result. The response message of this method needs to be sent from the second server to the first server, and then the latter sends it to the client device. The network transmission path is long and new network delays are introduced.
相关技术中,NVMe客户端设备和分布式存储系统中的每个存储节点(以下简称NVMe Target服务端)具有匹配的网络链接,一个NVMe Target服务端只能通过与NVMe客户端设备匹配的网络链接发送读写请求或接收读写响应。如果 NVMe客户端设备发送读写请求的链接与接收读写响应的链接不属于同一路径,那么即使发送的读写请求与接收的读写响应相匹配,NVMe客户端设备也无法识别。In related technology, the NVMe client device and each storage node in the distributed storage system (hereinafter referred to as the NVMe Target server) have matching network links. An NVMe Target server can only use the network link that matches the NVMe client device. Send read and write requests or receive read and write responses. if If the link used by the NVMe client device to send read and write requests and the link used to receive read and write responses do not belong to the same path, then even if the read and write requests sent match the read and write responses received, the NVMe client device cannot identify them.
NVMe Target服务端与NVMe客户端设备通过队列映射的方式建立网络链接。The NVMe Target server and the NVMe client device establish network links through queue mapping.
NVMe客户端设备包括NVMe层和网络层,NVMe层包括提交队列和完成队列,网络层包括提交队列和完成队列。分布式存储系统中的每个NVMe Target服务端也包括NVMe层和网络层,同样的,NVMe层包括提交队列和完成队列,网络层包括提交队列和完成队列。The NVMe client device includes the NVMe layer and the network layer. The NVMe layer includes the submission queue and the completion queue, and the network layer includes the submission queue and the completion queue. Each NVMe Target server in the distributed storage system also includes an NVMe layer and a network layer. Similarly, the NVMe layer includes a submission queue and a completion queue, and the network layer includes a submission queue and a completion queue.
从读写请求发送的角度,NVMe客户端设备与NVMe Target服务端之间的网络链接的建立过程为:在NVMe客户端设备侧,NVMe层的提交队列映射到网络层的提交队列,NVMe客户端设备侧网络层的提交队列经过网络传输对接到NVMe Target服务端的网络层的完成队列;在NVMe Target服务端侧网络层的完成队列映射到NVMe层的完成队列。From the perspective of sending read and write requests, the establishment process of the network link between the NVMe client device and the NVMe Target server is: on the NVMe client device side, the submission queue of the NVMe layer is mapped to the submission queue of the network layer, and the NVMe client The submission queue of the network layer on the device side is connected to the completion queue of the network layer on the NVMe Target server through network transmission; the completion queue of the network layer on the NVMe Target server is mapped to the completion queue of the NVMe layer.
从接收读写响应的角度,NVMe客户端设备与NVMe Target服务端之间的网络链接的建立过程为:在NVMe客户端设备侧,NVMe层的完成队列映射到网络层的完成队列,网络层的完成队列通过网络传输对接到NVMe Target服务端的网络层的提交队列;在NVMe Target服务端侧,网络层的提交队列映射到NVMe层的提交队列。From the perspective of receiving read and write responses, the establishment process of the network link between the NVMe client device and the NVMe Target server is: on the NVMe client device side, the completion queue of the NVMe layer is mapped to the completion queue of the network layer, and the completion queue of the network layer The completion queue is connected to the submission queue of the network layer of the NVMe Target server through network transmission; on the NVMe Target server side, the submission queue of the network layer is mapped to the submission queue of the NVMe layer.
当NVMe客户端设备向NVMe Target服务端发送读写请求时,NVMe客户端设备的NVMe层的提交队列向网络层的提交队列发送读写请求,并由网络层的提交队列向NVMe Target服务端的网络层的完成队列发送该读写请求;NVMe Target服务端的网络层的完成队列接收该读写请求后,向NVMe层的完成队列发送该读写请求。When the NVMe client device sends a read and write request to the NVMe Target server, the NVMe layer submission queue of the NVMe client device sends a read and write request to the network layer submission queue, and the network layer submission queue sends the read and write request to the NVMe Target server's network The completion queue of the layer sends the read and write request; after the completion queue of the network layer of the NVMe Target server receives the read and write request, it sends the read and write request to the completion queue of the NVMe layer.
当NVMe Target服务端向NVMe客户端设备发送读写响应时,NVMe Target服务端的NVMe层的提交队列向网络层的提交队列发送读写响应,并由网络层的提交队列向NVMe客户端设备的网络层的完成队列发送该读写响应;NVMe客户端设备的网络层的完成队列接收该读写请求后,向NVMe层的完成队列发送该读写响应。When the NVMe Target server sends a read and write response to the NVMe client device, the NVMe layer submission queue of the NVMe Target server sends a read and write response to the network layer submission queue, and the network layer submission queue sends a read and write response to the network of the NVMe client device. The completion queue of the NVMe client device sends the read and write response; after receiving the read and write request, the completion queue of the network layer of the NVMe client device sends the read and write response to the completion queue of the NVMe layer.
相关技术中,每个NVMe客户端设备的NVMe层只设置有一条提交队列和一条完成队列,网络层也是只设置有一条提交队列和完成队列,因此一个NVMe客户端设备只能绑定一个NVMe Target服务端,并只能与该NVMe Target服务端进行交互,若需要与其它的NVMe Target服务端进行交互,则需要通过绑定的NVMe Target服务端转发。In related technology, the NVMe layer of each NVMe client device is only set up with one submission queue and one completion queue. The network layer is also set with only one submission queue and completion queue. Therefore, one NVMe client device can only be bound to one NVMe Target. server, and can only interact with the NVMe Target server. If you need to interact with other NVMe Target servers, you need to forward them through the bound NVMe Target server.
具体说来,对于NVMe客户端设备发送的针对第一NVMe Target服务端的读写请求,只能由第一NVMe Target服务端通过自身的NVMe层的提交队列和网络层的提交队列向NVMe客户端设备发送读写响应。而如果该读写响应是由第二 NVMe Target服务端通过其NVMe层的提交队列和网络层的提交队列向NVMe客户端设备发送,那么对于NVMe客户端设备而言,其将无法识别该读写响应。因此,当只能通过第二NVMe Target服务端获取读写请求对应的读写响应时,必须通过第一NVMe Target服务端向NVMe客户端设备转发该读写响应,NVMe客户端设备才能识别该读写响应。Specifically, the read and write requests sent by the NVMe client device to the first NVMe Target server can only be sent to the NVMe client device by the first NVMe Target server through its own NVMe layer submission queue and network layer submission queue. Send read and write responses. And if the read-write response is sent by the second The NVMe Target server sends it to the NVMe client device through its NVMe layer submission queue and network layer submission queue, so the NVMe client device will not be able to recognize the read and write response. Therefore, when the read and write response corresponding to the read and write request can only be obtained through the second NVMe Target server, the read and write response must be forwarded to the NVMe client device through the first NVMe Target server, so that the NVMe client device can recognize the read and write response. Write response.
本公开提供一种数据读写方法,该方法可应用于客户端设备。The present disclosure provides a data reading and writing method, which can be applied to client devices.
如图1所示,该方法可以包括以下步骤101至步骤102。As shown in Figure 1, the method may include the following steps 101 to 102.
步骤101、通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点。Step 101: Send a first read and write request to the first storage node through the first network link established with the first storage node. The first storage node is any storage node in the storage system.
步骤102、通过与第二存储节点建立的第二网路链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。Step 102: Receive a read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request. The second storage node is The node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
在一示例性实施例中,接收来自于第二存储节点的读写响应,并根据存储协议层标识将读写响应作为针对第一读写请求的响应。在一示例性实施例中,第一网络链接和第二网络链接均为同一条存储协议层链接通道的底层网络链接。In an exemplary embodiment, a read-write response from the second storage node is received, and the read-write response is used as a response to the first read-write request according to the storage protocol layer identifier. In an exemplary embodiment, the first network link and the second network link are both bottom network links of the same storage protocol layer link channel.
应理解,本实施例中的读写请求包括读请求和/或写请求,相应地,读写响应包括读响应和/或写响应。当客户端设备发送的读写请求具体为读请求时,那么存储节点所返回的则是读响应;当客户端设备发送的读写请求具体为写请求时,那么存储节点所返回的则是写响应。It should be understood that the read and write requests in this embodiment include read requests and/or write requests, and accordingly, the read and write responses include read responses and/or write responses. When the read and write request sent by the client device is specifically a read request, then the storage node returns a read response; when the read and write request sent by the client device is specifically a write request, then the storage node returns a write response. response.
本实施例中,客户端设备与存储节点之间建立的网络链接可以同时用于发送请求和接收响应。比如,对于客户端设备和第一存储节点,客户端设备可以通过第一网络链接向第一存储节点发送读写请求,第一存储节点也可以通过第一网络链接向客户端设备返回读写响应。In this embodiment, the network link established between the client device and the storage node can be used to send requests and receive responses at the same time. For example, for the client device and the first storage node, the client device can send a read and write request to the first storage node through the first network link, and the first storage node can also return a read and write response to the client device through the first network link. .
应理解,当客户端设备与不同的存储节点的网络链接聚合实现同一条存储协议通道时,当客户端设备向一个存储节点发出读写请求后,无论读写响应来自于哪个存储节点,客户端设备均能识别该读写响。具体到本实施例中,由于第一网络链接和第二网络链接属于同一条存储协议通道,所以虽然读写响应来自于第二存储节点,但是客户端设备仍然能识别该读写响应。It should be understood that when the network links between the client device and different storage nodes are aggregated to implement the same storage protocol channel, when the client device sends a read and write request to a storage node, no matter which storage node the read and write response comes from, the client The device can recognize the read and write ring. Specifically in this embodiment, since the first network link and the second network link belong to the same storage protocol channel, although the read and write responses come from the second storage node, the client device can still recognize the read and write responses.
在一示例性实施例中,数据读写方法包括:客户端与存储集群建立一个存储协议层链接、第一网络链接、第二网络链接等多个网络层链接,存储协议层选择任一网络链接发送IO读写请求,从任一网络链接接收IO读写响应,读写响应的存储协议层标识与读写请求的存储协议层标识匹配。In an exemplary embodiment, the data reading and writing method includes: the client and the storage cluster establish a storage protocol layer link, a first network link, a second network link and other network layer links, and the storage protocol layer selects any network link Send an IO read and write request and receive an IO read and write response from any network link. The storage protocol layer identifier of the read and write response matches the storage protocol layer identifier of the read and write request.
本实施例提供以下两种方式将第一网络链接和第二网络链接聚合实现同一条存储协议通道。 This embodiment provides the following two methods to aggregate the first network link and the second network link to implement the same storage protocol channel.
其一,在客户端设备的NVMe层之上增加一存储协议管理层,以管理NVMe层。具体实现时,不同的存储节点在客户端设备的NVMe层仍然具有唯一匹配的提交队列和完成队列,但是可以由所增加的存储协议管理层对提交队列和完成队列进行标记,以指示可以经由该提交队列向不同的存储节点发送读写请求、以及指示可以识别该完成队列所接收的任一读写响应。First, add a storage protocol management layer on top of the NVMe layer of the client device to manage the NVMe layer. During specific implementation, different storage nodes still have uniquely matching submission queues and completion queues at the NVMe layer of the client device, but the submission queue and completion queue can be marked by the added storage protocol management layer to indicate that they can be processed through the The submission queue sends read and write requests to different storage nodes and indicates that any read or write responses received by the completion queue can be identified.
通过这种存储协议管理层标记的方式,虽然能实现将第一网络链接和第二网络链接聚合到一条存储协议通道上,但是需要增加存储协议管理层,导致工作量比较大。Through this storage protocol management layer marking method, although the first network link and the second network link can be aggregated into one storage protocol channel, the storage protocol management layer needs to be added, resulting in a relatively large workload.
其二,为了提高处理效率,本实施例设置与客户端设备具有网络链接关系的存储节点,映射到客户端设备的NVMe层的完成队列为同一完成队列;与客户端设备具有网络链接关系的存储节点,映射到客户端设备的NVMe层的提交队列为同一提交队列。Second, in order to improve processing efficiency, this embodiment sets a storage node that has a network link relationship with the client device. The completion queue mapped to the NVMe layer of the client device is the same completion queue; the storage node that has a network link relationship with the client device Node, the submission queue mapped to the NVMe layer of the client device is the same submission queue.
应理解,在本实施例中,重点在于实现使客户端设备识别第二存储节点返回的读写响应,因此在实现第一网络链接和第二网络链接属于同一条存储协议通道时,将第一存储节点和第二存储节点映射到客户端设备的NVMe层的完成队列为同一完成队列,这样不管是哪个存储节点发送的读写响应,客户端设备的NVMe层都是由同一个完成队列接收该读写响应并识别。It should be understood that in this embodiment, the focus is on enabling the client device to identify the read and write response returned by the second storage node. Therefore, when the first network link and the second network link belong to the same storage protocol channel, the first network link is The completion queues of the storage node and the second storage node mapped to the NVMe layer of the client device are the same completion queue. In this way, no matter which storage node sends the read and write response, the NVMe layer of the client device receives the read and write response from the same completion queue. Read and write responses and identify them.
具体实现时,一个实施例中,获取网络层中的第一完成队列和第二完成队列,第一完成队列用于接收来自于第一存储节点的读写响应,第二完成队列用于接收来自于第二存储节点的读写响应;以及获取存储协议层的完成队列,存储协议层的完成队列用于接收来自于第一完成队列的读写响应或第二完成队列的读写响应;将第一完成队列和第二完成队列映射到存储协议层的完成队列。During specific implementation, in one embodiment, the first completion queue and the second completion queue in the network layer are obtained. The first completion queue is used to receive read and write responses from the first storage node, and the second completion queue is used to receive the read and write responses from the first storage node. a read-write response from the second storage node; and obtain the completion queue of the storage protocol layer. The completion queue of the storage protocol layer is used to receive a read-write response from the first completion queue or a read-write response from the second completion queue; The first completion queue and the second completion queue are mapped to the completion queues of the storage protocol layer.
本实施例中,考虑到客户端设备存在短时间内发送多个读写请求的可能,为了区分收到的读写响应是否为针对第一读写请求的响应,本实施例中,在将第二存储节点返回的读写响应作为针对第一读写请求的响应之前,还可以对读写响应进行验证。In this embodiment, considering that the client device may send multiple read and write requests in a short period of time, in order to distinguish whether the received read and write response is a response to the first read and write request, in this embodiment, the third read and write request is Before the read-write response returned by the second storage node is used as a response to the first read-write request, the read-write response may also be verified.
一个实施例中,将读写响应作为针对第一读写请求的响应之前,解析读写响应中的存储协议层会话标识、以及第一读写请求中的存储协议层会话标识;确定读写响应中的存储协议层会话标识和第一读写请求中的存储协议层会话标识相同。In one embodiment, before using the read-write response as a response to the first read-write request, parse the storage protocol layer session identifier in the read-write response and the storage protocol layer session identifier in the first read-write request; determine the read-write response The storage protocol layer session ID in is the same as the storage protocol layer session ID in the first read and write request.
应理解,当读写响应中的会话标识和第一读写请求中的会话标识相同时,确认读写响应和第一读写请求为针对同一会话中的请求和响应,此时可以确认收到的读写响应与第一读写请求匹配,即读写响应为第一读写请求的响应。It should be understood that when the session identifier in the read-write response and the session identifier in the first read-write request are the same, it is confirmed that the read-write response and the first read-write request are requests and responses for the same session, and receipt can be confirmed at this time. The read-write response matches the first read-write request, that is, the read-write response is the response to the first read-write request.
为了方便后续在请求与第一读写请求的读写响应时,可以直接访问第二存储节点,减少请求转发的概率,本实施例还可以在确定第一读写请求的响应后,建立第二存储节点与客户端设备的映射关系。在一示例性实施例中,本实施例还可 以在确定第一读写请求的响应后,建立第二存储节点与读写请求目标数据的映射关系。In order to facilitate the subsequent read-write response to the first read-write request, the second storage node can be directly accessed to reduce the probability of request forwarding. This embodiment can also establish a second storage node after determining the response to the first read-write request. Mapping relationship between storage nodes and client devices. In an exemplary embodiment, this embodiment may also After determining the response to the first read-write request, a mapping relationship between the second storage node and the read-write request target data is established.
一个实施例中,从读写响应中提取第二存储节点的标识;基于第二存储节点的标识,建立第二存储节点与客户端设备的映射关系,映射关系指示读写响应数据存储在第二存储节点。一个实施例中,从读写响应中提取第二存储节点的标识;基于第二存储节点的标识,建立第二存储节点与读写请求目标数据的映射关系,映射关系指示读写响应数据存储在第二存储节点。In one embodiment, the identifier of the second storage node is extracted from the read and write response; based on the identifier of the second storage node, a mapping relationship between the second storage node and the client device is established, and the mapping relationship indicates that the read and write response data is stored in the second storage node. storage node. In one embodiment, the identifier of the second storage node is extracted from the read and write response; based on the identifier of the second storage node, a mapping relationship between the second storage node and the read and write request target data is established, and the mapping relationship indicates that the read and write response data is stored in Second storage node.
应用中,第二存储节点存储的数据具体可以为热点数据,也就是说,映射关系指示第二存储节点中存储有热点数据,因此当客户端设备需要访问热点数据时,为了减少请求转发的概率,可以直接向第二存储节点发送访问请求。In the application, the data stored in the second storage node may be hotspot data. That is to say, the mapping relationship indicates that the hotspot data is stored in the second storage node. Therefore, when the client device needs to access the hotspot data, in order to reduce the probability of request forwarding, , you can directly send an access request to the second storage node.
一个实施例中,客户端设备访问第二存储节点中的读写响应具体实现可以为:基于映射关系,生成第二读写请求,第二读写请求用于访问读写响应;通过第二网络链接,向第二存储节点发送第二读写请求,并接收第二存储节点响应于第二读写请求返回的读写响应。In one embodiment, the specific implementation of the client device accessing the read-write response in the second storage node may be: based on the mapping relationship, a second read-write request is generated, and the second read-write request is used to access the read-write response; through the second network link, send a second read-write request to the second storage node, and receive a read-write response returned by the second storage node in response to the second read-write request.
本实施例中,为了实现客户端设备既能向第一存储节点发送读写请求,又能向第二存储节点发送请求,因此在实现第一网络链接和第二网络链接属于同一条存储协议通道时,将第一存储节点和第二存储节点映射到客户端设备的NVMe层的同一提交队列。In this embodiment, in order to realize that the client device can send read and write requests to the first storage node and send requests to the second storage node, the first network link and the second network link belong to the same storage protocol channel. When, the first storage node and the second storage node are mapped to the same submission queue of the NVMe layer of the client device.
具体实现时,一个实施例中,获取网络层中的第一提交队列和第二提交队列,第一提交队列用于向第一存储节点发送读写请求,第二提交队列用于向第二存储节点发送读写请求;以及获取存储协议层中的提交队列,存储协议层的提交队列用于向第一提交队列和第二提交队列发送读写请求;将第一提交队列和第二提交队列映射到存储协议层的提交队列。During specific implementation, in one embodiment, the first submission queue and the second submission queue in the network layer are obtained. The first submission queue is used to send read and write requests to the first storage node, and the second submission queue is used to send read and write requests to the second storage node. The node sends read and write requests; and obtains the submission queue in the storage protocol layer. The submission queue of the storage protocol layer is used to send read and write requests to the first submission queue and the second submission queue; and maps the first submission queue and the second submission queue. Submission queue to the storage protocol layer.
本实施例提供的技术方案中,通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。由于第一网络链接和第二网络链接属于同一条存储协议通道,读写请求、读写响应的存储协议层标识信息相同,因此虽然客户端设备是向第一存储节点发送第一读写请求,但是仍然能实现由第二存储节点执行第一读写请求后,将读写响应发送到客户端设备,以使得客户端设备能够识别该读写响应是针对第一读写请求的。In the technical solution provided by this embodiment, the first read and write request is sent to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; through The second network link established by the second storage node receives the read and write response from the second storage node, and uses the read and write response as a response to the first read and write request. The second storage node is determined by the first storage node. A node capable of executing the first read and write request; the first network link and the second network link belong to the same storage protocol channel. Since the first network link and the second network link belong to the same storage protocol channel, the storage protocol layer identification information of the read-write request and the read-write response is the same, so although the client device sends the first read-write request to the first storage node, However, it is still possible to send the read-write response to the client device after the second storage node executes the first read-write request, so that the client device can recognize that the read-write response is for the first read-write request.
本公开提供一种数据读写方法,该方法可应用于第二存储节点;如图2所示,该方法可包括以下步骤201至202。 The present disclosure provides a data reading and writing method, which can be applied to the second storage node; as shown in Figure 2, the method can include the following steps 201 to 202.
步骤201、接收来自于第一存储节点的第一读写请求,第一存储节点为存储系统中的任一存储节点;步骤202、响应于第一读写请求,通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应,第二网络链接和第一网络链接属于同一条存储协议通道,第一网络链接为客户端设备与第一存储节点建立的网络链接。在一示例性实施例中,第二网络链接和第一网络链接均为同一条存储协议通道的底层网络链接。Step 201: Receive a first read and write request from a first storage node, which is any storage node in the storage system; Step 202: Respond to the first read and write request through the client device and the second storage node. The second network link established by the node returns a read and write response to the client device. The second network link and the first network link belong to the same storage protocol channel. The first network link is the network link established between the client device and the first storage node. . In an exemplary embodiment, the second network link and the first network link are both underlying network links of the same storage protocol channel.
在一示例性实施例中,该方法还用于:通过第二网络链接,接收来自于客户端设备的第二读写请求,第二读写请求用于访问读写响应;其中,读写响应包括第二存储节点数据。响应于第二读写请求,通过第二网络链接,向客户端设备返回读写响应。In an exemplary embodiment, the method is further configured to: receive a second read-write request from the client device through the second network link, the second read-write request being used to access the read-write response; wherein the read-write response Includes second storage node data. In response to the second read and write request, a read and write response is returned to the client device over the second network link.
基于同一构思,本公开中提供了一种数据读写系统,该系统的具体实施可参见方法实施例部分的描述,重复之处不再赘述,如图3所示,该系统主要包括:客户端设备301、存储集群,所述存储集群包括:第一存储节点302和第二存储节点303等多个存储节点;客户端设备301与存储集群建立一条存储协议层链接通道,与所述存储集群中所有存储节点分别建立网络层链接。客户端设备301用于通过与第一存储节点302建立的第一网络链接,向第一存储节点302发送第一读写请求,第一存储节点302为存储系统中的任一存储节点;通过与第二存储节点303建立的第二网络链接,接收来自于第二存储节点303的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点303为由第一存储节点302确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。同样的,客户端设备301可以通过与第二存储节点303建立的第二网络链接向第二存储节点302发送读写请求,并通过与第二存储节点303建立的第二网络链接,接收来自于第二存储节点303的读写响应。在一示例性实施例中,第一网络链接和第二网络链接均为同一条存储协议通道的底层网络链接。Based on the same concept, the present disclosure provides a data reading and writing system. For the specific implementation of the system, please refer to the description of the method embodiment section. The repeated points will not be repeated. As shown in Figure 3, the system mainly includes: client Device 301, storage cluster, the storage cluster includes: a first storage node 302 and a second storage node 303 and other storage nodes; the client device 301 establishes a storage protocol layer link channel with the storage cluster to communicate with the storage cluster. All storage nodes establish network layer links respectively. The client device 301 is configured to send a first read and write request to the first storage node 302 through the first network link established with the first storage node 302, which is any storage node in the storage system; through The second network link established by the second storage node 303 receives the read and write response from the second storage node 303, and uses the read and write response as a response to the first read and write request. The second storage node 303 is configured by the first storage node 303. The node determined by node 302 is capable of executing the first read and write request; the first network link and the second network link belong to the same storage protocol channel. Similarly, the client device 301 can send read and write requests to the second storage node 302 through the second network link established with the second storage node 303, and receive data from the second storage node 302 through the second network link established with the second storage node 303. The read and write response of the second storage node 303. In an exemplary embodiment, the first network link and the second network link are both underlying network links of the same storage protocol channel.
以下以客户端设备为NVMe客户端设备,存储节点为NVMe Target服务端为例,对本公开中的应用环境进行描述。The following takes the client device as an NVMe client device and the storage node as an NVMe Target server as an example to describe the application environment in this disclosure.
现有的NVM Express over Fabrics Revision 1.1a规范要求,NVMe层IO队列与底层网络链接为一一对应关系,本公开提出将存储层、网络层解耦,NVMe协议与网络链接支持1:N(N不小于1)映射关系,NVMe的请求消息、响应消息支持在不同的网络链接上传输,客户端与分布式集群中所有服务器都建立网络链接,这些网络联机映射到同一条存储层NVMe协议通道,实现一条NVMe Path(路径)映射到多条网络链接上,从而彻底解决标准NVMe客户端访问分布式存储集群存在的问题,以获得更高性能、更好的用户体验。本公开与已有的MultiPath(多路径)功能不同,多路径是NVMe协议层面存在多条路径通道,根据协议规范要求选择一条NVMe链接实现IO交互。The existing NVM Express over Fabrics Revision 1.1a specification requires that the NVMe layer IO queue and the underlying network link have a one-to-one correspondence. This disclosure proposes to decouple the storage layer and network layer, and the NVMe protocol and network link support 1:N(N Not less than 1) Mapping relationship. NVMe request messages and response messages support transmission on different network links. The client establishes network links with all servers in the distributed cluster. These network connections are mapped to the same storage layer NVMe protocol channel. Realize that one NVMe Path (path) is mapped to multiple network links, thereby completely solving the problem of standard NVMe clients accessing distributed storage clusters to obtain higher performance and better user experience. This disclosure is different from the existing MultiPath (multipath) function. Multipath means that there are multiple path channels at the NVMe protocol level. According to the protocol specification requirements, an NVMe link is selected to implement IO interaction.
基于以上发明构思说明,给出如图4所示的分布式存储系统中节点与客户端的链接原理示意图。在图4中,客户端与分布式存储系统中所有服务器节点分别 创建网络链接,多条网络链接聚合实现一条NVMe访问路径,客户端通过一条NVMe路径对整个集群进行读写。Based on the above description of the inventive concept, a schematic diagram of the link principle between nodes and clients in the distributed storage system as shown in Figure 4 is given. In Figure 4, the client and all server nodes in the distributed storage system are respectively Create a network link, aggregate multiple network links to implement an NVMe access path, and the client reads and writes the entire cluster through an NVMe path.
请参照图5,图5为以分布式存储系统中的节点X、节点Y以及客户端设备三者之间的交互流程图。Please refer to Figure 5, which is an interaction flow chart between node X, node Y and client device in the distributed storage system.
步骤501:客户端向节点X发送IO请求,要求X节点具有数据分区表的,能够确定目标空间的位置信息;步骤502:节点X收到并解析IO请求,根据请求的地址空间计算目标节点为存储节点Y,将请求消息转发到节点Y,X节点的该NVMe会话生命周期结束;步骤503:存储节点Y收到NVMe请求,解析处理NVMe消息,执行对应的IO请求;步骤504:节点Y发送NVMe响应消息到客户端。Step 501: The client sends an IO request to node X, requiring node X to have a data partition table and be able to determine the location information of the target space; Step 502: Node Storage node Y forwards the request message to node Y. The NVMe session life cycle of node X ends; Step 503: Storage node Y receives the NVMe request, parses and processes the NVMe message, and executes the corresponding IO request; Step 504: Node Y sends NVMe responds with messages to the client.
为了实现上述发明构思,本公开对NVMe客户端设备以及NVMe Target服务端分别做如下扩展。In order to realize the above inventive concept, this disclosure makes the following extensions to the NVMe client device and the NVMe Target server respectively.
NVMe客户端设备侧:根据NVMe协议定义,客户端通过connect命令与服务端建立网络链接,NVMe协议的提交队列(submition queue)、完成队列(completion queue)分别映射到网络链接的提交队列、完成队列。根据本公开,connect命令指定服务端地址列表,客户端与所有服务节点建立网络链接,以RDMA为例,所有链接共享同一个RDMA保护域(Protection Domain,简称PD),服务端通过RDMA网络共享客户端内存域(Memory Region)。NVMe client device side: According to the NVMe protocol definition, the client establishes a network link with the server through the connect command. The submission queue and completion queue of the NVMe protocol are mapped to the submission queue and completion queue of the network link respectively. . According to this disclosure, the connect command specifies the server address list, and the client establishes network links with all service nodes. Taking RDMA as an example, all links share the same RDMA Protection Domain (PD), and the server shares the client through the RDMA network. Terminal memory region (Memory Region).
客户端发送消息时,使用NVMe消息头的保留字段保存链路标识FID(Fabric ID),网络层根据消息头中携带的FID向正确的服务端发送标准NVMe请求消息。When the client sends a message, it uses the reserved field of the NVMe message header to save the link identifier FID (Fabric ID). The network layer sends a standard NVMe request message to the correct server based on the FID carried in the message header.
客户端接收消息时,多个网络链接共享一个完成队列(complete queue),所有链路接收的消息都由该完成队列通知NVMe层协议栈,进而由客户端接收并处理。When the client receives a message, multiple network links share a completion queue. The completion queue notifies the NVMe layer protocol stack of messages received by all links, which are then received and processed by the client.
本公开对客户端的增强还包括地址映射表缓存功能,所有读写请求的响应消息都由目标节点返回,客户端接收响应时立即更新该请求的目标地址、目标节点的映射关系,后续对该地址的访问可以直接向目标节点发起请求,减少请求转发的概率,可以大大提升读写效率。The enhancement of the client in this disclosure also includes the address mapping table caching function. The response messages of all read and write requests are returned by the target node. When the client receives the response, it immediately updates the target address of the request and the mapping relationship of the target node. Subsequently, the address Access can directly initiate requests to the target node, reducing the probability of request forwarding and greatly improving read and write efficiency.
NVMe Target服务端侧:分布式存储集群实现一个逻辑存储卷,通过NVMe Target方式对外提供存储卷服务,本公开对NVMe Target服务端进行增强。NVMe Target server side: The distributed storage cluster implements a logical storage volume and provides external storage volume services through NVMe Target. This disclosure enhances the NVMe Target server side.
其一,NVMe Target服务端之间建立NVMe Target服务专用的fabric链接,该链接支持NVMe请求原样转发到其他节点,因此NVMe Target服务端能够同时接收来自客户端、其他NVMe Target服务端的NVMe请求。First, a fabric link dedicated to the NVMe Target service is established between NVMe Target servers. This link supports forwarding NVMe requests to other nodes as they are, so the NVMe Target server can receive NVMe requests from the client and other NVMe Target servers at the same time.
其二,每个NVMe Target服务端能够根据请求计算目标所在服务器节点。Second, each NVMe Target server can calculate the server node where the target is located based on the request.
其三,NVMe Target服务端只向客户端发送NVMe响应消息,不论请求是否由其他节点中转过来。Third, the NVMe Target server only sends NVMe response messages to the client, regardless of whether the request is forwarded by other nodes.
本公开对NVMe-oF协议的客户端、服务端做了扩展,网络层传输的协议层消息与现有标准协议一致,因此,本公开增强NVMe客户端可以兼容标准的NVMe-oF  Target服务端,标准的NVMe客户端可以访问本公开增强的Target服务端。This disclosure extends the client and server of the NVMe-oF protocol. The protocol layer messages transmitted by the network layer are consistent with the existing standard protocols. Therefore, the enhanced NVMe client of this disclosure can be compatible with the standard NVMe-oF Target server, a standard NVMe client can access the Target server enhanced by this disclosure.
本公开对NVMe协议客户端、服务端实现方案的改进,网络层不局限于RDMA,兼容标准NVMe协议支持的网络类型。This disclosure improves the NVMe protocol client and server implementation solutions. The network layer is not limited to RDMA and is compatible with network types supported by the standard NVMe protocol.
基于同一构思,本公开中提供了一种客户端设备,该客户端设备的具体实施可参见方法实施例部分的描述,重复之处不再赘述,如图6所示,该客户端设备主要包括:第一发送单元601、第一接收单元602。Based on the same concept, this disclosure provides a client device. For the specific implementation of the client device, please refer to the description of the method embodiment section, and the repeated points will not be repeated. As shown in Figure 6, the client device mainly includes : first sending unit 601, first receiving unit 602.
第一发送单元601,配置为通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;第一接收单元602,配置为通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。在一示例性实施例中,客户端设备包括IO客户端设备。The first sending unit 601 is configured to send a first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system; the first receiving Unit 602 is configured to receive a read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request. The second storage node is A node capable of executing the first read and write request determined by the first storage node; the first network link and the second network link belong to the same storage protocol channel. In an exemplary embodiment, the client device includes an IO client device.
在一示例性实施例中,客户端设备包括:存储协议层发送单元、存储协议层接收单元、网络层第一发送单元、网络层第一接收单元、网络层第二发送单元、网络层第二接收单元。In an exemplary embodiment, the client device includes: a storage protocol layer sending unit, a storage protocol layer receiving unit, a first network layer sending unit, a first network layer receiving unit, a second network layer sending unit, a second network layer receiving unit.
存储协议层发送单元发送第一读写请求,由网络层第一发送单元向第一存储节点发送第一读写请求,网络层第二接收单元从第二存储节点接收读写响应,存储协议层接收单元将读写响应判定为第一读写请求的响应。第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。The storage protocol layer sending unit sends the first read and write request, the network layer first sending unit sends the first read and write request to the first storage node, the network layer second receiving unit receives the read and write response from the second storage node, the storage protocol layer The receiving unit determines the read-write response as a response to the first read-write request. The second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
在一示例性实施例中,客户端设备,包括:存储协议层发送单元、存储协议层接收单元、网络层第一发送单元、网络层第一接收单元、网络层第二发送单元、网络层第二接收单元等。存储协议层发送单元选择网络层第一发送单元发送IO读写请求,存储协议层接收单元支持网络层第二接收单元接收存储层读写响应,网络层第一发送单元为网络层任一发送单元,网络层第二接收单元为网络层任一接收单元。其中,读写请求包含存储协议层信息、网络层链路信息。In an exemplary embodiment, the client device includes: a storage protocol layer sending unit, a storage protocol layer receiving unit, a first sending unit of the network layer, a first receiving unit of the network layer, a second sending unit of the network layer, a third sending unit of the network layer. Two receiving units, etc. The storage protocol layer sending unit selects the first sending unit of the network layer to send the IO read and write request. The storage protocol layer receiving unit supports the second receiving unit of the network layer to receive the storage layer read and write response. The first sending unit of the network layer is any sending unit of the network layer. , the second receiving unit of the network layer is any receiving unit of the network layer. Among them, read and write requests include storage protocol layer information and network layer link information.
在一示例性实施例中,网络层第二接收单元将来自第二存储节点的读写响应转交给存储协议层接收单元,存储协议层接收单元解析读写响应的存储协议层标识和网络层标识,比较读写响应的存储协议层标识匹配第一读写请求的存储协议层标识,网络第二接收单元接收的读写响应为网络第二发送单元发送的第一读写请求的响应。存储协议层记录第二网络层标识,建立读写请求目标数据与第二网络链接标识映射关系。第一网络链接和第二网络链接为存储协议层的任意网络链接通道。In an exemplary embodiment, the second receiving unit of the network layer transfers the read and write response from the second storage node to the storage protocol layer receiving unit, and the storage protocol layer receiving unit parses the storage protocol layer identifier and network layer identifier of the read and write response. , compare the storage protocol layer identifier of the read-write response with the storage protocol layer identifier of the first read-write request, and the read-write response received by the second receiving unit of the network is a response to the first read-write request sent by the second sending unit of the network. The storage protocol layer records the second network layer identifier and establishes a mapping relationship between the read and write request target data and the second network link identifier. The first network link and the second network link are any network link channels of the storage protocol layer.
在一示例性实施例中,该客户端设备还用于:将读写响应作为针对第一读写请求的响应之后,从读写响应中提取第二存储节点的标识;基于第二存储节点的 标识,建立第二存储节点与客户端设备的映射关系,映射关系指示读写响应存储在第二存储节点。In an exemplary embodiment, the client device is further configured to: after using the read and write response as a response to the first read and write request, extract the identification of the second storage node from the read and write response; based on the second storage node identification, establishing a mapping relationship between the second storage node and the client device, and the mapping relationship indicates that the read and write responses are stored in the second storage node.
在一示例性实施例中,该客户端设备还用于:基于第二存储节点的标识,建立第二存储节点与客户端设备的映射关系之后,基于映射关系,生成第二读写请求,第二读写请求用于访问读写响应;通过第二网络链接,向第二存储节点发送第二读写请求,并接收第二存储节点响应于第二读写请求返回的读写响应。In an exemplary embodiment, the client device is further configured to: after establishing a mapping relationship between the second storage node and the client device based on the identity of the second storage node, generate a second read-write request based on the mapping relationship. The second read-write request is used to access the read-write response; send the second read-write request to the second storage node through the second network link, and receive the read-write response returned by the second storage node in response to the second read-write request.
在一示例性实施例中,该客户端设备还用于:通过第二网络链接,向第二存储节点发送第二读写请求之前,获取网络层中的第一提交队列和第二提交队列,第一提交队列用于向第一存储节点发送读写请求,第二提交队列用于向第二存储节点发送读写请求;以及获取存储协议层中的提交队列,存储协议层的提交队列用于向第一提交队列和第二提交队列发送读写请求;将网络层第一提交队列和网络层第二提交队列映射到存储协议层的提交队列。In an exemplary embodiment, the client device is further configured to: obtain the first submission queue and the second submission queue in the network layer before sending the second read and write request to the second storage node through the second network link, The first submission queue is used to send read and write requests to the first storage node, the second submission queue is used to send read and write requests to the second storage node; and the submission queue in the storage protocol layer is obtained, and the submission queue in the storage protocol layer is used to Send read and write requests to the first submission queue and the second submission queue; map the first submission queue of the network layer and the second submission queue of the network layer to the submission queue of the storage protocol layer.
在一示例性实施例中,客户端设备还用于:通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应之前,获取网络层中的第一完成队列和第二完成队列,第一完成队列用于接收来自于第一存储节点的读写响应,第二完成队列用于接收来自于第二存储节点的读写响应;以及获取存储协议层的完成队列,存储协议层的完成队列用于接收来自于网络层第一完成队列的读写响应或网络层第二完成队列的读写响应;将网络层第一完成队列和网络层第二完成队列映射到存储协议层的完成队列。In an exemplary embodiment, the client device is further configured to: obtain the first completion queue in the network layer before receiving the read and write response from the second storage node through the second network link established with the second storage node. and a second completion queue, the first completion queue is used to receive read and write responses from the first storage node, the second completion queue is used to receive read and write responses from the second storage node; and obtains the completion queue of the storage protocol layer , the completion queue of the storage protocol layer is used to receive read and write responses from the first completion queue of the network layer or the read and write responses of the second completion queue of the network layer; map the first completion queue of the network layer and the second completion queue of the network layer to Stores the completion queue for the protocol layer.
在一示例性实施例中,该客户端设备还用于:将读写响应作为针对第一读写请求的响应之前,解析读写响应中的会话标识、以及第一读写请求中的会话标识;确定读写响应中的会话标识和第一读写请求中的会话标识相同。In an exemplary embodiment, the client device is further configured to: parse the session identifier in the read-write response and the session identifier in the first read-write request before using the read-write response as a response to the first read-write request. ; Make sure the session ID in the read-write response is the same as the session ID in the first read-write request.
基于同一构思,本公开中提供了一种第二存储节点,该第二存储节点的具体实施可参见方法实施例部分的描述,重复之处不再赘述,如图7所示,该第二存储节点主要包括:第二接收单元701、第二发送单元702。Based on the same concept, the present disclosure provides a second storage node. For the specific implementation of the second storage node, please refer to the description in the method embodiment section. Repeated parts will not be repeated. As shown in Figure 7, the second storage node The node mainly includes: a second receiving unit 701 and a second sending unit 702.
第二接收单元701,配置为接收来自于第一存储节点的第一读写请求,第一存储节点为存储系统中的任一存储节点;其中,第二接收单元包括内部网络层接收单元。第二发送单元702,配置为响应于第一读写请求,通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应,第二网络链接和第一网络链接属于同一条存储协议通道,第一网络链接为客户端设备与第一存储节点建立的网络链接。其中,第二发送单元包括客户端网络层发送单元。The second receiving unit 701 is configured to receive the first read and write request from the first storage node, which is any storage node in the storage system; wherein the second receiving unit includes an internal network layer receiving unit. The second sending unit 702 is configured to respond to the first read and write request, return a read and write response to the client device through the second network link established by the client device and the second storage node, the second network link and the first network link. Belonging to the same storage protocol channel, the first network link is the network link established between the client device and the first storage node. Wherein, the second sending unit includes a client network layer sending unit.
在一示例性实施例中,一种第二存储节点,包括:集群内部网络层发送单元、集群内部网络层接收单元、客户端网络层发送单元、客户端网络层接收单元。集群内部网络层接收单元,接收来自于第一存储节点的第一读写请求,第一存储节点为存储系统中的任一存储节点;客户端网络层发送单元通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应。第二网络链接属于 客户端与存储集群建立的存储协议层链接通道。第二存储节点为集群中任一存储节点。In an exemplary embodiment, a second storage node includes: a cluster internal network layer sending unit, a cluster internal network layer receiving unit, a client network layer sending unit, and a client network layer receiving unit. The cluster internal network layer receiving unit receives the first read and write request from the first storage node, which is any storage node in the storage system; the client network layer sending unit communicates with the second storage node through the client device The established second network link returns a read and write response to the client device. The second network link belongs to The storage protocol layer link channel established between the client and the storage cluster. The second storage node is any storage node in the cluster.
在一示例性实施例中,第二存储节点还用于:响应于第一读写请求,通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应之后,通过第二网络链接,接收来自于客户端设备的第二读写请求,第二读写请求用于访问读写响应;响应于第二读写请求,通过第二网络链接,向客户端设备返回读写响应。In an exemplary embodiment, the second storage node is further configured to: in response to the first read and write request, after returning a read and write response to the client device through the second network link established by the client device and the second storage node, Receive a second read and write request from the client device through the second network link, the second read and write request is used to access the read and write response; respond to the second read and write request, return to the client device through the second network link Read and write responses.
基于同一构思,本公开中还提供了一种电子设备,如图8所示,该电子设备主要包括:处理器801、存储器802和通信总线803,其中,处理器801和存储器802通过通信总线803完成相互间的通信。其中,存储器802中存储有可被处理器801执行的程序,处理器801执行存储器802中存储的程序,实现如下步骤:通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道;Based on the same concept, the present disclosure also provides an electronic device. As shown in Figure 8, the electronic device mainly includes: a processor 801, a memory 802 and a communication bus 803. The processor 801 and the memory 802 communicate through the communication bus 803. complete mutual communication. Among them, the memory 802 stores a program that can be executed by the processor 801. The processor 801 executes the program stored in the memory 802 to implement the following steps: sending a message to the first storage node through the first network link established with the first storage node. For the first read and write request, the first storage node is any storage node in the storage system; through the second network link established with the second storage node, the read and write response from the second storage node is received, and the read and write response is In response to the first read and write request, the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel;
或,or,
接收来自于第一存储节点的第一读写请求,第一存储节点为存储系统中的任一存储节点;响应于第一读写请求,通过客户端设备与第二存储节点建立的第二网络链接,向客户端设备返回读写响应,第二网络链接和第一网络链接属于同一条存储协议通道,第一网络链接为客户端设备与第一存储节点建立的网络链接。Receive a first read and write request from a first storage node, which is any storage node in the storage system; in response to the first read and write request, through the second network established by the client device and the second storage node link to return a read and write response to the client device. The second network link and the first network link belong to the same storage protocol channel. The first network link is the network link established between the client device and the first storage node.
上述电子设备中提到的通信总线803可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。该通信总线803可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 803 mentioned in the above electronic equipment may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The communication bus 803 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 8, but it does not mean that there is only one bus or one type of bus.
存储器802可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。在一示例性实施例中,存储器还可以是至少一个位于远离前述处理器801的存储装置。The memory 802 may include random access memory (RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. In an exemplary embodiment, the memory may also be at least one storage device located remotely from the aforementioned processor 801.
上述的处理器801可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等,还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。 The above-mentioned processor 801 can be a general-purpose processor, including a Central Processing Unit (CPU for short), a Network Processor (NP for short), etc., or it can also be a Digital Signal Processing (DSP for short). ), Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
在本公开的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当该计算机程序在计算机上运行时,使得计算机执行上述实施例中所描述的数据读写方法。In yet another embodiment of the present disclosure, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program. When the computer program is run on a computer, it causes the computer to execute the above embodiments. The described data reading and writing methods.
本公开提供的方案,通过与存储集群中多个存储节点建立多个网络链接,同时建立一个存储协议层链接;多个网络链接属于同一存储协议链路。The solution provided by this disclosure establishes multiple network links with multiple storage nodes in the storage cluster and simultaneously establishes a storage protocol layer link; multiple network links belong to the same storage protocol link.
本公开提供的上述技术方案与现有技术相比具有如下优点:本公开提供的该方法,通过与第一存储节点建立的第一网络链接,向第一存储节点发送第一读写请求,第一存储节点为存储系统中的任一存储节点;通过与第二存储节点建立的第二网络链接,接收来自于第二存储节点的读写响应,并将读写响应作为针对第一读写请求的响应,第二存储节点为由第一存储节点确定的能够执行第一读写请求的节点;第一网络链接和第二网络链接属于同一条存储协议通道。由于第一网络链接和第二网络链接属于同一条存储协议通道,因此虽然客户端设备是向第一存储节点发送第一读写请求,但是仍然能实现由第二存储节点执行第一读写请求后,将读写响应发送到客户端设备,以使得客户端设备能够识别该读写响应是针对第一读写请求的。The above technical solution provided by the present disclosure has the following advantages compared with the existing technology: the method provided by the present disclosure sends a first read and write request to the first storage node through the first network link established with the first storage node. One storage node is any storage node in the storage system; through the second network link established with the second storage node, the read and write response from the second storage node is received, and the read and write response is used as the first read and write request. In response, the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel. Since the first network link and the second network link belong to the same storage protocol channel, although the client device sends the first read and write request to the first storage node, the first read and write request can still be executed by the second storage node. Finally, the read-write response is sent to the client device, so that the client device can recognize that the read-write response is for the first read-write request.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本公开的流程或功能。该计算机可以时通用计算机、专用计算机、计算机网络或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、微波等)方式向另外一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如软盘、硬盘、磁带等)、光介质(例如DVD)或者半导体介质(例如固态硬盘)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions, when loaded and executed on a computer, produce processes or functions in accordance with the present disclosure, in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another, e.g., from a website, computer, server, or data center via a wireline (e.g., Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, microwave, etc.) means to transmit to another website, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes, etc.), optical media (such as DVDs), or semiconductor media (such as solid state drives).
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本 公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。 The above are only specific embodiments of the present disclosure, enabling those skilled in the art to understand or implement the present disclosure. public. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the disclosure. Therefore, the present disclosure is not to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims (13)

  1. 一种数据读写方法,应用于客户端设备,所述方法包括:A data reading and writing method, applied to client devices, the method includes:
    通过与第一存储节点建立的第一网络链接,向所述第一存储节点发送第一读写请求,所述第一存储节点为存储系统中的任一存储节点;Send a first read and write request to the first storage node through the first network link established with the first storage node, where the first storage node is any storage node in the storage system;
    通过与第二存储节点建立的第二网络链接,接收来自于所述第二存储节点的读写响应,并将所述读写响应作为针对所述第一读写请求的响应,所述第二存储节点为由所述第一存储节点确定的能够执行所述第一读写请求的节点;所述第一网络链接和所述第二网络链接属于同一条存储协议通道。Through the second network link established with the second storage node, a read and write response from the second storage node is received, and the read and write response is used as a response to the first read and write request, and the second The storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  2. 根据权利要求1所述的方法,其中,将所述读写响应作为针对所述第一读写请求的响应之后,还包括:The method according to claim 1, wherein after using the read and write response as a response to the first read and write request, it further includes:
    从所述读写响应中提取所述第二存储节点的标识;Extract the identification of the second storage node from the read and write response;
    基于所述第二存储节点的标识,建立所述第二存储节点与所述客户端设备的映射关系,所述映射关系指示所述读写响应存储在所述第二存储节点。Based on the identification of the second storage node, a mapping relationship between the second storage node and the client device is established, where the mapping relationship indicates that the read and write response is stored in the second storage node.
  3. 根据权利要求2所述的方法,其中,基于所述第二存储节点的标识,建立所述第二存储节点与所述客户端设备的映射关系之后,还包括:The method according to claim 2, wherein, after establishing the mapping relationship between the second storage node and the client device based on the identity of the second storage node, it further includes:
    基于所述映射关系,生成第二读写请求,所述第二读写请求用于访问所述读写响应;Based on the mapping relationship, generate a second read and write request, the second read and write request is used to access the read and write response;
    通过所述第二网络链接,向所述第二存储节点发送所述第二读写请求,并接收所述第二存储节点返回的所述读写响应。Send the second read and write request to the second storage node through the second network link, and receive the read and write response returned by the second storage node.
  4. 根据权利要求3所述的方法,其中,通过所述第二网络链接,向所述第二存储节点发送所述第二读写请求之前,还包括:The method according to claim 3, wherein before sending the second read and write request to the second storage node through the second network link, it further includes:
    获取网络层中的第一提交队列和第二提交队列,所述第一提交队列用于向所述第一存储节点发送读写请求,所述第二提交队列用于向第二存储节点发送读写请求;以及获取存储协议层中的提交队列,所述存储协议层的提交队列用于向所述第一提交队列和所述第二提交队列发送读写请求;Obtain the first submission queue and the second submission queue in the network layer. The first submission queue is used to send read and write requests to the first storage node. The second submission queue is used to send read and write requests to the second storage node. write request; and obtain the submission queue in the storage protocol layer, the submission queue of the storage protocol layer is used to send read and write requests to the first submission queue and the second submission queue;
    将所述第一提交队列和所述第二提交队列映射到所述存储协议层的提交队列。 The first submission queue and the second submission queue are mapped to a submission queue of the storage protocol layer.
  5. 根据权利要求1所述的方法,其中,通过与第二存储节点建立的第二网络链接,接收来自于所述第二存储节点的读写响应之前,还包括:The method according to claim 1, wherein before receiving the read and write response from the second storage node through the second network link established with the second storage node, it further includes:
    获取网络层中的第一完成队列和第二完成队列,所述第一完成队列用于接收来自于所述第一存储节点的读写响应,所述第二完成队列用于接收来自于所述第二存储节点的读写响应;以及获取存储协议层的完成队列,所述存储协议层的完成队列用于接收来自于所述第一完成队列的读写响应或所述第二完成队列的读写响应;Obtain the first completion queue and the second completion queue in the network layer. The first completion queue is used to receive read and write responses from the first storage node. The second completion queue is used to receive the read and write responses from the first storage node. The read and write response of the second storage node; and obtaining the completion queue of the storage protocol layer, which is used to receive the read and write response from the first completion queue or the read and write response of the second completion queue. write response;
    将所述第一完成队列和所述第二完成队列映射到所述存储协议层的完成队列。The first completion queue and the second completion queue are mapped to a completion queue of the storage protocol layer.
  6. 根据权利要求1所述的方法,其中,将所述读写响应作为针对所述第一读写请求的响应之前,还包括:The method according to claim 1, wherein before using the read and write response as a response to the first read and write request, it further includes:
    解析所述读写响应中的存储协议层会话标识、以及所述第一读写请求中的存储协议层会话标识;Parse the storage protocol layer session identifier in the read and write response and the storage protocol layer session identifier in the first read and write request;
    确定所述读写响应中的存储协议层会话标识和所述第一读写请求中的存储协议层会话标识相同。It is determined that the storage protocol layer session identifier in the read and write response is the same as the storage protocol layer session identifier in the first read and write request.
  7. 一种数据读写方法,应用于第二存储节点,所述方法包括:A data reading and writing method, applied to the second storage node, the method includes:
    接收来自于第一存储节点的第一读写请求,所述第一存储节点为存储系统中的任一存储节点;Receive a first read and write request from a first storage node, where the first storage node is any storage node in the storage system;
    响应于所述第一读写请求,通过客户端设备与所述第二存储节点建立的第二网络链接,向所述客户端设备返回读写响应,所述第二网络链接和第一网络链接属于同一条存储协议通道,所述第一网络链接为所述客户端设备与所述第一存储节点建立的网络链接。In response to the first read and write request, return a read and write response to the client device through the second network link established by the client device and the second storage node, the second network link and the first network link Belonging to the same storage protocol channel, the first network link is a network link established between the client device and the first storage node.
  8. 根据权利要求7所述的方法,其中,响应于所述第一读写请求,通过客户端设备与所述第二存储节点建立的第二网络链接,向所述客户端设备返回读写响应之后,还包括:The method of claim 7, wherein in response to the first read and write request, after returning a read and write response to the client device through a second network link established by the client device and the second storage node ,Also includes:
    通过所述第二网络链接,接收来自于所述客户端设备的第二读写请求,所述第二读写请求用于访问所述读写响应; Receive a second read-write request from the client device through the second network link, where the second read-write request is used to access the read-write response;
    响应于所述第二读写请求,通过所述第二网络链接,向所述客户端设备返回所述读写响应。In response to the second read and write request, the read and write response is returned to the client device through the second network link.
  9. 一种数据读写系统,包括:A data reading and writing system, including:
    客户端设备、第一存储节点和第二存储节点;A client device, a first storage node and a second storage node;
    所述客户端设备用于通过与所述第一存储节点建立的第一网络链接,向所述第一存储节点发送第一读写请求,所述第一存储节点为存储系统中的任一存储节点;通过与所述第二存储节点建立的第二网络链接,接收来自于所述第二存储节点的读写响应,并将所述读写响应作为针对所述第一读写请求的响应,所述第二存储节点为由所述第一存储节点确定的能够执行所述第一读写请求的节点;所述第一网络链接和所述第二网络链接属于同一条存储协议通道。The client device is configured to send a first read and write request to the first storage node through the first network link established with the first storage node. The first storage node is any storage in the storage system. Node; receiving a read and write response from the second storage node through the second network link established with the second storage node, and using the read and write response as a response to the first read and write request, The second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel.
  10. 一种客户端设备,包括:A client device that includes:
    第一发送单元,配置为通过与第一存储节点建立的第一网络链接,向所述第一存储节点发送第一读写请求,所述第一存储节点为存储系统中的任一存储节点;The first sending unit is configured to send the first read and write request to the first storage node through the first network link established with the first storage node, and the first storage node is any storage node in the storage system;
    第一接收单元,用于通过与第二存储节点建立的第二网络链接,接收来自于所述第二存储节点的读写响应,并将所述读写响应作为针对所述第一读写请求的响应,所述第二存储节点为由所述第一存储节点确定的能够执行所述第一读写请求的节点;所述第一网络链接和所述第二网络链接属于同一条存储协议通道。A first receiving unit configured to receive a read and write response from the second storage node through the second network link established with the second storage node, and use the read and write response as a response to the first read and write request. In response, the second storage node is a node determined by the first storage node that can execute the first read and write request; the first network link and the second network link belong to the same storage protocol channel .
  11. 一种第二存储节点,包括:A second storage node including:
    第二接收单元,配置为接收来自于第一存储节点的第一读写请求,所述第一存储节点为存储系统中的任一存储节点;The second receiving unit is configured to receive the first read and write request from the first storage node, where the first storage node is any storage node in the storage system;
    第二发送单元,配置为响应于所述第一读写请求,通过客户端设备与所述第二存储节点建立的第二网络链接,向所述客户端设备返回读写响应,所述第二网络链接和第一网络链接属于同一条存储协议通道,所述第一网络链接为所述客户端设备与所述第一存储节点建立的网络链接。The second sending unit is configured to respond to the first read and write request and return a read and write response to the client device through the second network link established by the client device and the second storage node, the second The network link and the first network link belong to the same storage protocol channel, and the first network link is a network link established between the client device and the first storage node.
  12. 一种电子设备,包括:处理器、存储器和通信总线,其中,处理器和存储器通过通信总线完成相互间的通信;An electronic device includes: a processor, a memory and a communication bus, wherein the processor and the memory complete communication with each other through the communication bus;
    所述存储器,用于存储计算机程序; The memory is used to store computer programs;
    所述处理器,用于执行所述存储器中所存储的程序,实现权利要求1-6任一项所述的数据读写方法或权利要求7-8所述的数据读写方法。The processor is configured to execute the program stored in the memory to implement the data reading and writing method described in any one of claims 1-6 or the data reading and writing method described in claims 7-8.
  13. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-6任一项所述的数据读写方法或权利要求7-8所述的数据读写方法。 A computer-readable storage medium storing a computer program, which when executed by a processor implements the data reading and writing method described in any one of claims 1-6 or the data reading and writing described in claims 7-8 method.
PCT/CN2023/081675 2022-03-16 2023-03-15 Data read-write method, and device, storage node and storage medium WO2023174341A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210258761.1 2022-03-16
CN202210258761.1A CN116804908A (en) 2022-03-16 2022-03-16 Data reading and writing method, device, storage node and storage medium

Publications (1)

Publication Number Publication Date
WO2023174341A1 true WO2023174341A1 (en) 2023-09-21

Family

ID=88022398

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081675 WO2023174341A1 (en) 2022-03-16 2023-03-15 Data read-write method, and device, storage node and storage medium

Country Status (2)

Country Link
CN (1) CN116804908A (en)
WO (1) WO2023174341A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111448A (en) * 2011-01-13 2011-06-29 华为技术有限公司 Data prefetching method of DHT memory system and node and system
CN108701004A (en) * 2017-01-25 2018-10-23 华为技术有限公司 A kind of system of data processing, method and corresponding intrument
US10244069B1 (en) * 2015-12-24 2019-03-26 EMC IP Holding Company LLC Accelerated data storage synchronization for node fault protection in distributed storage system
CN110286849A (en) * 2019-05-10 2019-09-27 深圳物缘科技有限公司 The data processing method and device of data-storage system
CN113014662A (en) * 2021-03-11 2021-06-22 联想(北京)有限公司 Data processing method and storage system based on NVMe-oF protocol

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111448A (en) * 2011-01-13 2011-06-29 华为技术有限公司 Data prefetching method of DHT memory system and node and system
US10244069B1 (en) * 2015-12-24 2019-03-26 EMC IP Holding Company LLC Accelerated data storage synchronization for node fault protection in distributed storage system
CN108701004A (en) * 2017-01-25 2018-10-23 华为技术有限公司 A kind of system of data processing, method and corresponding intrument
CN110286849A (en) * 2019-05-10 2019-09-27 深圳物缘科技有限公司 The data processing method and device of data-storage system
CN113014662A (en) * 2021-03-11 2021-06-22 联想(北京)有限公司 Data processing method and storage system based on NVMe-oF protocol

Also Published As

Publication number Publication date
CN116804908A (en) 2023-09-26

Similar Documents

Publication Publication Date Title
US11487690B2 (en) Universal host and non-volatile memory express storage domain discovery for non-volatile memory express over fabrics
US20210084537A1 (en) Load balance method and apparatus thereof
WO2020186909A1 (en) Virtual network service processing method, apparatus and system, and controller and storage medium
US7969989B2 (en) High performance ethernet networking utilizing existing fibre channel arbitrated loop HBA technology
US11544001B2 (en) Method and apparatus for transmitting data processing request
US10574477B2 (en) Priority tagging based solutions in fc sans independent of target priority tagging capability
CN112130748B (en) Data access method, network card and server
US20100306387A1 (en) Network interface device
US8527661B1 (en) Gateway for connecting clients and servers utilizing remote direct memory access controls to separate data path from control path
US11489921B2 (en) Kickstart discovery controller connection command
US20220222016A1 (en) Method for accessing solid state disk and storage device
JP7126021B2 (en) Configure an OpenFlow instance
US20190158627A1 (en) Method and device for generating forwarding information
WO2020134144A1 (en) Data or message forwarding method, node, and system
WO2022011563A1 (en) Internet of things configuration method and apparatus, computer device, and storage medium
WO2017185322A1 (en) Storage network element discovery method and device
WO2018107433A1 (en) Information processing method and device
WO2021175105A1 (en) Connection method and apparatus, device, and storage medium
WO2020187124A1 (en) Data processing method and device
WO2023174341A1 (en) Data read-write method, and device, storage node and storage medium
US9077741B2 (en) Establishing communication between entities in a shared network
TW201006191A (en) UPnP/DLNA device support apparatus, system, and method
CN111865801B (en) Virtio port-based data transmission method and system
WO2015123986A1 (en) Data recording method and system, and access server
WO2024001549A9 (en) Address configuration method and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769843

Country of ref document: EP

Kind code of ref document: A1