WO2024109240A1 - 一种nvmf存储集群节点互联方法、装置、设备及非易失性可读存储介质 - Google Patents

一种nvmf存储集群节点互联方法、装置、设备及非易失性可读存储介质 Download PDF

Info

Publication number
WO2024109240A1
WO2024109240A1 PCT/CN2023/116232 CN2023116232W WO2024109240A1 WO 2024109240 A1 WO2024109240 A1 WO 2024109240A1 CN 2023116232 W CN2023116232 W CN 2023116232W WO 2024109240 A1 WO2024109240 A1 WO 2024109240A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
subsystem
nvme
nvmf
Prior art date
Application number
PCT/CN2023/116232
Other languages
English (en)
French (fr)
Inventor
苑忠科
张凯
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024109240A1 publication Critical patent/WO2024109240A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/08Protocols for interworking; Protocol conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/163In-band adaptation of TCP data exchange; In-band control procedures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks

Definitions

  • the present application relates to the field of storage cluster communication technology, and in particular to an NVMF storage cluster node interconnection method, device, equipment and non-volatile readable storage medium.
  • FC Fibre Channel
  • PCIE Peripheral Component Interconnect Express
  • the purpose of this application is to provide a NVMF storage cluster node interconnection method, device, equipment and non-volatile readable storage medium, which can improve the performance and flexibility of storage cluster node interconnection.
  • the optional solutions are as follows:
  • the present application discloses a NVMF storage cluster node interconnection method, which is applied to any node in the storage cluster, including:
  • the subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node;
  • Cluster service transmission is performed based on the connection information.
  • the method before establishing a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, the method further includes:
  • performing node discovery on other nodes and obtaining subsystem NVMe qualified names of other nodes includes:
  • node discovery is performed on other nodes based on the NVMF protocol, and the subsystem NVMe qualified name returned by the other nodes is obtained, including:
  • the connect fabric command is used to notify the preset roce driver to establish an RDMA management queue, and then create an NVMe management queue with the discovery controllers in other nodes.
  • the discovery controller is configured through the fabric/admin command to obtain the discovery log page; the discovery log page includes the subsystem NVMe qualified names of other nodes discovered.
  • the preset roce driver is a driver created based on the roce protocol, wherein the roce protocol is a protocol for implementing RDMA transmission based on the udp protocol.
  • establishing a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes includes:
  • connect fabric command to notify the preset roce driver to establish an RDMA management queue, then create an NVMe management queue with the preset controllers in other nodes, complete the authentication operation through the authentication command, and configure the preset controller through the fabric/admin command; use the connect fabric command to notify the preset roce driver to establish an RDMA IO queue, then create an NVMe IO queue with the preset controller, and complete the authentication operation through the authentication command.
  • node discovery is performed on other nodes based on the NVMF protocol, and the subsystem NVMe qualified name returned by the other nodes is obtained, including:
  • TCP Transmission Control Protocol
  • establishing a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes includes:
  • Establish a TCP connection with other nodes through the connect fabric command then create an NVMe management queue with the preset controller in other nodes, complete the authentication operation through the authentication command, and configure the preset controller through the fabric/admin command; create an NVMe IO queue with the preset controller, and complete the authentication operation through the authentication command.
  • performing node discovery on other nodes and obtaining subsystem NVMe qualified names of other nodes includes:
  • performing node discovery on other nodes and obtaining subsystem NVMe qualified names of other nodes includes:
  • IP Internet Protocol Address
  • port Publish the IP (Internet Protocol Address) address, port, and subsystem NVMe qualified name of this node to other nodes through the link layer discovery protocol, and obtain the subsystem NVMe qualified names of other nodes returned by other nodes.
  • performing node discovery on other nodes and obtaining subsystem NVMe qualified names of other nodes includes:
  • the method further includes:
  • the identities of other nodes include: not being a storage node, being a storage node and a node of the present storage cluster, being a storage node and a node of another storage cluster, and being a storage node and a node not joining the storage cluster.
  • querying node information of other nodes through NVME instructions includes:
  • NVMe Use the NVMe identify command to query the controller information and obtain the subnqn, product information, and model information of other nodes as node information;
  • NVMe identify command to query the cluster node structure cluster node CNS, and obtain the cluster uuid, node uuid, port information, platform information, version information and other business identification information of other nodes as node information.
  • identifying the identities of other nodes based on the node information includes:
  • the identities of the other nodes are identified based on the cluster UUID, node UUID, port information, platform information, version information and other business identification information of the other nodes.
  • different service configurations are performed based on different identities, including:
  • identities of other nodes are storage nodes, different configurations of cluster services are performed according to different identities.
  • an NVMF storage cluster node interconnection device which is applied to any node in the storage cluster, including:
  • a communication connection establishment module is configured to establish a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and create multiple NVMe IO queues in the process of establishing the first communication connection; wherein each node has a preset subsystem and a subsystem NVMe qualified name uniquely corresponding to the node;
  • a connection information binding module is configured to bind its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes based on the first communication connection and the second communication connection to obtain a connection information pair; wherein the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node;
  • the cluster service transmission module is configured to perform cluster service transmission based on the connection information.
  • the present application discloses an NVMF storage node, which is applied to an NVMF storage cluster with multiple nodes, including:
  • a communication connection establishment module is configured to establish a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and create multiple NVMe IO queues in the process of establishing the first communication connection; wherein each node has a preset subsystem and a subsystem NVMe qualified name uniquely corresponding to the node;
  • a connection information binding module is configured to bind its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes based on the first communication connection and the second communication connection to obtain a connection information pair; wherein the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node;
  • the cluster service transmission module is configured to perform cluster service transmission based on the connection information.
  • the present application discloses an electronic device, including a memory and a processor, wherein:
  • a memory arranged to store a computer program
  • the processor is configured to execute a computer program to implement the aforementioned NVMF storage cluster node interconnection method.
  • the present application discloses a non-volatile readable storage medium, which is configured to store a computer program, wherein the computer program implements the aforementioned NVMF storage cluster node interconnection method when executed by a processor.
  • the present application establishes a first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and creates multiple NVMe IO queues in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and a subsystem NVMe qualified name that uniquely corresponds to the node, and binds its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes based on the first communication connection and the second communication connection to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node, and then cluster business transmission is performed based on the connection information pair.
  • each node has a preset subsystem for node interconnection, and there is a unique subsystem NVMe qualified name.
  • Any node establishes a first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and binds the two opposite connection information together as a connection information pair to realize bidirectional cluster interconnection business flow.
  • Transmission can use NVMe multiple queues to improve the performance of storage cluster node interconnection, and is adapted to the roce (RDMA over Converged Ethernet, RDMA based on converged Ethernet, RDMA (Remote Direct Memory Access)) transmission protocol and TCP transmission protocol. It can use the RDMA characteristics and the high bandwidth of the transmission network to improve performance. At the same time, based on the TCP protocol, it can improve the distance and flexibility of topology deployment.
  • FIG1 is a flow chart of a method for adapting storage management software and a storage system disclosed in an embodiment of the present application
  • FIG2 is a schematic diagram of an optional NVMF storage cluster node interconnection disclosed in an embodiment of the present application.
  • FIG3 is a schematic diagram of an optional node discovery and connection process disclosed in an embodiment of the present application.
  • FIG4 is a schematic diagram of the structure of an NVMF storage cluster node interconnection device disclosed in an embodiment of the present application.
  • FIG5 is a schematic diagram of a schematic structure of an electronic device disclosed in an embodiment of the present application.
  • FC interconnection solution has performance bottlenecks due to the SCSI single queue and bandwidth limitations, while the PCIE interconnection method has limitations in cable length and topology flexibility. Therefore, how to improve the performance and flexibility of storage cluster node interconnection is a problem that needs to be solved urgently.
  • the present application provides a NVMF (i.e., NVMe over Fabric, a remote access technology based on NVMe, a protocol standard proposed by the NVMe (i.e., Non-Volatile Memory Express, non-volatile memory host controller interface specification) standards organization to apply NVMe to various fabric networks, and defines how to use various common transport layer protocols to implement NVMe functions) storage cluster node interconnection solution, which can improve the performance and flexibility of storage cluster node interconnection.
  • NVMe Non-Volatile Memory Express, non-volatile memory host controller interface specification
  • an embodiment of the present application discloses a NVMF storage cluster node interconnection method, which is applied to any node in the storage cluster, including:
  • Step S11 Establish a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and create multiple NVMe IO queues in the process of establishing the first communication connection; wherein each node has a preset subsystem and a subsystem NVMe qualified name uniquely corresponding to the node.
  • the embodiment of the present application can perform node discovery on other nodes and obtain the subsystem NVMe qualified name of other nodes.
  • node discovery may be performed on other nodes based on the NVMF protocol, and the subsystem NVMe qualified name returned by the other nodes may be obtained.
  • the connect fabric command can be used to notify the preset roce driver to establish an RDMA (Remote Direct Memory Access) management queue, and then create an NVMe management queue with the discovery controller in other nodes.
  • the discovery controller is configured through the fabric/admin command to obtain the discovery log page; the discovery log page includes the subsystem NVMe qualified names of other nodes discovered.
  • the transmission protocol used by the first communication connection is the TCP protocol
  • a TCP connection can be established with other nodes through the connect fabric command, and the subsystem NVMe qualified name returned by other nodes can be obtained through the TCP connection.
  • configuration information may be obtained, wherein the configuration information includes the IP address of other nodes and the subsystem NVMe qualified name. That is, the target node is specified through the configuration information.
  • the IP address, port, and subsystem NVMe qualified name of the node can be published to other nodes through the link layer discovery protocol, and the subsystem NVMe qualified names of other nodes returned by other nodes can be obtained.
  • node discovery may be performed on other nodes based on the mDNS protocol and the subsystem NVMe qualified names of other nodes may be obtained.
  • the process of establishing the first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes is: notify the preset roce driver to establish the RDMA management queue through the connect fabric instruction, and then create the NVMe management queue with the preset controller in other nodes, complete the authentication operation through the authentication instruction, and configure the preset controller through the fabric/admin instruction; notify the preset roce driver to establish multiple RDMA IO queues through the connect fabric instruction, and then create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication instruction.
  • the preset roce driver is a driver created based on the roce protocol
  • roce is a protocol based on udp to implement RDMA transmission, which is applied to NVME over RDMA.
  • the process of establishing the first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes is as follows: establishing a TCP connection with other nodes through the connect fabric instruction, and then creating an NVMe management queue with the preset controller in other nodes, and The authentication operation is completed through the fabric/admin command, and the preset controller is configured through the fabric/admin command; multiple NVMe IO queues are created with the preset controller, and the authentication operation is completed through the authentication command.
  • a first communication connection is established with the preset subsystem in other nodes, that is, a first communication connection is established with the preset controller in the preset subsystem in other nodes.
  • each node serves as both the initiator of the nvmf protocol and the nvmf subsystem (i.e., the preset subsystem) end.
  • Each node uses a unique subnqn (i.e., the subsystem NVMe qualified name) to create a node-wide nvmf subsystem dedicated to node interconnection, which is isolated from the node business path to avoid mutual influence.
  • each node initiates an nvmf connection to all nodes, that is, the initiator of each node connects to the target end of other nodes, i.e., the subsystem, and the initiator of other nodes also initiates a connection to the target end of this node.
  • the discovery method supports manual configuration, extended lldp protocol, mDNS and nvmf discovery.
  • the manual configuration method is to configure the IP and subnqn of each node to indicate the target end of nvmf.
  • the extended lldp publishes the IP, port and subsystem information to other nodes through the link layer discovery protocol.
  • the nvmf discovery method follows the standard protocol to implement the nvme discovery service (i.e. nvme discovery service) and the link and interaction process of the discovery controller (i.e. discovery controller).
  • the initiator between nodes realizes the connection, authentication, attribute acquisition and configuration of the management queue, the enabling of the controller, the creation and authentication process of the io queue to the target end.
  • the embodiment of the present application can query the node information of other nodes through NVME instructions; identify the identity of other nodes based on the node information, and perform different business configurations based on different identities. And, if it is a storage node, different configurations of cluster services are performed according to different identities.
  • the identity may include not being a storage node, being a storage node and being a node of this cluster, being a storage node and being a node of other clusters, and being a storage node and not being a node that has not joined the cluster.
  • the initiator can obtain the manufacturer and product information of the other end through the nvme standard and custom instructions, and obtain the cluster UUID (i.e., Universally Unique Identifier), node UUID, platform version and other business identification information, and complete the identity identification through the above information. If it is a storage node, configure segmented cluster services for it.
  • the cluster UUID i.e., Universally Unique Identifier
  • node UUID i.e., Universally Unique Identifier
  • platform version and other business identification information i.e., platform version and other business identification information
  • the NVMe identify command can be used to query the controller information, obtain the subnqn, product information, and model information of the peer device to preliminarily identify whether it is a storage node. If so, continue to use the NVMe identify command to query the customized cluster node CNS (i.e., cluster node structure) to obtain the cluster uuid, node uuid, port information, platform information, version information, and other business identification information of the peer node, and identify it according to the internal logic. There may be problems with the preliminary identification, and the final identification result shall prevail.
  • the customized cluster node CNS i.e., cluster node structure
  • the final identification results include: not a storage node, a storage node and a node in this cluster, a storage node and a node in another cluster, and a storage node and a node that has not joined the cluster.
  • the node identity attributes obtained above are bound to the link object of nvmf. If it is identified as not a storage node If it is a node, the connection information is reported to the platform module for other business use. If it is identified as a storage node, it is handed over to the cluster communication module, which performs different configurations for cluster business according to the different identities of the node and prepares for cluster node communication.
  • Step S12 Based on the first communication connection and the second communication connection, the subsystem NVMe qualified name, local port and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node.
  • the process of establishing the second communication connection may refer to the process of establishing the first communication connection, which will not be described in detail here.
  • the two nodes bind the two connection information in opposite directions together through the local subnqn, local port, peer subnqn, and peer port to identify them as a "cluster interconnection connection pair", thereby realizing two-way transmission of cluster interconnection service flows.
  • Step S13 performing cluster service transmission based on the connection information.
  • FIG. 2 is a schematic diagram of an optional NVMF storage cluster node interconnection disclosed in an embodiment of the present application, including an NVMe over fabric connection process and a storage node identification process.
  • FIG3 is a schematic diagram of an optional node discovery and connection process disclosed in an embodiment of the present application, NVMe over fabric connection process: the NVMF initiator implements the discovery function and uses the well-known nqn to establish a connection with the RDMA NVME node device.
  • the process is: notify the roce driver to establish an RDMA queue pair (admin) (i.e., RDMA management queue) through the connect fabric command, and then create an NVMe admin queue pair (i.e., NVMe management queue) with the discover controller (i.e., discovery controller), configure the discover controller through the fabric/admin command, and obtain the discover log page (i.e., discovery log page).
  • admin i.e., RDMA management queue
  • NVMe admin queue pair i.e., NVMe management queue
  • the discover controller i.e., discovery controller
  • the NVMF initiator uses the discovered nqn (i.e., subnqn) to establish a connection with the RDMA NVMe node device.
  • the process is as follows: the connect fabric command notifies the roce driver to establish an RDMA queue pair (admin), then creates an NVMe admin queue pair with the preset controller (i.e., the preset controller, the NVMe controller in Figure 3), completes the authentication operation through the Authentication Send/Receive command, and configures the preset controller through the fabric/admin command; the connect fabric command notifies the roce driver to establish RDMA queue pairs (io) (i.e., RDMA IO queues), then creates NVMe io queue pairs (i.e., NVMe IO queues) with the preset controller, and completes the authentication operation through the Authentication Send/Receive command.
  • notify the node identification module to perform node identity identification.
  • the storage node identification process may include: on each node, after the NVMf initiator completes the fabric connection, it reports the connection information to the node identification module, which is responsible for identifying the identity information of the connection object.
  • the node identification module which is responsible for identifying the identity information of the connection object.
  • use the NVMe identify command to query the controller information and obtain the subnqn, product information and model information of the peer device for preliminary identification. Is it a storage node? If so, continue to use the NVMe identify command to query the customized cluster node structure, obtain the cluster uuid, node uuid, port information, platform information, version information and other business identification information of the peer node, and identify it according to the internal logic.
  • the final identification results include: not a storage node, a storage node and a node of this cluster, a storage node and a node of another cluster, a storage node and a node that has not joined the cluster.
  • the above obtained node identity attributes are bound to the link object of nvmf. If it is not identified as a storage node, the connection information is reported to the platform module for other business use. If it is identified as a storage node, it is handed over to the cluster communication module to make different configurations for cluster business according to the different identities of the nodes, and prepare for cluster node communication.
  • NVMf creates a dedicated subsystem for interconnecting storage nodes in the local scope.
  • the initiator connects to the controller of the subsystem and queries the controller and node information, it is responsible for generating relevant information and returning it to the other party through protocol interaction.
  • the nvmf subsystem module determines that the two connection information are the same pair of interconnected node ports based on the local nqn (i.e., the local subsystem NVMe qualified name), the custom implemented local nvme port (local port), the remote nqn (the peer subsystem NVMe qualified name) and the custom implemented remote nvme port (peer port) in the two information, and then binds them together for node communication.
  • the local nqn i.e., the local subsystem NVMe qualified name
  • the custom implemented local nvme port local port
  • the remote nqn the peer subsystem NVMe qualified name
  • peer port the custom implemented remote nvme port
  • NVMe over fabric node interconnection in the embodiment of the present application is not limited to a specific fabric transmission protocol, and supports NVMe over ROCE, NVMe over TCP and NVMe over FC.
  • the embodiment of the present application establishes a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and creates multiple NVMe IO queues in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and there is a subsystem NVMe qualified name uniquely corresponding to the node, and based on the first communication connection and the second communication connection, its own subsystem NVMe qualified name, local port and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node, and then cluster service transmission is performed based on the connection information pair.
  • each node has a preset subsystem for node interconnection, and has a unique subsystem NVMe qualified name.
  • Any node establishes a first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and binds the two connection information in opposite directions together as a connection information pair to achieve two-way transmission of cluster interconnection business flows, and can utilize NVMe multi-queues to improve the performance of storage cluster node interconnection, and is adapted to the roce transmission protocol and the TCP transmission protocol, and can utilize the RDMA characteristics and the high bandwidth of the transmission network to improve performance, while based on the TCP protocol, it can improve the distance and flexibility of topology deployment.
  • an NVMF storage cluster node interconnection device which is applied to any node in the storage cluster, including:
  • the communication connection establishing module 11 is configured to establish a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and create multiple NVMe IO queues in the process of establishing the first communication connection; wherein each node has a preset subsystem and a subsystem NVMe qualified name uniquely corresponding to the node;
  • the connection information binding module 12 is configured to bind its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes based on the first communication connection and the second communication connection to obtain a connection information pair; wherein the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node;
  • the cluster service transmission module 13 is configured to perform cluster service transmission based on the connection information.
  • the embodiment of the present application establishes a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and creates multiple NVMe IO queues in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and there is a subsystem NVMe qualified name uniquely corresponding to the node, and based on the first communication connection and the second communication connection, its own subsystem NVMe qualified name, local port and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node, and then cluster service transmission is performed based on the connection information pair.
  • each node has a preset subsystem for node interconnection, and has a unique subsystem NVMe qualified name.
  • Any node establishes a first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and binds the two connection information in opposite directions together as a connection information pair to achieve two-way transmission of cluster interconnection business flows, and can utilize NVMe multi-queues to improve the performance of storage cluster node interconnection, and is adapted to the roce transmission protocol and the TCP transmission protocol, and can utilize the RDMA characteristics and the high bandwidth of the transmission network to improve performance, while based on the TCP protocol, it can improve the distance and flexibility of topology deployment.
  • the device also includes a node discovery module, which is configured to perform node discovery on other nodes and obtain the subsystem NVMe qualified names of other nodes.
  • the node discovery module is configured to perform node discovery on other nodes based on the NVMF protocol and obtain the subsystem NVMe qualified names returned by other nodes.
  • the node discovery module is configured to notify the preset roce driver to establish an RDMA management queue through the connect fabric instruction, and then create an NVMe management queue with the discovery controller in other nodes, configure the discovery controller through the fabric/admin instruction, and obtain the discovery log page; wherein the discovery log page includes the subsystem NVMe qualified names of other nodes discovered.
  • the node discovery module is configured to establish a TCP connection with other nodes through the connect fabric instruction, and obtain the subsystem NVMe qualified names returned by other nodes through the TCP connection.
  • the node discovery module is configured to obtain configuration information, wherein the configuration information includes the IP addresses of other nodes and the subsystem NVMe qualified names.
  • the node discovery module is configured to publish the IP address, port and subsystem NVMe qualified name of the node to other nodes through a link layer discovery protocol, and obtain the subsystem NVMe qualified names of other nodes returned by other nodes.
  • the node discovery module is configured to perform node discovery on other nodes based on the mDNS protocol and obtain the subsystem NVMe qualified names of other nodes.
  • the communication connection establishment module 11 is configured to notify the preset roce driver to establish an RDMA management queue through the connect fabric instruction, and then create an NVMe management queue with the preset controller in other nodes, complete the authentication operation through the authentication instruction, and configure the preset controller through the fabric/admin instruction; notify the preset roce driver to establish multiple RDMA IO queues through the connect fabric instruction, and then create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication instruction.
  • the communication connection establishment module 11 is configured to establish a TCP connection with other nodes through the connect fabric instruction, and then create an NVMe management queue with a preset controller in the other node, complete the authentication operation through the authentication instruction, and configure the preset controller through the fabric/admin instruction; create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication instruction.
  • the device also includes:
  • the identity recognition module is configured to query the node information of other nodes through NVME instructions; and recognize the identity of other nodes based on the node information;
  • the configuration module is configured to perform different business configurations based on different identities.
  • the configuration module is configured to perform different configurations of cluster services according to different identities if the identities of other nodes are storage nodes.
  • an embodiment of the present application discloses an NVMF storage node, which is applied to an NVMF storage cluster with multiple nodes, including:
  • a communication connection establishment module is configured to establish a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and create multiple NVMe IO queues in the process of establishing the first communication connection; wherein each node has a preset subsystem and a subsystem NVMe qualified name uniquely corresponding to the node;
  • a connection information binding module is configured to bind its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes based on the first communication connection and the second communication connection to obtain a connection information pair; wherein the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node;
  • the cluster service transmission module is configured to perform cluster service transmission based on the connection information.
  • the embodiment of the present application discloses an electronic device 20, including a processor 21 and a memory 22;
  • the memory 22 is configured to store a computer program;
  • the processor 21 is configured to execute the computer program to implement the following steps:
  • a first communication connection is established with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and multiple NVMe IO queues are created in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and there is a subsystem NVMe qualified name uniquely corresponding to the node; based on the first communication connection and the second communication connection, its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node; cluster service transmission is performed based on the connection information pair.
  • the embodiment of the present application establishes a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and creates multiple NVMe IO queues in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and there is a subsystem NVMe qualified name uniquely corresponding to the node, and based on the first communication connection and the second communication connection, its own subsystem NVMe qualified name, local port and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node, and then cluster service transmission is performed based on the connection information pair.
  • each node has a preset subsystem for node interconnection, and has a unique subsystem NVMe qualified name.
  • Any node establishes a first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and binds the two connection information in opposite directions together as a connection information pair to achieve two-way transmission of cluster interconnection business flows, and can utilize NVMe multi-queues to improve the performance of storage cluster node interconnection, and is adapted to the roce transmission protocol and the TCP transmission protocol, and can utilize the RDMA characteristics and the high bandwidth of the transmission network to improve performance, while based on the TCP protocol, it can improve the distance and flexibility of topology deployment.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: before establishing a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, node discovery is performed on other nodes and the subsystem NVMe qualified name of other nodes is obtained.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: performing node discovery on other nodes based on the NVMF protocol, and obtaining the subsystem NVMe qualified name returned by other nodes.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: notify the preset roce driver to establish an RDMA management queue through the connect fabric instruction, and then create an NVMe management queue with the discovery controller in other nodes, configure the discovery controller through the fabric/admin instruction, and obtain the discovery log page; wherein, the discovery log page includes the subsystem NVMe qualified name of other nodes discovered.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: Use the connect fabric command to notify the preset roce driver to establish an RDMA management queue, then create an NVMe management queue with the preset controllers in other nodes, complete the authentication operation through the authentication command, and configure the preset controller through the fabric/admin command; use the connect fabric command to notify the preset roce driver to establish multiple RDMA IO queues, then create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication command.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: establish a TCP connection with other nodes through the connect fabric instruction, and obtain the subsystem NVMe qualified name returned by other nodes through the TCP connection.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: establish a TCP connection with other nodes through the connect fabric instruction, then create an NVMe management queue with the preset controller in the other node, complete the authentication operation through the authentication instruction, and configure the preset controller through the fabric/admin instruction; create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication instruction.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: obtaining configuration information, wherein the configuration information includes the IP addresses of other nodes and the subsystem NVMe qualified names.
  • the processor 21 executes the computer subroutine stored in the memory 22 the following steps can be implemented: the IP address, port and subsystem NVMe qualified name of the node are published to other nodes through the link layer discovery protocol, and the subsystem NVMe qualified names of other nodes returned by other nodes are obtained.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: performing node discovery on other nodes based on the mDNS protocol and obtaining the subsystem NVMe qualified names of other nodes.
  • the processor 21 executes the computer subroutine stored in the memory 22, the following steps can be implemented: querying the node information of other nodes through NVME instructions; identifying the identities of other nodes based on the node information, and performing different business configurations based on different identities.
  • the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be implemented: if the identity of other nodes is a storage node, different configurations of cluster services are performed according to different identities.
  • the memory 22, as a carrier for storing resources may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the storage method may be temporary storage or permanent storage.
  • the electronic device 20 also includes a power supply 23, a communication interface 24, an input/output interface 25 and a communication bus 26; wherein the power supply 23 is configured to provide working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of the present application, and is not limited here; the input/output interface 25 is configured to obtain external input data or output data to the outside world, and its interface type can be selected according to application needs and is not limited here.
  • an embodiment of the present application discloses a non-volatile readable storage medium, which is configured to store a computer program, wherein when the computer program is executed by a processor, the following steps are implemented:
  • a first communication connection is established with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and multiple NVMe IO queues are created in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and there is a subsystem NVMe qualified name uniquely corresponding to the node; based on the first communication connection and the second communication connection, its own subsystem NVMe qualified name, local port, and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node; cluster service transmission is performed based on the connection information pair.
  • the embodiment of the present application establishes a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and creates multiple NVMe IO queues in the process of establishing the first communication connection; wherein, each node has a preset subsystem, and there is a subsystem NVMe qualified name uniquely corresponding to the node, and based on the first communication connection and the second communication connection, its own subsystem NVMe qualified name, local port and the port and subsystem NVMe qualified name of other nodes are bound to obtain a connection information pair; wherein, the second communication connection is a communication connection initiated by other nodes based on the NVMF protocol and established with the arbitrary node, and then cluster service transmission is performed based on the connection information pair.
  • each node has a preset subsystem for node interconnection, and has a unique subsystem NVMe qualified name.
  • Any node establishes a first communication connection with the preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, and binds the two connection information in opposite directions together as a connection information pair to achieve two-way transmission of cluster interconnection business flows, and can utilize NVMe multi-queues to improve the performance of storage cluster node interconnection, and is adapted to the roce transmission protocol and the TCP transmission protocol, and can utilize the RDMA characteristics and the high bandwidth of the transmission network to improve performance, while based on the TCP protocol, it can improve the distance and flexibility of topology deployment.
  • the following steps can be implemented: before establishing a first communication connection with a preset subsystem in other nodes based on the NVMF protocol and the subsystem NVMe qualified name of other nodes, node discovery is performed on other nodes and the subsystem NVMe qualified name of other nodes is obtained.
  • the following steps can be implemented: performing node discovery on other nodes based on the NVMF protocol, and obtaining the subsystem NVMe qualified name returned by other nodes.
  • the following steps can be implemented: notify the preset roce driver to establish an RDMA management queue through the connect fabric instruction, and then create an NVMe management queue with the discovery controller in other nodes, configure the discovery controller through the fabric/admin instruction, and obtain the discovery log page; wherein, the discovery log page includes the subsystem NVMe qualified name of other nodes discovered.
  • the following steps can be implemented: notify the preset roce driver to establish an RDMA management queue through the connect fabric instruction, and then create an NVMe management queue with the preset controller in other nodes, complete the authentication operation through the authentication instruction, and configure the preset controller through the fabric/admin instruction; notify the preset roce driver to establish multiple RDMA IO queues through the connect fabric instruction, and then create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication instruction.
  • the following steps can be implemented: establish a TCP connection with other nodes through the connect fabric instruction, and obtain the subsystem NVMe qualified name returned by other nodes through the TCP connection.
  • the following steps can be implemented: establish a TCP connection with other nodes through the connect fabric instruction, then create an NVMe management queue with the preset controller in the other node, complete the authentication operation through the authentication instruction, and configure the preset controller through the fabric/admin instruction; create multiple NVMe IO queues with the preset controller, and complete the authentication operation through the authentication instruction.
  • the following steps can be implemented: obtaining configuration information, wherein the configuration information includes the IP addresses of other nodes and the subsystem NVMe qualified names.
  • the following steps can be implemented: the IP address, port and subsystem NVMe qualified name of the node are published to other nodes through the link layer discovery protocol, and the subsystem NVMe qualified names of other nodes are obtained from other nodes.
  • the following steps can be implemented: performing node discovery on other nodes based on the mDNS protocol and obtaining the subsystem NVMe qualified names of other nodes.
  • the following steps can be implemented: querying the node information of other nodes through NVME instructions; identifying the identities of other nodes based on the node information, and performing different business configurations based on different identities.
  • the following steps can be implemented: if the identity of other nodes is a storage node, different configurations of cluster services are performed according to different identities.
  • each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments.
  • the same or similar parts between the embodiments can be referred to each other.
  • the description is relatively simple, and the relevant parts can be referred to the method part.
  • the steps of the methods or algorithms described in the embodiments disclosed herein may be implemented directly using hardware, software modules executed by a processor, or a combination of the two.
  • the software modules may be placed in random access memory (RAM), internal memory, read-only storage, or other storage mediums.
  • RAM random access memory
  • the memory may be stored in a memory card (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of non-volatile readable storage medium known in the technical field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请公开了一种NVMF存储集群节点互联方法、装置、设备及非易失性可读存储介质,应用于存储集群通信技术领域,包括:基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名;基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;基于连接信息对进行集群业务传输。能够提升存储集群节点互联的性能以及灵活性。

Description

一种NVMF存储集群节点互联方法、装置、设备及非易失性可读存储介质
相关申请的交叉引用
本申请要求于2022年11月25日提交中国专利局,申请号为202211487379.4,申请名称为“一种NVMF存储集群节点互联方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储集群通信技术领域,特别涉及一种NVMF存储集群节点互联方法、装置、设备及非易失性可读存储介质。
背景技术
目前,存储集群的节点通常使用fc(fibre channel,光纤通道)或pcie(即peripheral component interconnect express,高速串行计算机扩展总线标准)链路互联,fc互联方案因SCSI(即Small Computer System Interface,小型计算机系统接口)单队列和带宽限制存在性能瓶颈,而pcie互联方式存在线缆长度和拓扑灵活性的局限性。相关技术中存在的上述问题,并未得到解决。
发明内容
有鉴于此,本申请的目的在于提供一种NVMF存储集群节点互联方法、装置、设备及非易失性可读存储介质,能够提升存储集群节点互联的性能以及灵活性。其可选方案如下:
第一方面,本申请公开了一种NVMF存储集群节点互联方法,应用于存储集群中任意节点,包括:
基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名;
基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
基于连接信息对进行集群业务传输。
在一些实施例中,基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接之前,还包括:
对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
在一些实施例中,对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名。
在一些实施例中,基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名,包括:
通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的发现控制器创建NVMe管理队列,通过fabric/admin指令配置发现控制器,获取发现日志页;其中,发现日志页包括发现到的其他节点的子系统NVMe限定名。
在一些实施例中,预设roce驱动为基于roce协议创建的驱动,其中,roce协议为一种基于udp协议实现RDMA传输的协议。
在一些实施例中,基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,包括:
通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;通过connect fabric指令通知预设roce驱动建立RDMA IO队列,然后与预设控制器创建NVMe IO队列,通过鉴权指令完成鉴权操作。
在一些实施例中,基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名,包括:
通过connect fabric指令与其他节点建立TCP(Transmission Control Protocol,传输层协议)连接,通过TCP连接获取其他节点返回的子系统NVMe限定名。
在一些实施例中,基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,包括:
通过connect fabric指令与其他节点建立TCP连接,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;与预设控制器创建NVMe IO队列,通过鉴权指令成鉴权操作。
在一些实施例中,对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
获取配置信息,其中,配置信息包括其他节点的IP地址和子系统NVMe限定名。
在一些实施例中,对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
通过链路层发现协议将本节点的IP(Internet Protocol Address互联网国际地址)地址以、端口以及子系统NVMe限定名发布给其他节点,并获取其他节点返回的其他节点的子系统NVMe限定名。
在一些实施例中,对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
基于mDNS(Multicast DNS,多播DNS,DNS(Domain Name System,域名系统))协议对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
在一些实施例中,基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接之后,还包括:
通过NVME指令查询其他节点的节点信息;
基于节点信息识别其他节点的身份,基于不同的身份进行不同的业务配置。
在一些实施例中,其他节点的身份包括:不为存储节点,是存储节点且为本存储集群的节点,是存储节点且为其他存储集群的节点,以及,是存储节点且为未加入存储集群节点。
在一些实施例中,通过NVME指令查询其他节点的节点信息,包括:
使用NVMe identify指令查询controller信息,获取其他节点的subnqn、产品信息和型号信息作为节点信息;
使用NVMe identify指令查询集群节点结构cluster node CNS,获取其他节点的集群uuid、节点uuid、端口信息、平台信息、版本信息以及其他业务标识信息作为节点信息。
在一些实施例中,基于节点信息识别其他节点的身份,包括:
基于其他节点的subnqn、产品信息和型号信息初步识别其他节点是否为存储节点,得到初步识别结果;
在初步识别结果用于指示其他节点为存储节点的情况下,基于其他节点的集群uuid、节点uuid、端口信息、平台信息、版本信息以及其他业务标识信息识别其他节点的身份。
在一些实施例中,基于不同的身份进行不同的业务配置,包括:
若其他节点的身份为存储节点,则根据不同身份进行集群业务的不同配置。
第二方面,本申请公开了一种NVMF存储集群节点互联装置,应用于存储集群中任意节点,包括:
通信连接建立模块,被设置为基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在该节点唯一对应的子系统NVMe限定名;
连接信息绑定模块,被设置为基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
集群业务传输模块,被设置为基于连接信息对进行集群业务传输。
第三方面,本申请公开了一种NVMF存储节点,应用于具有多节点的NVMF存储集群,包括:
通信连接建立模块,被设置为基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在该节点唯一对应的子系统NVMe限定名;
连接信息绑定模块,被设置为基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
集群业务传输模块,被设置为基于连接信息对进行集群业务传输。
第四方面,本申请公开了一种电子设备,包括存储器和处理器,其中:
存储器,被设置为保存计算机程序;
处理器,被设置为执行计算机程序,以实现前述的NVMF存储集群节点互联方法。
第五方面,本申请公开了一种非易失性可读存储介质,被设置为保存计算机程序,其中,计算机程序被处理器执行时实现前述的NVMF存储集群节点互联方法。
可见,本申请基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名,并基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接,之后基于连接信息对进行集群业务传输。也即,本申请中每个节点均存在用于节点互联的预设子系统,并存在唯一的子系统NVMe限定名,任意节点基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并将两个方向相反的连接信息绑定到一起作为连接信息对,实现集群互联业务流的双向 传输,能够利用NVMe多队列,从而提升存储集群节点互联的性能,并且适配于roce(RDMA over Converged Ethernet,基于融合以太网的RDMA,RDMA(Remote Direct Memory Access,远程直接数据存取))传输协议以及TCP传输协议,能够利用RDMA特性及传输网络的高带宽提升性能,同时基于TCP协议可以提升拓扑部署的距离和灵活性。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例公开的一种存储管理软件与存储系统的适配方法流程图;
图2为本申请实施例公开的一种可选的NVMF存储集群节点互联示意图;
图3为本申请实施例公开的一种可选的节点发现以及连接过程示意图;
图4为本申请实施例公开的一种NVMF存储集群节点互联装置结构示意图;
图5为本申请实施例公开的一种电子设备原理结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,存储集群的节点通常使用fc或pcie链路互联,fc互联方案因SCSI单队列和带宽限制存在性能瓶颈,而pcie互联方式存在线缆长度和拓扑灵活性的局限性。因此,如何提升存储集群节点互联的性能以及灵活性是目前亟待解决的问题。为此,本申请提供了一种NVMF(即nvme over fabric,基于NVMe的远程访问技术,NVMe(即Non-Volatile Memory express,非易失性内存主机控制器接口规范)标准组织提出的将NVMe应用到各种fabric网络的协议标准,定义了使用各种通用的传输层协议来实现NVMe功能的方式)存储集群节点互联方案,能够提升存储集群节点互联的性能以及灵活性。
参见图1所示,本申请实施例公开了一种NVMF存储集群节点互联方法,应用于存储集群中任意节点,包括:
步骤S11:基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名。
其中,在基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接之前,本申请实施例可以对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
在第一种实施方式中,可以基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名。
若第一通信连接采用的传输协议为roce协议,可以通过connect fabric指令通知预设roce驱动建立RDMA(即Remote Direct Memory Access,远程直接数据存取)管理队列,然后与其他节点中的发现控制器创建NVMe管理队列,通过fabric/admin指令配置发现控制器,获取发现日志页;其中,发现日志页包括发现到的其他节点的子系统NVMe限定名。
若第一通信连接采用的传输协议为TCP协议,可以通过connect fabric指令与其他节点建立TCP连接,通过TCP连接获取其他节点返回的子系统NVMe限定名。
在第二种实施方式中,可以获取配置信息,其中,配置信息包括其他节点的IP地址和子系统NVMe限定名。也即,通过配置信息指定目标节点。
在第三种实施方式中,可以通过链路层发现协议将本节点的IP地址以、端口以及子系统NVMe限定名发布给其他节点,并获取其他节点返回的其他节点的子系统NVMe限定名。
在第四种实施方式中,可以基于mDNS协议对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
进一步的,若第一通信连接采用的传输协议为roce协议,则基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接的过程为:通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;通过connect fabric指令通知预设roce驱动建立多个RDMA IO队列,然后与预设控制器创建多个NVMe IO队列,通过鉴权指令完成鉴权操作。其中,预设roce驱动为基于roce协议创建的驱动,roce为一种基于udp实现rdma传输的协议,应用于nvme over rdma。
并且,若第一通信连接采用的传输协议为TCP协议,则基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接的过程为:通过connect fabric指令与其他节点建立TCP连接,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴 权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;与预设控制器创建多个NVMe IO队列,通过鉴权指令成鉴权操作。
可以理解的是,基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,即与其他节点中预设子系统中预设控制器建立第一通信连接。
本申请中,每个节点既作为nvmf协议的发起端又作为nvmf subsystem(即预设子系统)端,每个节点使用唯一的subnqn(即子系统NVMe限定名)创建节点范围的节点互联专用的nvmf subsystem,与节点业务路径隔离避免相互影响。当连接节点时每个节点都向所有节点发起nvmf连接,即每个节点的发起端连接其他节点的目标端即subsystem、其他节点的发起端也向本节点目标端发起连接。并且实现了存储节点间通过nvme over fabric协议发现、连接的过程。发现方式支持手动配置、扩展lldp协议、mDNS及nvmf discovery方式,手动配置方式为配置各节点的ip和subnqn来指明nvmf的目标端,扩展lldp则通过链路层发现协议将ip、端口及subsystem信息发布给其他节点;nvmf discovery方式则遵循标准协议实现nvme discovery service(即nvme发现服务)并实现discovery controller(即发现控制器)的链接和交互过程。节点间发起端向目标端实现管理队列的连接、鉴权、属性的获取和配置、controller的使能,io队列的创建、鉴权过程。
进一步的,在基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接之后,本申请实施例可以通过NVME指令查询其他节点的节点信息;基于节点信息识别其他节点的身份,基于不同的身份进行不同的业务配置。并且,若为存储节点,则根据不同身份进行集群业务的不同配置。其中,身份可以包括不是存储节点、是存储节点且为本集群节点、是存储节点且为其他集群节点、是存储节点且为未加入集群节点。其中,发起端可以通过nvme标准、自定义指令获取对端厂商、产品信息,获取集群UUID(即Universally Unique Identifier,通用唯一识别码)、节点UUID、平台版本及其他业务标识信息等信息,通过以上信息完成身份的识别。如果为存储节点为其配置细分的集群业务。
在一种可选的实施方式中,可以使用NVMe identify指令查询controller信息,获取对端设备的subnqn、产品信息和型号信息初步识别是否为存储节点,如果是则继续是使用NVMe identify指令查询定制的cluster node CNS(即集群节点结构),获取对端节点的集群uuid、节点uuid、端口信息、平台信息、版本信息以及其他业务标识信息,根据内部逻辑进行识别。初步识别可能存在问题,以最终识别结果为准,最终识别的结果包括:不是存储节点、是存储节点且为本集群节点、是存储节点且为其他集群节点、是存储节点且为未加入集群节点,以上获取到的节点身份属性与nvmf的链接对象进行绑定。如果识别为不是存储节 点,则将连接信息上报给平台模块做其他业务使用。如果识别为存储节点,则交由集群通信模块,根据节点的不同身份做集群业务的不同配置,并做集群节点通信的准备。
步骤S12:基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接。
其中,第二通信连接的建立过程可以参考第一通信连接的建立过程,在此不再进行赘述。
需要指出的是,由于NVMf连接为发起端、目标端定向模式,双方节点通过本地subnqn、本地端口、对端subnqn、对端端口将两条方向相反的连接信息绑定到一起识别为“集群互联连接对”,实现集群互联业务流的双向传输。
步骤S13:基于连接信息对进行集群业务传输。
下面以本NVMe over rdma为例详细的阐述本申请提供的方案:参见图2所示,图2为本申请实施例公开的一种可选的NVMF存储集群节点互联示意图,包括NVMe over fabric连接过程以及存储节点识别过程。
参见图3所示,图3为本申请实施例公开的一种可选的节点发现以及连接过程示意图,NVMe over fabric连接过程:NVMF发起端实现发现功能,使用well-known nqn与RDMA NVME节点设备建立连接。过程为:通过connect fabric指令通知roce驱动建立RDMA queue pair(admin)(即RDMA管理队列),然后与discover controller(即发现控制器)创建NVMe admin queue pair(即NVMe管理队列),通过fabric/admin指令配置discover controller,获取discover log page(即发现日志页)。另外,按需决定是否建立持久连接。NVMF发起端使用发现的nqn(即subnqn)与RDMA NVMe节点设备建立连接。过程为:connect fabric指令通知roce驱动建立RDMA queue pair(admin),然后与预设controller(即预设控制器,图3中的NVMe控制器)创建NVMe admin queue pair,通过Authentication Send/Recive指令完成鉴权操作,通过fabric/admin指令配置预设controller;connect fabric指令通知roce驱动建立RDMA queue pairs(io)(即RDMA IO队列),然后与预设controller创建NVMe io queue pairs(即NVMe IO队列),通过Authentication Send/Receive指令完成鉴权操作。当初始化就绪后,通知节点识别模块进行节点身份识别。
存储节点识别过程,可以包括:在每个节点上,NVMf发起端完成fabric连接后,向节点识别模块上报连接信息,节点识别模块负责识别连接对象的身份信息。首先,使用NVMe identify指令查询controller信息,获取对端设备的subnqn、产品信息和型号信息初步识别 是否为存储节点,如果是则继续是使用NVMe identify指令查询定制的集群节点结构,获取对端节点的集群uuid、节点uuid、端口信息、平台信息、版本信息以及其他业务标识信息,根据内部逻辑进行识别。最终识别的结果包括:不是存储节点、是存储节点且为本集群节点、是存储节点且为其他集群节点、是存储节点且为未加入集群节点,以上获取到的节点身份属性与nvmf的链接对象进行绑定。如果识别为不是存储节点,则将连接信息上报给平台模块做其他业务使用。如果识别为存储节点,则交由集群通信模块,根据节点的不同身份做集群业务的不同配置,并做集群节点通信的准备。
在每个节点上,NVMf在本地范围创建存储节点互联的专用subsystem,当发起端向该subsystem的controller连接并查询controller和节点信息时,负责将相关信息生成并经协议交互返回给对方。当有新的nvmf连接时,包括发起端向目标端的连接和目标端接收到发起端的链接,nvmf subsystem模块根据两个信息中的local nqn(即本地子系统NVMe限定名)、自定义实现的local nvme port(本地端口)、remote nqn(对端子系统NVMe限定名)和自定义实现的remote nvme port(对端端口)确定两个连接信息为同一对相互连接的节点端口,则绑定到一起提供给节点通信用。
并且,本申请实施例的nvme over fabric节点互联的实现不限定特定的fabric传输协议,支持nvme over roce、nvme over tcp和nvme over fc。
可见,本申请实施例基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名,并基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接,之后基于连接信息对进行集群业务传输。也即,本申请实施例中每个节点均存在用于节点互联的预设子系统,并存在唯一的子系统NVMe限定名,任意节点基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并将两个方向相反的连接信息绑定到一起作为连接信息对,实现集群互联业务流的双向传输,能够利用NVMe多队列,从而提升存储集群节点互联的性能,并且适配于roce传输协议以及TCP传输协议,能够利用RDMA特性及传输网络的高带宽提升性能,同时基于TCP协议可以提升拓扑部署的距离和灵活性。
参见图4所示,本申请实施例公开了一种NVMF存储集群节点互联装置,应用于存储集群中任意节点,包括:
通信连接建立模块11,被设置为基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在该节点唯一对应的子系统NVMe限定名;
连接信息绑定模块12,被设置为基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
集群业务传输模块13,被设置为基于连接信息对进行集群业务传输。
可见,本申请实施例基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名,并基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接,之后基于连接信息对进行集群业务传输。也即,本申请实施例中每个节点均存在用于节点互联的预设子系统,并存在唯一的子系统NVMe限定名,任意节点基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并将两个方向相反的连接信息绑定到一起作为连接信息对,实现集群互联业务流的双向传输,能够利用NVMe多队列,从而提升存储集群节点互联的性能,并且适配于roce传输协议以及TCP传输协议,能够利用RDMA特性及传输网络的高带宽提升性能,同时基于TCP协议可以提升拓扑部署的距离和灵活性。
进一步的,装置还包括节点发现模块,被设置为对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
在第一种实施方式中,节点发现模块,被设置为基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名。在一些实施例中,节点发现模块,被设置为通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的发现控制器创建NVMe管理队列,通过fabric/admin指令配置发现控制器,获取发现日志页;其中,发现日志页包括发现到的其他节点的子系统NVMe限定名。在另一些实施例中,节点发现模块,被设置为通过connect fabric指令与其他节点建立TCP连接,通过TCP连接获取其他节点返回的子系统NVMe限定名。
在第二种实施方式中,节点发现模块,被设置为获取配置信息,其中,配置信息包括其他节点的IP地址和子系统NVMe限定名。
在第三种实施方式中,节点发现模块,被设置为通过链路层发现协议将本节点的IP地址以、端口以及子系统NVMe限定名发布给其他节点,并获取其他节点返回的其他节点的子系统NVMe限定名。
在第四种实施方式中,节点发现模块,被设置为基于mDNS协议对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
在一些实施例中,通信连接建立模块11,被设置为通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;通过connect fabric指令通知预设roce驱动建立多个RDMA IO队列,然后与预设控制器创建多个NVMe IO队列,通过鉴权指令完成鉴权操作。
在另一些实施例中,通信连接建立模块11,被设置为通过connect fabric指令与其他节点建立TCP连接,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;与预设控制器创建多个NVMe IO队列,通过鉴权指令成鉴权操作。
进一步的,装置还包括:
身份识别模块,被设置为通过NVME指令查询其他节点的节点信息;基于节点信息识别其他节点的身份;
配置模块,被设置为基于不同的身份进行不同的业务配置。
并且,配置模块,被设置为若其他节点的身份为存储节点,则根据不同身份进行集群业务的不同配置。
进一步的,本申请实施例公开了一种NVMF存储节点,应用于具有多节点的NVMF存储集群,包括:
通信连接建立模块,被设置为基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在该节点唯一对应的子系统NVMe限定名;
连接信息绑定模块,被设置为基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
集群业务传输模块,被设置为基于连接信息对进行集群业务传输。
参见图5所示,本申请实施例公开了一种电子设备20,包括处理器21和存储器22;其 中,存储器22,被设置为保存计算机程序;处理器21,被设置为执行计算机程序,以实现以下步骤:
基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名;基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;基于连接信息对进行集群业务传输。
可见,本申请实施例基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名,并基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接,之后基于连接信息对进行集群业务传输。也即,本申请实施例中每个节点均存在用于节点互联的预设子系统,并存在唯一的子系统NVMe限定名,任意节点基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并将两个方向相反的连接信息绑定到一起作为连接信息对,实现集群互联业务流的双向传输,能够利用NVMe多队列,从而提升存储集群节点互联的性能,并且适配于roce传输协议以及TCP传输协议,能够利用RDMA特性及传输网络的高带宽提升性能,同时基于TCP协议可以提升拓扑部署的距离和灵活性。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:在基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接之前,对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的发现控制器创建NVMe管理队列,通过fabric/admin指令配置发现控制器,获取发现日志页;其中,发现日志页包括发现到的其他节点的子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:通 过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;通过connect fabric指令通知预设roce驱动建立多个RDMA IO队列,然后与预设控制器创建多个NVMe IO队列,通过鉴权指令完成鉴权操作。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:通过connect fabric指令与其他节点建立TCP连接,通过TCP连接获取其他节点返回的子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:通过connect fabric指令与其他节点建立TCP连接,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;与预设控制器创建多个NVMe IO队列,通过鉴权指令成鉴权操作。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:获取配置信息,其中,配置信息包括其他节点的IP地址和子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:通过链路层发现协议将本节点的IP地址以、端口以及子系统NVMe限定名发布给其他节点,并获取其他节点返回的其他节点的子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:基于mDNS协议对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:通过NVME指令查询其他节点的节点信息;基于节点信息识别其他节点的身份,基于不同的身份进行不同的业务配置。
本实施例中,处理器21执行存储器22中保存的计算机子程序时,可以实现以下步骤:若其他节点的身份为存储节点,则根据不同身份进行集群业务的不同配置。
并且,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,存储方式可以是短暂存储或者永久存储。
另外,电子设备20还包括电源23、通信接口24、输入输出接口25和通信总线26;其中,电源23被设置为为电子设备20上的各硬件设备提供工作电压;通信接口24能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行限定;输入输出接口25,被设置为获取外界输入数据或向外界输出数据,其接口类型可以根据应用需要进行选取,在此不进行限定。
进一步的,本申请实施例公开了一种非易失性可读存储介质,被设置为保存计算机程序,其中,计算机程序被处理器执行时实现以下步骤:
基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名;基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接;基于连接信息对进行集群业务传输。
可见,本申请实施例基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并在建立第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名,并基于第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,第二通信连接为其他节点基于NVMF协议发起并与该任意节点建立的通信连接,之后基于连接信息对进行集群业务传输。也即,本申请实施例中每个节点均存在用于节点互联的预设子系统,并存在唯一的子系统NVMe限定名,任意节点基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接,并将两个方向相反的连接信息绑定到一起作为连接信息对,实现集群互联业务流的双向传输,能够利用NVMe多队列,从而提升存储集群节点互联的性能,并且适配于roce传输协议以及TCP传输协议,能够利用RDMA特性及传输网络的高带宽提升性能,同时基于TCP协议可以提升拓扑部署的距离和灵活性。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:在基于NVMF协议以及其他节点的子系统NVMe限定名与其他节点中预设子系统建立第一通信连接之前,对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的发现控制器创建NVMe管理队列,通过fabric/admin指令配置发现控制器,获取发现日志页;其中,发现日志页包括发现到的其他节点的子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;通过connect fabric指令通知预设roce驱动建立多个RDMA IO队列,然后与预设控制器创建多个NVMe IO队列,通过鉴权指令完成鉴权操作。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:通过connect fabric指令与其他节点建立TCP连接,通过TCP连接获取其他节点返回的子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:通过connect fabric指令与其他节点建立TCP连接,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;与预设控制器创建多个NVMe IO队列,通过鉴权指令成鉴权操作。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:获取配置信息,其中,配置信息包括其他节点的IP地址和子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:通过链路层发现协议将本节点的IP地址以、端口以及子系统NVMe限定名发布给其他节点,并获取其他节点返回的其他节点的子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:基于mDNS协议对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:通过NVME指令查询其他节点的节点信息;基于节点信息识别其他节点的身份,基于不同的身份进行不同的业务配置。
本实施例中,非易失性可读存储介质中保存的计算机子程序被处理器执行时,可以实现以下步骤:若其他节点的身份为存储节点,则根据不同身份进行集群业务的不同配置。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储 器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的非易失性可读存储介质中。
以上对本申请所提供的一种NVMF存储集群节点互联方法、装置、设备及非易失性可读存储介质进行了详细介绍,本文中应用了个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在可选实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种NVMF存储集群节点互联方法,其特征在于,应用于存储集群中任意节点,包括:
    基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接,并在建立所述第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在与该节点唯一对应的子系统NVMe限定名;
    基于所述第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及所述其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,所述第二通信连接为所述其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
    基于所述连接信息对进行集群业务传输。
  2. 根据权利要求1所述的NVMF存储集群节点互联方法,其特征在于,所述基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接之前,还包括:
    对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
  3. 根据权利要求2所述的NVMF存储集群节点互联方法,其特征在于,所述对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
    基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名。
  4. 根据权利要求3所述的NVMF存储集群节点互联方法,其特征在于,所述基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名,包括:
    通过connect fabric指令通知预设roce驱动建立远程直接数据存取RDMA管理队列,然后与其他节点中的发现控制器创建NVMe管理队列,通过fabric/admin指令配置发现控制器,获取发现日志页;其中,所述发现日志页包括发现到的其他节点的子系统NVMe限定名。
  5. 根据权利要求4所述的NVMF存储集群节点互联方法,其特征在于,所述预设roce驱动为基于roce协议创建的驱动,其中,所述roce协议为一种基于udp协议实现RDMA传输的协议。
  6. 根据权利要求4所述的NVMF存储集群节点互联方法,其特征在于,所述基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接,包括:
    通过connect fabric指令通知预设roce驱动建立RDMA管理队列,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;通过connect fabric指令通知预设roce驱动建立多个RDMA IO队列,然后与预设控制器创建多个NVMe IO队列,通过鉴权指令完成鉴权操作。
  7. 根据权利要求3所述的NVMF存储集群节点互联方法,其特征在于,所述基于NVMF协议对其他节点进行节点发现,并获取其他节点返回的子系统NVMe限定名,包括:
    通过connect fabric指令与所述其他节点建立TCP连接,通过所述TCP连接获取所述其他节点返回的子系统NVMe限定名。
  8. 根据权利要求7所述的NVMF存储集群节点互联方法,其特征在于,所述基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接,包括:
    通过connect fabric指令与所述其他节点建立TCP连接,然后与其他节点中的预设控制器创建NVMe管理队列,通过鉴权指令完成鉴权操作,通过fabric/admin指令配置预设控制器;与预设控制器创建多个NVMe IO队列,通过鉴权指令成鉴权操作。
  9. 根据权利要求2所述的NVMF存储集群节点互联方法,其特征在于,所述对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
    获取配置信息,其中,所述配置信息包括其他节点的IP地址和子系统NVMe限定名。
  10. 根据权利要求2所述的NVMF存储集群节点互联方法,其特征在于,所述对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
    通过链路层发现协议将本节点的IP地址以、端口以及子系统NVMe限定名发布给其他节点,并获取其他节点返回的其他节点的子系统NVMe限定名。
  11. 根据权利要求2所述的NVMF存储集群节点互联方法,其特征在于,所述对其他节点进行节点发现并获取其他节点的子系统NVMe限定名,包括:
    基于mDNS协议对其他节点进行节点发现并获取其他节点的子系统NVMe限定名。
  12. 根据权利要求1至11任一项所述的NVMF存储集群节点互联方法,其特征在于,所述基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接之后,还包括:
    通过NVME指令查询所述其他节点的节点信息;
    基于所述节点信息识别所述其他节点的身份,基于不同的身份进行不同的业务配置。
  13. 根据权利要求12所述的NVMF存储集群节点互联方法,其特征在于,所述其他节点的身份包括:不为存储节点,是存储节点且为本存储集群的节点,是存储节点且为其他存储集群的节点,以及,是存储节点且为未加入存储集群节点。
  14. 根据权利要求12所述的NVMF存储集群节点互联方法,其特征在于,所述通过NVME指令查询所述其他节点的节点信息,包括:
    使用NVMe identify指令查询controller信息,获取所述其他节点的subnqn、产品信息和型号信息作为所述节点信息;
    使用NVMe identify指令查询集群节点结构cluster node CNS,获取所述其他节点的集群uuid、节点uuid、端口信息、平台信息、版本信息以及其他业务标识信息作为所述节点信息。
  15. 根据权利要求12所述的NVMF存储集群节点互联方法,其特征在于,所述基于所述节点信息识别所述其他节点的身份,包括:
    基于所述其他节点的subnqn、产品信息和型号信息初步识别所述其他节点是否为存储节点,得到初步识别结果;
    在所述初步识别结果用于指示所述其他节点为存储节点的情况下,基于所述其他节点的集群uuid、节点uuid、端口信息、平台信息、版本信息以及其他业务标识信息识别所述其他节点的身份。
  16. 根据权利要求12所述的NVMF存储集群节点互联方法,其特征在于,所述基于所述不同的身份进行不同的业务配置,包括:
    若所述其他节点的身份为存储节点,则根据不同身份进行集群业务的不同配置。
  17. 一种NVMF存储集群节点互联装置,其特征在于,应用于存储集群中任意节点,包括:
    通信连接建立模块,被设置为基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接,并在建立所述第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在该节点唯一对应的子系统NVMe限定名;
    连接信息绑定模块,被设置为基于所述第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及所述其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,所述第二通信连接为所述其他节点基于NVMF协议发起并与该任意节点建立的通信连接;
    集群业务传输模块,被设置为基于所述连接信息对进行集群业务传输。
  18. 一种NVMF存储节点,其特征在于,应用于具有多节点的NVMF存储集群,包括:
    通信连接建立模块,被设置为基于NVMF协议以及其他节点的子系统NVMe限定名与所述其他节点中预设子系统建立第一通信连接,并在建立所述第一通信连接的过程中创建多个NVMe IO队列;其中,每个节点均存在预设子系统,并存在该节点唯一对应的子系统NVMe限定名;
    连接信息绑定模块,被设置为基于所述第一通信连接和第二通信连接将自身的子系统NVMe限定名、本地端口以及所述其他节点的端口、子系统NVMe限定名绑定,得到连接信息对;其中,所述第二通信连接为所述其他节点基于NVMF协议发起并与本节点建立的通信连接;
    集群业务传输模块,被设置为基于所述连接信息对进行集群业务传输。
  19. 一种电子设备,其特征在于,包括存储器和处理器,其中:
    所述存储器,被设置为保存计算机程序;
    所述处理器,被设置为执行所述计算机程序,以实现如权利要求1至16任一项所述的NVMF存储集群节点互联方法。
  20. 一种非易失性可读存储介质,其特征在于,被设置为保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至16任一项所述的NVMF存储集群节点互联方法。
PCT/CN2023/116232 2022-11-25 2023-08-31 一种nvmf存储集群节点互联方法、装置、设备及非易失性可读存储介质 WO2024109240A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211487379.4 2022-11-25
CN202211487379.4A CN115550377B (zh) 2022-11-25 2022-11-25 一种nvmf存储集群节点互联方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2024109240A1 true WO2024109240A1 (zh) 2024-05-30

Family

ID=84719985

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116232 WO2024109240A1 (zh) 2022-11-25 2023-08-31 一种nvmf存储集群节点互联方法、装置、设备及非易失性可读存储介质

Country Status (2)

Country Link
CN (1) CN115550377B (zh)
WO (1) WO2024109240A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550377B (zh) * 2022-11-25 2023-03-07 苏州浪潮智能科技有限公司 一种nvmf存储集群节点互联方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108351813A (zh) * 2015-12-21 2018-07-31 英特尔公司 用于在非易失性存储器快速(NVMe)控制器的不同网络地址上使能个别的NVMe输入/输出(IO)队列的方法和装置
CN113407307A (zh) * 2021-06-11 2021-09-17 苏州浪潮智能科技有限公司 一种端口扩展方法、装置、设备及计算机可读存储介质
CN113872955A (zh) * 2021-09-23 2021-12-31 苏州浪潮智能科技有限公司 一种网络连接的方法、装置、计算机设备和存储介质
CN115550377A (zh) * 2022-11-25 2022-12-30 苏州浪潮智能科技有限公司 一种nvmf存储集群节点互联方法、装置、设备及介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012103705A1 (zh) * 2011-06-24 2012-08-09 华为技术有限公司 计算机子系统和计算机系统
CN105912275A (zh) * 2016-04-27 2016-08-31 华为技术有限公司 在非易失性存储系统中建立连接的方法和装置
CN111722786A (zh) * 2019-03-21 2020-09-29 阿里巴巴集团控股有限公司 基于NVMe设备的存储系统
CN112130748B (zh) * 2019-06-24 2022-07-19 华为技术有限公司 一种数据访问方法、网卡及服务器
US11113001B2 (en) * 2019-08-30 2021-09-07 Hewlett Packard Enterprise Development Lp Fabric driven non-volatile memory express subsystem zoning
CN111459417B (zh) * 2020-04-26 2023-08-18 中国人民解放军国防科技大学 一种面向NVMeoF存储网络的无锁传输方法及系统
US11503140B2 (en) * 2020-12-11 2022-11-15 Western Digital Technologies, Inc. Packet processing by programmable network interface
CN113014662A (zh) * 2021-03-11 2021-06-22 联想(北京)有限公司 数据处理方法及基于NVMe-oF协议的存储系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108351813A (zh) * 2015-12-21 2018-07-31 英特尔公司 用于在非易失性存储器快速(NVMe)控制器的不同网络地址上使能个别的NVMe输入/输出(IO)队列的方法和装置
CN113407307A (zh) * 2021-06-11 2021-09-17 苏州浪潮智能科技有限公司 一种端口扩展方法、装置、设备及计算机可读存储介质
CN113872955A (zh) * 2021-09-23 2021-12-31 苏州浪潮智能科技有限公司 一种网络连接的方法、装置、计算机设备和存储介质
CN115550377A (zh) * 2022-11-25 2022-12-30 苏州浪潮智能科技有限公司 一种nvmf存储集群节点互联方法、装置、设备及介质

Also Published As

Publication number Publication date
CN115550377B (zh) 2023-03-07
CN115550377A (zh) 2022-12-30

Similar Documents

Publication Publication Date Title
EP3225014B1 (en) Source ip address transparency systems and methods
WO2020073859A1 (zh) 区块链节点服务部署方法、装置、系统、计算设备及介质
US9231846B2 (en) Providing network capability over a converged interconnect fabric
WO2023098645A1 (zh) 容器网络配置方法、装置、计算节点、主节点及存储介质
US9923978B2 (en) Automated network service discovery and communication
US9917729B2 (en) Methods, systems, and computer readable media for multi-layer orchestration in software defined networks (SDNs)
US11575592B2 (en) Message processing method and apparatus, control-plane device, and computer storage medium
CN109088820B (zh) 一种跨设备链路聚合方法、装置、计算装置和存储介质
US20140279862A1 (en) Network controller with integrated resource management capability
CN104394080A (zh) 实现安全组功能的方法及装置
WO2024109240A1 (zh) 一种nvmf存储集群节点互联方法、装置、设备及非易失性可读存储介质
EP3588875B1 (en) Web services across virtual routing and forwarding
US20060256735A1 (en) Method and apparatus for centrally configuring network devices
JP2012526433A (ja) データトランスポートのためのリソースドメインをわたるトラフィックエンジニアリングされた接続の確立
CN104133776B (zh) 存储阵列自动化配置方法、装置及存储系统
CN115150327A (zh) 一种接口设置方法、装置、设备及介质
CN115955456A (zh) 基于IPv6的企业园区网及组网方法
CN108512737B (zh) 一种数据中心ip层互联的方法和sdn控制器
WO2023221708A1 (zh) Pdn拨号及配置方法、系统、装置、设备及存储介质
Großmann et al. Cloudless computing-a vision to become reality
Almohaimeed et al. Distribution Model for OpenFlow-Based Networks
CN113783971A (zh) 地址管理方法、网络设备及存储介质
WO2016034217A1 (en) Software defined networking controller for hybrid network components and method for controlling a software defined network
CN109768905A (zh) 一种PPPoE报文传输的方法、系统、设备及存储介质
US10142190B2 (en) Management plane network aggregation