CN111459417A

CN111459417A - NVMeoF storage network-oriented lock-free transmission method and system

Info

Publication number: CN111459417A
Application number: CN202010338868.8A
Authority: CN
Inventors: 李琼; 宋振龙; 赵曦; 谢徐超; 谢旻; 袁远; 黎铁军; 肖立权; 魏登萍; 任静; 李世杰; 陈浩稳
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2020-07-28
Anticipated expiration: 2040-04-26
Also published as: CN111459417B

Abstract

The invention discloses a lock-free transmission method and a lock-free transmission system for an NVMeOF storage network.A host end creates NVMeOF queues with the same number according to the number of CPU cores and applies for a blank memory with an array format for each NVMeOF queue; when the command packet arrives, adding the command packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the command packet through an independent linked list; and polling each NVMeOF queue, and sending the cached command packet to the target end through the network. According to the invention, the linked list cache is added in each NVMeOF queue, and a method of polling a plurality of linked lists is adopted, so that the problem that the multiple NVMeOF queues compete violently for a single sending linked list lock is solved, and the I/O bottleneck problem under the condition of large I/O pressure is solved.

Description

NVMeoF storage network-oriented lock-free transmission method and system

Technical Field

The invention relates to a remote storage and storage network technology, in particular to a lock-free transmission method and system for an NVMeOF storage network.

Background

The NVMe protocol is customized for a novel high-speed Non-Volatile Memory (such as a flash Memory, a 3D Xpoint and the like), and the efficient combination of the PCIe interface and the NVMe protocol reduces the overhead of an I/O protocol stack and the storage access delay, improves the I/O throughput and the I/O bandwidth, and is widely applied to data centers.

However, limited by the scalability of the PCIe bus, the NVMe protocol is not suitable for large-scale cross-network remote storage access. The NVMe protocol is developed based on the NVMeOF storage network protocol of RDMA network extension, can be used for communicating with remote NVMe equipment between a host and a storage system through various network structures, provides an effective technical approach for a data center to construct a high-performance network storage system easy to extend, and becomes a future development trend.

Nvmeof (nvme over fabrics) can be implemented by selecting different link layer and physical layer protocols, and the bearer network includes Infiniband, ROCE, iWARP, FibreChannel, etc., and RDMA network based on customized protocol, such as customized high-speed interconnection network adopted in the tianhe super computer.

An RDMA network-based NVMeOF network storage I/O command transmission flow is shown in FIG. 1, a command packet (CMDCapsule) is a general name of the I/O request command packet in FIG. 1 after being packaged, and contains optional additional SG L or command data besides basic command ID, operation, cache address and command parameters, a response packet (RSP Capsule) is a general name of the I/O response packet in FIG. 1 after being packaged, and contains optional command data besides basic command parameters, SQ (send queue) head pointer, command state and command ID, an NVMeOF queue (nvme _ fabric _ queue) encapsulates information such as ID number, queue size and nvme command message of the queue, and a send linked list (send _ list) is a linked list used for storing the command packet or the response packet pointer.

The number of nvmeofs queues is created according to the number of cores of the current server CPU, and in the case of a multi-core CPU, the traditional connection mode is a "multi-producer, single-consumer model". A link based on a single transmission linked list is generally adopted, as shown in fig. 2. I/O command packets sent by a plurality of NVMeOF queues at a Host end (Host) need to be sent through an interconnection network interface card (network card for short), so that a sending linked list needs to be locked; each command packet can be added to the tail part of the transmission linked list after being locked, and waits for the network card to transmit. Similarly, the I/O response packet generated by the Target (Target) side queue also needs to obtain the lock before the transmission linked list can be inserted. Under the condition of multi-core and multi-process, competition for the chain table lock is very frequent, the processing efficiency of the I/O request is seriously influenced, the speed of sending the request from the host end to the target end and sending the response from the target end to the host end is too slow, and the performance of the bottom-layer high-speed NVMe storage equipment cannot be fully exerted. Lock contention is particularly intense in the case of high I/O pressure, leading to a significant I/O bottleneck problem.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problem of I/O bottleneck caused by fierce lock competition under the condition of high I/O pressure in the prior art, the invention provides a lock-free transmission method and a lock-free transmission system facing an NVMeOF storage network.

In order to solve the technical problems, the invention adopts the technical scheme that:

a lock-free transmission method facing an NVMeOF storage network comprises the implementation steps of:

1) the method comprises the steps that a host end creates NVMeOF queues with the same number according to the number of CPU cores, and applies for a blank memory with an array format for each NVMeOF queue;

2) when the command packet arrives, adding the command packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the command packet through an independent linked list; and polling each NVMeOF queue, and sending the cached command packet to the target end through the network.

Optionally, caching in step 2) through an independent linked list specifically means that the command packet added into the blank memory in the array format corresponding to the NVMeoF queue is cached to the tail of the sending linked list corresponding to the NVMeoF queue; the sending of the cached command packet to the target end through the network specifically means that the command packet at the head of the sending linked list is sent to the target end through the network.

Optionally, the host further includes a step of managing a blank memory in an array format of each NVMeoF queue by using a linked list: 1. adding the blank memory application number of each NVMeOF queue in the array format into a management linked list after the application number is applied; 2. when the command packet arrives, the head node in the management linked list is taken out and deleted from the management linked list; when the I \ O pressure is overlarge, if the management linked list is empty, applying for a section of temporary memory space to store the command packet; 3. assigning a value to a head node taken out from a management linked list, and storing the message content of a command packet into an address pointed by the head node; 4. the head node after assignment is not in any management linked list, and then the head node is added to the tail part of a sending linked list corresponding to the NVMeOF queue; 5. when the network card polls to a certain NVMeOF queue, taking out a head node of a sending linked list in the NVMeOF queue, and deleting the node from the sending linked list after taking out the head node; 6. the network card sends out the message content stored in the address pointed by the head node through the network card; 7. after the sending is finished, the address pointed by the head node is emptied and added to the tail part of the management linked list again to ensure that the applied memory is repeatedly available; and if the address in the head node is the temporarily applied memory address, releasing the address immediately after the network card finishes sending.

Optionally, after step 1), the method further includes a step of initializing, by the target, the NVMeoF queue: and creating NVMeOF queues which are the same as the NVMeOF queues at the host end, and applying for a blank memory in an array format for each NVMeOF queue.

Optionally, step 2) is followed by the step of sending a response packet after the target receives the command packet: when a response packet arrives, adding the response packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the response packet through an independent linked list; and polling each NVMeOF queue, and sending the cached response packet to the host side through the network.

Optionally, when the target receives the command packet and then sends the response packet, the caching by the independent linked list specifically means that the command packet added to the blank memory in the array format corresponding to the NVMeoF queue is cached to the tail of the sending linked list corresponding to the NVMeoF queue; the sending of the cached command packet to the host end through the network specifically means sending the command packet at the head of the sending linked list to the host end through the network.

Optionally, the target further includes a step of managing a blank memory of an array format of each NVMeoF queue by using a linked list: 1. adding the blank memory application number of each NVMeOF queue in the array format into a management linked list after the application number is applied; 2. when the response packet arrives, the head node in the management linked list is taken out and deleted from the management linked list; when the I \ O pressure is overlarge, if the management linked list is empty, applying for a section of temporary memory space to store the response packet; 3. assigning a value to a head node taken out from the management linked list, and storing the message content of the response packet into an address pointed by the head node; 4. the head node after assignment is not in any management linked list, and then the head node is added to the tail part of a sending linked list corresponding to the NVMeOF queue; 5. when the network card polls to a certain NVMeOF queue, taking out a head node of a sending linked list in the NVMeOF queue, and deleting the node from the sending linked list after taking out the head node; 6. the network card sends out the message content stored in the address pointed by the head node through the network card; 7. after the sending is finished, the address pointed by the head node is emptied and added to the tail part of the management linked list again, so that the repeated availability of the applied memory is ensured; and if the address in the head node is the temporarily applied memory address, releasing the address immediately after the network card finishes sending.

In addition, the invention also provides an NVMeoF storage network-oriented lock-free transmission system, which comprises a host end and a target end, wherein the host end is programmed or configured to execute the steps of the NVMeoF storage network-oriented lock-free transmission method, or the target end is programmed or configured to execute the steps of the NVMeoF storage network-oriented lock-free transmission method

In addition, the invention also provides an NVMeoF storage network-oriented lock-free transmission system, which comprises a computing device, wherein the computing device is programmed or configured to execute the steps of the NVMeoF storage network-oriented lock-free transmission method, or a computer program which is programmed or configured to execute the NVMeoF storage network-oriented lock-free transmission method is stored on a memory of the computing device.

Furthermore, the present invention also provides a computer readable storage medium having stored thereon a computer program programmed or configured to execute the lock-free transmission method for the nvmeofs storage network.

Compared with the prior art, the invention has the following advantages: the method comprises the steps that a host end creates NVMeOF queues with the same number according to the number of CPU cores, and applies for a blank memory with an array format for each NVMeOF queue; when the command packet arrives, adding the command packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the command packet through an independent linked list; and (network card) polls each NVMeOF queue and sends the cached command packet to the target end through the network. According to the invention, the linked list cache is added in each NVMeOF queue, and a method of polling a plurality of linked lists is adopted, so that the problem that the multiple NVMeOF queues compete violently for a single sending linked list lock is solved, and the I/O bottleneck problem under the condition of large I/O pressure is solved.

Drawings

FIG. 1 is a schematic diagram of a conventional processing flow of an I/O command of an NVMeOF.

Fig. 2 shows a conventional I/O packet transmission method of the conventional nvmeofs.

Fig. 3 is a schematic diagram of the basic principle of the method according to the embodiment of the present invention.

FIG. 4 is a diagram of a "single producer and single consumer model" for the I/O message transmission of NVMeOF in an embodiment of the present invention.

FIG. 5 is a flowchart illustrating a linked list management array according to an embodiment of the present invention.

Fig. 6 is an I/O message transmission mechanism of the improved NVMeoF in the embodiment of the present invention.

Detailed Description

As shown in fig. 3 and fig. 4, the implementation steps of the lock-free transmission method for the NVMeoF storage network in this embodiment include:

2) when the command packet arrives, adding the command packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the command packet through an independent linked list; and (network card) polls each NVMeOF queue and sends the cached command packet to the target end through the network.

As shown in fig. 3 and 4, in the lock-free transmission method for the NVMeoF storage network according to this embodiment, when the Host (Host) and the Target (Target) establish a connection, the Host creates the number of queues of the NVMeoF queue according to the number of CPU cores, and the Target creates the corresponding number of queues according to the number of NVMeoF queues of the Host. Each queue applies for a section of blank memory in an array format when being created, and when a command packet or a response packet of each queue arrives, the messages are respectively stored in the memories of the corresponding queues. Then the network card polls each queue and sends the messages in the memory one by one, and the connection mode of the network card is 'single producer and single consumer model'. All IO instructions cannot compete for the lock of the sending linked list, and the problem of low efficiency caused by fierce lock competition is solved.

As an optional implementation manner, in this embodiment, a sending linked list is used to organize the command packets to be sent in each NVMeoF queue, so as to improve the polling processing efficiency of the network card. In this embodiment, caching by using an independent linked list in step 2) specifically means that a command packet added to a blank memory in an array format corresponding to the NVMeoF queue is cached to the tail of a transmission linked list corresponding to the NVMeoF queue; the sending of the cached command packet to the target end through the network specifically means that the command packet at the head of the sending linked list is sent to the target end through the network.

When the upper layer IO pressure is too high, the host end may cause the condition that the receiving speed of the message exceeds the sending speed of the network card, resulting in memory overflow. In order to solve the above problem, as shown in fig. 5, the host further includes a step of managing a blank memory in an array format of each NVMeoF queue by using a linked list: 1. adding the blank memory application number of each NVMeOF queue in the array format into a management linked list after the application number is applied; 2. when the command packet arrives, the head node in the management linked list is taken out and deleted from the management linked list; when the I \ O pressure is overlarge, if the management linked list is empty, applying for a section of temporary memory space to store the command packet; 3. assigning a value to a head node taken out from a management linked list, and storing the message content of a command packet into an address pointed by the head node; 4. the head node after assignment is not in any management linked list, and then the head node is added to the tail part of a sending linked list corresponding to the NVMeOF queue; 5. when the network card polls to a certain NVMeOF queue, taking out a head node of a sending linked list in the NVMeOF queue, and deleting the node from the sending linked list after taking out the head node; 6. the network card sends out the message content stored in the address pointed by the head node through the network card; 7. after the sending is finished, the address pointed by the head node is emptied and added to the tail part of the management linked list again, so that the repeated availability of the applied memory is ensured; and if the address in the head node is the temporarily applied memory address, releasing the address immediately after the network card finishes sending. As an optional implementation mode, the I \ O pressure can select the I \ O number in unit time according to needs, and in addition, other feasible I \ O pressure index values can also be selected according to needs.

Referring to fig. 4, in this embodiment, after step 1), the method further includes a step of initializing, by the target, the NVMeoF queue: and creating NVMeOF queues which are the same as the NVMeOF queues at the host end, and applying for a blank memory in an array format for each NVMeOF queue.

In this embodiment, step 2) further includes, after receiving the command packet, a step of sending a response packet by the target: when a response packet arrives, adding the response packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the response packet through an independent linked list; and polling each NVMeOF queue, and sending the cached response packet to the host side through the network.

In this embodiment, when the target receives the command packet and then sends the response packet, the caching by the independent linked list specifically means that the command packet added to the blank memory in the array format corresponding to the NVMeoF queue is cached to the tail of the sending linked list corresponding to the NVMeoF queue; the sending of the cached command packet to the host end through the network specifically means sending the command packet at the head of the sending linked list to the host end through the network.

When the upper layer IO pressure is too high, the receiving speed of the message exceeds the sending speed of the network card at the target end, which causes the memory overflow. In order to solve the above problem, referring to fig. 5, in this embodiment, the target further includes a step of managing a blank memory in an array format of each NVMeoF queue by using a linked list: 1. adding the blank memory application number of each NVMeOF queue in the array format into a management linked list after the application number is applied; 2. when the response packet arrives, the head node in the management linked list is taken out and deleted from the management linked list; when the I \ O pressure is overlarge, if the management linked list is empty, applying for a section of temporary memory space to store the response packet; 3. assigning a value to a head node taken out from the management linked list, and storing the message content of the response packet into an address pointed by the head node; 4. the head node after assignment is not in any management linked list, and then the head node is added to the tail part of a sending linked list corresponding to the NVMeOF queue; 5. when the network card polls to a certain NVMeOF queue, taking out a head node of a sending linked list in the NVMeOF queue, and deleting the node from the sending linked list after taking out the head node; 6. the network card sends out the message content stored in the address pointed by the head node through the network card; 7. after the sending is finished, the address pointed by the head node is emptied and added to the tail part of the management linked list again to ensure that the applied memory is repeatedly available; and if the address in the head node is the temporarily applied memory address, releasing the address immediately after the network card finishes sending.

In summary, the network card "single producer single consumer model" finally obtained by the lock-free transmission method for the nvmeofs storage network in this embodiment is shown in fig. 6. As can be seen from fig. 6, in the non-lock transmission method for the NVMeoF storage network according to the embodiment, each NVMeoF queue corresponds to one transmission linked list, so that a situation that a plurality of NVMeoF queues compete for one transmission linked list is avoided, a network card "single-producer single-consumer model" is realized, and meanwhile, when the memory is not enough, the non-lock transmission method for the NVMeoF storage network according to the embodiment temporarily applies for a section of memory at the host end and at the target end, and is released after being used up for processing, so that the problem of memory overflow is effectively solved. In the non-lock transmission method for the NVMeOF storage network, the linked list cache is added in each NVMeOF queue, and a method of polling a plurality of linked lists is adopted, so that the problem that the multiple NVMeOF queues compete violently for a single transmission linked list lock is solved, and the I/O bottleneck problem under the condition of large I/O pressure is solved.

In addition, this embodiment also provides an NVMeoF storage network-oriented lock-free transmission system, which includes a host end and a target end, where the host end is programmed or configured to execute the steps of the aforementioned NVMeoF storage network-oriented lock-free transmission method, or the target end is programmed or configured to execute the steps of the aforementioned NVMeoF storage network-oriented lock-free transmission method.

In addition, the present embodiment also provides a lock-free transmission system for an NVMeoF storage network, which includes a computing device programmed or configured to execute the steps of the lock-free transmission method for the NVMeoF storage network.

In addition, the present embodiment also provides a lock-free transmission system for an NVMeoF storage network, which includes a computing device, where a memory of the computing device stores a computer program programmed or configured to execute the lock-free transmission method for the NVMeoF storage network.

Furthermore, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, which is programmed or configured to execute the aforementioned lock-free transmission method for the NVMeoF storage network.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A lock-free transmission method facing an NVMeOF storage network is characterized by comprising the following implementation steps:

2. The NVMeOF storage network-oriented lock-free transmission method of claim 1, wherein the caching in step 2) by the independent linked list specifically means caching the command packet added to the blank memory in the array format corresponding to the NVMeOF queue at the tail of the transmission linked list corresponding to the NVMeOF queue; the sending of the cached command packet to the target end through the network specifically means that the command packet at the head of the sending linked list is sent to the target end through the network.

3. The NVMeOF storage network-oriented lock-free transmission method of claim 2, wherein the host further comprises a step of managing the blank memory of the array format of each NVMeOF queue by using a linked list: 1. adding the blank memory application number of each NVMeOF queue in the array format into a management linked list after the application number is applied; 2. when the command packet arrives, the head node in the management linked list is taken out and deleted from the management linked list; when the I \ O pressure is overlarge, if the management linked list is empty, applying for a section of temporary memory space to store the command packet; 3. assigning a value to a head node taken out from a management linked list, and storing the message content of a command packet into an address pointed by the head node; 4. the head node after assignment is not in any management linked list, and then the head node is added to the tail part of a sending linked list corresponding to the NVMeOF queue; 5. when the network card polls to a certain NVMeOF queue, taking out a head node of a sending linked list in the NVMeOF queue, and deleting the node from the sending linked list after taking out the head node; 6. the network card sends out the message content stored in the address pointed by the head node through the network card; 7. after the sending is finished, the address pointed by the head node is emptied and added to the tail part of the management linked list again, so that the repeated availability of the applied memory is ensured; and if the address in the head node is the temporarily applied memory address, releasing the address immediately after the network card finishes sending.

4. The NVMeOF storage network-oriented lock-free transmission method of claim 1, wherein the step 1) is followed by the step of initializing the NVMeOF queue by the target: and creating NVMeOF queues which are the same as the NVMeOF queues at the host end, and applying for a blank memory in an array format for each NVMeOF queue.

5. The NVMeOF storage network-oriented lock-free transmission method of claim 4, wherein the step 2) is followed by the step of sending a response packet after the target receives the command packet: when a response packet arrives, adding the response packet into a blank memory of an array format corresponding to the NVMeOF queue and caching the response packet through an independent linked list; and polling each NVMeOF queue, and sending the cached response packet to the host side through the network.

6. The NVMeOF storage network-oriented lock-free transmission method of claim 5, wherein when the target receives the command packet and then sends the response packet, the caching by the independent linked list specifically means caching the command packet added to the blank memory in the array format corresponding to the NVMeOF queue to the tail of the sending linked list corresponding to the NVMeOF queue; the sending of the cached command packet to the host end through the network specifically means sending the command packet at the head of the sending linked list to the host end through the network.

7. The NVMeOF storage network-oriented lock-free transmission method of claim 6, wherein the target further comprises a step of managing the blank memory of the array format of each NVMeOF queue by using a linked list: 1. adding the blank memory application number of each NVMeOF queue in the array format into a management linked list after the application number is applied; 2. when the response packet arrives, the head node in the management linked list is taken out and deleted from the management linked list; when the I \ O pressure is overlarge, if the management linked list is empty, applying for a section of temporary memory space to store the response packet; 3. assigning a value to a head node taken out from the management linked list, and storing the message content of the response packet into an address pointed by the head node; 4. the head node after assignment is not in any management linked list, and then the head node is added to the tail part of a sending linked list corresponding to the NVMeOF queue; 5. when the network card polls to a certain NVMeOF queue, taking out a head node of a sending linked list in the NVMeOF queue, and deleting the node from the sending linked list after taking out the head node; 6. the network card sends out the message content stored in the address pointed by the head node through the network card; 7. after the sending is finished, the address pointed by the head node is emptied and added to the tail part of the management linked list again to ensure that the applied memory is repeatedly available; and if the address in the head node is the temporarily applied memory address, releasing the address immediately after the network card finishes sending.

8. An NVMeOF storage network-oriented lock-free transmission system, comprising a host end and a target end, wherein the host end is programmed or configured to perform the steps of the NVMeOF storage network-oriented lock-free transmission method of any one of claims 1 to 3, or the target end is programmed or configured to perform the steps of the NVMeOF storage network-oriented lock-free transmission method of any one of claims 4 to 7.

9. An NVMeOF storage network-oriented lock-free transmission system comprising a computing device, wherein the computing device is programmed or configured to perform the steps of the NVMeOF storage network-oriented lock-free transmission method of any one of claims 1 to 7, or wherein a computer program programmed or configured to perform the NVMeOF storage network-oriented lock-free transmission method of any one of claims 1 to 7 is stored on a memory of the computing device.

10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the lock-free transfer method for nvmeofs storage networks of any of claims 1-7.