CN109388592B - Using multiple queuing structures within user space storage drives to increase speed - Google Patents

Using multiple queuing structures within user space storage drives to increase speed Download PDF

Info

Publication number
CN109388592B
CN109388592B CN201710650398.7A CN201710650398A CN109388592B CN 109388592 B CN109388592 B CN 109388592B CN 201710650398 A CN201710650398 A CN 201710650398A CN 109388592 B CN109388592 B CN 109388592B
Authority
CN
China
Prior art keywords
user space
storage
queue
driver
dispatch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710650398.7A
Other languages
Chinese (zh)
Other versions
CN109388592A (en
Inventor
吕烁
王文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EMC Corp
Original Assignee
EMC IP Holding Co LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EMC IP Holding Co LLC filed Critical EMC IP Holding Co LLC
Priority to CN201710650398.7A priority Critical patent/CN109388592B/en
Priority to US16/050,591 priority patent/US10795611B2/en
Publication of CN109388592A publication Critical patent/CN109388592A/en
Application granted granted Critical
Publication of CN109388592B publication Critical patent/CN109388592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1642Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1663Access to shared memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system

Abstract

The present disclosure relates to employing multiple queuing structures within a user space storage drive to increase speed. Improved techniques improve performance in a multi-core data storage system while allowing portability and fast failover in the event of a failure of a driver stack by the data storage system employing several queues to reduce lock contention. Queuing is performed with a number of queues of two levels within the user space scheduling driver, respectively within the user space container. The user space scheduler driver may be dequeued to a user space manager driver, which communicates with the kernel-based hardware driver by way of a kernel-assisted driver. Apparatus, systems, and computer program products for performing similar methods are also provided.

Description

Using multiple queuing structures within user space storage drives to increase speed
Technical Field
Embodiments of the present disclosure relate generally to the field of data storage, and more particularly to employing multiple queuing structures within a user space storage drive to increase speed.
Background
A data storage system is an arrangement of hardware and software that typically includes one or more storage processors coupled to an array of non-volatile data storage devices, such as magnetic disk drives, electronic flash drives, and/or optical drives. The storage processor service hosts input/output (I/O) operations received from a host. The received I/O operation specifies a storage object (e.g., a logical disk or "LUN") to be written, read, created, or deleted. The storage processor runs software that manages incoming I/O operations and performs various data processing tasks to organize and protect host data received from the host and stored on the non-volatile data storage device.
Some data storage systems employ a storage stack to process and convert I/O operations from one format to another to increase speed and versatility. Once an I/O operation is converted into a set of underlying I/O operations for a physical section of storage on a storage drive, these underlying I/O operations may be queued and executed according to various policies to ensure fairness and improve efficiency.
Disclosure of Invention
Unfortunately, conventional data storage systems utilizing several parallel processing cores may experience performance limitations when a large number of underlying I/O operations are directed to physical drives in a short period of time. This is mainly due to lock contention on queues between several processing cores. This contention may become more significant when using modern flash-based drives capable of handling several concurrent I/O operations, as those devices are capable of handling hundreds of thousands of I/O operations (or more) per second, which can easily overwhelm a single queue with lock contention issues.
Therefore, it would be desirable to reduce performance degradation due to locking. This result may be accomplished by the data storage system employing several queues to reduce lock contention. It would also be desirable to perform this queuing within the userspace driver within the userspace container to allow portability and fast failover to a new userspace container in the event of a failure of the driver stack. This may be done by performing queuing with several queues of two levels within the user space scheduling driver within the user space container. The user space scheduler driver may be dequeued to a user space manager driver, which communicates with the kernel-based hardware driver by way of a kernel-assisted driver.
In one embodiment, a method of processing storage requests for a storage device of a computing device having multiple processing cores (hereinafter "cores") is performed. The method comprises the following steps: (a) sending, by a first storage driver operating within a user space of a computing device, a storage request initiated by a first core of the computing device to a first user space queue, the first user space queue dedicated to storage requests from the first core; (b) sending, by a second storage driver operating within user space, a storage request initiated by a second core of the computing device to a second user space queue, the second user space queue being dedicated to storage requests from the second core, the second core being different from the first core, and the second user space queue being different from the first user space queue; (c) sending, by a first storage driver operating in user space, a storage request from a first user space queue and a second user space queue to a set of user space dispatch queues, the first user space queue and the second user space queue not belonging to the set of user space dispatch queues; (d) sending, by a first storage drive operating within a user space, a storage request from a set of user space dispatch drives to a second storage drive operating within the user space of the computing device, the second storage drive being different from the first storage drive; and (e) sending, by way of the kernel-assisted function, the storage request received from the first storage driver to a hardware device driver of the storage device for execution by the storage device by way of a second storage driver operating within the user space, the hardware device driver of the storage device operating within a kernel of the computing device. Apparatus, systems, and computer program products for performing similar methods are also provided.
The foregoing summary is presented for illustrative purposes to aid the reader in readily understanding the exemplary features presented herein. However, the foregoing summary is not intended to illustrate the claimed elements or to limit embodiments thereof in any way.
Drawings
The foregoing and other features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same or similar parts throughout the different views.
FIG. 1 is a block diagram depicting example systems and apparatus for use in conjunction with various embodiments.
Fig. 2 is a flow diagram depicting an example method of various embodiments.
Detailed Description
Embodiments are directed to techniques for improving performance in a multi-core data storage system while allowing portability and fast failover in the event of a failure of a driver stack. This may be accomplished by the data storage system employing several queues to reduce lock contention. Queuing is performed with a number of queues of two levels within the user space scheduling driver, respectively within the user space container. The user space scheduler driver may be dequeued to a user space manager driver, which communicates with the kernel-based hardware driver by way of a kernel-assisted driver.
Fig. 1 depicts an example environment 30 that includes a computing device 32 that acts as a Data Storage System (DSS). The DSS computing device 32 may be any variety of computing device, such as, for example, a personal computer, a workstation, a server computer, an enterprise server, a DSS rack server, a laptop computer, a tablet computer, a smart phone, a mobile computer, and so forth. Typically, the computing device 30 is a DSS rack server.
DSS computing device 32 includes network interface circuitry 34, processing circuitry 36, memory 40, storage interface circuitry 42, and persistent data storage drives 44 (depicted as storage device 44A, optional storage device 44B … …). The DSS computing device 32 may also include other components, including interconnect circuitry, as is known in the art.
The network interface circuit 34 may include one or more ethernet cards, cellular modems, Fibre Channel (FC) adapters, wireless fidelity (Wi-Fi) wireless network adapters, and/or other devices for connecting to a network (not depicted). The network interface circuit 34 allows the DSS computing device 32 to communicate with one or more host devices (not depicted) capable of sending data storage commands to the DSS computing device for fulfillment.
The processing circuit 36 may be any kind of processor or collection of processors configured to perform operations, such as, for example, a microprocessor, a multi-core microprocessor, a digital signal processor, a system on a chip, a collection of electronic circuits, a similar kind of controller, or any combination of the above. The processing circuitry 36 includes a plurality of processing cores 38 (depicted as cores 38(1), 38(2), 38(3) … …). Each core 38 may be a different physical core or it may be a virtual core (e.g., due to hyper-threading). Thus, for example, if the DSS computing device 32 includes two microprocessors, each having four physical cores with hyper-threading enabled, the DSS computing device 32 will have a total of sixteen cores 38. In some embodiments, DSS computing device 32 may be constructed as a set of two or more storage processors (SPs, not depicted) each mounted on a separate board, each SP having its own network interface circuitry 34, processing circuitry 36, memory 40, and storage interface circuitry 42, but sharing storage device 44 between them. In such an embodiment, a high speed inter-SP bus may connect the SPs. There may be more than one SP installed in the DSS 30 for redundancy and performance reasons. In these embodiments, each SP may be considered independently for purposes of this disclosure.
Persistent storage drive 44 may include any kind of persistent storage device such as, for example, a hard disk drive, a solid State Storage Device (SSD), a flash drive, etc. In a typical embodiment, one or more of the storage devices 44 (e.g., storage drives 44A) are SSD or flash drives having multiple channels 46 (depicted as channels 46A-i, 46A-ii … …) that allow more than one storage operation to be performed by the storage device 44A simultaneously.
The storage interface circuitry 42 controls and provides access to persistent storage drives 44. The storage interface circuit 42 may include, for example, SCSI, SAS, ATA, SATA, FC, M.2, and/or other similar controllers and ports.
Memory 40 may be any kind of digital system memory such as, for example, Random Access Memory (RAM). The memory 40 stores an Operating System (OS) kernel 50 (e.g., Linux, UNIX, Windows, MacOS, or similar operating system kernels) in operation. The memory 40 also includes a user space portion 48 within which non-kernel OS applications (not depicted) as well as user applications (not depicted) and data (not depicted) may be stored.
As is known in the art, the kernel 50 and applications, which run only within the kernel 50, have direct access to the hardware of the DSS computing system 32. Any application running within user space 48 may access the hardware by means of only system calls to kernel 50. Although functions may execute faster if implemented within the kernel 50 (e.g., as hardware drivers, such as the storage hardware driver 66 and the auxiliary driver 64), the more complex the kernel 50 becomes, the more likely the kernel 50 is to crash, which may require a full DSS 32 (or SP) reboot, which may cause a significant amount of downtime. Therefore, it is desirable to implement complex functions within user space 48 (such as a complex storage driver stack) to avoid crashing kernel 50. It would also be desirable to implement the storage driver stack completely within its own user space container 52, so that in case of a crash, only the user space container 52 needs to be restarted, while other applications running within the user space 48 may continue to operate, which allows for even further reduced downtime in case of a crash. The user of the user space container 52 also allows easy portability between SPs. The implementation within user space 48 also allows for easy upgrades of kernel 50 without having to recompile and test new OS upgrades prior to upgrading.
As depicted, the user space drivers 54, 56, 58, 60, and 62 are used to implement a storage stack for storage operations on the DSS computing device 32. The upper file system driver 54 is configured to receive file-based storage requests 74 from the host, each file-based storage request 74 being directed to a file system (not depicted) that is ultimately supported by storage from one or more of the storage devices 44 of the DSS computing device 32. The upper file system driver 54 translates those requests 74 into block-based storage requests 76, the block-based storage requests 76 being directed to specific blocks of storage of the volume or logical disk (not depicted) on which the file system resides. The upper file system driver 54 sends those block-based storage requests 76 to the mapping driver 56.
In some embodiments, mapping driver 56 implements the logical volume using a container file (not depicted) of an underlying file system (not depicted). Thus, in these embodiments, mapping driver 56 translates block-based request 76 into file-based request 78 for the container file. Mapping driver 56 also implements an underlying file system on a second virtual volume (not depicted) comprised of segments of storage (not depicted) drawn from one or more storage devices 44. In some embodiments, the mapping driver 56 also introduces address translation, RAID, and other services due to deduplication. Thus, the mapping driver 56 translates the file-based request 78 back to a block-based request 80 to the particular storage driver 44, which sends the storage stack down to the multi-core cache 58.
The multi-core cache 58 is a layer of a driver stack that utilizes a dedicated portion (not depicted) of the memory 40 (which may include some persistent or battery-backed memory (not depicted)) to store data related to block-based storage requests for fast performance optimized for execution by several cores 38 operating in parallel. Typically, once a block-based storage request 80 is placed within the multi-core cache 58 (and either placed in persistent memory or mirrored to a backup copy on another SP), it may be acknowledged to the stack, allowing the host to continue as if the original storage request 74 had been fully executed, even though the data has not been flushed to the final backing store on the storage device 44. In this context, the core 38 operates to flush cached data to the storage device 44.
Each core 38 is configured to perform this flushing by sending one or more underlying data storage commands 82, each for a particular address range of a particular storage driver 44, to the next layer of the storage stack as a dispatch driver 60. For any given storage drive 44A, the scheduling driver 60 (which may also be referred to as a physical package driver) schedules the execution of commands 82 for that storage drive 44A, which is important when there are many simultaneous commands 82 from several different cores 38 for the same storage drive 82.
Each core 38 sends its respective storage command 82 (depicted as storage commands 82(1), 82(2), 82(3) corresponding to cores 38(1), 38(2), 38(3), respectively) to a particular per-core queue 68 (depicted as per-core user space queue 68A (1), 68A (2), 68A (3) … …). Each per-core queue 68 is dedicated to storage commands 82 from a particular core 38(x) for a particular storage device 44Y. For each particular memory device 44Y, there may be as many per-core queues 68 as there are cores 38 within the processing circuitry 36. In some embodiments, it may be possible that some cores 38 of processing circuitry 36 may not be configured to send a store command 82 to flush the multi-core cache 58 (e.g., some cores 38 may be dedicated to other tasks). In these embodiments, the number of per-core queues 68 for each particular storage device 44Y may instead be limited to the number of cores 38 available to send storage commands 82. Thus, for example, if the processing circuitry 36 includes four cores 38, but only three of those cores 38(1), 38(2), 38(3) are available to send storage commands 82, there will be three per-core user space queues 68A (1), 68A (2), 68A (3) for the storage device 44A.
Upon receipt of each storage command 82, the scheduling driver 60 places it into the appropriate per-core user space queue 68 (at the head of each queue). In some embodiments, the scheduling driver 60 may perform various reordering and/or merging operations on various storage commands 82 within each per-core user space queue 68. In some embodiments, dispatch driver 60 may perform load balancing operations by transferring various storage commands 82 between per-core user space queues 68.
The scheduling driver 60 dequeues the storage command 82 from the tail of each per-core user space queue 68A for a particular storage device 44A to the appropriate user space dispatch queue 70 of a set of such dispatch queues 70A (depicted as dispatch queues 70A-i, 70A-ii … …) associated with that particular storage device 44A. If there are the same number of per-core user space queues 68A for a particular storage device 44A as dispatch queues 70A for that particular storage device 44A, then each per-core user space queue 68A is dequeued directly to the head of a dedicated dispatch queue 70A. Thus, for example, if there are only two per-core user space queues 68A (1), 68A (2), then the per-core user space queue 68A (1) will be dequeued from its tail to the head of the dispatch queue 70A-i, and the per-core user space queue 68A (2) will be dequeued from its tail to the head of the dispatch queue 70A-ii. As depicted, since there are three per-core user space queues 68A (1), 68A (2), 68A (3) but only two dispatch queues 70A-i, 70A-ii, the per-core user space queue 68A (1) is dequeued from its tail by sending the storage command 84 to the head of the dispatch queue 70A-i, while the per-core user space queue 68A (2), 68A (3) is alternately dequeued from its respective tail by sending the storage command 86a, 86b, respectively, to the head of the dispatch queue 70A-ii.
Scheduling driver 60 generally refrains from performing reordering, merging and load balancing operations on dispatch queue 70.
The dispatch driver 60 dequeues the storage commands 88 for the respective channel 46A of the particular storage device 44A from the tail of each dispatch queue 70A for that particular storage device 44A. Thus, for example, as depicted, dispatch driver 60 dequeues storage commands 88-i for channels 46A-i of storage driver 44A from the tail of dispatch queue 70A-i and dequeues storage commands 88-ii for channels 46A-ii of storage driver 44A from the tail of dispatch queue 70A-ii. It should be appreciated that in some embodiments, instead of dequeuing for a particular channel 46, the scheduling driver 60 may simply dequeue from the dispatch queue 70A for a particular storage driver 44A in a round-robin fashion, depending on the built-in queue (not depicted) of that particular storage driver 44A to perform operations in parallel on its various channels 46A.
It should be appreciated that the dispatch driver 60 does not directly dequeue the storage command 88 to the channel 46A (or directly to the storage device 44A) because user space drivers (such as the dispatch driver 60) cannot communicate directly with the hardware. In addition, there are additional management tasks performed by the intermediary management driver 62, such as link initialization, link services, PHY management, and I/O support. Since the management driver 62 also runs within the user space 48, it is able to communicate with a storage hardware driver 66 running in the kernel 50 through a kernel auxiliary driver 64. See, e.g., U.S. patent No. 9,612,756 issued on 4/4 of 2017, the entire contents and teachings of which are incorporated herein by reference. Thus, the management driver 62 forwards the storage commands 88-i, 88-ii as corresponding storage commands 90-i, 90-ii to the storage hardware driver 66A for the storage driver 44A. The auxiliary driver 64 is used to forward certain communications between the management driver 62 within the user space 48 and the storage hardware driver 66A within the kernel 50, in the sense that such communications are disabled (or impossible), across the barriers between the user space 48 and the kernel 50. Finally, the storage hardware driver 66A forwards the storage commands 90-i, 90-ii as respective storage commands 92-i, 92-ii to the storage driver 44A or its respective channel 46A-i, 46A-ii.
In some embodiments, memory 40 may also include a persistent storage portion (not depicted). The persistent storage portion of memory 40 may be comprised of one or more persistent storage devices, such as, for example, disks. The persistent storage portion of the memory 40 or persistent storage driver 44 is configured to store programs and data even when the DSS computing device 32 is powered down. The OS, applications and drivers 54, 56, 58, 60, 62, 64, 66 are typically stored in the persistent storage portion of memory 40 or on persistent storage driver 44 so that they may be loaded into the system portion of memory 40 from the persistent storage portion of memory 40 or persistent storage driver 44 upon a system reboot. When stored in a non-transitory form in a volatile portion of the memory 40 or on the persistent storage drive 44 or in a persistent portion of the memory 40, these applications and drives 54, 56, 58, 60, 62, 64, 66 form a computer program product. Thus, the processing circuitry 36 running one or more of these applications and drivers 54, 56, 58, 60, 62, 64, 66 forms a dedicated circuit constructed and arranged to perform the various processes described herein.
FIG. 2 illustrates an example method 100 performed by the various drivers 54, 56, 58, 60, 62, 64, 66 and/or the core 50 of the storage stack. It should be understood that whenever a piece of software (e.g., drivers 54, 56, 58, 60, 62, 64, 66, core 50, etc.) is described as performing a method, process, step or function, it actually means that the computing device (e.g., DSS computing device 32) on which the piece of software is running performs the method, process, step or function when the piece of software is executed on its processing circuitry 36. It should be understood that one or more of the steps or sub-steps of method 100 may be omitted in some embodiments. Similarly, in some embodiments, one or more steps or sub-steps may be combined together or performed in a different order. The method 100 is performed by the DSS computing device 32.
Step 110 and step 120 (and in embodiments where step 130 is performed, step 130) may be performed in parallel. Parallel execution means that the order of execution of these steps 110, 120 (and 130) is not important; they may be performed simultaneously in an overlapping manner, or either may be performed before or after the other steps.
In step 110, the scheduling driver 60 operating within the user space 48 (and, in some embodiments, more particularly within the dedicated user space container 52) sends a storage request (e.g., a storage command 82(1)) initiated by the first core 38(1) (toward a particular storage drive 44A) to the first user space per core queue 68A (1), the first user space per core queue 68A (1) being dedicated to the storage request 82(1) from the first core 38(1) for the particular storage drive 44A.
In step 120, the scheduling driver 60 operating within the user space 48 (and, in some embodiments, more particularly within the dedicated user space container 52) sends a storage request (e.g., storage command 82(2)) initiated by the second core 38(2) (toward the particular storage drive 44A) to the second user space per core queue 68A (2), the second user space per core queue 68A (2) being dedicated to the storage request 82(2) from the second core 38(2) for the particular storage drive 44A.
In an optional step 130 (which may be omitted in a system having only two cores 38(1), 38(2) or only two cores 38(1), 38(2) that allow processing flushes from the multi-core cache 58), the dispatch driver 60 operating within the user space 48 (and, in some embodiments, more particularly within the dedicated user space container 52) sends a storage request (e.g., a storage command 82(3)) initiated by the third core 38(3) (toward a particular storage driver 44A) to the third user space per core queue 68A (3), the third user space per core queue 68A (3) being dedicated to a storage request 82(3) from the third core 38(3) for the particular storage driver 44A.
In some embodiments, the dispatch driver 60 may perform step 140, where the contents of the various per core queues 68 may be modified for efficiency reasons. Typically, step 140 is omitted for any per-core queues 68A associated with storage drives 44A that are SSDs or flash-based (or otherwise have minimal latency for random seeking). Step 140 may include one or more of sub-steps 142, 144, 146.
In sub-step 142, dispatch driver 60 reorders the memory commands 82(x) stored in per-core queues 68a (x). For example, if there are two different storage commands 82(x) -I and 82(x) -II for a section of the storage drive 44A that are physically in close proximity to each other, the dispatch drive 60 may reorder the queue 68a (x) so that those two storage commands 82(x) -I, 82(x) -II are executed consecutively without another storage command 82(x) -III for a distant section intermediary.
In sub-step 144, dispatch driver 60 merges memory commands 82(x) stored in per-core queues 68a (x). For example, if there are two different storage commands 82(x) -I and 82(x) -II that are both write commands for a contiguous sector of storage drive 44A, dispatch driver 60 may merge these two storage commands 82(x) -I, 82(x) -II into a single storage command 82(x) -IV that writes to the merged larger sector.
In sub-step 146, dispatch driver 60 load balances between per-core queues 68A. For example, if per core queue 68A (1) has 1000 pending storage requests 82(1) therein and per core queue 68A (2) has only 17 pending storage requests 82(2) therein, then dispatch driver 60 may transfer some of pending storage requests 82(1) from per core queue 68A (1) to per core queue 68A (2).
In step 150, the scheduling driver 60 operating within the user space 48 (and, in some embodiments, more particularly within the dedicated user space container 52) sends storage requests (e.g., storage commands 84, 86a, 86b) from the user space per core queue 68A to a set of user space dispatch queues 70A for the storage driver 44A.
In one embodiment, if there is only a single user space dispatch queue 70A-i for the storage drive 44A, the scheduling driver 60 dequeues the storage commands 84, 86a, 86b from all user space per core requests 68A to the single user space dispatch queue 70A-i for the storage drive 44A.
Alternatively, if there are at least two user space dispatch queues 70A-I, 70A-ii, then substeps 156 and 157 (and possibly 158) are performed.
In sub-step 156, the scheduling driver 60 dequeues the storage command 86 from the at least one user space per core queue 68A (1) to a user space dispatch queue 70A-i for the storage driver 44A, while in sub-step 157, the scheduling driver 60 dequeues the storage command 86a from a different user space per user queue 68A (2) to a different user space dispatch queue 70A-ii for the storage driver 44A. If there are more user space per core queues 68A than user space dispatch queues 70A for storage drives 44A, then in sub-step 158 the scheduling drive 60 dequeues the storage command 86b from the third user space per core queue 68A (3) to the same user space dispatch queue 70A-ii as in sub-step 157.
In step 160, the scheduling driver 60 operating within the user space 48 (and, in some embodiments, more particularly within the dedicated user space container 52) sends a storage request (e.g., a storage command 88) from a set of user space dispatch queues 70A to another driver operating within the user space 48 (i.e., the management driver 62). In some embodiments, storage commands 88 are sent from a particular dispatch queue 70A to a corresponding dispatch queue (not depicted) within the management driver. In other embodiments, in optional sub-step 165, the scheduling driver 60 dispatches storage commands 88 from respective user space dispatch queues 70A to the management driver 62 (with only a single queue (not depicted) therein) in an alternating manner according to a fairness policy.
In step 170, another driver operating within user space 48 (i.e., management driver 62) sends the storage request 88 received from the scheduling driver 60 as storage request 90 to the storage hardware driver 66A within kernel 50 by way of kernel auxiliary driver 64. The storage hardware driver 66A can then pass these storage requests 90 onto the storage driver 44A as storage requests 92 for execution by the storage driver 44A. In some embodiments, the storage hardware driver 66A passes certain storage requests 92-i to the first channel 46A-i and passes other storage requests 92-ii to the second channel 46A-ii. In other embodiments, a local queue (not depicted) within the storage drive 44A distributes the storage request 92 among the channels 46A for execution as it becomes available.
Accordingly, techniques have been presented for improving performance in a multi-core data storage system 32 while allowing portability and fast failover in the event of a failure of a driver stack. This may be accomplished by the data storage system 32 employing several queues 68, 70 to reduce lock contention. Queuing is performed using a number of queues 68, 70 at two levels within the user space scheduling driver 60 within the user space container 52, respectively. The user space scheduling driver 60 may be dequeued into a user space management driver 62, the user space management driver 62 communicating with a kernel-based hardware driver 66 by way of a kernel auxiliary driver 64.
As used throughout this document, the words "comprising", "including", "containing" and "having" are intended to describe certain items, steps, elements or aspects of something in an open-ended fashion. Also, as used herein and unless a specific statement is made to the contrary, the word "set" means one or more of something. This is true regardless of whether the phrase "set of … …" is followed by a singular or plural object and regardless of whether it is conjugated with a singular or plural verb. Moreover, although ordinal expressions (such as "first," "second," "third," etc.) may be used herein as adjectives, such ordinal expressions are used for identification purposes and are not intended to imply any ordering or sequence unless specifically indicated. Thus, for example, a "second" event may occur before or after a "first event" or even if the first event did not even occur. In addition, although a particular element, feature, or action is referred to herein as being "first," such element, feature, or action should not be construed as requiring that a "second" or other such element, feature, or action be present. Rather, the "first" item may be only one. Although certain embodiments are disclosed herein, it should be understood that these are provided by way of example only and that the invention is not limited to these specific embodiments.
While various embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the appended claims.
For example, while various embodiments have been described as methods, software embodying these methods is also included. Accordingly, one embodiment includes a tangible, non-transitory computer-readable storage medium (such as, for example, a hard disk, a floppy disk, an optical disk, a flash memory, etc.) programmed with instructions that, when executed by a computer or a set of computers, cause one or more of the methods described in the various embodiments to be performed. Another embodiment includes a computer programmed to perform one or more of the methods described in the various embodiments.
Moreover, it is to be understood that all embodiments that have been described may be combined with each other in all possible combinations, except insofar as such combinations have been explicitly excluded.
Finally, even if a technique, method, apparatus, or other concept is specifically labeled "conventional," applicant does not recognize that such technique, method, apparatus, or other concept is actually prior art at 35u.s.c. § 102 or 35u.s.c. § 103, subject to legal determination depending upon a number of factors not all of which are known to the applicant at this time.

Claims (17)

1. A method of processing a storage request for a storage device of a computing device having a plurality of processing cores (hereinafter "cores"), the method comprising:
sending, by a first storage driver operating within a user space of the computing device, a storage request initiated by a first core of the computing device to a first user space queue, the first user space queue dedicated to storage requests from the first core;
sending, by the first storage driver operating within user space, a storage request initiated by a second core of the computing device to a second user space queue, the second user space queue dedicated to storage requests from the second core, the second core being different from the first core, and the second user space queue being different from the first user space queue;
sending, by the first storage driver operating in user space, storage requests from the first user space queue and the second user space queue to a set of user space dispatch queues to which the first user space queue and the second user space queue do not belong;
sending, by the first storage drive operating in user space, a storage request from the set of user space dispatch queues to a second storage drive operating in user space of the computing device, the second storage drive being different from the first storage drive; and
sending, by way of a kernel-assisted function, the storage request received from the first storage driver to a hardware device driver of the storage device for execution by the storage device by way of the second storage driver operating within user space, the hardware device driver of the storage device operating within a kernel of the computing device.
2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
wherein the storage device is configured to concurrently process a plurality of storage requests;
wherein the set of user space dispatch queues comprises a plurality of user space dispatch queues; and is
Wherein sending storage requests from the first user space queue and the second user space queue to the set of user space dispatch queues comprises:
sending a storage request from the first user space queue to a first user space dispatch queue of the set of user space dispatch queues; and
sending a storage request from the second user space queue to a second user space dispatch queue of the set of user space dispatch queues, the second user space dispatch queue being different from the first user space dispatch queue.
3. The method of claim 2, wherein the method further comprises:
sending, by the first storage driver operating within user space, a storage request initiated by a third core of the computing device to a third user space queue, the third user space queue dedicated to storage requests from the third core, the third core being different from the first core and the second core, the third user space queue being different from the first user space queue and the second user space queue, and the third user space queue not belonging to the set of user space dispatch queues; and
sending, by the first storage driver operating within user space, a storage request from the third user space queue to the second user space dispatch queue for execution by the storage device, wherein sending the storage request from the third user space queue to the second user space dispatch queue is interleaved with sending the storage request from the second user space queue to the second user space dispatch queue.
4. The method of claim 2, wherein the plurality of user space dispatch queues is equal to the plurality of storage requests that the storage device is configured to process concurrently.
5. The method of claim 2, wherein sending the storage request from the set of user space dispatch queues to the second storage drive operating within user space comprises dispatching storage requests from each of the plurality of user space dispatch queues to the second storage drive operating within user space in an alternating manner according to a fairness policy.
6. The method of claim 1, wherein sending the storage requests from the set of user space dispatch queues to the second storage drive operating within user space comprises dispatching storage requests from each user space dispatch queue of the plurality of user space dispatch queues without reordering or merging the storage requests.
7. The method of claim 1, wherein the first storage drive operating in user space is configured to reorder storage requests within the first user space queue and the second user space queue for efficiency.
8. The method of claim 7, wherein the first storage drive operating within user space is configured to merge contiguous storage requests within the first user space queue and the second user space queue for efficiency.
9. The method of claim 1, wherein the method further comprises load balancing storage requests between the first user space queue and the second user space queue for efficiency.
10. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
wherein the storage device is one of a plurality of storage devices of the computing device; and is
The second storage drive operating in user space is configured to manage the plurality of storage devices.
11. The method of claim 1, wherein the kernel assist function is configured to pass hardware interrupts between the hardware device driver of the storage device operating within the kernel and the second storage driver running within user space.
12. A computing apparatus for processing storage requests for a storage device of the computing apparatus, the apparatus comprising processing circuitry having a plurality of processing cores (hereinafter "cores") coupled to a memory, configured to:
sending, by a first storage driver operating within a user space of the computing device, a storage request initiated by a first core of the computing device to a first user space queue, the first user space queue dedicated to storage requests from the first core;
sending, by the first storage driver operating within user space, a storage request initiated by a second core of the computing device to a second user space queue, the second user space queue dedicated to storage requests from the second core, the second core being different from the first core, and the second user space queue being different from the first user space queue;
sending, by the first storage driver operating in user space, storage requests from the first user space queue and the second user space queue to a set of user space dispatch queues to which the first user space queue and the second user space queue do not belong;
sending, by the first storage drive operating within user space, a storage request from the set of user space dispatch queues to a second storage drive operating within user space of the computing device, the second storage drive being different from the first storage drive; and
sending, by way of a kernel-assisted function, the storage request received from the first storage driver to a hardware device driver of the storage device for execution by the storage device by way of the second storage driver operating within user space, the hardware device driver of the storage device operating within a kernel of the computing apparatus.
13. The computing device of claim 12, wherein the computing device,
wherein the storage device is configured to concurrently process a plurality of storage requests;
wherein the set of user space dispatch queues comprises a plurality of user space dispatch queues; and is
Wherein sending storage requests from the first user space queue and the second user space queue to the set of user space dispatch queues comprises:
sending a storage request from the first user space queue to a first user space dispatch queue of the set of user space dispatch queues; and
sending a storage request from the second user space queue to a second user space dispatch queue of the set of user space dispatch queues, the second user space dispatch queue being different from the first user space dispatch queue.
14. The computing device of claim 12, wherein sending the storage requests from the set of user space dispatch queues to the second storage drive operating within user space comprises dispatching storage requests from each user space dispatch queue of the plurality of user space dispatch queues without reordering or merging the storage requests.
15. A non-transitory computer-readable storage medium for processing storage requests for a storage device of a computing device having multiple processing cores (hereinafter "cores"), the non-transitory computer-readable storage medium storing a set of instructions that, when executed by the computing device, cause the computing device to:
sending, by a first storage driver operating within a user space of the computing device, a storage request initiated by a first core of the computing device to a first user space queue, the first user space queue dedicated to storage requests from the first core;
sending, by the first storage driver operating within user space, a storage request initiated by a second core of the computing device to a second user space queue, the second user space queue dedicated to storage requests from the second core, the second core being different from the first core, and the second user space queue being different from the first user space queue;
sending, by the first storage driver operating in user space, storage requests from the first user space queue and the second user space queue to a set of user space dispatch queues to which the first user space queue and the second user space queue do not belong;
sending, by the first storage drive operating in user space, a storage request from the set of user space dispatch queues to a second storage drive operating in user space of the computing device, the second storage drive being different from the first storage drive; and
sending, by way of a kernel-assisted function, the storage request received from the first storage driver to a hardware device driver of the storage device for execution by the storage device by way of the second storage driver operating within user space, the hardware device driver of the storage device operating within a kernel of the computing device.
16. The storage medium of claim 15, wherein the storage medium,
wherein the storage device is configured to concurrently process a plurality of storage requests;
wherein the set of user space dispatch queues comprises a plurality of user space dispatch queues; and is
Wherein sending storage requests from the first user space queue and the second user space queue to the set of user space dispatch queues comprises:
sending a storage request from the first user space queue to a first user space dispatch queue of the set of user space dispatch queues; and
sending a storage request from the second user space queue to a second user space dispatch queue of the set of user space dispatch queues, the second user space dispatch queue being different from the first user space dispatch queue.
17. The storage medium of claim 15, wherein sending the storage requests from the set of user space dispatch queues to the second storage drive operating in user space comprises dispatching storage requests from each of the plurality of user space dispatch queues without reordering or merging the storage requests.
CN201710650398.7A 2017-08-02 2017-08-02 Using multiple queuing structures within user space storage drives to increase speed Active CN109388592B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710650398.7A CN109388592B (en) 2017-08-02 2017-08-02 Using multiple queuing structures within user space storage drives to increase speed
US16/050,591 US10795611B2 (en) 2017-08-02 2018-07-31 Employing multiple queueing structures within a userspace storage driver to increase speed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710650398.7A CN109388592B (en) 2017-08-02 2017-08-02 Using multiple queuing structures within user space storage drives to increase speed

Publications (2)

Publication Number Publication Date
CN109388592A CN109388592A (en) 2019-02-26
CN109388592B true CN109388592B (en) 2022-03-29

Family

ID=65229429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710650398.7A Active CN109388592B (en) 2017-08-02 2017-08-02 Using multiple queuing structures within user space storage drives to increase speed

Country Status (2)

Country Link
US (1) US10795611B2 (en)
CN (1) CN109388592B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11513855B2 (en) * 2020-04-07 2022-11-29 EMC IP Holding Company, LLC System and method for allocating central processing unit (CPU) cores for system operations

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103392171A (en) * 2010-12-13 2013-11-13 超威半导体公司 Graphics processing dispatch from user mode
CN103582877A (en) * 2010-12-15 2014-02-12 超威半导体公司 Computer system interrupt handling
CN103608767A (en) * 2011-06-23 2014-02-26 微软公司 Programming interface for data communications
CN105683905A (en) * 2013-11-01 2016-06-15 高通股份有限公司 Efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media
US9378047B1 (en) * 2013-09-18 2016-06-28 Emc Corporation Efficient communication of interrupts from kernel space to user space using event queues

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8452901B1 (en) 2011-12-30 2013-05-28 Emc Corporation Ordered kernel queue for multipathing events
US9304936B2 (en) * 2013-12-09 2016-04-05 International Business Machines Corporation Bypassing a store-conditional request around a store queue
WO2016183028A2 (en) * 2015-05-10 2016-11-17 Apl Software Inc. Methods and architecture for enhanced computer performance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103392171A (en) * 2010-12-13 2013-11-13 超威半导体公司 Graphics processing dispatch from user mode
CN103582877A (en) * 2010-12-15 2014-02-12 超威半导体公司 Computer system interrupt handling
CN103608767A (en) * 2011-06-23 2014-02-26 微软公司 Programming interface for data communications
US9378047B1 (en) * 2013-09-18 2016-06-28 Emc Corporation Efficient communication of interrupts from kernel space to user space using event queues
CN105683905A (en) * 2013-11-01 2016-06-15 高通股份有限公司 Efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media

Also Published As

Publication number Publication date
CN109388592A (en) 2019-02-26
US10795611B2 (en) 2020-10-06
US20190042158A1 (en) 2019-02-07

Similar Documents

Publication Publication Date Title
US10042563B2 (en) Segmenting read requests and interleaving segmented read and write requests to reduce latency and maximize throughput in a flash storage device
US9317204B2 (en) System and method for I/O optimization in a multi-queued environment
US8725906B2 (en) Scalable data storage architecture and methods of eliminating I/O traffic bottlenecks
US10459661B2 (en) Stream identifier based storage system for managing an array of SSDs
US9465555B2 (en) Method and apparatus for efficient processing of disparate data storage commands
US8886845B1 (en) I/O scheduling system and method
US11262945B2 (en) Quality of service (QOS) system and method for non-volatile memory express devices
US20090119463A1 (en) System and article of manufacture for dumping data in processing systems to a shared storage
US9699093B2 (en) Migration of virtual machine based on proximity to peripheral device in NUMA environment
US10459662B1 (en) Write failure handling for a memory controller to non-volatile memory
US20220308764A1 (en) Enhanced Storage Protocol Emulation in a Peripheral Device
US10318178B1 (en) Accelerating copy of zero-filled data extents
US20110154165A1 (en) Storage apparatus and data transfer method
US10831684B1 (en) Kernal driver extension system and method
CN109388592B (en) Using multiple queuing structures within user space storage drives to increase speed
US11093175B1 (en) Raid data storage device direct communication system
US10154113B2 (en) Computer system
US20110173372A1 (en) Method and apparatus for increasing file copy performance on solid state mass storage devices
US11733926B2 (en) Command sequencing for read operations by solid-state drives
US11231881B2 (en) Raid data storage device multi-step command coordination system
US11093180B2 (en) RAID storage multi-operation command system
US11422740B2 (en) Raid storage-device-assisted data update system
US11003391B2 (en) Data-transfer-based RAID data update system
US11003378B2 (en) Memory-fabric-based data-mover-enabled memory tiering system
US9317419B1 (en) System and method for thin provisioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant