US20230244390A1 - Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices - Google Patents
Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices Download PDFInfo
- Publication number
- US20230244390A1 US20230244390A1 US17/588,204 US202217588204A US2023244390A1 US 20230244390 A1 US20230244390 A1 US 20230244390A1 US 202217588204 A US202217588204 A US 202217588204A US 2023244390 A1 US2023244390 A1 US 2023244390A1
- Authority
- US
- United States
- Prior art keywords
- physical function
- child
- mfnd
- child physical
- qos
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003863 physical function Effects 0.000 title abstract description 81
- 238000000034 method Methods 0.000 claims description 58
- 238000012544 monitoring process Methods 0.000 claims description 30
- 238000005516 engineering process Methods 0.000 abstract description 28
- 238000010586 diagram Methods 0.000 description 23
- 238000012545 processing Methods 0.000 description 20
- 238000007726 management method Methods 0.000 description 15
- 239000003795 chemical substances by application Substances 0.000 description 13
- 230000007246 mechanism Effects 0.000 description 12
- 238000004891 communication Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 230000004044 response Effects 0.000 description 9
- 238000013403 standard screening design Methods 0.000 description 8
- 238000013500 data storage Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 230000006855 networking Effects 0.000 description 4
- 238000000844 transformation Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 239000003990 capacitor Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3034—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0605—Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Definitions
- NVMe Non-Volatile Memory Express
- PCIe Peripheral Component Interconnect Express
- PFs PCIe physical functions
- MFNDs multiple physical function non-volatile memory devices
- a MFND In a MFND, one PF, which might be referred to herein as a “parent PF,” can act as a parent controller to receive and execute administrative commands.
- Other physical functions on a MFND which might be referred to herein as “child PFs” or “children PFs,” can act as child controllers that behave similarly to standard NVMe controllers.
- a MFND can enable the efficient sharing of input/output (“I/O”) resources between virtual machines (“VMs”) or bare metal instances.
- VMs virtual machines
- bare metal instances For example, child PFs can be directly assigned to and utilized by different VMs through various direct hardware access technologies, such as HYPER-V NVMe Direct or Discrete Device Assignment (“DDA”).
- DDA Discrete Device Assignment
- the child PFs exposed by a single MFND can appear as multiple, separate physical devices to individual VMs. This allows individual VMs to directly utilize a portion of the available non-volatile storage space provided by a MFND with reduced central processing unit (“CPU”) and hypervisor overhead.
- CPU central processing unit
- MFNDs Existing MFNDs, however, have limitations that restrict aspects of their functionality when used with VMs in the manner described above. As one specific example, it might not be possible to obtain detailed information regarding a VM's usage of the resources allocated to it by a MFND. Consequently, system administrators might not know when a VM is over or under-utilizing the resources provided by a MFND and, as a result, might not be able to make informed decisions regarding reallocating those MFND-provided resources or provisioning new MFND-provided resources. Current MFNDs can also suffer from other technical limitations, some of which are described in detail below.
- MFNDs can be configured to collect QoS statistics, referred to herein as “child PF QoS statistics,” for in-use child physical functions that describe the utilization of resources provided by the child PFs to VMs.
- the collected child PF QoS statistics can then be utilized to inform decisions regarding reallocation of MFND-provided resources and provisioning of new MFND-provided resources, thereby making more efficient utilization of MFND hardware.
- the child PF QoS statistics can also be collected in a manner that reduces the performance impact of collecting the child PF QoS statistics on the MFND and minimizes the use of non-volatile memory.
- Other technical benefits not specifically mentioned herein can also be realized through implementations of the disclosed subject matter.
- the disclosed technologies include functionality for collecting QoS statistics for in-use child PFs of an MFND.
- a host computing device creates a child PF on a MFND and configures the child PF on the MFND to provide a specified QoS level to an associated VM executing on the host computing device.
- the host computing device also enables the MFND to collect child PF QoS statistics for the child PF.
- the collected child PF QoS statistics describe the utilization of resources provided by child PFs to assigned VMs.
- the MFND provides the child PF QoS statistics from the MFND to the host computing device. As discussed above, the collected child PF QoS statistics can then be utilized to inform decisions regarding reallocation of MFND-provided resources, provisioning of new MFND-provided resources, and potentially other types of decisions.
- the specified QoS level defines maximum read input/output (“I/O”) operations per second (“IOPS”) and maximum write IOPS for the child PF.
- I/O input/output
- the child PF QoS statistics for the child PF might specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child PF to the VM assigned to the child PF during a specified monitoring period.
- the specified QoS level might also, or alternately, define a maximum read bandwidth and a maximum write bandwidth for the child PF.
- the child PF QoS statistics for the child PF might specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child PF to the VM assigned to the child PF during the specified monitoring period.
- the child PF QoS statistics for the child PF specify the percentage of read operations and write operations performed by the child PF during the specified monitoring period.
- the child PF QoS statistics for the child PF might also, or alternately, specify a size of I/O workloads performed by the child PF on behalf of an assigned VM during the monitoring period.
- the child PF QoS statistics might also, or alternately, specify an amount of the storage capacity of a non-volatile memory device on the MFND that is in use by a VM.
- Other types of child PF QoS statistics can be collected in the manner described herein in other embodiments.
- the host computing device specifies the duration of a QoS statistics monitor period and the duration of a QoS statistics swap bucket period to the MFND.
- the MFND is further configured to store the child physical function QoS statistics in a log, which might be referred to herein as the “active log,” during the duration of the QoS statistics monitor period.
- the MFND swaps the active log with another log, which might be referred to herein as the “save log.”
- the MFND provides the child PF QoS statistics from the MFND to the host computing device from the save log.
- the MFND also generates a notification, such as an asynchronous event, to the host computing device when the QoS statistics swap bucket period elapses.
- the host computing device may request the child PF QoS statistics from the MFND.
- FIG. 1 is a computing architecture diagram that shows aspects of the configuration and operation of a MFND that can implement the embodiments disclosed herein, according to one embodiment
- FIG. 2 A is a computing architecture diagram showing aspects of one mechanism disclosed herein for creating child PFs on a MFND, according to one embodiment
- FIG. 2 B is a computing architecture diagram showing aspects of one mechanism disclosed herein for setting the QoS for a child PF on a MFND, according to one embodiment
- FIG. 2 C is a computing architecture diagram showing aspects of one mechanism disclosed herein for enabling the collection of child PF QoS statistics by a MFND, according to one embodiment
- FIG. 2 D is a computing architecture diagram showing aspects of one mechanism disclosed herein for retrieving child PF QoS statistics from a MFND, according to one embodiment
- FIG. 3 is a computing architecture diagram showing aspects of one mechanism disclosed herein for swapping a child PF QoS statistics active log and a child PF QoS statistics save log on a MFND, according to one embodiment
- FIG. 4 is a data structure diagram showing an illustrative configuration for a child PF QoS statistics log maintained by a MFND, according to one embodiment
- FIG. 5 is a flow diagram showing a routine that illustrates aspects of a method for configuring child PFs on a MFND, according to one embodiment disclosed herein;
- FIG. 6 is a flow diagram showing a routine that illustrates aspects of a method for collecting QoS statistics for in-use child PFs of a MFND, according to one embodiment disclosed herein;
- FIG. 7 is a computer architecture diagram showing an illustrative computer hardware and software architecture for a computing device that can act as a host for a MFND that implements aspects of the technologies presented herein;
- FIG. 8 is a computing network architecture diagram showing an illustrative configuration for a computing environment in which computing devices hosting MFNDs implementing the disclosed technologies can be utilized.
- MFNDs implementing the disclosed technologies can collect child PF QoS statistics for in-use child PFs that describe the utilization of resources provided by the child PFs to VMs.
- the child PF QoS statistics can be collected and stored in a manner that reduces the performance impact of collecting the child PF QoS statistics on the MFND and minimizes the use of volatile and non-volatile memory.
- child PF QoS statistics can be utilized to inform decisions regarding reallocation of MFND-provided resources and provisioning of new MFND-provided resources, thereby making more efficient utilization of MFND hardware.
- Other technical benefits not specifically mentioned herein can also be realized through implementations of the disclosed subject matter.
- FIG. 1 is a computing architecture diagram that shows aspects of the configuration and operation of a MFND 102 that can implement the embodiments disclosed herein, according to one embodiment.
- the MFND 102 is an NVMe Specification-compliant device in some embodiments.
- the MFND 102 can be hosted by a host computing device 100 (which might be referred to herein simply as a “host”), such as a server computer operating in a distributed computing network such as that described below with reference to FIG. 8 .
- a host computing device 100 which might be referred to herein simply as a “host”
- server computer operating in a distributed computing network such as that described below with reference to FIG. 8 .
- NVMe is an open logical device interface specification for accessing non-volatile storage media.
- an NVMe device is accessible via a PCIe bus.
- an NVMe device is accessible via a network or other packet-based interface.
- the NVMe Specification defines a register interface, command set, and collection of features for PCIe-based solid-state storage devices (“SSDs”) with the goals of high performance and interoperability across a broad range of non-volatile memory subsystems.
- SSDs solid-state storage devices
- the NVMe Specification does not stipulate the ultimate usage model, such as solid-state storage, main memory, cache memory or backup memory.
- NVMe provides an alternative to the Small Computer System Interface (“SCSI”) standard and the Advanced Technology Attachment (“ATA”) standard for connecting and transmitting data between a host computing device 100 and a peripheral target storage device.
- SATA Serial ATA
- SAS Serial Attached SCSI
- HDDs hard disk drives
- NVMe was designed for use with faster media.
- NVMe-based PCIe SSDs over SAS-based and SATA-based SSDs are reduced latency in the host software stack, higher IOPS and potentially lower power consumption, depending on the form factor and the number of PCIe lanes in use.
- NVMe can support SSDs that use different types of non-volatile memory, including NAND flash and the 3 D XPOINT technology developed by INTEL and MICRON TECHNOLOGY. Supported form factors include add-in PCIe cards, M.2 SSDs and U.2 2.5-inch SSDs. NVMe reference drivers are available for a variety of operating systems, including the WINDOWS and LINUX operating systems. Accordingly, it is to be appreciated that the MFND 102 described herein is not limited to a particular type of non-volatile memory, form factor, or operating system.
- the MFND 102 described herein includes capabilities for exposing multiple PFs 112 A- 112 N to the host computing device 100 .
- Each of the PFs 112 A- 112 N is an independent NVMe controller in one embodiment.
- the PFs 112 A- 112 N are other types of controllers in other embodiments.
- At least a plurality of PF 112 A- 112 N are independent NVMe controllers and at least one distinct PF 112 A- 112 N is a non-NVME controller in other embodiments.
- One PF 112 A in the MFND 102 acts as a parent controller.
- the parent PF 112 A is the privileged PCIe function zero of the MFND 102 .
- the parent controller might be configured as another PCIe function number in other embodiments.
- the parent PF 112 A and child PFs described below might also be device types other than NVMe devices in some embodiments.
- the parent PF 112 A can act as a parent controller to receive and execute administrative commands 110 generated by a root partition 108 .
- the parent PF 112 A can manage child PFs 112 B- 112 N such as, for example, by creating, deleting, modifying, and querying the child PFs 112 B- 112 N.
- the child PFs 112 B- 112 N might be referred to herein interchangeably as “child PFs 112 B- 112 N” or “child controllers 112 B- 112 N.”
- the child PFs 112 B- 112 N are regular PCIe physical functions of the MFND 102 .
- the child PFs 112 B- 112 N can behave like regular and independent NVMe controllers.
- the child controllers 112 B- 112 N can also support the administrative and I/O commands defined by the NVMe Specification.
- I/O resources provided by the MFND 102 can be efficiently shared between VMs 104 A- 104 N.
- child PFs 112 B- 112 N can be directly assigned to different VMs 104 A- 104 N, respectively, through various direct hardware access technologies such as HYPER-V NVMe Direct or DDA.
- the child PFs 112 B- 112 N exposed by a single MFND 102 can appear as multiple, separate physical devices to individual VMs 104 A- 104 N, respectively. This allows individual VMs 104 A- 104 N to directly utilize a respective portion of the available storage space provided by a non-volatile memory device 103 on the MFND 102 with reduced CPU and hypervisor 106 overhead.
- the host computing device 100 operates in a distributed computing network, such as that described below with regard to FIG. 8 . Additionally, the host computing device 100 executes a host agent 116 and a management application programming interface (“API”) 118 in order to enable access to aspects of the functionality disclosed herein in some embodiments.
- API application programming interface
- the host agent 116 can receive commands from other components, such as other components in a distributed computing network such as that described below with regard to FIG. 8 , and make calls to the management API 118 to implement the commands.
- the management API 118 can issue administrative commands to the parent PF 112 A to perform the various functions described herein. Details regarding various methods exposed by the management API 118 to the host agent 116 for implementing the functionality disclosed herein are described below.
- the MFND 102 has two modes of operation: regular user mode and super administrator mode.
- regular user mode only read-only functions can be executed.
- the non-read-only management functions described herein e.g. set the QoS level for a PF, etc.
- super administrator mode If an attempt is made to execute these functions in regular user mode, an error (which might be referred to herein as an “ERROR_ACCESS_DENIED” error) will be returned.
- the API 118 exposes methods for getting the device operation mode (which might be referred to herein as the “GetDeviceOperationMode” method) and switching the device operation mode (which might be referred to herein as the “SwitchDeviceOperationMode” method) in some embodiments.
- MFNDs 102 have limitations that restrict aspects of their functionality when used with VMs 104 in the manner described above. As one specific example, it might not be possible to obtain detailed information regarding a VM's 104 usage of the resources allocated to it by a MFND 102 . Consequently, system administrators might not know when a VM 104 is over or under-utilizing the resources provided by a MFND 102 and, as a result, might not be able to make informed decisions regarding reallocating those MFND-provided resources or provisioning new MFND-provided resources.
- the technologies presented herein address these and potentially other technical considerations by enabling collection of QoS statistics for in-use child PFs 112 B- 112 N of a MFND 102 . Additional details regarding these aspects will be provided below.
- FIG. 2 A is a computing architecture diagram showing aspects of one mechanism disclosed herein for creating child PFs 112 on a MFND 102 , according to one embodiment.
- the host agent 116 can create a new child PF 112 B on the MFND 102 by calling an appropriate method exposed by the management API 118 .
- the management API 118 issues a command 110 A to the parent PF 112 A to create the desired child PF 112 B.
- the MFND 102 creates the child PF 112 B.
- a VM 104 A may be assigned to the child PF 112 B. Additional details regarding the creation of child PFs 112 B on a MFND 102 and assignment of a VM 104 A to a child PF 112 B will be provided below with regard to FIG. 5 .
- FIG. 2 B is a computing architecture diagram showing aspects of one mechanism disclosed herein for setting the QoS level for a child PF 112 B on a MFND 102 , according to one embodiment.
- the MFND 102 can provide functionality for managing the QoS level provided by the child PFs 112 B.
- implementations of the disclosed technologies can also enable a host agent 116 to query and modify the QoS level provided by a child PF 112 B of a MFND 102 .
- the MFND 102 supports multiple storage service level agreements (“SLAs”). Each SLA defines a different QoS level to be provided by a PF 112 A- 112 N.
- QoS levels that can be supported by child PFs 112 on the MFND 102 include, but are not limited to, a “reserve mode” wherein a child PF 112 is allocated at least a specified minimum amount of bandwidth and IOPS, a “limit mode” wherein a child PF 112 is allocated at most a specified maximum amount of bandwidth and IOPS, and a “mixed mode” wherein a child PF 112 is allocated at least a specified minimum amount of bandwidth and IOPS but at most a specified maximum amount of bandwidth and IOPS.
- Other QoS levels can be implemented in other embodiments.
- the embodiments disclosed herein allow the parent PF 112 A to individually define the QoS level for each child PF 112 B- 112 N in a single MFND 102 .
- the parent PF 112 A might define the minimum and/or maximum bandwidth and/or IOPS to be supported by each child PF 112 B- 112 N.
- the host agent 116 can call a method exposed by the management API 118 .
- the management API 118 issues a command 110 B to the parent physical function 112 A that includes QoS settings 202 for a child PF 112 B.
- the child PF 112 B then utilizes the QoS settings 202 when processing requests from an assigned VM 104 A.
- One illustrative method for modifying the settings of child PFs 112 B- 112 N (which might be referred to herein as the “UpdateChildPhysicalFunctionSettings” method) takes an identifier (e.g. a handle) to a MFND 102 , an identifier (e.g. a serial number) of a child PF 112 , and a pointer to a data structure containing the QoS settings 202 for the child PF 112 as input.
- the data structure can include data specifying the resources (e.g. the amount of storage space, namespaces, and interrupt vectors that the identified PF 112 is to use) and the QoS level that are to be assigned to the identified child PF 112 .
- the UpdateChildPhysicalFunctionSettings method returns a success message if the supplied settings were successfully applied to the identified child PF 112 and otherwise returns an error code.
- An illustrative method for querying the settings of child PFs 112 B- 112 N (which might be referred to herein as the “QueryChildPhysicalFunctionSettings” method) takes an identifier (e.g. a handle) for a MFND 102 and an identifier (e.g. a serial number) of a child PF 112 as input.
- the QueryChildPhysicalFunctionSettings method returns a pointer to a data structure containing the current settings of the identified child PF 112 .
- a data structure can include data specifying the resources (e.g. the amount of storage space, namespaces, and interrupt vectors that the PF 112 can use) and QoS settings 202 that are currently assigned to the identified child PF 112 .
- FIG. 2 C is a computing architecture diagram showing aspects of one mechanism disclosed herein for enabling the collection of child PF QoS statistics 210 by a MFND 102 , according to one embodiment.
- the host agent 116 can configure the MFND 102 to collect child PF QoS statistics 210 by calling an appropriate method on the management API 118 .
- the management API 118 issues a command 110 C to the parent physical function 112 A instructing the MFND 102 to enable the collection of the child PF QoS statistics 210 .
- the MFND 102 stores the child PF QoS statistics 210 in a child PF statistics log 208 in one embodiment. Details regarding the configuration and use of the child PF statistics log 208 will be provided below with respect to FIGS. 3 and 4 .
- a single command 110 C can be utilized to enable collection of child PF QoS statistics 210 for all in-use child PFs 112 B- 112 N.
- per child PF 112 commands 110 C can be issued to enable collection of child QoS statistics 210 by individual child PFs 112 B- 112 N.
- the command 110 C specifies a QoS statistics monitor period 204 and a QoS statistics swap bucket period 206 in some embodiments.
- the QoS statistics monitor period 204 specifies the duration of a monitoring period during which the MFND 102 is to collect the child PF QoS statistics 210 .
- the QoS statistics monitor period 204 is specified in seconds with a minimum value of 60 seconds and increments of 30 seconds.
- the QoS statistics monitor period 204 might be specified in other ways in other embodiments.
- the QoS statistics swap bucket period 206 defines a period of time after which the MFND 102 is to swap an “active log” with a “save log.” In these embodiments, the MFND 102 is further configured to store the child physical function QoS statistics 210 in the active log during the duration of the QoS statistics monitor period 204 . In one embodiment, the QoS statistics swap bucket period 206 is specified in minutes, with a minimum value of 30 minutes and a maximum value of 1440 minutes. The QoS statistics swap bucket period 206 might be specified in other ways in other embodiments. Additional details regard the contents and use of the active and save logs will be provided below with regard to FIGS. 3 and 4 .
- FIG. 2 D is a computing architecture diagram showing aspects of one mechanism disclosed herein for retrieving child PF QoS statistics 210 from a MFND 102 , according to one embodiment.
- FIG. 2 D will be described in conjunction with FIG. 3 , which is a computing architecture diagram showing aspects of one mechanism disclosed herein for swapping a child PF QoS statistics active log 302 , which might be referred to simply as the “active log 302 ,” and a child PF QoS statistics save log 304 , which might be referred to simply as the “save log 304 ,” on a MFND 102 , according to one embodiment.
- the child PF QoS statistics log 208 is implemented using two separate logs, the child PF QoS statistics active log 302 and the child PF QoS statistics save log 304 , in some embodiments.
- the MFND 102 swaps the active log 302 with the save log 304 and clears the active log 302 . This can be performed as an atomic operation in order to avoid corruption of the logs 302 and 304 .
- the MFND 102 provides the child PF QoS statistics 210 from the save log 304 in response to requests 308 received from the host computing device 100 .
- the MFND 102 also provides functionality for enabling the host computing device 100 to retrieve the contents of the active log 302 in some embodiments.
- the MFND 102 also generates a notification, such as an asynchronous event 306 , to the host computing device 100 when the QoS statistics swap bucket period elapses 206 .
- the host computing device 100 may issue a command 110 D to the MFND 102 to retrieve the child PF QoS statistics 210 from the MFND 100 .
- the MFND 102 retrieves the child PF QoS statistics 210 from the save log 304 and returns the child PF QoS statistics 210 to the host 100 in response to the command 110 D.
- the host agent 116 might provide the child PF QoS statistics 210 to a remote management system 212 or another component.
- the specified QoS level defines maximum read IOPS and maximum write IOPS for a child PF 112 B.
- the child PF QoS statistics 210 for the child PF 112 B specify the maximum read IOPS and the maximum write IOPS provided by the child PF 112 B to the VM 104 A assigned to the child PF 112 B during the QoS statistics monitor period 204 .
- the maximum read IOPS, and the maximum write IOPS are specified as a percentage of the maximum read IOPS and the maximum write IOPS specified by the QoS level for the child PF 112 B in some embodiments.
- the maximum read IOPS and the maximum write IOPS can be expressed using only a single byte, thereby saving space on the non-volatile memory device 103 .
- the specified QoS level might also, or alternately, define a maximum read bandwidth and a maximum write bandwidth for the child PF 112 B.
- the child PF QoS statistics 210 for the child PF 112 B specify the maximum read bandwidth and the maximum write bandwidth provided by the child PF 112 B to the VM 104 A assigned to the child PF 112 B during the specified QoS statistics monitor period 204 .
- the maximum read bandwidth and the maximum write bandwidth are specified as a percentage of the maximum read bandwidth and the maximum write bandwidth specified by the QoS level for the child PF 112 B.
- the maximum read bandwidth and a maximum write bandwidth can be expressed using only a single byte, thereby saving space on the non-volatile memory device 103 .
- the child PF QoS statistics 210 for the child PF 112 B specify a percentage of read operations and write operations performed by the child PF 112 B during the specified QoS statistics monitor period 204 .
- the child PF QoS statistics 210 for the child PF 112 B might also, or alternately, specify a size of I/O workloads performed by the child PF 112 B on behalf of an assigned VM 104 A during the QoS statistics monitor period 204 .
- the child PF QoS statistics 210 might also, or alternately, specify an amount of the storage capacity of a non-volatile memory device 103 on the MFND 102 that is in use by a VM 104 A.
- the amount of the storage capacity of a non-volatile memory device 103 on the MFND 102 that is in use by a VM 104 A may be obtained from the MFND 102 by issuing an identified child controller command to a child PF 112 B to retrieve the Namespace Utilization field (“NUSE”) defined by the NVMe Specification.
- NUSE Namespace Utilization field
- Other types of child PF QoS statistics 210 such as but not limited to read/write I/O command latency and bytes written to media, can be collected in the manner described herein in other embodiments.
- FIG. 4 is a data structure diagram showing an illustrative configuration for the child PF QoS statistics log 208 maintained by a MFND 102 , according to one embodiment.
- the child PF QoS statistics log 208 includes the fields 402 A- 402 P in the illustrated embodiment.
- the illustrated configuration is merely illustrative, and that other types and configurations of data might be utilized.
- a single child PF QoS statistics log 208 might store the child PF QoS statistics 210 for all of the in-use child PFS 112 B- 112 N on a MFND 102 or separate child PF QoS statistics logs 208 might be maintained for each of the in-use child PFS 112 B- 112 N.
- the field 402 A stores data indicating a version number identified with the format of the child PF QoS statistics log 208 .
- the version number might be modified following changes to the format of the child PF QoS statistics log 208 .
- the field 402 B stores a sequence number that is incremented whenever an active log 302 is generated (i.e., after each QoS statistics swap bucket period 206 elapses). When the value reaches 255 and a new active log 302 is generated, the value is reset to zero.
- the field 402 C stores data identifying the number of log entries in the child PF QoS statistics log 208 . As described in greater detail below, the log entries are stored in the fields 402 G- 402 J.
- the field 402 D stores data identifying the child PF QoS statistics monitor period 204 and the field 402 E stores data identifying the child PF QoS statistics swap bucket period 206 described above.
- the field 402 F stores a timestamp associated with the first log entry in the child PF QoS statistics log 208 .
- the timestamp uses the data format for a timestamp as defined by the NVMe Specification. If the host computing device 100 does not set the timestamp, this field contains the time since the MFND 102 last powered up.
- the fields 402 G- 402 J contain log entries containing the child PF QoS statistics 210 .
- each log entry includes fields 402 M- 402 P specifying the maximum read IOPS percentage, the maximum write IOPS percentage, the maximum read bandwidth percentage, and the maximum write bandwidth percentage during the monitoring period, respectively.
- the log entries can include other types of child PF QoS statistics 210 , some of which were described above, in other embodiments.
- the field 402 K contains a version number for the log entries and the field 402 L stores a globally unique identifier (“GUID”) associated with the log entries.
- GUID globally unique identifier
- FIG. 5 is a flow diagram showing a routine 500 that illustrates aspects of a method for configuring child PFs 112 on a MFND 102 , according to one embodiment disclosed herein. It should be appreciated that the logical operations described herein with regard to FIG. 5 , and the other FIGS., can be implemented (1) as a sequence of computer implemented acts or program modules running on a computing device and/or (2) as interconnected machine logic circuits or circuit modules within a computing device.
- the routine 500 begins at operation 502 , where the host agent 116 can enumerate some or all of the MFND devices 102 that are present in a host 100 .
- One particular method (which might be referred to herein as the “GetMFNDList” method) for enumerating the MFND devices 102 connected to a host 100 returns device paths of all MFND devices 102 connected to a host 100 . If no MFND devices 102 are connected, or none are enumerated, the GetMFNDList method returns an error code.
- the routine 500 proceeds to operation 504 , where the host agent 116 can enumerate the child PFs 112 B- 112 N that are currently present on a MFND device 102 identified at operation 502 .
- One method (which might be referred to herein as the “GetChildPhysicalFunctionList” method) for enumerating the child PFS 112 A- 112 N on a MFND device 102 takes an identifier (e.g. a handle) for a particular MFND device 102 as input and returns adapter serial numbers of all child PFs 112 B- 112 N on the identified device.
- the routine 500 proceeds to operation 506 , where the host agent 116 can determine the capabilities of the MFND device 102 identified at operation 502 . For example, the host agent 116 can determine the maximum number of child PFs 112 B- 112 N supported by the MFND device 102 .
- One method for getting the capabilities of a MFND device 102 takes an identifier (e.g. a handle) for a particular MFND device 102 as input and returns a device capability structure that specifies the capabilities of the identified device.
- the device capability structure includes data identifying the maximum and available child PFs 112 B- 112 N, I/O queue pair count, interrupt count, namespace count, storage size, bandwidth, and IOPS of the identified device.
- the device capability structure might include additional or alternate data in other embodiments.
- routine 500 can proceed from operation 506 to operation 508 , where child PFs 112 B- 112 N can be created or deleted on the MFND device 102 .
- the MFND 102 has only one PF 112 , the parent PF 112 A, which is reserved for receiving administrative commands 110 from the root partition 108 .
- the child PFs 112 B- 112 N are first created.
- the newly created child PFs 112 B- 112 N will appear to the host 100 following a reboot.
- One method for creating child PFs 112 B- 112 N takes an identifier (e.g. a handle) to a MFND 102 and a pointer to a data structure containing the settings for the new child PF 112 as input.
- the data structure can include data specifying the resources (e.g.
- the CreateChildPhysicalFunction method returns an identifier (e.g. a serial number) for the new child PF 112 as output if it completes successfully.
- Child PFs 112 B- 112 N and their settings will persist across reboots of the host 100 , so the maximum number of child PFs 112 B- 112 N to be supported may be initially created to avoid rebooting the host 100 in the future. If a MFND 102 already has child PFs 112 B- 112 N, either as a result of a manufacturing configuration or previous user configuration, additional child PFs 112 B- 112 N can be created or deleted in order to configure the MFND 102 with the desired number of child PFs 112 B- 112 N to be supported.
- One method for deleting child PFs 112 B- 112 N (which might be referred to herein as the “DeleteChildPhysicalFunction” method) takes an identifier for a MFND 102 (e.g. a handle) and the serial number for the child PF 112 to be deleted as input.
- the DeleteChildPhysicalFunction returns a success message if the identified child PF 112 was successfully deleted and otherwise returns an error code.
- routine 500 proceeds from operation 510 to operation 512 , where the QoS level for the newly created child PFs 112 B- 112 N are set in the manner described above with regard to FIG. 2 B .
- routine 500 proceeds from operation 512 to operation 514 , where the MFND 102 enables the collection of child PF QoS statistics 210 for in-use child PFs 112 B- 112 N of the MFND 102 in the manner described above with regard to FIGS. 2 C and 3 and in further detail below with regard to FIG. 6 .
- the routine 500 proceeds from operation 514 to operation 516 , where the child PFs 112 B- 112 N provided by a MFND 102 can be assigned to VMs 104 A- 104 N.
- newly created child PFs 112 B- 112 N have zero storage size, minimal flexible resources, and no defined QoS level.
- newly created child PFs 112 B- 112 N may have a default QoS level, a default amount of storage, and/or default configurations for other resources.
- the host 100 might need to provision the resources (NVM space, I/O queue pair count, QoS level, etc.) to a child PF 112 B- 112 N before it can be assigned to a VM 104 using DDA, HYPER-V NVMe Direct, or another direct storage assignment technology.
- resources NVM space, I/O queue pair count, QoS level, etc.
- the child PFs 112 B- 112 N can also be securely erased before assignment to a VM 104 . There is no host reboot involved in this workflow.
- One method for securely erasing child PFs 112 B- 112 N (which might be referred to herein as the “SecureEraseChildPhysicalFunction” method) takes an identifier for a MFND 102 (e.g. a handle) and the serial number for the child PF 112 to be erased as input.
- the SecureEraseChildPhysicalFunction returns a success message if the identified child PF 112 was successfully erased and otherwise returns an error code.
- the routine 500 then proceeds from operation 516 to operation 518 , where it ends.
- FIG. 6 is a flow diagram showing a routine 600 that illustrates aspects of a method for collecting child PF QoS statistics 210 for in-use child PFs 112 of a MFND 102 , according to one embodiment disclosed herein.
- the routine 600 begins at operation 602 , where the MFND 102 determines whether collection of child PF QoS statistics 210 has been enabled in the manner described above. If the collection of child PF QoS statistics 210 has been enabled, the routine 600 proceeds from operation 602 to operation 604 .
- the MFND 102 collects the child PF QoS statistics 210 in the manner described above.
- the routine 600 then proceeds from operation 604 to operation 606 , where the MFND 102 determines whether the QoS statistics monitor period 204 has elapsed. If the QoS statistics monitor period 204 has not elapsed, the routine 600 proceeds from operation 606 back to operation 604 , where the MFND 102 can continue to collect the child PF QoS statistics 210 in the manner described above. If the QoS statistics monitor period 204 has elapsed, the routine 600 proceeds from operation 606 to operation 608 .
- the MFND 102 stores the child PF QoS statistics 210 in a log entry in the child PF QoS statistics active log 302 in the manner described above.
- the routine 600 then proceeds from operation 608 to operation 610 , where the MFND 102 determines whether the QoS statistics swap bucket period 206 has elapsed. If the QoS statistics swap bucket period 206 has not elapsed, the routine 600 proceeds back to operation 604 , where the MFND 102 continues to collect child PF QoS statistics 210 and store the child PF QoS statistics 210 in entries in the active log 302 in the manner described above.
- the routine 600 proceeds from operation 610 to operation 612 , where the MFND 102 atomically swaps the active log 302 and the save log 304 and clears the active log 302 in the manner described above.
- the routine 600 then proceeds from operation 612 to operation 614 , where the MFND 102 generates a notification, such as an asynchronous event 306 , to the host computing device 100 to inform the host computing device 100 that child PF QoS statistics 210 are available from the MFND 102 .
- the host computing device 100 might subsequently transmit a command 110 D requesting the child PF QoS statistics 210 .
- the MFND 102 responds to the request with child PF QoS statistics 210 retrieved from the save log 304 .
- the routine 600 then proceeds from operation 612 to operation 614 , where it ends.
- FIG. 7 is a computer architecture diagram showing an illustrative computer hardware and software architecture for a data processing system 700 that can act as a host 100 for a MFND 102 that implements aspects of the technologies presented herein.
- the architecture illustrated in FIG. 7 can be utilized to implement a server computer, mobile phone, an e-reader, a smartphone, a desktop computer, an AR/VR device, a tablet computer, a laptop computer, or another type of computing device that acts as a host 100 for the MFND 102 .
- the data processing system 700 illustrated in FIG. 7 includes a central processing unit 702 (“CPU”), a system memory 704 , including a random-access memory 706 (“RAM”) and a read-only memory (“ROM”) 708 , and a system bus 710 that couples the memory 704 to the CPU 702 .
- the data processing system 700 further includes a mass storage device 712 for storing an operating system 722 , application programs, and other types of programs.
- the mass storage device 712 might store the host agent 116 and the management API 118 .
- the mass storage device 712 can also be configured to store other types of programs and data.
- the mass storage device 712 is connected to the CPU 702 through a mass storage controller (not shown) connected to the bus 710 .
- the mass storage device 712 and its associated computer readable media provide non-volatile storage for the data processing system 700 .
- computer readable media can be any available computer storage media or communication media that can be accessed by the data processing system 700 .
- Communication media includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media.
- modulated data signal means a signal that has one or more of its characteristics changed or set in a manner so as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
- computer storage media can include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid-state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and which can be accessed by the data processing system 700 .
- DVD digital versatile disks
- HD-DVD high definition digital versatile disks
- BLU-RAY blue ray
- magnetic cassettes magnetic tape
- magnetic disk storage magnetic disk storage devices
- the data processing system 700 can operate in a networked environment using logical connections to remote computers through a network such as the network 720 .
- the data processing system 700 can connect to the network 720 through a network interface unit 716 connected to the bus 710 . It should be appreciated that the network interface unit 716 can also be utilized to connect to other types of networks and remote computer systems.
- the data processing system 700 can also include an input/output controller 718 for receiving and processing input from a number of other devices, including a keyboard, mouse, touch input, an electronic stylus (not shown in FIG. 7 ), or a physical sensor such as a video camera. Similarly, the input/output controller 718 can provide output to a display screen or other type of output device (also not shown in FIG. 7 ).
- the software components described herein when loaded into the CPU 702 and executed, can transform the CPU 702 and the overall data processing system 700 from a general-purpose computing device into a special-purpose computing device customized to facilitate the functionality presented herein.
- the CPU 702 can be constructed from any number of transistors or other discrete circuit elements, which can individually or collectively assume any number of states. More specifically, the CPU 702 can operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions can transform the CPU 702 by specifying how the CPU 702 transitions between states, thereby transforming the transistors or other discrete hardware elements constituting the CPU 702 .
- Encoding the software modules presented herein can also transform the physical structure of the computer readable media presented herein.
- the specific transformation of physical structure depends on various factors, in different implementations of this description. Examples of such factors include, but are not limited to, the technology used to implement the computer readable media, whether the computer readable media is characterized as primary or secondary storage, and the like.
- the computer readable media is implemented as semiconductor-based memory
- the software disclosed herein can be encoded on the computer readable media by transforming the physical state of the semiconductor memory.
- the software can transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory.
- the software can also transform the physical state of such components in order to store data thereupon.
- the computer readable media disclosed herein can be implemented using magnetic or optical technology.
- the software presented herein can transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations can include altering the magnetic characteristics of particular locations within given magnetic media. These transformations can also include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
- the architecture shown in FIG. 7 for the data processing system 700 can be utilized to implement other types of computing devices, including hand-held computers, video game devices, embedded computer systems, mobile devices such as smartphones, tablets, and AR/VR devices, and other types of computing devices known to those skilled in the art. It is also contemplated that the data processing system 700 might not include all of the components shown in FIG. 7 , can include other components that are not explicitly shown in FIG. 7 , or can utilize an architecture completely different than that shown in FIG. 7 .
- FIG. 8 is a computing network architecture diagram showing an illustrative configuration for a distributed computing environment 800 in which computing devices hosting MFNDs 102 implementing the disclosed technologies can be utilized.
- the distributed computing environment 800 includes a computing environment 802 operating on, in communication with a network 856 .
- client devices 806 A- 806 N (hereinafter referred to collectively and/or generically as “clients 806 ”) can communicate with the computing environment 802 via the network 804 and/or other connections (not illustrated in FIG. 8 ).
- the clients 806 include a computing device 806 A such as a laptop computer, a desktop computer, or other computing device; a tablet computing device (“tablet computing device”) 806 B; a mobile computing device 806 C such as a smartphone, an on-board computer, or other mobile computing device; or a server computer 806 D.
- a computing device 806 A such as a laptop computer, a desktop computer, or other computing device
- tablet computing device (“tablet computing device”) 806 B such as a smartphone, an on-board computer, or other mobile computing device
- server computer 806 D a server computer 806 D.
- any number of devices 806 can communicate with the computing environment 802 .
- An example computing architecture for the devices 806 is illustrated and described above with reference to FIG. 7 . It should be understood that the illustrated devices 806 and computing architectures illustrated and described herein are illustrative only and should not be construed as being limited in any way.
- the computing environment 802 includes application servers 808 , data storage 810 , and one or more network interfaces 812 .
- the functionality of the application servers 808 can be provided by one or more server computers that are executing as part of, or in communication with, the network 804 .
- the application servers 808 can host various services, VMs, portals, and/or other resources.
- the application servers 808 can also be implemented using host computing devices 100 that includes MFNDs 102 configured in the manner described herein.
- the application servers 808 host one or more virtual machines 104 for hosting applications, network services, or for providing other functionality. It should be understood that this configuration is illustrative only and should not be construed as being limiting in any way.
- the application servers 808 can also host or provide access to one or more portals, link pages, web sites, network services, and/or other information sites, such as web portals 816 .
- the application servers 808 also include one or more mailbox services 818 and one or more messaging services 820 .
- the mailbox services 818 can include electronic mail (“email”) services.
- the mailbox services 818 also can include various personal information management (“PIM”) services including, but not limited to, calendar services, contact management services, collaboration services, and/or other services.
- PIM personal information management
- the messaging services 820 can include, but are not limited to, instant messaging services, chat services, forum services, and/or other communication services.
- the application servers 808 also might include one or more social networking services 822 .
- the social networking services 822 can include various social networking services including, but not limited to, services for sharing or posting status updates, instant messages, links, photos, videos, and/or other information; services for commenting or displaying interest in articles, products, blogs, or other resources; and/or other services. Other services are possible and are contemplated.
- the social networking services 822 also can include commenting, blogging, and/or micro blogging services. Other services are possible and are contemplated.
- the application servers 808 also can host other network services, applications, portals, and/or other resources (“other resources”) 824 .
- the other resources 824 can include, but are not limited to, document sharing, rendering, or any other functionality.
- the computing environment 802 can include data storage 810 .
- the functionality of the data storage 810 is provided by one or more databases operating on, or in communication with, the network 804 .
- the functionality of the data storage 810 also can be provided by one or more server computers configured to host data for the computing environment 802 .
- the data storage 810 can include, host, or provide one or more real or virtual data stores 826 A- 826 N (hereinafter referred to collectively and/or generically as “datastores 826 ”).
- the datastores 826 are configured to host data used or created by the application servers 808 and/or other data. Although not illustrated in FIG. 8 , the datastores 826 also can host or store web page documents, word processing documents, presentation documents, data structures, and/or other data utilized by any application program or another module. Aspects of the datastores 826 might be associated with a service for storing files.
- the computing environment 802 can communicate with, or be accessed by, the network interfaces 812 .
- the network interfaces 812 can include various types of network hardware and software for supporting communications between two or more computing devices including, but not limited to, the clients 806 and the application servers 808 . It should be appreciated that the network interfaces 812 also might be utilized to connect to other types of networks and/or computer systems.
- distributed computing environment 800 described herein can implement aspects of at least some of the software elements described herein with any number of virtual computing resources and/or other distributed computing functionality that can be configured to execute any aspects of the software components disclosed herein.
- a computer-implemented method comprising: creating a child physical function on a multiple physical function non-volatile memory device (MFND); configuring the child physical function on the MFND to provide a specified Quality of Service (QoS) level; collecting child physical function QoS statistics for the child physical function; and providing the child physical function QoS statistics from the MFND to a host computing device.
- QoS Quality of Service
- Clause 2 The computer-implemented method of clause 1, wherein the specified QoS level defines maximum read input/output operations per second (IOPS) and maximum write IOPS for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- IOPS maximum read input/output operations per second
- the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- Clause 3 The computer-implemented method of any of clauses 1 or 2, wherein the specified QoS level defines a maximum read bandwidth and a maximum write bandwidth for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child physical function during a monitoring period.
- Clause 4 The computer-implemented method of any of clauses 1-3, wherein the child physical function QoS statistics for the child physical function specify a percentage of read operations and write operations performed by the child physical function during a monitoring period.
- Clause 5 The computer-implemented method of any of clauses 1-4, wherein the child physical function QoS statistics for the child physical function specify a size of input/output (I/O) workloads performed by the child physical function during a monitoring period.
- I/O input/output
- Clause 6 The computer-implemented method of any of clauses 1-5, wherein the MFND comprises a non-volatile memory device, and wherein the QoS statistics for the child physical function specify an amount of the non-volatile memory device utilized.
- Clause 7 The computer-implemented method of any of clauses 1-6, wherein the host computing device specifies a QoS statistics monitor period and a QoS statistics swap bucket period to the MFND, and wherein the method further comprises: storing the child physical function QoS statistics in an active log during the QoS statistics monitor period; and swapping the active log with a save log when the QoS statistics swap bucket period elapses, wherein the child physical function QoS statistics are provided from the MFND to the host computing device from the save log.
- Clause 8 The computer-implemented method of any of clauses 1-7, further comprising generating an asynchronous event from the MFND to the host computing device when the QoS statistics swap bucket period elapses.
- a multiple physical function non-volatile memory device comprising: a non-volatile memory device; a parent physical function; and a child physical function configured to provide a Quality of Service (QoS) level specified by a host computing device configured to perform read or write operations on the non-volatile memory device, wherein the MFND is configured to collect child physical function QoS statistics for the child physical function, and provide the child physical function QoS statistics to the host computing device.
- QoS Quality of Service
- Clause 10 The MFND of clause 9, wherein the specified QoS level defines maximum read input/output operations per second (IOPS) and maximum write IOPS for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- IOPS maximum read input/output operations per second
- the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- Clause 11 The MFND of any of clauses 9 or 10, wherein the specified QoS level defines a maximum read bandwidth and a maximum write bandwidth for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child physical function during a monitoring period.
- Clause 14 The MFND of any of clauses 9-13, wherein the QoS statistics for the child physical function specify an amount of the non-volatile memory device utilized.
- Clause 15 The MFND of any of clauses 9-14, wherein the host computing device specifies a QoS statistics monitor period and a QoS statistics swap bucket period to the MFND, and wherein the MFND is further configured to: store the child physical function QoS statistics in an active log during the QoS statistics monitor period; swap the active log with a save log when the QoS statistics swap bucket period elapses; and generate an asynchronous event from the MFND to the host computing device when the QoS statistics swap bucket period elapses, wherein the child physical function QoS statistics are provided from the MFND to the host computing device from the save log.
- a non-transitory computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors, cause the one or more processors to: create a child physical function on a multiple physical function non-volatile memory device (MFND); configure the child physical function on the MFND to provide a specified Quality of Service (QoS) level; collect child physical function QoS statistics for the child physical function; and provide the child physical function QoS statistics from the MFND to a host computing device.
- QoS Quality of Service
- Clause 17 The non-transitory computer-readable storage medium of clause 16, wherein the specified QoS level defines maximum read input/output operations per second (IOPS) and maximum write IOPS for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- IOPS maximum read input/output operations per second
- the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- Clause 18 The non-transitory computer-readable storage medium of any of clauses 16 or 17, wherein the specified QoS level defines a maximum read bandwidth and a maximum write bandwidth for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child physical function during a monitoring period.
- Clause 19 The non-transitory computer-readable storage medium of any of clauses 16-19, wherein the child physical function QoS statistics comprise statistics selected from the group consisting of a percentage of read operations and write operations performed by the child physical function during a monitoring period, a size of input/output (I/O) workloads performed by the child physical function during a monitoring period, and an amount of the non-volatile memory device utilized.
- the child physical function QoS statistics comprise statistics selected from the group consisting of a percentage of read operations and write operations performed by the child physical function during a monitoring period, a size of input/output (I/O) workloads performed by the child physical function during a monitoring period, and an amount of the non-volatile memory device utilized.
- Clause 20 The non-transitory computer-readable storage medium of any of clauses 16-20, wherein the host computing device specifies a QoS statistics monitor period and a QoS statistics swap bucket period to the MFND, and wherein the non-transitory computer-readable storage medium has further computer-executable instructions stored thereupon to: store the child physical function QoS statistics in an active log during the QoS statistics monitor period; swap the active log with a save log when the QoS statistics swap bucket period elapses; and generate an asynchronous event from the MFND to the host computing device when the QoS statistics swap bucket period elapses, wherein the child physical function QoS statistics are provided from the MFND to the host computing device from the save log.
- computer-executable instructions include routines, programs, objects, modules, components, data structures, and the like that perform particular functions or implement particular abstract data types.
- the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be executed in any order, combined in any order, subdivided into multiple sub-operations, and/or executed in parallel to implement the described processes.
- the described processes can be performed by resources associated with one or more device(s) such as one or more internal or external CPUs or GPUs, and/or one or more instances of hardware logic such as FPGAs, DSPs, or other types of accelerators.
- All of the methods and processes described above may be embodied in, and fully automated via, software code modules executed by one or more general purpose computers or processors.
- the code modules may be stored in any type of computer-readable storage medium or other computer storage device. Some or all of the methods may alternatively be embodied in specialized computer hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- Non-Volatile Memory Express (“NVMe”) is an open host controller interface and storage protocol specification for accessing non-volatile storage devices attached via a Peripheral Component Interconnect Express (“PCIe”) bus. Certain NVMe devices can expose multiple PCIe physical functions (“PFs”), such as independent NVMe controllers. These types of devices might be referred to herein as multiple physical function non-volatile memory devices (“MFNDs”).
- In a MFND, one PF, which might be referred to herein as a “parent PF,” can act as a parent controller to receive and execute administrative commands. Other physical functions on a MFND, which might be referred to herein as “child PFs” or “children PFs,” can act as child controllers that behave similarly to standard NVMe controllers. Through this mechanism, a MFND can enable the efficient sharing of input/output (“I/O”) resources between virtual machines (“VMs”) or bare metal instances. For example, child PFs can be directly assigned to and utilized by different VMs through various direct hardware access technologies, such as HYPER-V NVMe Direct or Discrete Device Assignment (“DDA”).
- Through the mechanism described above, the child PFs exposed by a single MFND can appear as multiple, separate physical devices to individual VMs. This allows individual VMs to directly utilize a portion of the available non-volatile storage space provided by a MFND with reduced central processing unit (“CPU”) and hypervisor overhead.
- Existing MFNDs, however, have limitations that restrict aspects of their functionality when used with VMs in the manner described above. As one specific example, it might not be possible to obtain detailed information regarding a VM's usage of the resources allocated to it by a MFND. Consequently, system administrators might not know when a VM is over or under-utilizing the resources provided by a MFND and, as a result, might not be able to make informed decisions regarding reallocating those MFND-provided resources or provisioning new MFND-provided resources. Current MFNDs can also suffer from other technical limitations, some of which are described in detail below.
- It is with respect to these and other technical challenges that the disclosure made herein is presented.
- Technologies are disclosed herein for collecting Quality of Service (“QoS”) statistics for in-use child PFs on MFNDs. Through implementations of the disclosed technologies, MFNDs can be configured to collect QoS statistics, referred to herein as “child PF QoS statistics,” for in-use child physical functions that describe the utilization of resources provided by the child PFs to VMs. The collected child PF QoS statistics can then be utilized to inform decisions regarding reallocation of MFND-provided resources and provisioning of new MFND-provided resources, thereby making more efficient utilization of MFND hardware. The child PF QoS statistics can also be collected in a manner that reduces the performance impact of collecting the child PF QoS statistics on the MFND and minimizes the use of non-volatile memory. Other technical benefits not specifically mentioned herein can also be realized through implementations of the disclosed subject matter.
- As discussed briefly above and in further detail below, the disclosed technologies include functionality for collecting QoS statistics for in-use child PFs of an MFND. In order to provide this functionality, a host computing device creates a child PF on a MFND and configures the child PF on the MFND to provide a specified QoS level to an associated VM executing on the host computing device. The host computing device also enables the MFND to collect child PF QoS statistics for the child PF.
- The collected child PF QoS statistics describe the utilization of resources provided by child PFs to assigned VMs. The MFND provides the child PF QoS statistics from the MFND to the host computing device. As discussed above, the collected child PF QoS statistics can then be utilized to inform decisions regarding reallocation of MFND-provided resources, provisioning of new MFND-provided resources, and potentially other types of decisions.
- In one embodiment, the specified QoS level defines maximum read input/output (“I/O”) operations per second (“IOPS”) and maximum write IOPS for the child PF. In this embodiment, the child PF QoS statistics for the child PF might specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child PF to the VM assigned to the child PF during a specified monitoring period.
- The specified QoS level might also, or alternately, define a maximum read bandwidth and a maximum write bandwidth for the child PF. In this case, the child PF QoS statistics for the child PF might specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child PF to the VM assigned to the child PF during the specified monitoring period.
- In other embodiments, the child PF QoS statistics for the child PF specify the percentage of read operations and write operations performed by the child PF during the specified monitoring period. The child PF QoS statistics for the child PF might also, or alternately, specify a size of I/O workloads performed by the child PF on behalf of an assigned VM during the monitoring period. The child PF QoS statistics might also, or alternately, specify an amount of the storage capacity of a non-volatile memory device on the MFND that is in use by a VM. Other types of child PF QoS statistics can be collected in the manner described herein in other embodiments.
- In some embodiments, the host computing device specifies the duration of a QoS statistics monitor period and the duration of a QoS statistics swap bucket period to the MFND. In these embodiments, the MFND is further configured to store the child physical function QoS statistics in a log, which might be referred to herein as the “active log,” during the duration of the QoS statistics monitor period. When the QoS statistics swap bucket period elapses, the MFND swaps the active log with another log, which might be referred to herein as the “save log.” In these embodiments, the MFND provides the child PF QoS statistics from the MFND to the host computing device from the save log. In some embodiments, the MFND also generates a notification, such as an asynchronous event, to the host computing device when the QoS statistics swap bucket period elapses. In response thereto, the host computing device may request the child PF QoS statistics from the MFND.
- It should be appreciated that the above-described subject matter can be implemented as a computer-controlled apparatus, a computer-implemented method, a computing device, or as an article of manufacture such as a computer-readable medium. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.
- This Summary is provided to introduce a brief description of some aspects of the disclosed technologies in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
-
FIG. 1 is a computing architecture diagram that shows aspects of the configuration and operation of a MFND that can implement the embodiments disclosed herein, according to one embodiment; -
FIG. 2A is a computing architecture diagram showing aspects of one mechanism disclosed herein for creating child PFs on a MFND, according to one embodiment; -
FIG. 2B is a computing architecture diagram showing aspects of one mechanism disclosed herein for setting the QoS for a child PF on a MFND, according to one embodiment; -
FIG. 2C is a computing architecture diagram showing aspects of one mechanism disclosed herein for enabling the collection of child PF QoS statistics by a MFND, according to one embodiment; -
FIG. 2D is a computing architecture diagram showing aspects of one mechanism disclosed herein for retrieving child PF QoS statistics from a MFND, according to one embodiment; -
FIG. 3 is a computing architecture diagram showing aspects of one mechanism disclosed herein for swapping a child PF QoS statistics active log and a child PF QoS statistics save log on a MFND, according to one embodiment; -
FIG. 4 is a data structure diagram showing an illustrative configuration for a child PF QoS statistics log maintained by a MFND, according to one embodiment; -
FIG. 5 is a flow diagram showing a routine that illustrates aspects of a method for configuring child PFs on a MFND, according to one embodiment disclosed herein; -
FIG. 6 is a flow diagram showing a routine that illustrates aspects of a method for collecting QoS statistics for in-use child PFs of a MFND, according to one embodiment disclosed herein; -
FIG. 7 is a computer architecture diagram showing an illustrative computer hardware and software architecture for a computing device that can act as a host for a MFND that implements aspects of the technologies presented herein; and -
FIG. 8 is a computing network architecture diagram showing an illustrative configuration for a computing environment in which computing devices hosting MFNDs implementing the disclosed technologies can be utilized. - The following detailed description is directed to technologies for collecting QoS statistics for in-use child PFs on MFNDs. As discussed briefly above, MFNDs implementing the disclosed technologies can collect child PF QoS statistics for in-use child PFs that describe the utilization of resources provided by the child PFs to VMs. The child PF QoS statistics can be collected and stored in a manner that reduces the performance impact of collecting the child PF QoS statistics on the MFND and minimizes the use of volatile and non-volatile memory.
- Through the use of the disclosed functionality, child PF QoS statistics can be utilized to inform decisions regarding reallocation of MFND-provided resources and provisioning of new MFND-provided resources, thereby making more efficient utilization of MFND hardware. Other technical benefits not specifically mentioned herein can also be realized through implementations of the disclosed subject matter.
- While the subject matter described herein is presented in the general context of NVMe multiple physical function devices, those skilled in the art will recognize that the technologies disclosed herein can be used with other types of multiple physical function devices, including other types of multiple physical function non-volatile memory devices. Those skilled in the art will also appreciate that the subject matter described herein can be practiced with various computer system configurations, including host computers in a distributed computing environment, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, computing or processing systems embedded in devices (such as wearable computing devices, automobiles, home automation etc.), minicomputers, mainframe computers, and the like.
- In the following detailed description, references are made to the accompanying drawings that form a part hereof, and which are shown by way of illustration specific configurations or examples. Referring now to the drawings, in which like numerals represent like elements throughout the several FIGS., aspects of various technologies for collecting QoS statistics for in-use child PFs on MFNDs will be described.
-
FIG. 1 is a computing architecture diagram that shows aspects of the configuration and operation of aMFND 102 that can implement the embodiments disclosed herein, according to one embodiment. As discussed briefly above, theMFND 102 is an NVMe Specification-compliant device in some embodiments. TheMFND 102 can be hosted by a host computing device 100 (which might be referred to herein simply as a “host”), such as a server computer operating in a distributed computing network such as that described below with reference toFIG. 8 . - As also discussed briefly above, NVMe is an open logical device interface specification for accessing non-volatile storage media. In some embodiments, an NVMe device is accessible via a PCIe bus. In other embodiments, an NVMe device is accessible via a network or other packet-based interface. The NVMe Specification defines a register interface, command set, and collection of features for PCIe-based solid-state storage devices (“SSDs”) with the goals of high performance and interoperability across a broad range of non-volatile memory subsystems. The NVMe Specification does not stipulate the ultimate usage model, such as solid-state storage, main memory, cache memory or backup memory.
- NVMe provides an alternative to the Small Computer System Interface (“SCSI”) standard and the Advanced Technology Attachment (“ATA”) standard for connecting and transmitting data between a
host computing device 100 and a peripheral target storage device. The ATA command set in use with Serial ATA (“SATA”) SSDs and the SCSI command set for Serial Attached SCSI (“SAS”) SSDs were developed at a time when hard disk drives (“HDDs”) and tape were the primary storage media. NVMe was designed for use with faster media. - The main benefits of NVMe-based PCIe SSDs over SAS-based and SATA-based SSDs are reduced latency in the host software stack, higher IOPS and potentially lower power consumption, depending on the form factor and the number of PCIe lanes in use.
- NVMe can support SSDs that use different types of non-volatile memory, including NAND flash and the 3D XPOINT technology developed by INTEL and MICRON TECHNOLOGY. Supported form factors include add-in PCIe cards, M.2 SSDs and U.2 2.5-inch SSDs. NVMe reference drivers are available for a variety of operating systems, including the WINDOWS and LINUX operating systems. Accordingly, it is to be appreciated that the
MFND 102 described herein is not limited to a particular type of non-volatile memory, form factor, or operating system. - As described briefly above, the
MFND 102 described herein includes capabilities for exposingmultiple PFs 112A-112N to thehost computing device 100. Each of thePFs 112A-112N is an independent NVMe controller in one embodiment. ThePFs 112A-112N are other types of controllers in other embodiments. At least a plurality ofPF 112A-112N are independent NVMe controllers and at least onedistinct PF 112A-112N is a non-NVME controller in other embodiments. - One
PF 112A in theMFND 102, which might be referred to herein as the “parent PF 112A” or the “parent controller 112A,” acts as a parent controller. In one embodiment, for instance, theparent PF 112A is the privileged PCIe function zero of theMFND 102. In this regard, it is to be appreciated that the parent controller might be configured as another PCIe function number in other embodiments. Theparent PF 112A and child PFs described below might also be device types other than NVMe devices in some embodiments. - The
parent PF 112A can act as a parent controller to receive and executeadministrative commands 110 generated by aroot partition 108. In particular, and as described in greater detail below, theparent PF 112A can managechild PFs 112B-112N such as, for example, by creating, deleting, modifying, and querying thechild PFs 112B-112N. Thechild PFs 112B-112N might be referred to herein interchangeably as “child PFs 112B-112N” or “child controllers 112B-112N.” - The
child PFs 112B-112N are regular PCIe physical functions of theMFND 102. Thechild PFs 112B-112N can behave like regular and independent NVMe controllers. Thechild controllers 112B-112N can also support the administrative and I/O commands defined by the NVMe Specification. - Through the use of the
multiple PFs 112A-112N exposed by theMFND 102, I/O resources provided by theMFND 102 can be efficiently shared betweenVMs 104A-104N. For instance,child PFs 112B-112N can be directly assigned todifferent VMs 104A-104N, respectively, through various direct hardware access technologies such as HYPER-V NVMe Direct or DDA. In this way, thechild PFs 112B-112N exposed by asingle MFND 102 can appear as multiple, separate physical devices toindividual VMs 104A-104N, respectively. This allowsindividual VMs 104A-104N to directly utilize a respective portion of the available storage space provided by anon-volatile memory device 103 on theMFND 102 with reduced CPU andhypervisor 106 overhead. - In some configurations, the
host computing device 100 operates in a distributed computing network, such as that described below with regard toFIG. 8 . Additionally, thehost computing device 100 executes ahost agent 116 and a management application programming interface (“API”) 118 in order to enable access to aspects of the functionality disclosed herein in some embodiments. - The
host agent 116 can receive commands from other components, such as other components in a distributed computing network such as that described below with regard toFIG. 8 , and make calls to themanagement API 118 to implement the commands. In particular, themanagement API 118 can issue administrative commands to theparent PF 112A to perform the various functions described herein. Details regarding various methods exposed by themanagement API 118 to thehost agent 116 for implementing the functionality disclosed herein are described below. - In some embodiments, the
MFND 102 has two modes of operation: regular user mode and super administrator mode. In regular user mode, only read-only functions can be executed. The non-read-only management functions described herein (e.g. set the QoS level for a PF, etc.) must be executed in the super administrator mode. If an attempt is made to execute these functions in regular user mode, an error (which might be referred to herein as an “ERROR_ACCESS_DENIED” error) will be returned. TheAPI 118 exposes methods for getting the device operation mode (which might be referred to herein as the “GetDeviceOperationMode” method) and switching the device operation mode (which might be referred to herein as the “SwitchDeviceOperationMode” method) in some embodiments. - As discussed briefly above, existing
MFNDs 102 have limitations that restrict aspects of their functionality when used withVMs 104 in the manner described above. As one specific example, it might not be possible to obtain detailed information regarding a VM's 104 usage of the resources allocated to it by aMFND 102. Consequently, system administrators might not know when aVM 104 is over or under-utilizing the resources provided by aMFND 102 and, as a result, might not be able to make informed decisions regarding reallocating those MFND-provided resources or provisioning new MFND-provided resources. The technologies presented herein address these and potentially other technical considerations by enabling collection of QoS statistics for in-use child PFs 112B-112N of aMFND 102. Additional details regarding these aspects will be provided below. -
FIG. 2A is a computing architecture diagram showing aspects of one mechanism disclosed herein for creating child PFs 112 on aMFND 102, according to one embodiment. As shown inFIG. 2A , thehost agent 116 can create anew child PF 112B on theMFND 102 by calling an appropriate method exposed by themanagement API 118. In response thereto, themanagement API 118 issues acommand 110A to theparent PF 112A to create the desiredchild PF 112B. TheMFND 102, in turn, creates thechild PF 112B. Thereafter, aVM 104A may be assigned to thechild PF 112B. Additional details regarding the creation ofchild PFs 112B on aMFND 102 and assignment of aVM 104A to achild PF 112B will be provided below with regard toFIG. 5 . -
FIG. 2B is a computing architecture diagram showing aspects of one mechanism disclosed herein for setting the QoS level for achild PF 112B on aMFND 102, according to one embodiment. As discussed briefly above, once achild PF 112B has been created in the manner described above with regard toFIG. 2A , theMFND 102 can provide functionality for managing the QoS level provided by thechild PFs 112B. For example, and without limitation, implementations of the disclosed technologies can also enable ahost agent 116 to query and modify the QoS level provided by achild PF 112B of aMFND 102. - In some embodiments, the
MFND 102 supports multiple storage service level agreements (“SLAs”). Each SLA defines a different QoS level to be provided by aPF 112A-112N. QoS levels that can be supported by child PFs 112 on theMFND 102 include, but are not limited to, a “reserve mode” wherein a child PF 112 is allocated at least a specified minimum amount of bandwidth and IOPS, a “limit mode” wherein a child PF 112 is allocated at most a specified maximum amount of bandwidth and IOPS, and a “mixed mode” wherein a child PF 112 is allocated at least a specified minimum amount of bandwidth and IOPS but at most a specified maximum amount of bandwidth and IOPS. Other QoS levels can be implemented in other embodiments. - The embodiments disclosed herein allow the
parent PF 112A to individually define the QoS level for eachchild PF 112B-112N in asingle MFND 102. For instance, theparent PF 112A might define the minimum and/or maximum bandwidth and/or IOPS to be supported by eachchild PF 112B-112N. In order to provide this functionality, thehost agent 116 can call a method exposed by themanagement API 118. In response to such a call, themanagement API 118 issues acommand 110B to the parentphysical function 112A that includesQoS settings 202 for achild PF 112B. Thechild PF 112B then utilizes theQoS settings 202 when processing requests from an assignedVM 104A. - One illustrative method for modifying the settings of
child PFs 112B-112N (which might be referred to herein as the “UpdateChildPhysicalFunctionSettings” method) takes an identifier (e.g. a handle) to aMFND 102, an identifier (e.g. a serial number) of a child PF 112, and a pointer to a data structure containing theQoS settings 202 for the child PF 112 as input. The data structure can include data specifying the resources (e.g. the amount of storage space, namespaces, and interrupt vectors that the identified PF 112 is to use) and the QoS level that are to be assigned to the identified child PF 112. The UpdateChildPhysicalFunctionSettings method returns a success message if the supplied settings were successfully applied to the identified child PF 112 and otherwise returns an error code. - An illustrative method for querying the settings of
child PFs 112B-112N (which might be referred to herein as the “QueryChildPhysicalFunctionSettings” method) takes an identifier (e.g. a handle) for aMFND 102 and an identifier (e.g. a serial number) of a child PF 112 as input. The QueryChildPhysicalFunctionSettings method returns a pointer to a data structure containing the current settings of the identified child PF 112. As discussed above, such a data structure can include data specifying the resources (e.g. the amount of storage space, namespaces, and interrupt vectors that the PF 112 can use) andQoS settings 202 that are currently assigned to the identified child PF 112. -
FIG. 2C is a computing architecture diagram showing aspects of one mechanism disclosed herein for enabling the collection of childPF QoS statistics 210 by aMFND 102, according to one embodiment. As shown inFIG. 2C , thehost agent 116 can configure theMFND 102 to collect childPF QoS statistics 210 by calling an appropriate method on themanagement API 118. In response thereto, themanagement API 118 issues acommand 110C to the parentphysical function 112A instructing theMFND 102 to enable the collection of the childPF QoS statistics 210. TheMFND 102 stores the childPF QoS statistics 210 in a child PF statistics log 208 in one embodiment. Details regarding the configuration and use of the child PF statistics log 208 will be provided below with respect toFIGS. 3 and 4 . - In one embodiment a
single command 110C can be utilized to enable collection of childPF QoS statistics 210 for all in-use child PFs 112B-112N. Alternately, per child PF 112 commands 110C can be issued to enable collection ofchild QoS statistics 210 byindividual child PFs 112B-112N. - As also shown in
FIG. 2C , thecommand 110C specifies a QoS statistics monitorperiod 204 and a QoS statistics swapbucket period 206 in some embodiments. As will be described in greater detail below, the QoS statistics monitorperiod 204 specifies the duration of a monitoring period during which theMFND 102 is to collect the childPF QoS statistics 210. In one embodiment, the QoS statistics monitorperiod 204 is specified in seconds with a minimum value of 60 seconds and increments of 30 seconds. The QoS statistics monitorperiod 204 might be specified in other ways in other embodiments. - The QoS statistics swap
bucket period 206 defines a period of time after which theMFND 102 is to swap an “active log” with a “save log.” In these embodiments, theMFND 102 is further configured to store the child physicalfunction QoS statistics 210 in the active log during the duration of the QoS statistics monitorperiod 204. In one embodiment, the QoS statistics swapbucket period 206 is specified in minutes, with a minimum value of 30 minutes and a maximum value of 1440 minutes. The QoS statistics swapbucket period 206 might be specified in other ways in other embodiments. Additional details regard the contents and use of the active and save logs will be provided below with regard toFIGS. 3 and 4 . -
FIG. 2D is a computing architecture diagram showing aspects of one mechanism disclosed herein for retrieving childPF QoS statistics 210 from aMFND 102, according to one embodiment.FIG. 2D will be described in conjunction withFIG. 3 , which is a computing architecture diagram showing aspects of one mechanism disclosed herein for swapping a child PF QoS statisticsactive log 302, which might be referred to simply as the “active log 302,” and a child PF QoS statistics savelog 304, which might be referred to simply as the “savelog 304,” on aMFND 102, according to one embodiment. - As described briefly above, the child PF QoS statistics log 208 is implemented using two separate logs, the child PF QoS statistics
active log 302 and the child PF QoS statistics savelog 304, in some embodiments. When the QoS statistics swapbucket period 206 described above elapses, theMFND 102 swaps theactive log 302 with the save log 304 and clears theactive log 302. This can be performed as an atomic operation in order to avoid corruption of thelogs MFND 102 provides the childPF QoS statistics 210 from the save log 304 in response torequests 308 received from thehost computing device 100. TheMFND 102 also provides functionality for enabling thehost computing device 100 to retrieve the contents of theactive log 302 in some embodiments. - In some embodiments, the
MFND 102 also generates a notification, such as anasynchronous event 306, to thehost computing device 100 when the QoS statistics swap bucket period elapses 206. In response to receiving the notification, thehost computing device 100 may issue acommand 110D to theMFND 102 to retrieve the childPF QoS statistics 210 from theMFND 100. In response thereto, theMFND 102 retrieves the childPF QoS statistics 210 from the save log 304 and returns the childPF QoS statistics 210 to thehost 100 in response to thecommand 110D. In turn, thehost agent 116 might provide the childPF QoS statistics 210 to aremote management system 212 or another component. - In one embodiment, the specified QoS level defines maximum read IOPS and maximum write IOPS for a
child PF 112B. In this embodiment, the childPF QoS statistics 210 for thechild PF 112B specify the maximum read IOPS and the maximum write IOPS provided by thechild PF 112B to theVM 104A assigned to thechild PF 112B during the QoS statistics monitorperiod 204. - The maximum read IOPS, and the maximum write IOPS are specified as a percentage of the maximum read IOPS and the maximum write IOPS specified by the QoS level for the
child PF 112B in some embodiments. By expressing the maximum read IOPS and the maximum write IOPS as a percentage of the maximum read IOPS and the maximum write IOPS specified by the QoS level, the maximum read IOPS and the maximum write IOPS can be expressed using only a single byte, thereby saving space on thenon-volatile memory device 103. - The specified QoS level might also, or alternately, define a maximum read bandwidth and a maximum write bandwidth for the
child PF 112B. In this case, the childPF QoS statistics 210 for thechild PF 112B specify the maximum read bandwidth and the maximum write bandwidth provided by thechild PF 112B to theVM 104A assigned to thechild PF 112B during the specified QoS statistics monitorperiod 204. - In some embodiments, the maximum read bandwidth and the maximum write bandwidth are specified as a percentage of the maximum read bandwidth and the maximum write bandwidth specified by the QoS level for the
child PF 112B. By expressing the maximum read bandwidth and a maximum write bandwidth as a percentage of the maximum read bandwidth and a maximum write bandwidth specified by the QoS level, the maximum read bandwidth and a maximum write bandwidth can be expressed using only a single byte, thereby saving space on thenon-volatile memory device 103. - In other embodiments, the child
PF QoS statistics 210 for thechild PF 112B specify a percentage of read operations and write operations performed by thechild PF 112B during the specified QoS statistics monitorperiod 204. The childPF QoS statistics 210 for thechild PF 112B might also, or alternately, specify a size of I/O workloads performed by thechild PF 112B on behalf of an assignedVM 104A during the QoS statistics monitorperiod 204. - The child
PF QoS statistics 210 might also, or alternately, specify an amount of the storage capacity of anon-volatile memory device 103 on theMFND 102 that is in use by aVM 104A. In these embodiments, the amount of the storage capacity of anon-volatile memory device 103 on theMFND 102 that is in use by aVM 104A may be obtained from theMFND 102 by issuing an identified child controller command to achild PF 112B to retrieve the Namespace Utilization field (“NUSE”) defined by the NVMe Specification. Other types of childPF QoS statistics 210, such as but not limited to read/write I/O command latency and bytes written to media, can be collected in the manner described herein in other embodiments. -
FIG. 4 is a data structure diagram showing an illustrative configuration for the child PF QoS statistics log 208 maintained by aMFND 102, according to one embodiment. As shown inFIG. 4 , the child PF QoS statistics log 208 includes thefields 402A-402P in the illustrated embodiment. In this regard, it is to be appreciated that the illustrated configuration is merely illustrative, and that other types and configurations of data might be utilized. It is to be further appreciated that a single child PF QoS statistics log 208 might store the childPF QoS statistics 210 for all of the in-use child PFS 112B-112N on aMFND 102 or separate child PF QoS statistics logs 208 might be maintained for each of the in-use child PFS 112B-112N. - The
field 402A stores data indicating a version number identified with the format of the child PF QoS statistics log 208. The version number might be modified following changes to the format of the child PF QoS statistics log 208. - The
field 402B stores a sequence number that is incremented whenever anactive log 302 is generated (i.e., after each QoS statistics swapbucket period 206 elapses). When the value reaches 255 and a newactive log 302 is generated, the value is reset to zero. - The
field 402C stores data identifying the number of log entries in the child PF QoS statistics log 208. As described in greater detail below, the log entries are stored in thefields 402G-402J. - The
field 402D stores data identifying the child PF QoS statistics monitorperiod 204 and thefield 402E stores data identifying the child PF QoS statistics swapbucket period 206 described above. The field 402F stores a timestamp associated with the first log entry in the child PF QoS statistics log 208. In one embodiment, the timestamp uses the data format for a timestamp as defined by the NVMe Specification. If thehost computing device 100 does not set the timestamp, this field contains the time since theMFND 102 last powered up. - As discussed briefly above, the
fields 402G-402J contain log entries containing the childPF QoS statistics 210. In the illustrated example, for instance, each log entry includesfields 402M-402P specifying the maximum read IOPS percentage, the maximum write IOPS percentage, the maximum read bandwidth percentage, and the maximum write bandwidth percentage during the monitoring period, respectively. As discussed above, the log entries can include other types of childPF QoS statistics 210, some of which were described above, in other embodiments. Thefield 402K contains a version number for the log entries and thefield 402L stores a globally unique identifier (“GUID”) associated with the log entries. -
FIG. 5 is a flow diagram showing a routine 500 that illustrates aspects of a method for configuring child PFs 112 on aMFND 102, according to one embodiment disclosed herein. It should be appreciated that the logical operations described herein with regard toFIG. 5 , and the other FIGS., can be implemented (1) as a sequence of computer implemented acts or program modules running on a computing device and/or (2) as interconnected machine logic circuits or circuit modules within a computing device. - The particular implementation of the technologies disclosed herein is a matter of choice dependent on the performance and other requirements of the computing device. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts and modules can be implemented in hardware, software, firmware, in special-purpose digital logic, and any combination thereof. It should be appreciated that more or fewer operations can be performed than shown in the FIGS. and described herein. These operations can also be performed in a different order than those described herein.
- The routine 500 begins at
operation 502, where thehost agent 116 can enumerate some or all of theMFND devices 102 that are present in ahost 100. One particular method (which might be referred to herein as the “GetMFNDList” method) for enumerating theMFND devices 102 connected to ahost 100 returns device paths of allMFND devices 102 connected to ahost 100. If noMFND devices 102 are connected, or none are enumerated, the GetMFNDList method returns an error code. - From
operation 502, the routine 500 proceeds tooperation 504, where thehost agent 116 can enumerate thechild PFs 112B-112N that are currently present on aMFND device 102 identified atoperation 502. One method (which might be referred to herein as the “GetChildPhysicalFunctionList” method) for enumerating thechild PFS 112A-112N on aMFND device 102 takes an identifier (e.g. a handle) for aparticular MFND device 102 as input and returns adapter serial numbers of allchild PFs 112B-112N on the identified device. - From
operation 504, the routine 500 proceeds tooperation 506, where thehost agent 116 can determine the capabilities of theMFND device 102 identified atoperation 502. For example, thehost agent 116 can determine the maximum number ofchild PFs 112B-112N supported by theMFND device 102. - One method (which might be referred to herein as the “GetDeviceCapability” method) for getting the capabilities of a
MFND device 102 takes an identifier (e.g. a handle) for aparticular MFND device 102 as input and returns a device capability structure that specifies the capabilities of the identified device. In one embodiment, the device capability structure includes data identifying the maximum andavailable child PFs 112B-112N, I/O queue pair count, interrupt count, namespace count, storage size, bandwidth, and IOPS of the identified device. The device capability structure might include additional or alternate data in other embodiments. - Once the capabilities of the
MFND device 102 have been determined, the routine 500 can proceed fromoperation 506 tooperation 508, wherechild PFs 112B-112N can be created or deleted on theMFND device 102. By default, theMFND 102 has only one PF 112, theparent PF 112A, which is reserved for receivingadministrative commands 110 from theroot partition 108. - In order to assign
individual child PFs 112B-112N toVMs 104A-104N, thechild PFs 112B-112N are first created. The newly createdchild PFs 112B-112N will appear to thehost 100 following a reboot. One method for creatingchild PFs 112B-112N (which might be referred to herein as the “CreateChildPhysicalFunction” method) takes an identifier (e.g. a handle) to aMFND 102 and a pointer to a data structure containing the settings for the new child PF 112 as input. The data structure can include data specifying the resources (e.g. the amount of storage space, namespaces, and interrupt vectors that the new PF 112 can use) and QoS level that are to be assigned to the new child PF 112. The CreateChildPhysicalFunction method returns an identifier (e.g. a serial number) for the new child PF 112 as output if it completes successfully. -
Child PFs 112B-112N and their settings will persist across reboots of thehost 100, so the maximum number ofchild PFs 112B-112N to be supported may be initially created to avoid rebooting thehost 100 in the future. If a MFND 102 already haschild PFs 112B-112N, either as a result of a manufacturing configuration or previous user configuration,additional child PFs 112B-112N can be created or deleted in order to configure theMFND 102 with the desired number ofchild PFs 112B-112N to be supported. - One method for deleting
child PFs 112B-112N (which might be referred to herein as the “DeleteChildPhysicalFunction” method) takes an identifier for a MFND 102 (e.g. a handle) and the serial number for the child PF 112 to be deleted as input. The DeleteChildPhysicalFunction returns a success message if the identified child PF 112 was successfully deleted and otherwise returns an error code. - Once the
host 100 has rebooted, the routine 500 proceeds fromoperation 510 tooperation 512, where the QoS level for the newly createdchild PFs 112B-112N are set in the manner described above with regard toFIG. 2B . Once the QoS levels have been set for thechild PFs 112B-112N, the routine 500 proceeds fromoperation 512 tooperation 514, where theMFND 102 enables the collection of childPF QoS statistics 210 for in-use child PFs 112B-112N of theMFND 102 in the manner described above with regard toFIGS. 2C and 3 and in further detail below with regard toFIG. 6 . - The routine 500 proceeds from
operation 514 tooperation 516, where thechild PFs 112B-112N provided by aMFND 102 can be assigned toVMs 104A-104N. As described briefly above, in some embodiments newly createdchild PFs 112B-112N have zero storage size, minimal flexible resources, and no defined QoS level. In other embodiments, newly createdchild PFs 112B-112N may have a default QoS level, a default amount of storage, and/or default configurations for other resources. Accordingly, thehost 100 might need to provision the resources (NVM space, I/O queue pair count, QoS level, etc.) to achild PF 112B-112N before it can be assigned to aVM 104 using DDA, HYPER-V NVMe Direct, or another direct storage assignment technology. - The
child PFs 112B-112N can also be securely erased before assignment to aVM 104. There is no host reboot involved in this workflow. One method for securely erasingchild PFs 112B-112N (which might be referred to herein as the “SecureEraseChildPhysicalFunction” method) takes an identifier for a MFND 102 (e.g. a handle) and the serial number for the child PF 112 to be erased as input. The SecureEraseChildPhysicalFunction returns a success message if the identified child PF 112 was successfully erased and otherwise returns an error code. The routine 500 then proceeds fromoperation 516 tooperation 518, where it ends. -
FIG. 6 is a flow diagram showing a routine 600 that illustrates aspects of a method for collecting childPF QoS statistics 210 for in-use child PFs 112 of aMFND 102, according to one embodiment disclosed herein. The routine 600 begins atoperation 602, where theMFND 102 determines whether collection of childPF QoS statistics 210 has been enabled in the manner described above. If the collection of childPF QoS statistics 210 has been enabled, the routine 600 proceeds fromoperation 602 tooperation 604. - At
operation 604, theMFND 102 collects the childPF QoS statistics 210 in the manner described above. The routine 600 then proceeds fromoperation 604 tooperation 606, where theMFND 102 determines whether the QoS statistics monitorperiod 204 has elapsed. If the QoS statistics monitorperiod 204 has not elapsed, the routine 600 proceeds fromoperation 606 back tooperation 604, where theMFND 102 can continue to collect the childPF QoS statistics 210 in the manner described above. If the QoS statistics monitorperiod 204 has elapsed, the routine 600 proceeds fromoperation 606 tooperation 608. - At
operation 608, theMFND 102 stores the childPF QoS statistics 210 in a log entry in the child PF QoS statisticsactive log 302 in the manner described above. The routine 600 then proceeds fromoperation 608 tooperation 610, where theMFND 102 determines whether the QoS statistics swapbucket period 206 has elapsed. If the QoS statistics swapbucket period 206 has not elapsed, the routine 600 proceeds back tooperation 604, where theMFND 102 continues to collect childPF QoS statistics 210 and store the childPF QoS statistics 210 in entries in theactive log 302 in the manner described above. - If the QoS statistics swap
bucket period 206 has elapsed, the routine 600 proceeds fromoperation 610 tooperation 612, where theMFND 102 atomically swaps theactive log 302 and the save log 304 and clears theactive log 302 in the manner described above. The routine 600 then proceeds fromoperation 612 tooperation 614, where theMFND 102 generates a notification, such as anasynchronous event 306, to thehost computing device 100 to inform thehost computing device 100 that childPF QoS statistics 210 are available from theMFND 102. As discussed above, thehost computing device 100 might subsequently transmit acommand 110D requesting the childPF QoS statistics 210. TheMFND 102 responds to the request with childPF QoS statistics 210 retrieved from thesave log 304. The routine 600 then proceeds fromoperation 612 tooperation 614, where it ends. -
FIG. 7 is a computer architecture diagram showing an illustrative computer hardware and software architecture for adata processing system 700 that can act as ahost 100 for aMFND 102 that implements aspects of the technologies presented herein. In particular, the architecture illustrated inFIG. 7 can be utilized to implement a server computer, mobile phone, an e-reader, a smartphone, a desktop computer, an AR/VR device, a tablet computer, a laptop computer, or another type of computing device that acts as ahost 100 for theMFND 102. - The
data processing system 700 illustrated inFIG. 7 includes a central processing unit 702 (“CPU”), asystem memory 704, including a random-access memory 706 (“RAM”) and a read-only memory (“ROM”) 708, and asystem bus 710 that couples thememory 704 to theCPU 702. A basic input/output system (“BIOS” or “firmware”) containing the basic routines that help to transfer information between elements within thedata processing system 700, such as during startup, can be stored in theROM 708. Thedata processing system 700 further includes amass storage device 712 for storing anoperating system 722, application programs, and other types of programs. For example, themass storage device 712 might store thehost agent 116 and themanagement API 118. Themass storage device 712 can also be configured to store other types of programs and data. - The
mass storage device 712 is connected to theCPU 702 through a mass storage controller (not shown) connected to thebus 710. Themass storage device 712 and its associated computer readable media provide non-volatile storage for thedata processing system 700. Although the description of computer readable media contained herein refers to a mass storage device, such as a hard disk, CD-ROM drive, DVD-ROM drive, or USB storage key, it should be appreciated by those skilled in the art that computer readable media can be any available computer storage media or communication media that can be accessed by thedata processing system 700. - Communication media includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics changed or set in a manner so as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
- By way of example, and not limitation, computer storage media can include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. For example, computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid-state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and which can be accessed by the
data processing system 700. For purposes of the claims, the phrase “computer storage medium,” and variations thereof, does not include waves or signals per se or communication media. - According to various configurations, the
data processing system 700 can operate in a networked environment using logical connections to remote computers through a network such as thenetwork 720. Thedata processing system 700 can connect to thenetwork 720 through anetwork interface unit 716 connected to thebus 710. It should be appreciated that thenetwork interface unit 716 can also be utilized to connect to other types of networks and remote computer systems. Thedata processing system 700 can also include an input/output controller 718 for receiving and processing input from a number of other devices, including a keyboard, mouse, touch input, an electronic stylus (not shown inFIG. 7 ), or a physical sensor such as a video camera. Similarly, the input/output controller 718 can provide output to a display screen or other type of output device (also not shown inFIG. 7 ). - It should be appreciated that the software components described herein, when loaded into the
CPU 702 and executed, can transform theCPU 702 and the overalldata processing system 700 from a general-purpose computing device into a special-purpose computing device customized to facilitate the functionality presented herein. TheCPU 702 can be constructed from any number of transistors or other discrete circuit elements, which can individually or collectively assume any number of states. More specifically, theCPU 702 can operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions can transform theCPU 702 by specifying how theCPU 702 transitions between states, thereby transforming the transistors or other discrete hardware elements constituting theCPU 702. - Encoding the software modules presented herein can also transform the physical structure of the computer readable media presented herein. The specific transformation of physical structure depends on various factors, in different implementations of this description. Examples of such factors include, but are not limited to, the technology used to implement the computer readable media, whether the computer readable media is characterized as primary or secondary storage, and the like. For example, if the computer readable media is implemented as semiconductor-based memory, the software disclosed herein can be encoded on the computer readable media by transforming the physical state of the semiconductor memory. For instance, the software can transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory. The software can also transform the physical state of such components in order to store data thereupon.
- As another example, the computer readable media disclosed herein can be implemented using magnetic or optical technology. In such implementations, the software presented herein can transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations can include altering the magnetic characteristics of particular locations within given magnetic media. These transformations can also include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
- In light of the above, it should be appreciated that many types of physical transformations take place in the
data processing system 700 in order to store and execute the software components presented herein. It also should be appreciated that the architecture shown inFIG. 7 for thedata processing system 700, or a similar architecture, can be utilized to implement other types of computing devices, including hand-held computers, video game devices, embedded computer systems, mobile devices such as smartphones, tablets, and AR/VR devices, and other types of computing devices known to those skilled in the art. It is also contemplated that thedata processing system 700 might not include all of the components shown inFIG. 7 , can include other components that are not explicitly shown inFIG. 7 , or can utilize an architecture completely different than that shown inFIG. 7 . -
FIG. 8 is a computing network architecture diagram showing an illustrative configuration for a distributedcomputing environment 800 in which computingdevices hosting MFNDs 102 implementing the disclosed technologies can be utilized. According to various implementations, the distributedcomputing environment 800 includes acomputing environment 802 operating on, in communication with a network 856. One ormore client devices 806A-806N (hereinafter referred to collectively and/or generically as “clients 806”) can communicate with thecomputing environment 802 via the network 804 and/or other connections (not illustrated inFIG. 8 ). - In one illustrated configuration, the clients 806 include a
computing device 806A such as a laptop computer, a desktop computer, or other computing device; a tablet computing device (“tablet computing device”) 806B; amobile computing device 806C such as a smartphone, an on-board computer, or other mobile computing device; or aserver computer 806D. It should be understood that any number of devices 806 can communicate with thecomputing environment 802. An example computing architecture for the devices 806 is illustrated and described above with reference toFIG. 7 . It should be understood that the illustrated devices 806 and computing architectures illustrated and described herein are illustrative only and should not be construed as being limited in any way. - In the illustrated configuration, the
computing environment 802 includesapplication servers 808,data storage 810, and one or more network interfaces 812. According to various implementations, the functionality of theapplication servers 808 can be provided by one or more server computers that are executing as part of, or in communication with, the network 804. Theapplication servers 808 can host various services, VMs, portals, and/or other resources. Theapplication servers 808 can also be implemented usinghost computing devices 100 that includesMFNDs 102 configured in the manner described herein. - In the illustrated configuration, the
application servers 808 host one or morevirtual machines 104 for hosting applications, network services, or for providing other functionality. It should be understood that this configuration is illustrative only and should not be construed as being limiting in any way. Theapplication servers 808 can also host or provide access to one or more portals, link pages, web sites, network services, and/or other information sites, such asweb portals 816. - According to various implementations, the
application servers 808 also include one ormore mailbox services 818 and one ormore messaging services 820. The mailbox services 818 can include electronic mail (“email”) services. The mailbox services 818 also can include various personal information management (“PIM”) services including, but not limited to, calendar services, contact management services, collaboration services, and/or other services. Themessaging services 820 can include, but are not limited to, instant messaging services, chat services, forum services, and/or other communication services. - The
application servers 808 also might include one or more social networking services 822. Thesocial networking services 822 can include various social networking services including, but not limited to, services for sharing or posting status updates, instant messages, links, photos, videos, and/or other information; services for commenting or displaying interest in articles, products, blogs, or other resources; and/or other services. Other services are possible and are contemplated. - The
social networking services 822 also can include commenting, blogging, and/or micro blogging services. Other services are possible and are contemplated. As shown inFIG. 8 , theapplication servers 808 also can host other network services, applications, portals, and/or other resources (“other resources”) 824. Theother resources 824 can include, but are not limited to, document sharing, rendering, or any other functionality. - As mentioned above, the
computing environment 802 can includedata storage 810. According to various implementations, the functionality of thedata storage 810 is provided by one or more databases operating on, or in communication with, the network 804. The functionality of thedata storage 810 also can be provided by one or more server computers configured to host data for thecomputing environment 802. Thedata storage 810 can include, host, or provide one or more real orvirtual data stores 826A-826N (hereinafter referred to collectively and/or generically as “datastores 826”). - The datastores 826 are configured to host data used or created by the
application servers 808 and/or other data. Although not illustrated inFIG. 8 , the datastores 826 also can host or store web page documents, word processing documents, presentation documents, data structures, and/or other data utilized by any application program or another module. Aspects of the datastores 826 might be associated with a service for storing files. - The
computing environment 802 can communicate with, or be accessed by, the network interfaces 812. The network interfaces 812 can include various types of network hardware and software for supporting communications between two or more computing devices including, but not limited to, the clients 806 and theapplication servers 808. It should be appreciated that the network interfaces 812 also might be utilized to connect to other types of networks and/or computer systems. - It should be understood that the distributed
computing environment 800 described herein can implement aspects of at least some of the software elements described herein with any number of virtual computing resources and/or other distributed computing functionality that can be configured to execute any aspects of the software components disclosed herein. - It should be further understood that the disclosure presented herein also encompasses the subject matter set forth in the following clauses:
-
Clause 1. A computer-implemented method, comprising: creating a child physical function on a multiple physical function non-volatile memory device (MFND); configuring the child physical function on the MFND to provide a specified Quality of Service (QoS) level; collecting child physical function QoS statistics for the child physical function; and providing the child physical function QoS statistics from the MFND to a host computing device. - Clause 2. The computer-implemented method of
clause 1, wherein the specified QoS level defines maximum read input/output operations per second (IOPS) and maximum write IOPS for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period. - Clause 3. The computer-implemented method of any of
clauses 1 or 2, wherein the specified QoS level defines a maximum read bandwidth and a maximum write bandwidth for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child physical function during a monitoring period. - Clause 4. The computer-implemented method of any of clauses 1-3, wherein the child physical function QoS statistics for the child physical function specify a percentage of read operations and write operations performed by the child physical function during a monitoring period.
- Clause 5. The computer-implemented method of any of clauses 1-4, wherein the child physical function QoS statistics for the child physical function specify a size of input/output (I/O) workloads performed by the child physical function during a monitoring period.
- Clause 6. The computer-implemented method of any of clauses 1-5, wherein the MFND comprises a non-volatile memory device, and wherein the QoS statistics for the child physical function specify an amount of the non-volatile memory device utilized.
- Clause 7. The computer-implemented method of any of clauses 1-6, wherein the host computing device specifies a QoS statistics monitor period and a QoS statistics swap bucket period to the MFND, and wherein the method further comprises: storing the child physical function QoS statistics in an active log during the QoS statistics monitor period; and swapping the active log with a save log when the QoS statistics swap bucket period elapses, wherein the child physical function QoS statistics are provided from the MFND to the host computing device from the save log.
- Clause 8. The computer-implemented method of any of clauses 1-7, further comprising generating an asynchronous event from the MFND to the host computing device when the QoS statistics swap bucket period elapses.
- Clause 9. A multiple physical function non-volatile memory device (MFND), comprising: a non-volatile memory device; a parent physical function; and a child physical function configured to provide a Quality of Service (QoS) level specified by a host computing device configured to perform read or write operations on the non-volatile memory device, wherein the MFND is configured to collect child physical function QoS statistics for the child physical function, and provide the child physical function QoS statistics to the host computing device.
- Clause 10. The MFND of clause 9, wherein the specified QoS level defines maximum read input/output operations per second (IOPS) and maximum write IOPS for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- Clause 11. The MFND of any of clauses 9 or 10, wherein the specified QoS level defines a maximum read bandwidth and a maximum write bandwidth for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child physical function during a monitoring period.
- Clause 12. The MFND of any of clauses 9-11, wherein the child physical function QoS statistics for the child physical function specify a percentage of read operations and write operations performed by the child physical function during a monitoring period.
- Clause 13. The MFND of any of clauses 9-12, wherein the child physical function QoS statistics for the child physical function specify a size of input/output (I/O) workloads performed by the child physical function during a monitoring period.
- Clause 14. The MFND of any of clauses 9-13, wherein the QoS statistics for the child physical function specify an amount of the non-volatile memory device utilized.
- Clause 15. The MFND of any of clauses 9-14, wherein the host computing device specifies a QoS statistics monitor period and a QoS statistics swap bucket period to the MFND, and wherein the MFND is further configured to: store the child physical function QoS statistics in an active log during the QoS statistics monitor period; swap the active log with a save log when the QoS statistics swap bucket period elapses; and generate an asynchronous event from the MFND to the host computing device when the QoS statistics swap bucket period elapses, wherein the child physical function QoS statistics are provided from the MFND to the host computing device from the save log.
- Clause 16. A non-transitory computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors, cause the one or more processors to: create a child physical function on a multiple physical function non-volatile memory device (MFND); configure the child physical function on the MFND to provide a specified Quality of Service (QoS) level; collect child physical function QoS statistics for the child physical function; and provide the child physical function QoS statistics from the MFND to a host computing device.
- Clause 17. The non-transitory computer-readable storage medium of clause 16, wherein the specified QoS level defines maximum read input/output operations per second (IOPS) and maximum write IOPS for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read IOPS and the maximum write IOPS provided by the child physical function during a monitoring period.
- Clause 18. The non-transitory computer-readable storage medium of any of clauses 16 or 17, wherein the specified QoS level defines a maximum read bandwidth and a maximum write bandwidth for the child physical function, and wherein the child physical function QoS statistics for the child physical function specify a percentage of the maximum read bandwidth and the maximum write bandwidth provided by the child physical function during a monitoring period.
- Clause 19. The non-transitory computer-readable storage medium of any of clauses 16-19, wherein the child physical function QoS statistics comprise statistics selected from the group consisting of a percentage of read operations and write operations performed by the child physical function during a monitoring period, a size of input/output (I/O) workloads performed by the child physical function during a monitoring period, and an amount of the non-volatile memory device utilized.
- Clause 20. The non-transitory computer-readable storage medium of any of clauses 16-20, wherein the host computing device specifies a QoS statistics monitor period and a QoS statistics swap bucket period to the MFND, and wherein the non-transitory computer-readable storage medium has further computer-executable instructions stored thereupon to: store the child physical function QoS statistics in an active log during the QoS statistics monitor period; swap the active log with a save log when the QoS statistics swap bucket period elapses; and generate an asynchronous event from the MFND to the host computing device when the QoS statistics swap bucket period elapses, wherein the child physical function QoS statistics are provided from the MFND to the host computing device from the save log.
- Although the technologies presented herein have been described in language specific to structural features and/or methodological acts, it is to be understood that the appended claims are not necessarily limited to the features or acts described. Rather, the features and acts are described as example implementations of such technologies. Moreover, the above-described subject matter may be implemented as a computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable storage medium.
- The operations of the example methods presented herein are illustrated in individual blocks and summarized with reference to those blocks. The methods are illustrated as logical flows of blocks, each block of which can represent one or more operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the operations represent computer-executable instructions stored on one or more computer-readable media that, when executed by one or more processors, enable the one or more processors to perform the recited operations.
- Generally, computer-executable instructions include routines, programs, objects, modules, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be executed in any order, combined in any order, subdivided into multiple sub-operations, and/or executed in parallel to implement the described processes. The described processes can be performed by resources associated with one or more device(s) such as one or more internal or external CPUs or GPUs, and/or one or more instances of hardware logic such as FPGAs, DSPs, or other types of accelerators.
- All of the methods and processes described above may be embodied in, and fully automated via, software code modules executed by one or more general purpose computers or processors. The code modules may be stored in any type of computer-readable storage medium or other computer storage device. Some or all of the methods may alternatively be embodied in specialized computer hardware.
- Conditional language such as, among others, “can,” “could,” “might” or “may,” unless specifically stated otherwise, are understood within the context to present that certain examples include, while other examples do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that certain features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether certain features, elements and/or steps are included or are to be performed in any particular example. Conjunctive language such as the phrase “at least one of X, Y or Z,” unless specifically stated otherwise, is to be understood to present that an item, term, etc. may be either X, Y, or Z, or a combination thereof.
- Any routine descriptions, elements or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternate implementations are included within the scope of the examples described herein in which elements or functions may be deleted, or executed out of order from that shown or discussed, including substantially synchronously or in reverse order, depending on the functionality involved as would be understood by those skilled in the art.
- It should be emphasized that many variations and modifications may be made to the above-described examples, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/588,204 US12056372B2 (en) | 2022-01-28 | 2022-01-28 | Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices |
PCT/US2022/048330 WO2023146605A1 (en) | 2022-01-28 | 2022-10-31 | Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/588,204 US12056372B2 (en) | 2022-01-28 | 2022-01-28 | Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices |
Publications (2)
Publication Number | Publication Date |
---|---|
US20230244390A1 true US20230244390A1 (en) | 2023-08-03 |
US12056372B2 US12056372B2 (en) | 2024-08-06 |
Family
ID=84421353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/588,204 Active 2042-06-05 US12056372B2 (en) | 2022-01-28 | 2022-01-28 | Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices |
Country Status (2)
Country | Link |
---|---|
US (1) | US12056372B2 (en) |
WO (1) | WO2023146605A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130297907A1 (en) * | 2012-01-18 | 2013-11-07 | Samsung Electronics Co., Ltd. | Reconfigurable storage device |
US20210132860A1 (en) * | 2019-11-01 | 2021-05-06 | Microsoft Technology Licensing, Llc | Management of multiple physical function non-volatile memory devices |
US20210342245A1 (en) * | 2020-05-04 | 2021-11-04 | EMC IP Holding Company LLC | Method and Apparatus for Adjusting Host QOS Metrics Based on Storage System Performance |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20210105201A (en) | 2020-02-18 | 2021-08-26 | 삼성전자주식회사 | Storage device configured to support multi-hosts and operation method thereof |
-
2022
- 2022-01-28 US US17/588,204 patent/US12056372B2/en active Active
- 2022-10-31 WO PCT/US2022/048330 patent/WO2023146605A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130297907A1 (en) * | 2012-01-18 | 2013-11-07 | Samsung Electronics Co., Ltd. | Reconfigurable storage device |
US20210132860A1 (en) * | 2019-11-01 | 2021-05-06 | Microsoft Technology Licensing, Llc | Management of multiple physical function non-volatile memory devices |
US20210342245A1 (en) * | 2020-05-04 | 2021-11-04 | EMC IP Holding Company LLC | Method and Apparatus for Adjusting Host QOS Metrics Based on Storage System Performance |
Also Published As
Publication number | Publication date |
---|---|
WO2023146605A1 (en) | 2023-08-03 |
US12056372B2 (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11237761B2 (en) | Management of multiple physical function nonvolatile memory devices | |
CN105893139B (en) | Method and device for providing storage service for tenant in cloud storage environment | |
US10521393B2 (en) | Remote direct memory access (RDMA) high performance producer-consumer message processing | |
US10324754B2 (en) | Managing virtual machine patterns | |
US9507636B2 (en) | Resource management and allocation using history information stored in application's commit signature log | |
US20180196603A1 (en) | Memory Management Method, Apparatus, and System | |
JP2023036774A (en) | Access control method of shared memory, access control device of shared memory, electronic apparatus, and autonomous vehicle | |
CN110750221B (en) | Volume cloning method, apparatus, electronic device and machine-readable storage medium | |
US20240220334A1 (en) | Data processing method in distributed system, and related system | |
US20230055511A1 (en) | Optimizing clustered filesystem lock ordering in multi-gateway supported hybrid cloud environment | |
EP3167370B1 (en) | Stream based event processing utilizing virtual streams and processing agents | |
US10785295B2 (en) | Fabric encapsulated resilient storage | |
US10976934B2 (en) | Prioritizing pages to transfer for memory sharing | |
JP7431490B2 (en) | Data migration in hierarchical storage management systems | |
US12056372B2 (en) | Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices | |
US10447607B2 (en) | System and method for dequeue optimization using conditional iteration | |
US10346424B2 (en) | Object processing | |
CN114741165A (en) | Processing method of data processing platform, computer equipment and storage device | |
US11977785B2 (en) | Non-volatile memory device-assisted live migration of virtual machine data | |
WO2018188416A1 (en) | Data search method and apparatus, and related devices | |
US9251100B2 (en) | Bitmap locking using a nodal lock | |
US12093528B2 (en) | System and method for managing data access in distributed systems | |
US11870668B1 (en) | System and method for managing data processing systems and hosted devices | |
KR101440605B1 (en) | User device having file system gateway unit and method for accessing to stored data | |
CN115794296A (en) | Link cloning method, system, equipment and storage medium based on hardware unloading |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, SCOTT CHAO-CHUEH;KOU, LEI;SHAH, MONISH SHANTILAL;AND OTHERS;SIGNING DATES FROM 20220121 TO 20220128;REEL/FRAME:061320/0424 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |