US20200296182A1 - Using machine learning to improve input/output performance of an application - Google Patents
Using machine learning to improve input/output performance of an application Download PDFInfo
- Publication number
- US20200296182A1 US20200296182A1 US16/353,153 US201916353153A US2020296182A1 US 20200296182 A1 US20200296182 A1 US 20200296182A1 US 201916353153 A US201916353153 A US 201916353153A US 2020296182 A1 US2020296182 A1 US 2020296182A1
- Authority
- US
- United States
- Prior art keywords
- operations
- modifying
- computing device
- data
- selected application
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010801 machine learning Methods 0.000 title description 14
- 238000000034 method Methods 0.000 claims description 120
- 230000008569 process Effects 0.000 claims description 101
- 238000004458 analytical method Methods 0.000 claims description 16
- 238000007906 compression Methods 0.000 claims description 10
- 230000006835 compression Effects 0.000 claims description 10
- 230000006872 improvement Effects 0.000 description 46
- 238000012545 processing Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 6
- 238000012546 transfer Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000007637 random forest analysis Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- APTZNLHMIGJTEW-UHFFFAOYSA-N pyraflufen-ethyl Chemical compound C1=C(Cl)C(OCC(=O)OCC)=CC(C=2C(=C(OC(F)F)N(C)N=2)Cl)=C1F APTZNLHMIGJTEW-UHFFFAOYSA-N 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- H04L67/2819—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
- H04L67/303—Terminal profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/349—Performance evaluation by tracing or monitoring for interfaces, buses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2178—Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
- G06F18/2185—Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor the supervisor being an automated module, e.g. intelligent oracle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G06K9/6256—
-
- G06K9/6264—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/564—Enhancement of application control based on intercepted application data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
Definitions
- This invention relates generally to computing devices and, more particularly to improving input/output (I/O) performance of one or more applications executing on a computing device.
- I/O input/output
- An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information.
- information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated.
- the variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications.
- information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
- a workstation may be used to execute applications that use a large amount of computing resources, such as, for example, central process unit (CPU) cycles, memory, storage, graphics processing unit (GPU) cycles, and the like.
- computing resource-intensive applications include Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, and the like.
- Storage in a system is a stack of software and hardware that includes files systems, volumes and volume managers, class drivers and I/O drivers, and disk subsystems.
- storage and memory e.g., data cache
- Disk subsystem technologies such as non-volatile memory express (NVME), rotating media (e.g., disk drives), tiered storage (e.g., drives with built-in cache) and others have different interfaces and capabilities that result in complex interactions with application workloads.
- NVME non-volatile memory express
- rotating media e.g., disk drives
- tiered storage e.g., drives with built-in cache
- the physical attributes of a system such as single/multi physical disk devices (e.g., configured as a Redundant Array of Independent Disks (RAID)) and logical volumes creates many I/O patterns in a workstation.
- Storage devices are unaware of workload variations (e.g., read or write rates) and treat all I/O equally, e.g., without regard to which application provides the I/O workload.
- workload variations e.g., read or write rates
- Some of the variation of load may be due to translation layers between an application I/O and disk I/O handled by the operating system.
- I/O requests and interactions with data cache may require adjustment of memory and cache as well as physical device parameters, for each application workload.
- a workstation manufacturer may provide predefined profiles with each workstation that configure various parameters associated with the workstation's resources to improve performance for popular applications. For example, a profile may modify parameters, such as cache size, a location where a temporary file is created, memory allocation size, page size, and the like, to improve I/O performance.
- parameters such as cache size, a location where a temporary file is created, memory allocation size, page size, and the like.
- the manufacturer is only able to provide predefined profiles for popular applications because the manufacturer cannot test all available applications.
- the manufacturer can only test applications that have been available for an amount of time sufficient to enable the manufacturer to perform testing and create the corresponding profile.
- the predefined profiles may be created to improve performance for commonly performed tasks for popular applications. For example, the manufacturer may create a profile to improve the performance of a particular set of tasks performed using a particular application. However, if the user performs a different set of tasks using the particular application, then the predefined profile may provide minimal performance improvement or may even degrade performance. Thus, providing predefined profiles for popular applications may not significantly improve the performance of a relatively new application, a new version of an existing application, or an application that is not considered a popular application.
- a computing device may perform various operations.
- the operations may include receiving, via a user interface, a user selection of a particular application from a plurality of applications to create a selected application.
- the operations may include determining that the selected application is executing and accessing (e.g., performing operations to) an input/output stack of the computing device.
- the operations may include gathering, over a predetermined interval of time, data associated with the selected application that is performing the operations to the input/output stack. After gathering the data, the operations may include performing an analysis of the data and determining, by a classifier and based at least in part on the analysis, a particular workload type from a predefined set of workload types that is associated with the selected application.
- the classifier may be trained using multiple hardware platforms, multiple storage configurations, multiple workloads, and the predefined plurality of profiles to classify a workload based on input/output operations performed by a particular application and to identify a profile to increase performance of the input/output operations.
- the operations may include selecting a particular profile from a plurality of predefined profiles based at least in part on the particular workload type, and modifying, based on the particular profile, a plurality of parameters to create a plurality of modified parameters.
- the modified parameters may reduce an execution time of performing the operations to the input/output stack.
- FIG. 1 is a block diagram of a computing device executing a classifier (e.g., a machine learning algorithm) that gathers data associated with an application and selects a profile to configure resources of the computing device, according to some embodiments.
- a classifier e.g., a machine learning algorithm
- FIG. 2 is a block diagram illustrating training a classifier, according to some embodiments.
- FIG. 3 is a block diagram illustrating examples of variables used to train a machine learning algorithm, according to some embodiments.
- FIG. 4 is a flowchart of a process that includes training a classifier, according to some embodiments.
- FIG. 5 is a flowchart of a process that includes configuring parameters associated with an I/O system based on a profile, according to some embodiments.
- FIG. 6 illustrates an example configuration of a computing device that can be used to implement the systems and techniques described herein.
- FIG. 7 is a block diagram illustrating classifying a workload of an app, according to some embodiments.
- an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, calculate, determine, classify, process, transmit, receive, retrieve, originate, switch, store, display, communicate, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes.
- an information handling system may be a personal computer (e.g., desktop or laptop), tablet computer, mobile device (e.g., personal digital assistant (PDA) or smart phone), server (e.g., blade server or rack server), a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price.
- the information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, touchscreen and/or video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components.
- RAM random access memory
- processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory.
- Additional components of the information handling system may include one or more disk drives, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, touchscreen and/or video display.
- I/O input and output
- the information handling system may also include one or more buses operable to transmit
- the systems and techniques described herein use machine learning to increase application (“app”) performance related to input/output (I/O), such as read operations and write operations.
- the I/O in a computing device may have multiple layers, collectively referred to as an I/O stack.
- the computing device may use virtual memory to enable multiple apps to share a physical storage device (e.g., a disk drive). Portions (e.g., pages) of virtual memory may be stored in main memory (e.g., random access memory (RAM)) and swapped back and forth with the physical storage device.
- the physical storage device may use a high-speed memory, known as a cache, to increase throughput when apps access the physical storage device.
- an app when an app performs a write, the app may write to virtual memory in RAM, which is then sent to the physical storage device and stored in the cache, before being written to the physical storage device.
- an operating system may provide a file system to enable apps to perform I/O.
- the file system may use blocks having a particular size such that a large file is stored as multiple blocks.
- the multiple blocks may be located at different locations in the physical storage device and the file system may keep track of the locations of each of the multiple blocks such that the apps are unaware that the large file is being stored as multiple blocks in multiple (e.g., non-contiguous) locations.
- the systems and techniques described herein perform an analysis of how an application accesses the I/O stack and, based on the analysis, characterize the application as having a particular type of workload.
- the systems and techniques select a profile that is designed to improve I/O (e.g., increased throughput, faster execution, lower latency, and the like) for the particular type of workload.
- Each particular type of workload has particular characteristics (e.g., read/write (R/W) ratio, queue depth, block size, and the like) with regard to how the app accesses the I/O stack.
- the systems and techniques monitor what the app is doing to the I/O stack and identify a particular profile with settings that improve I/O performance when accessing the I/O stack (e.g., the settings improve performance as compared to the default configuration used by the operating system and the application when initially installed).
- each profile is designed to improve a particular type of I/O workload rather than a particular type of app, making the systems and techniques application agnostic.
- an app's I/O workload is monitored and an appropriate profile selected.
- the profile may configure parameters associated with the application to improve throughput, reduce execution time, and the like.
- the system and techniques create a classifier using a machine learning algorithm such as, for example, Random Forest, Neural Network, or the like.
- a machine learning algorithm such as, for example, Random Forest, Neural Network, or the like.
- Combinations of different hardware platforms and different storage configurations are used to execute different types of workloads using different types of profiles and data associated with the workload characteristics is gathered.
- the data is used to train the classifier to identify which profile (among multiple profiles that were tested) provides the highest performance (e.g., fastest execution time) for a particular workload executing on a particular hardware platform having a particular storage configuration.
- the workload characteristics e.g., parameters
- the characteristics may include (1) logical and physical I/O operation counters such as, for example, I/O Read Operations/sec, I/O Write Operation, I/O Read Operations/sec, I/O Write Operations/sec, I/O Data Operations/sec, (2) process parameters, such as, for example, Virtual Bytes, Cache Copy Read Hits %, Cache Copy Reads/Sec, Page File Bytes/sec, (3) caching and operating system (OS) information, such as, for example, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes, and the like.
- OS operating system
- the classifier identifies which profile from among multiple tested profiled provides the most efficient usage of I/O resources to achieve faster throughput, faster execution time, and the like reduces the impact on the user.
- the classifier may identify a subset (e.g., top N, where 0 ⁇ N ⁇ 100) of the up to 1,000 I/O related parameters.
- the subset is typically between about 50 to about 100 parameters, and may typically be about 70 parameters.
- the subset of parameters may be those parameters that have the highest influence on increasing throughput, e.g., the parameters that, when varied, cause the largest change (e.g., improvement) and provide the most “bang for the buck”.
- the subset of parameters are later used when the classifier is deployed to classify a workload.
- One of the reasons for determining the subset of parameters is because monitoring up to 1,000 parameters on a user's computing device would significantly slow down execution of apps.
- the subset of parameters may include, for example, Cache ⁇ Async Copy Reads/sec, Cache ⁇ Copy Read Hits %, Cache ⁇ Copy Reads/sec, Cache ⁇ Data Map Hits %, Cache ⁇ Data Maps/sec, Cache ⁇ Dirty Page Threshold, Cache ⁇ Dirty Pages, Cache ⁇ MDL Read Hits %, Cache ⁇ MDL Reads/sec, Cache ⁇ Pin Read Hits %, Cache ⁇ Pin Reads/sec, Cache ⁇ Read Aheads/sec, LogicalDisk(Total) ⁇ % Disk Read Time, LogicalDisk(Total) ⁇ % Disk Time, LogicalDisk(Total) ⁇ % Disk Write Time, LogicalDisk(Total) ⁇ Avg.
- Disk Bytes/Read PhysicalDisk(Total) ⁇ Avg.
- Disk Bytes/Write PhysicalDisk(Total) ⁇ Avg. Disk Read Queue Length
- a performance improvement software application (e.g., Dell® Precision Optimizer or the like) that includes the classifier may be deployed (e.g., installed) on workstations (e.g., information handling systems).
- the software application may provide a user interface (UI) that enables a user to select one or more apps (e.g., Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, and the like).
- apps e.g., Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, and the like.
- apps e.g., Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media
- the software application may gather data associated with the subset of I/O parameters for a predetermined period of time (e.g., 15, 30, 45, 60 minutes or the like). For each of the selected apps, the classifier may use the gathered data to characterize the workload for each of the selected apps, select a profile that corresponds to the workload, and apply the profile by configuring various I/O related parameters (e.g., size of cache, size of pagefile, location of temporary files, and the like) of the workstation. For example, the classifier may have identified a particular set of workloads (e.g., between 10 and 50 different types of workloads) that encompass the majority of the workloads presented by different applications. To illustrate, in some cases, the classifier may identify about 25 workloads and 25 corresponding profiles.
- a predetermined period of time e.g. 15, 30, 45, 60 minutes or the like.
- the classifier may use the gathered data to characterize the workload for each of the selected apps, select a profile that corresponds to the workload, and apply the profile by configuring various
- the training data for the classifier may be gathered using different types of storage devices with different types of interfaces. For example, different amounts of RAM, different types of storage, and the like may be used to generate the training data.
- the different types of storage may include mechanical-based disk drivers and solid-state drives (SSD) that use different types of interfaces, such as serial ATA (SATA), Non-Volatile Memory Express (NVMe), or the like.
- the classifier may provide recommendations to improve performance, such as, for example, “Increasing RAM from 8 GB to 16 GB will provide up to X % improvement in execution times”, “Switching from a first type of storage device (e.g., mechanical disk drive) to a second type of storage device (e.g., SSD) may provide up to Y % improvement in execution times, and switching to a third type of storage device (e.g., NVME) may provide up to Z % improvement in execution times.” (X, Y, and Z>0)
- the systems and techniques described herein provide a way to characterize, at runtime, an application workload and improve the performance, in terms of storage I/O (e.g., the storage stack, from the file system to the physical device).
- a machine learning system is used to gather data associated with an app's workload (e.g., the way in which the app performs I/O), select a similar predetermined workload to the app's workload from a set of predefined workloads, select a corresponding profile to the similar predetermined workload, and configure parameters associated with the I/O (e.g., operating system parameters, device driver parameters, app parameters, device parameters, and the like) to improve the app's I/O performance.
- the performance improvements may result in the same task executing faster after the profile is applied as compared to before the profile is applied, executing more tasks in a particular period of time, and the like.
- the systems and techniques improve (e.g., optimize) native application performance by analyzing data across layers of the storage stack including the physical disk, cache, logical disk, memory and pagefile to allow the application to make the best use of the relevant computing resources (e.g., storage-related resources).
- a classifier is trained in a non-production environment (e.g., a lab environment).
- a set of parameters e.g., up to 1,000, in some cases around 700
- the workloads are executed on different types of platforms having different configurations and different types of storage types.
- a platform may be a particular motherboard version with a particular chip set.
- the configurations may vary based on, for example, the type of processor (e.g., Intel® i3, i5, i7, and the like), the processor generation, the clock speed, the amount of RAM, the amount of storage, the type of storage (e.g., mechanical, SSD, or the like), the storage interface (e.g., SATA-3, SATA-6, NVME, or the like), and the like.
- the type of processor e.g., Intel® i3, i5, i7, and the like
- the processor generation e.g., the processor generation, the clock speed, the amount of RAM, the amount of storage, the type of storage (e.g., mechanical, SSD, or the like), the storage interface (e.g., SATA-3, SATA-6, NVME, or the like), and the like.
- the set of parameters used to characterize the workload may include variables and counters, such as, for example, (1) logical and physical I/O operation counters, such as I/O Read Operations/sec, I/O Write Operation, I/O Read Operations/sec, I/O Write Operations/sec, and I/O Data Operations/sec, (2) process parameters, such as, for example, Virtual Bytes, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes/sec, (3) caching and O/S information, such as, for example, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes, and the like.
- I/O Opera counters such as I/O Read Operations/sec, I/O Write Operation, I/O Read Operations/sec, I/O Write Operations/sec, and I/O Data Operations/sec
- process parameters such as, for example, Virtual Bytes, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes/sec
- the parameters are measured when the application is performing different tasks (e.g., workloads) and used to determine the I/O profile of the application.
- the influence of each of the parameters is ranked using Mean Decrease Gini (e.g., based on Random Forest) and a subset of the parameters are selected based on the ranking (e.g., the N highest ranked parameters are selected, N>0).
- the workloads may be grouped based on a similarity of the parameters to consolidate and reduce the number of workloads. For example, if a number (e.g., M>0) of the highest ranked parameters are similar, e.g., within a predetermined range, then the workloads may be grouped into a single workload, e.g., a particular workload in which a first parameter is within a first range, a second parameter is within a second range, and so on.
- the parameters for a workload may be determined and compared with the workloads identified in the non-production environment to determine a closest matching workload.
- benchmarking tools may be used to classify of workloads based on a particular set of characteristics, such as, for example, a ratio of reads to writes, block size, and the like.
- a correlational analysis may be used to rank a dependency level between characteristics to enable each workload type to be uniquely identified based on the measured characteristics of a workload.
- a machine learning classifier is trained to classify a workload based on measuring a set of parameters over a period of time and select a profile corresponding to the workload.
- the data used to train the classifier is generated by executing different types of workloads across multiple platforms using variations of multiple parameters.
- the classifier identifies parameter settings that increase the performance of particular types of workloads in terms of bandwidth, input/output operations per second (IOPS), latency, and the like.
- IOPS input/output operations per second
- each workload from multiple workloads is executed on different hardware configurations and the performance measured (e.g., read performance, write performance, read/write performance, and the like).
- the process of executing different workloads is repeated while varying the values of the performance variables to create a tree structure for the resulting data.
- the tree structure is used by the machine learning algorithm to make a decision as to the configuration that provides the highest performance for a particular application that provides a particular type of I/O workload.
- the trained machine learning model (such as supervised or deep learning) is deployed on client devices.
- the trained classifier is installed and shipped with systems (e.g., workstations, such as Dell® Precision) that may be used to execute applications that use a significant amount of computing resources, such as storage I/O.
- a UI enables a user to select one or more apps. When one of the selected apps is executing, the subset of parameters (e.g., that most influence I/O operations) are measured for a predetermined period of time. The data gathered by monitoring the subset of parameters is used with a decision tree to identify a profile with configuration settings to improve performance. Thus, a profile may be selected at runtime to improve performance for a particular application.
- a computing device may include one or more processors and one or more non-transitory computer readable media storing instructions executable by the one or more processors to perform various operations.
- the operations may include displaying a user interface (UI) and receiving, via the UI, a user selection of a particular application from a plurality of applications to create a selected application.
- the plurality of applications may include, for example, Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, or the like.
- the operations may include determining that the selected application is executing and performing operations to an input/output stack of the computing device.
- the input/output stack may include: (i) a file system used by the computing device, (ii) a random-access memory used by the computing device, (iii) a logical storage used by the operating system, (iv) a cache allocated in the random-access memory by the operating system, (v) a pagefile used by the operating system, and (vi) physical storage accessible to the operating system.
- the operations may include gathering, over a predetermined interval of time (e.g., fifteen minutes, thirty minutes, forty-five minutes, sixty minutes, or the like) data associated with the selected application that is performing the operations to the input/output stack.
- the operations may include performing an analysis of the data and determining, by a classifier and based at least in part on the analysis, a particular workload type from a predefined set of workload types that is associated with the selected application.
- the operations may include ordering, according to frequency of occurrence, the operations performed by the selected application to the input/output stack, determining a subset of the operations comprising a plurality of most frequently performed operations performed by the selected application to the input/output stack, comparing the subset of the operations to frequent operations associated with each of the predefined set of workload types, and determining that the subset of the operations associated with the selected application matches the frequent operations associated with the particular workload type.
- the classifier may be trained using multiple hardware platforms, multiple storage configurations, multiple workloads, and the predefined plurality of profiles to classify a workload based on input/output operations performed by a particular application and to identify a profile to increase performance of the input/output operations.
- the operations may include selecting a particular profile from a plurality of predefined profiles based at least in part on the particular workload type, and modifying, based on the particular profile, a plurality of parameters to create a plurality of modified parameters.
- the modified parameters may reduce an execution time of performing the operations to the input/output stack.
- modifying the plurality of parameters to create the plurality of modified parameters comprises at least one of: modifying a process priority associated with the application (e.g., to a highest process priority), modifying a power plan of the operating system (e.g., to a high-performance power plan), modifying (e.g., enabling or disabling) a hyperthreading feature associated with the one or more processors, modifying (e.g., enabling or disabling) a core parking feature associated with the one or more processors, modifying (e.g., enabling or disabling) a compression feature to compress data stored in the random-access memory, modifying (e.g., enabling or disabling) a page combining feature of the operating system to remove duplicates of content stored in the random-access memory, modifying (e.g., enabling or disabling) a vertical synchronization feature associated with synchronizing a frame rate output of the selected application with a monitor refresh rate of a display device associated with the computing device, or
- FIG. 1 is a block diagram 100 of a computing device executing a classifier (e.g., a machine learning algorithm) that gathers data associated with an application and selects a profile to configure resources of the computing device, according to some embodiments.
- a computing device 102 may include one or more applications 104 and a performance improvement tool 106 .
- the computing device 102 may be a workstation, such as Dell® Precision workstation (e.g., a laptop or a desktop).
- the applications (“apps”) 104 may include one or more applications, such as, for example, Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, or the like.
- the performance improvement tool 106 may be an application, such as, for example, Dell® Precision Optimizer or similar.
- the performance improvement tool 106 may provide an app selection user interface (UI) 108 to enable a user to select one or more apps, such as apps 110 ( 1 ) to app 110 (N) (N>0) from among the apps 104 .
- apps 110 may be a subset of the applications 104 .
- the UI 108 may enable a user to select up to five apps 110 .
- the performance improvement tool 106 may determine when one of the apps 110 , such as the app 110 (N), is being executed by the computing device 102 and monitor the input/output (I/O) requests 112 ( 1 ) to 112 (P) (P>0) from the app 110 (N) to an I/O stack 114 .
- the I/O stack 114 may include a file system 116 , a memory 118 , a logical storage 120 , a cache 122 , a pagefile 123 , and physical storage 124 .
- the pagefile 123 is used in paging, a type of memory management scheme by which the computing device 102 stores and retrieves data from the physical storage 124 for use in the main memory (e.g., RAM) 118 .
- An operating system 148 may retrieve data from physical storage 124 in same-size blocks called pages (e.g., the representative pagefile 123 ).
- Paging is a part of virtual memory feature of the operating system 148 that enables the applications 104 to exceed the size of the available physical storage 124 .
- the I/O stack 114 illustrated in FIG. 1 is merely an example and the particular layers of the I/O stack 114 may vary in different implementations based on the operating system 148 , the file system 114 , the physical storage 124 , and the like.
- a workstation running a different operating system 148 such as Linux® (or another Unix® variant), may have different layers in the I/O stack 114 than what is illustrated in FIG. 1 .
- the performance improvement tool 106 may determine when one of the apps 110 , e.g., the app 110 (N), is executing and monitor the I/O requests 112 ( 1 ) to 112 (P) to the I/O stack 114 for a predetermined amount of time (e.g., 15, 30, 45, 60 minutes or the like). During this time, the performance improvement tool 106 may gather the data 126 (N) to characterize the workload presented by the app 110 (N) and then select one of the profiles 132 . The process of gathering the data 126 (N) associated with the app 110 (N) and selecting one of the profiles 132 is typically done once, e.g., after the user opens the app selection UI 108 and selects on or more of the apps 110 from the applications 104 .
- a predetermined amount of time e.g. 15, 30, 45, 60 minutes or the like.
- the process of gathering the data 126 (N) may degrade the performance of the app 110 (N)
- the process is performed when the user desires a performance improvement in the app 110 (N).
- the process of gathering the data 126 (N) may not subsequently be performed.
- the user may instruct the performance improvement tool 106 to gather the data 126 (N) associated with the app 110 (N) and select one of the profiles 132 by opening the app selection UI 108 and selecting the app 110 (N).
- the user may install the new profiles and instruct the performance improvement tool 106 to gather the data 126 (N) associated with the app 110 (N) and select one of the profiles 132 by opening the app selection UI 108 and selecting the app 110 (N).
- the user may modify the configuration of the computing device 102 by adding more memory (e.g., RAM), adding another storage device with a faster interface (e.g., NVME instead of SATA), adding a storage device with more cache and/or faster I/O characteristics, or the like.
- the user may instruct the performance improvement tool 106 to gather the data 126 (N) associated with the app 110 (N) and select one of the profiles 132 by opening the app selection UI 108 and selecting the app 110 (N).
- the performance improvement tool 106 may monitor the input/output (I/O) requests 112 for each of the selected apps when each of the selected apps is executing (e.g., the first time each of the selected apps is being executed) and gather data 126 ( 1 ) associated with how the app 110 ( 1 ) uses the I/O stack 114 and gather data 126 (N) associated with how the app 110 (N) uses the I/O stack 114 .
- the data 126 (N) may be gathered across each of the layers of the I/O stack 114 .
- the gathered data 126 may include between about 50 to 100 I/O related parameters (e.g., preferably about 70), such as, for example,
- the subset of parameters may include, for example, Cache ⁇ Async Copy Reads/sec, Cache ⁇ Copy Read Hits %, Cache ⁇ Copy Reads/sec, Cache ⁇ Data Map Hits %, Cache ⁇ Data Maps/sec, Cache ⁇ Dirty Page Threshold, Cache ⁇ Dirty Pages, Cache ⁇ MDL Read Hits %, Cache ⁇ MDL Reads/sec, Cache ⁇ Pin Read Hits %, Cache ⁇ Pin Reads/sec, Cache ⁇ Read Aheads/sec, LogicalDisk(Total) ⁇ % Disk Read Time, LogicalDisk(Total) ⁇ % Disk Time, LogicalDisk(Total) ⁇ % Disk Write Time, LogicalDisk(Total) ⁇ Avg.
- Disk Bytes/Read PhysicalDisk(Total) ⁇ Avg.
- Disk Bytes/Write PhysicalDisk(Total) ⁇ Avg. Disk Read Queue Length
- a trained classifier 128 may analyze the data 126 and identify a predefined workload type 130 from among workloads types 130 ( 1 ) to 130 (M) (M>0, M not necessarily equal to N) that is closest (e.g., most similar) to the type 129 of a workload (e.g., determined based on the data 126 ) that the selected apps 110 present to the I/O stack 114 .
- M may be between 10 and 30, such as about 25 different types of predefined workloads.
- the classifier 128 may analyze the data 126 (N) associated with the app 110 (N) and determine that the I/O requests 112 to the I/O stack 114 present a workload type 129 (N) that is similar (e.g., closest) to the predetermined workload type 130 (M) and select the profile 132 (M).
- the performance improvement tool 106 may apply the settings in the profile 132 (M) to the computing device 102 to improve the performance of the app 110 (N), e.g., as related to accessing the I/O stack 114 .
- the profile 132 (M) may modify various parameters associated with the I/O stack 114 , an operating system 148 of the computing device 102 , the app 110 (N), another set of parameters, or any combination thereof. Applying the profile 132 (M) causes an increase in the speed at which the I/O requests 112 are executed, thereby reducing execution time and increasing throughput for the app 110 (N).
- the profile 132 (M) may modify the size of caches, queues, counters, and other data structures in the I/O stack to improve the execution time of the I/O requests 112 of the app 110 (N).
- the profile 132 (M) may modify parameters 134 of the file system 116 , such as the type of file system (e.g., FAT, exFAT, NTFS, or the like), cluster size, volume size, whether compression is used and if so, what type of compression is used, encryption, and the like.
- the profile 132 (M) may modify parameters 136 of the memory 118 , such as how much of the memory 118 is allocated for paging, and other memory-related settings.
- the profile 132 (M) may modify parameters 138 associated with the logical storage 120 , such as how the logical storage 120 is implemented.
- the profile 132 (M) may modify parameters 140 associated with the cache 122 , such as a size of the cache 122 , under what conditions the contents of the cache 122 are written to the physical storage 124 , and the like.
- the profile 132 (M) may modify parameters 142 associated with the pagefile 123 , such as a size of the pagefile 123 , under what conditions paging occurs, and the like.
- the profile 132 (M) may modify parameters 144 associated with the physical storage 124 .
- the profile 132 (M) may modify various parameters 146 of the app 110 (N), such as the location of a temporary file, the size of various internal caches and queues, and the like.
- various parameters 146 of the app 110 (N) such as the location of a temporary file, the size of various internal caches and queues, and the like.
- a video editor application may enable a location of a temporary file to be specified. If the temporary file is located on the same storage device as the app 110 (N), then I/O requests to access portions of the application software and I/O requests to access the temporary file are placed in the same queue as they access the same storage device.
- the profile 132 (M) may modify the parameters 146 to locate the temporary file (e.g., video file(s), photo file(s), audio file(s), illustration file(s), and the like) on a second storage device while the app 110 (N) executes on a first storage device.
- the temporary file e.g., video file(s), photo file(s), audio file(s), illustration file(s), and the like
- I/O requests to access portions of the application software are placed in a first queue associated with the first storage device and I/O requests to access the temporary file are placed in a second queue associated with the second storage device. In this way, the app I/O requests and the temporary file I/O requests can be performed substantially in parallel.
- the profiles 132 may modify one or more parameters 148 of the operating system 148 to improve performance of the apps 110 .
- the parameters 150 may include process priorities 152 , a power plan 154 , Vsync 156 , hyperthreading 158 , core parking 160 , superfetch 162 , cache VMEM 164 , memory compression 166 , page combining 168 , and other parameters 170 .
- the process priorities 152 may include a priority level, e.g., high, normal, or low, associated with each process (e.g., an instance of a software application).
- the power plan 154 may be one of multiple plans, such as, for example, a high-performance plan, a balanced plan, and a power save plan.
- Vsync 156 refers to Vertical Synchronization (Vsync), a display option to synchronize the frame rate output by an application (e.g., via a graphics card) with the monitor refresh rate. Because a graphic processor executes as fast as possible, extremely high frame rates may be output, e.g., faster than the display device is capable of displaying. Enabling VSync caps (e.g., throttles) the monitor's refresh rate and may avoid excessive strain on the graphics processor.
- Vsync Vertical Synchronization
- Hyperthreading 158 provides simultaneous multithreading to improve parallelization of computations (doing multiple tasks at once) performed by multiple cores of a central processing unit (CPU).
- core parking 160 cores of CPU that do not have threads scheduled for execution are parked (e.g., place in a low power state to conserve power). A parked core may take time to be unparked to enable the core to execute a thread, thereby causing a delay. Thus, turning core parking 160 may increase performance because an app does not wait for parked core to become available to execute a thread associated with the app.
- SuperFetch 162 is a pre-fetch feature of a memory manager of the operating system 148 that is used to cache (e.g., in RAM) frequently-accessed data instead of on a storage device because data can be retrieved from the cache faster than from the storage device.
- Cache virtual memory (vMem) 164 is memory that is allocated for virtualization (e.g., such as that provided by VMware® or similar software). For example, virtual memory addresses associated with each virtual machine is translated to physical memory addresses.
- Memory compression 166 is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the storage.
- Page combining 168 is a technique to free up memory (RAM) in which the operating system 148 analyzes the content of memory, locates duplicate content, and keeps a single copy of particular content while removing duplicate content from the memory.
- RAM free up memory
- these are merely examples of parameters that can be modified and other parameters 170 may be changed depending on the operating system (e.g., Windows, MacOS, iOS, Android, Linux, and the like), the operating system version, and so on.
- Table 1 illustrates examples of the application and operating system parameters that each of the profiles 132 may modify.
- the profile 132 (M) associated with the app 110 (N) may configure the parameters as illustrated in Table 2.
- the performance improvement tool 106 may provide recommendations 172 to improve performance of one or more of the apps 110 .
- the recommendations 172 may include “Increasing RAM from 8 GB to 16 GB will provide up to X % improvement in execution times for app 110 (N)”, “For app 110 (N), switching from a first type of storage device (e.g., mechanical disk drive) to a second type of storage device (e.g., SSD) may provide up to Y % improvement in execution times, and switching to a third type of storage device (e.g., NVME) may provide up to Z % improvement in execution times.” (X, Y, and Z>0).
- the recommendations 172 may include “Upgrading to the latest Precision workstation with 4.2 GHz 17 processor, 16 GB RAM, and 256 GB NVME memory will yield an improvement of X % for app 110 (N).”
- a performance improvement tool may provide a UI to enable a user to select one or more apps.
- the tool detects that one of the selected apps is being executed (e.g., for the first time after being selected)
- the tool gathers data as to how the I/O requests of the selected app affect the I/O stack.
- the data is gathered for a predetermined period of time, such as 15, 30, 45, 60 minutes or the like.
- a classifier analyzes the gathered data and characterizes the workload presented by the I/O requests of the app to the I/O stack.
- the classifier compares the workload of the app to multiple predefined workloads and identifies a closest (e.g., most similar) workload from among the predefined workloads.
- the classifier selects a profile that corresponds to the closest workload.
- the process of gathering data, analyzing the data, identifying a most similar workload, and selecting a corresponding profile is repeated for each selected app.
- a profile is associated with an app
- each time the app is executed the associated profile is applied.
- the profile is applied by modifying various parameters associated with (e.g., that affect) the I/O stack. Applying the profile modifies the parameters associated with the I/O stack to enable the I/O stack to execute the I/O requests from the app in a faster period of time, thereby reducing execution time and increasing throughput.
- the associated first profile is applied.
- an associated second profile is selected and applied to reconfigure the parameters to improve throughput when the second app is executing.
- FIG. 2 is a block diagram 200 illustrating training a classifier, according to some embodiments.
- the classifier 128 of FIG. 1 may be created and trained using the process 200 .
- the classifier is created.
- software instructions that implement one or more machine learning algorithms e.g., Random Forest, Neural Networks, or the like
- the classifier may be trained using training data 206 .
- the training data 206 may include data that has been pre-classified (e.g., by a human, by another classifier, or a combination thereof).
- the classifier may be used to classify test data 210 .
- the test data 210 may have been pre-classified by a human, by another classifier, or a combination thereof.
- An accuracy with which the classifier classified the test data 210 may be determined. If the accuracy does not satisfy a desired accuracy, then the classifier may be tuned, at 212 , to achieve a desired accuracy.
- the desired accuracy may be a predetermined threshold, such as ninety-percent, ninety-five percent, ninety-nine percent and the like.
- the classifier may be further tuned by modifying the algorithms based on the results of classifying the test data 210 . 208 and 212 may be repeated (e.g., iteratively) until the accuracy of the classifier satisfies the desired accuracy.
- the process may proceed to 214 where the accuracy of the classifier may be verified using verification data 216 .
- the verification data 216 may have been pre-classified by a human, by another classifier, or a combination thereof.
- the verification process may be performed at 214 to determine whether the classifier exhibits any bias towards the training data 206 and/or the test data 210 .
- the verification data 216 may be documents that are different from both the test data 210 or the training data 206 .
- the trained classifier 128 may be used to identify the set of workload types 130 , with each of the workload types 130 affecting the I/O stack 114 in a different way as compared to others of the workload types 130 .
- the training data 206 , the test data 210 , and the verification data 216 may include data gathered by using different hardware platforms 220 , with each of the hardware platforms 220 having a different storage configurations 222 , such as different amount of RAM, different amount of physical storage, different types (e.g., mechanical, SSD, and the like) of physical storage, different storage interfaces (e.g., SATA, NVME, network attached storage (NAS) or the like.
- the data 206 , 210 , 216 may be gathered based on performing multiple execution runs 228 using different workloads 224 on various hardware platforms 220 and storage configurations 222 using different profiles 226 .
- the classifier 128 may associate the profile 132 (M) with the workload type 130 (M) because, among the profiles 226 , the profile 132 (M) provides the fastest execution of I/O requests for the workload type 130 (M).
- FIG. 3 is a block diagram 300 illustrating examples of variables used to train a machine learning algorithm, according to some embodiments.
- FIG. 3 illustrates examples of the variables influencing I/O throughput.
- system file reads 302 I/O reads 304 , copy cache reads 306 , process reads 308 , cache writes 310 , logical disk reads 312 , system file writes 314 , disk read queue length 316 , disk write queue length 318 , and cache data flush 320 .
- a correlational analysis may be used to rank the dependency level between I/O related variables, such as the variables identified in FIG. 3 .
- I/O related variables such as the variables identified in FIG. 3 .
- the variables may be ranked according to each variable's dependency level, e.g., how much the variable influences I/O, and the variables ranked, as illustrated in FIG. 3 , based on mean decrease Gini.
- the mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of nodes and leaves in a resulting random forest. Variables that result in nodes with higher purity have a higher decrease in Gini coefficient, indicating a greater influence.
- a subset of the variables e.g., the top X (e.g., 10 ⁇ X ⁇ 100) variables (e.g., having the highest mean decrease Gini), may be selected.
- the subset of variables is used to gather the data 126 associated with each of the selected apps 110 .
- monitoring 700 different I/O related variables when gathering the data 126 is impractical in a runtime environment because the execution speed of the apps 110 would slow down significantly.
- monitoring the top X variables is sufficient to characterize the workload of each of the apps 110 because the top X variables have the largest influence (e.g., greatest mean decrease Gini) over I/O.
- the variables 302 , 304 , 306 , 308 , and 310 may be measured and used to characterize the I/O workload.
- the workload may be identified as most similar to the workload type 130 ( 1 ) and assigned the profile 132 ( 1 ).
- the profile 132 ( 1 ) may configure operating system and application variables to improve the execution time for system file reads 302 and I/O reads 304 .
- the workload may be identified as most similar to the workload type 130 (M) and assigned the profile 132 (M).
- the profile 132 (M) may configure operating system and application variables to improve the execution time for cache writes 310 and process reads 308 . In this way, each profile reduces the execution time for the most influential variables.
- each block represents one or more operations that can be implemented in hardware, software, or a combination thereof.
- the blocks represent computer-executable instructions that, when executed by one or more processors, cause the processors to perform the recited operations.
- computer-executable instructions include routines, programs, objects, modules, components, data structures, and the like that perform particular functions or implement particular abstract data types.
- the order in which the blocks are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
- the processes 400 and 500 are described with reference to FIGS. 1, 2, and 3 , as described above, although other models, frameworks, systems and environments may be used to implement this process.
- FIG. 4 is a flowchart of a process 400 that includes training a classifier, according to some embodiments.
- the process 400 may be performed in a non-production environment to create the classifier 128 of FIG. 1 .
- a hardware platform may be selected.
- a storage configuration may be selected.
- a workload may be selected.
- a profile may be selected.
- the workload may be executed for a particular period of time.
- data associated with the I/O stack may be gathered.
- the gathered data, the platform, the configuration, the workload, and the profile may be stored.
- the classifier may be trained, as described in FIG. 2 , using the data gathered in 402 , 404 , 406 , 408 , 410 , 412 , 414 , 416 , 418 , 420 , and 422 .
- the classifier may be used to identify a particular profile from multiple profiles that provides a fastest I/O execution for each workload.
- FIG. 5 is a flowchart of a process 500 that includes configuring parameters associated with an I/O system based on a profile, according to some embodiments.
- the process 500 may be performed by the performance improvement tool 106 of FIG. 1 .
- the performance improvement tool may be installed on a computing device (e.g., a workstation, such as Dell® Precision).
- the tool may include predefined workloads and corresponding profiles.
- the performance improvement tool 106 e.g., Dell® Precision Optimizer or similar
- the performance improvement tool 106 may include the workload types 130 and the profiles 132 .
- the tool may display a UI.
- a selection of one or more apps (e.g., for which performance is to be improved, e.g., “optimized”) may be received via the UI.
- the performance improvement tool 106 may provide the app selection UI 108 to enable a user to select one or more apps, such as apps 110 ( 1 ) to app 110 (N) (N>0) from among the apps 104 .
- the process may determine that one of the selected apps is executing (e.g., after having been selected via the UI).
- data associated with how the app uses the I/O stack may be gathered for a predetermined period of time.
- the performance improvement tool 106 may monitor the input/output (I/O) requests 112 for a particular one of the selected apps 110 when the selected app is executing and gather data 126 associated with how the selected app uses the I/O stack 114 .
- the data 126 may be gathered across each of the layers of the I/O stack 114 for a predetermined period of time (e.g., 15, 30, 45, 60 minutes or the like).
- an analysis of the data may be performed using a machine learning algorithm (e.g., a classifier).
- a closest predefined workload may be determined.
- a profile corresponding to the closest predefined workload may be selected.
- the process may configure one or more parameters of the computing device based on the profile.
- the classifier 128 may analyze the data 126 (N) associated with the app 110 (N) and determine that the I/O requests 112 to the I/O stack 114 present a workload that is similar (e.g., closest) to the predetermined workload type 130 (M) and select the profile 132 (M).
- the performance improvement tool 106 may apply the settings in the profile 132 (M) to the computing device 102 to improve the performance of the app 110 (N), e.g., as related to accessing the I/O stack 114 .
- FIG. 6 illustrates an example configuration of the computing device 102 that can be used to implement the systems and techniques described herein.
- the computing device 600 may include one or more processors 602 (e.g., central processing unit (CPU), graphics processing unit (GPU), and the like), a memory 604 , communication interfaces 606 , at least one display device 608 , other input/output (I/O) devices 610 (e.g., keyboard, trackball, and the like), and one or more mass storage devices 612 (e.g., disk drive, solid state disk drive, or the like), configured to communicate with each other, such as via one or more system buses 614 or other suitable connections.
- system buses 614 may include multiple buses, such as a memory device bus, a storage device bus (e.g., serial ATA (SATA) and the like), data buses (e.g., universal serial bus (USB) and the like), video signal buses (e.g., ThunderBolt®, DVI, HDMI, and the like), power buses, etc.
- a memory device bus e.g., a hard disk drive (WLAN) and the like
- data buses e.g., universal serial bus (USB) and the like
- video signal buses e.g., ThunderBolt®, DVI, HDMI, and the like
- power buses e.g., ThunderBolt®, DVI, HDMI, and the like
- the processors 602 are one or more hardware devices that may include a single processing unit or a number of processing units, all of which may include single or multiple computing units or multiple cores.
- the processors 602 may include a graphics processing unit (GPU) that is integrated into the CPU or the GPU may be a separate processor device from the CPU.
- the processors 602 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, graphics processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
- the processors 602 may be configured to fetch and execute computer-readable instructions stored in the memory 604 , mass storage devices 612 , or other computer-readable media.
- Memory 604 and mass storage devices 612 are examples of computer storage media (e.g., memory storage devices) for storing instructions that can be executed by the processors 602 to perform the various functions described herein.
- memory 604 may include both volatile memory and non-volatile memory (e.g., random access memory (RAM), read only memory (ROM), or the like) devices.
- mass storage devices 612 may include hard disk drives, solid-state drives, removable media (e.g., secure digital (SD) cards), including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CD, DVD), a storage array, a network attached storage, a storage area network, or the like.
- SD secure digital
- Both memory 604 and mass storage devices 612 may be collectively referred to as memory or computer storage media herein and may be any type of non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that can be executed by the processors 602 as a particular machine configured for carrying out the operations and functions described in the implementations herein.
- the computing device 600 may include one or more communication interfaces 606 for exchanging data via a network 618 .
- the communication interfaces 606 can facilitate communications within a wide variety of networks and protocol types, including wired networks (e.g., Ethernet, DOCSIS, DSL, Fiber, USB etc.) and wireless networks (e.g., WLAN, GSM, CDMA, 802.11, Bluetooth, Wireless USB, ZigBee, cellular, satellite, etc.), the Internet and the like.
- Communication interfaces 606 can also provide communication with external storage, such as a storage array, network attached storage, storage area network, cloud storage, or the like.
- the display device 608 may be used for displaying content (e.g., information and images) to users.
- Other I/O devices 610 may be devices that receive various inputs from a user and provide various outputs to the user, and may include a keyboard, a touchpad, a mouse, a printer, audio input/output devices, and so forth.
- the computer storage media may be used to store software and data.
- the computer storage media may be used to store the apps 104 , the performance improvement tool 106 (including the recommendations 172 , the app selection UI 108 , the classifier 128 , the selected apps 110 , the data 126 , the workload types 130 , and the profiles 132 ), the I/O stack 114 , and the operating system 148 , as well as other applications (e.g., device drivers) and other data.
- the performance improvement tool 106 may enable a user to select the apps 110 , from among the apps 104 , causing the tool 106 to identify a type of workload that the selected apps present to the I/O stack 114 and select a corresponding one of the profiles 132 .
- the performance improvement tool 106 may determine when one of the apps 110 is being executed by the computing device 102 and monitor the I/O requests from the selected app to the I/O stack 114 for a predetermined amount of time (e.g., 15, 30, 45, 60 minutes or the like). During this time, the performance improvement tool 106 may gather the data 126 to characterize the workload presented by the selected app and select one of the profiles 132 .
- the process of gathering the data 126 may not subsequently be performed.
- the user may instruct the performance improvement tool 106 to gather the data 126 associated with one of the apps 110 and automatically (e.g., without human interaction) select one of the profiles 132 by opening the app selection UI 108 and selecting one of the apps 110 .
- the user may download the new profiles 620 from a server 616 using the network 618 .
- the user may instruct the performance improvement tool 106 to gather the data 126 associated with one of the apps 110 and select one of the profiles 132 by opening the app selection UI 108 and selecting one of the apps 110 .
- the performance improvement tool 106 may monitor the input/output (I/O) requests 112 for each of the selected apps when each of the selected apps is executing (e.g., the first time each of the selected apps is being executed) and gather data 126 associated with how the selected app uses the I/O stack 114 .
- the data 126 may be gathered across each of the layers of the I/O stack 114 .
- the classifier 128 may analyze the data 126 and identify one of predefined workload types 130 that is closest (e.g., most similar) to the type of workload that the selected app presents to the I/O stack 114 .
- the performance improvement tool 106 may apply the settings in the corresponding profile 132 to the computing device 102 to improve the performance of the selected app.
- the profiles 132 may modify various parameters associated with the I/O stack 114 , the operating system 148 , the selected app 110 , another set of parameters, or any combination thereof. Applying the selected profile 132 causes an increase in the speed at which the I/O requests 112 are executed, thereby reducing execution time and increasing throughput for the selected app.
- FIG. 7 is a block diagram 700 illustrating classifying a workload of an app, according to some embodiments.
- the performance tool 106 may gather data 704 associated with how the app 702 uses the I/O stack 114 .
- the data 704 may include a set of operations 706 (e.g., read, write, and the like) performed to the I/O stack 114 .
- the performance tool 106 may determine a how frequently each operation 708 is performed.
- the performance tool 106 may determine that the app 702 performs operations 708 ( 1 ) to 708 (R) (R>0) with a corresponding frequency, e.g., that the operation 708 ( 1 ) is performed with a frequency of 710 ( 1 ) and the operation 708 (R) is performed with a frequency of 710 (R).
- the classifier 128 may identify a subset 712 of the set of operations 706 that includes the most frequently performed operations from the set of operations 706 . For example, the operations 708 ( 1 ) to 708 (S) with corresponding frequencies 710 ( 1 ) to 710 (S), where S ⁇ R, may be selected for the subset 712 .
- a particular threshold e.g., performed at least V times per second
- the classifier 128 may determine (e.g., classify), based on the subset 712 , a type 714 (e.g., one of the types 129 of FIG. 1 ) of the workload that the app 702 presents to the I/O stack 114 . For example, the classifier 128 may determine that the type 714 of the app 702 is most similar to a workload type 718 (e.g., one of the workload types 130 of FIG. 1 ). For example, the workload type 718 may be associated with frequent operations 720 , e.g., operations 722 ( 1 ) to 722 (T) (T>0) having a corresponding frequency 724 ( 1 ) to 724 (T), respectively.
- a type 714 e.g., one of the types 129 of FIG. 1
- the workload type 718 may be associated with frequent operations 720 , e.g., operations 722 ( 1 ) to 722 (T) (T>0) having a corresponding frequency 724 ( 1 )
- the subset 712 may be most similar to the frequent operations 720 , e.g., the operations 708 may be similar (or identical) to the operations 722 and the frequencies 710 may be similar (or identical) to the frequencies 724 .
- the frequent operations 720 may include the particular type of read operation and the particular type of write operation.
- the classifier 128 may select the profile 716 (e.g., one of the profiles 132 of FIG. 1 ).
- the performance improvement tool 106 may configure various parameters associated with the computing device 102 of FIG. 1 based on the profile 716 to improve execution (e.g., reduce execution time) of the subset 712 .
- the data 704 gathered by monitoring the operations 708 performed by the app 702 to the I/O stack 114 can be classified by the classifier 128 as the type 714 of the workload associated with the app 702 .
- the classifier 128 determines that the type 714 is most similar to the workload type 718 , selects the corresponding profile 716 and configures the parameters of the computing device to improve performance of the app 702 .
- the performance of the app 702 can be improved even when the app 702 is a new application, a new version of an existing application, or if the user uses the app 702 in a way that is different from how other users use the app 702 .
- the profile 716 is thus selected according to the way in which the app 702 is used. Further, if the user changes the way in which the app 702 is used, the user can re-run the performance improvement tool 106 to monitor the new way in which the app 702 is being used, characterize the type of workload, and select a different profile based on the new usage.
- module can represent program code (and/or declarative-type instructions) that performs specified tasks or operations when executed on a processing device or devices (e.g., CPUs or processors).
- the program code can be stored in one or more computer-readable memory devices or other computer storage devices.
- this disclosure provides various example implementations, as described and as illustrated in the drawings. However, this disclosure is not limited to the implementations described and illustrated herein, but can extend to other implementations, as would be known or as would become known to those skilled in the art. Reference in the specification to “one implementation,” “this implementation,” “these implementations” or “some implementations” means that a particular feature, structure, or characteristic described is included in at least one implementation, and the appearances of these phrases in various places in the specification are not necessarily all referring to the same implementation.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- This invention relates generally to computing devices and, more particularly to improving input/output (I/O) performance of one or more applications executing on a computing device.
- As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system (IHS) generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
- One type of information handling system is a workstation. A workstation may be used to execute applications that use a large amount of computing resources, such as, for example, central process unit (CPU) cycles, memory, storage, graphics processing unit (GPU) cycles, and the like. Examples of such computing resource-intensive applications include Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, and the like.
- The way in which apps access and use storage may vary based on numerous parameters. Diverse attributes of storage in a system in conjunction with the variability of workloads creates complex interactions with storage input/output (I/O). Storage in a system is a stack of software and hardware that includes files systems, volumes and volume managers, class drivers and I/O drivers, and disk subsystems. The way in which storage and memory (e.g., data cache) interact with persistent and non-persistent memory may vary in each system and according to each application workload. Disk subsystem technologies such as non-volatile memory express (NVME), rotating media (e.g., disk drives), tiered storage (e.g., drives with built-in cache) and others have different interfaces and capabilities that result in complex interactions with application workloads. The physical attributes of a system, such as single/multi physical disk devices (e.g., configured as a Redundant Array of Independent Disks (RAID)) and logical volumes creates many I/O patterns in a workstation. Storage devices are unaware of workload variations (e.g., read or write rates) and treat all I/O equally, e.g., without regard to which application provides the I/O workload. Some of the variation of load may be due to translation layers between an application I/O and disk I/O handled by the operating system. For example, I/O requests and interactions with data cache may require adjustment of memory and cache as well as physical device parameters, for each application workload.
- To improve performance for applications, a workstation manufacturer may provide predefined profiles with each workstation that configure various parameters associated with the workstation's resources to improve performance for popular applications. For example, a profile may modify parameters, such as cache size, a location where a temporary file is created, memory allocation size, page size, and the like, to improve I/O performance. However, such an approach has several limitations. First, the manufacturer is only able to provide predefined profiles for popular applications because the manufacturer cannot test all available applications. Second, the manufacturer can only test applications that have been available for an amount of time sufficient to enable the manufacturer to perform testing and create the corresponding profile. Thus, if a relatively new application, a new version of an existing application, or an application that is not widely used is being executed on a workstation, then a corresponding predefined profile may not be available. Third, the predefined profiles may be created to improve performance for commonly performed tasks for popular applications. For example, the manufacturer may create a profile to improve the performance of a particular set of tasks performed using a particular application. However, if the user performs a different set of tasks using the particular application, then the predefined profile may provide minimal performance improvement or may even degrade performance. Thus, providing predefined profiles for popular applications may not significantly improve the performance of a relatively new application, a new version of an existing application, or an application that is not considered a popular application.
- This Summary provides a simplified form of concepts that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features and should therefore not be used for determining or limiting the scope of the claimed subject matter.
- In some examples, a computing device may perform various operations. For example, the operations may include receiving, via a user interface, a user selection of a particular application from a plurality of applications to create a selected application. The operations may include determining that the selected application is executing and accessing (e.g., performing operations to) an input/output stack of the computing device. The operations may include gathering, over a predetermined interval of time, data associated with the selected application that is performing the operations to the input/output stack. After gathering the data, the operations may include performing an analysis of the data and determining, by a classifier and based at least in part on the analysis, a particular workload type from a predefined set of workload types that is associated with the selected application. The classifier may be trained using multiple hardware platforms, multiple storage configurations, multiple workloads, and the predefined plurality of profiles to classify a workload based on input/output operations performed by a particular application and to identify a profile to increase performance of the input/output operations. The operations may include selecting a particular profile from a plurality of predefined profiles based at least in part on the particular workload type, and modifying, based on the particular profile, a plurality of parameters to create a plurality of modified parameters. The modified parameters may reduce an execution time of performing the operations to the input/output stack.
- A more complete understanding of the present disclosure may be obtained by reference to the following Detailed Description when taken in conjunction with the accompanying Drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items.
-
FIG. 1 is a block diagram of a computing device executing a classifier (e.g., a machine learning algorithm) that gathers data associated with an application and selects a profile to configure resources of the computing device, according to some embodiments. -
FIG. 2 is a block diagram illustrating training a classifier, according to some embodiments. -
FIG. 3 is a block diagram illustrating examples of variables used to train a machine learning algorithm, according to some embodiments. -
FIG. 4 is a flowchart of a process that includes training a classifier, according to some embodiments. -
FIG. 5 is a flowchart of a process that includes configuring parameters associated with an I/O system based on a profile, according to some embodiments. -
FIG. 6 illustrates an example configuration of a computing device that can be used to implement the systems and techniques described herein. -
FIG. 7 is a block diagram illustrating classifying a workload of an app, according to some embodiments. - For purposes of this disclosure, an information handling system (IHS) may include any instrumentality or aggregate of instrumentalities operable to compute, calculate, determine, classify, process, transmit, receive, retrieve, originate, switch, store, display, communicate, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer (e.g., desktop or laptop), tablet computer, mobile device (e.g., personal digital assistant (PDA) or smart phone), server (e.g., blade server or rack server), a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, touchscreen and/or video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components.
- The systems and techniques described herein use machine learning to increase application (“app”) performance related to input/output (I/O), such as read operations and write operations. The I/O in a computing device may have multiple layers, collectively referred to as an I/O stack. The computing device may use virtual memory to enable multiple apps to share a physical storage device (e.g., a disk drive). Portions (e.g., pages) of virtual memory may be stored in main memory (e.g., random access memory (RAM)) and swapped back and forth with the physical storage device. The physical storage device may use a high-speed memory, known as a cache, to increase throughput when apps access the physical storage device. For example, when an app performs a write, the app may write to virtual memory in RAM, which is then sent to the physical storage device and stored in the cache, before being written to the physical storage device. In addition, an operating system may provide a file system to enable apps to perform I/O. The file system may use blocks having a particular size such that a large file is stored as multiple blocks. The multiple blocks may be located at different locations in the physical storage device and the file system may keep track of the locations of each of the multiple blocks such that the apps are unaware that the large file is being stored as multiple blocks in multiple (e.g., non-contiguous) locations.
- The systems and techniques described herein perform an analysis of how an application accesses the I/O stack and, based on the analysis, characterize the application as having a particular type of workload. The systems and techniques select a profile that is designed to improve I/O (e.g., increased throughput, faster execution, lower latency, and the like) for the particular type of workload. Each particular type of workload has particular characteristics (e.g., read/write (R/W) ratio, queue depth, block size, and the like) with regard to how the app accesses the I/O stack. The systems and techniques monitor what the app is doing to the I/O stack and identify a particular profile with settings that improve I/O performance when accessing the I/O stack (e.g., the settings improve performance as compared to the default configuration used by the operating system and the application when initially installed). Thus, each profile is designed to improve a particular type of I/O workload rather than a particular type of app, making the systems and techniques application agnostic. Thus, rather than have a first profile for a first application, a second profile for a second application, and so on, an app's I/O workload is monitored and an appropriate profile selected. In this way, new apps, less popular apps, and recently released versions of apps can be immediately supported by selecting a profile based on the application's I/O workload (e.g., rather than the name of the app). In some cases, the profile may configure parameters associated with the application to improve throughput, reduce execution time, and the like.
- The system and techniques create a classifier using a machine learning algorithm such as, for example, Random Forest, Neural Network, or the like. Combinations of different hardware platforms and different storage configurations are used to execute different types of workloads using different types of profiles and data associated with the workload characteristics is gathered. The data is used to train the classifier to identify which profile (among multiple profiles that were tested) provides the highest performance (e.g., fastest execution time) for a particular workload executing on a particular hardware platform having a particular storage configuration. The workload characteristics (e.g., parameters) that are analyzed may include up to 1,000 different I/O related variables across layers of the storage stack including the physical disk, cache, logical disk, memory, and pagefile. For example, the characteristics may include (1) logical and physical I/O operation counters such as, for example, I/O Read Operations/sec, I/O Write Operation, I/O Read Operations/sec, I/O Write Operations/sec, I/O Data Operations/sec, (2) process parameters, such as, for example, Virtual Bytes, Cache Copy Read Hits %, Cache Copy Reads/Sec, Page File Bytes/sec, (3) caching and operating system (OS) information, such as, for example, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes, and the like. The classifier identifies which profile from among multiple tested profiled provides the most efficient usage of I/O resources to achieve faster throughput, faster execution time, and the like reduces the impact on the user.
- After the classifier is trained, the classifier may identify a subset (e.g., top N, where 0<N<100) of the up to 1,000 I/O related parameters. The subset is typically between about 50 to about 100 parameters, and may typically be about 70 parameters. The subset of parameters may be those parameters that have the highest influence on increasing throughput, e.g., the parameters that, when varied, cause the largest change (e.g., improvement) and provide the most “bang for the buck”. The subset of parameters are later used when the classifier is deployed to classify a workload. One of the reasons for determining the subset of parameters is because monitoring up to 1,000 parameters on a user's computing device would significantly slow down execution of apps. In contrast, monitoring the subset of I/O parameters for a predetermined period of time (e.g., 30, 60, 90, 120 minutes, or the like) when a particular app is being used reduces the impact on the user. The subset of parameters may include, for example, Cache\Async Copy Reads/sec, Cache\Copy Read Hits %, Cache\Copy Reads/sec, Cache\Data Map Hits %, Cache\Data Maps/sec, Cache\Dirty Page Threshold, Cache\Dirty Pages, Cache\MDL Read Hits %, Cache\MDL Reads/sec, Cache\Pin Read Hits %, Cache\Pin Reads/sec, Cache\Read Aheads/sec, LogicalDisk(Total)\% Disk Read Time, LogicalDisk(Total)\% Disk Time, LogicalDisk(Total)\% Disk Write Time, LogicalDisk(Total)\Avg. Disk Bytes/Write, LogicalDisk(Total)\Avg. Disk Queue Length, LogicalDisk(Total)\Avg. Disk Read Queue Length, LogicalDisk(Total)\Avg. Disk sec/Transfer, LogicalDisk(Total)\Avg. Disk Write Queue Length, LogicalDisk(Total)\Current Disk Queue Length, LogicalDisk(Total)\Disk Bytes/sec, LogicalDisk(Total)\Disk Read Bytes/sec, LogicalDisk(Total)\Disk Transfers/sec, LogicalDisk(Total)\Disk Write Bytes/sec, Memory\% Committed Bytes In Use, Memory\Available Bytes, Memory\Available KBytes, Memory\Available MBytes, Memory\Cache Bytes, Memory\Cache Faults/sec, Memory\Committed Bytes, Memory\Free & Zero Page List Bytes, Memory\Free System Page Table Entries, Memory\Modified Page List Bytes, Memory\Page Faults/sec, Memory\Page Reads/sec, Memory\Page Writes/sec, Memory\Pages Input/sec, Memory\Pages Output/sec, Memory\Pages/sec, Memory\Pool Nonpaged Allocs, Memory\Pool Nonpaged Bytes, Memory\Pool Paged Allocs, Memory\Pool Paged Bytes, Memory\Pool Paged Resident Bytes, Memory\Standby Cache Core Bytes, Memory\Standby Cache Normal Priority Bytes, Memory\Standby Cache Reserve Bytes, Memory\System Cache Resident Bytes, Memory\System Driver Resident Bytes, Memory\System Driver Total Bytes, PhysicalDisk(Total)\% Disk Read Time, PhysicalDisk(Total)\% Disk Time, PhysicalDisk(Total)\% Disk Write Time, PhysicalDisk(Total)\Avg. Disk Bytes/Read, PhysicalDisk(Total)\Avg. Disk Bytes/Write, PhysicalDisk(Total)\Avg. Disk Read Queue Length, PhysicalDisk(Total)\Avg. Disk sec/Transfer, PhysicalDisk(Total)\Avg. Disk Write Queue Length, PhysicalDisk(Total)\Current Disk Queue Length, Process(beast)\% Privileged Time, Process(beast)\Elapsed Time, Process(beast)\% Processor Time, Process(beast)\IO Data Bytes/sec, Process(beast)\IO Data Operations/sec, Process(beast)\IO Other Bytes/sec, Process(beast)\IO Other Operations/sec, Process(beast)\IO Read Bytes/sec, Process(beast)\IO Read Operations/sec, Process(beast)\IO Write Bytes/sec, Process(beast)\IO Write Operations/sec, Process(beast)\Page File Bytes, Process(beast)\Page File Bytes Peak, Process(beast)\Page Faults/sec, Process(beast)\Priority Base, Process(beast)\Thread Count, Process(beast)\Pool Nonpaged Bytes, Process(beast)\Pool Paged Bytes, Process(beast)\Private Bytes, Process(beast)\Virtual Bytes, Process(beast)\Virtual Bytes Peak, Process(beast)\Working Set, Process(beast)\Working Set—Private, Process(beast)\Working Set Peak, Processor(Total)\DPC Rate, Processor(Total)\DPCs Queued/sec, Processor(Total)\Interrupts/sec, System\Context Switches/sec, System\File Read Bytes/sec, System\File Read Operations/sec, System\File Write Bytes/sec, System\File Write Operations/sec, System\Processor Queue Length, and System\System Up Time.
- After the classifier has been trained, a performance improvement software application (e.g., Dell® Precision Optimizer or the like) that includes the classifier may be deployed (e.g., installed) on workstations (e.g., information handling systems). The software application may provide a user interface (UI) that enables a user to select one or more apps (e.g., Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, and the like). After the user selects one or more apps, the software application may monitor apps executing on the workstation. When the software application determines that one of the selected apps is executing, the software application may gather data associated with the subset of I/O parameters for a predetermined period of time (e.g., 15, 30, 45, 60 minutes or the like). For each of the selected apps, the classifier may use the gathered data to characterize the workload for each of the selected apps, select a profile that corresponds to the workload, and apply the profile by configuring various I/O related parameters (e.g., size of cache, size of pagefile, location of temporary files, and the like) of the workstation. For example, the classifier may have identified a particular set of workloads (e.g., between 10 and 50 different types of workloads) that encompass the majority of the workloads presented by different applications. To illustrate, in some cases, the classifier may identify about 25 workloads and 25 corresponding profiles.
- The training data for the classifier may be gathered using different types of storage devices with different types of interfaces. For example, different amounts of RAM, different types of storage, and the like may be used to generate the training data. The different types of storage may include mechanical-based disk drivers and solid-state drives (SSD) that use different types of interfaces, such as serial ATA (SATA), Non-Volatile Memory Express (NVMe), or the like. When the classifier is deployed (e.g., after being trained), the classifier may provide recommendations to improve performance, such as, for example, “Increasing RAM from 8 GB to 16 GB will provide up to X % improvement in execution times”, “Switching from a first type of storage device (e.g., mechanical disk drive) to a second type of storage device (e.g., SSD) may provide up to Y % improvement in execution times, and switching to a third type of storage device (e.g., NVME) may provide up to Z % improvement in execution times.” (X, Y, and Z>0)
- Thus, the systems and techniques described herein provide a way to characterize, at runtime, an application workload and improve the performance, in terms of storage I/O (e.g., the storage stack, from the file system to the physical device). A machine learning system is used to gather data associated with an app's workload (e.g., the way in which the app performs I/O), select a similar predetermined workload to the app's workload from a set of predefined workloads, select a corresponding profile to the similar predetermined workload, and configure parameters associated with the I/O (e.g., operating system parameters, device driver parameters, app parameters, device parameters, and the like) to improve the app's I/O performance. The performance improvements may result in the same task executing faster after the profile is applied as compared to before the profile is applied, executing more tasks in a particular period of time, and the like. The systems and techniques improve (e.g., optimize) native application performance by analyzing data across layers of the storage stack including the physical disk, cache, logical disk, memory and pagefile to allow the application to make the best use of the relevant computing resources (e.g., storage-related resources).
- A classifier is trained in a non-production environment (e.g., a lab environment). A set of parameters (e.g., up to 1,000, in some cases around 700) are used to characterize each workload. The workloads are executed on different types of platforms having different configurations and different types of storage types. For example, a platform may be a particular motherboard version with a particular chip set. For each platform, the configurations may vary based on, for example, the type of processor (e.g., Intel® i3, i5, i7, and the like), the processor generation, the clock speed, the amount of RAM, the amount of storage, the type of storage (e.g., mechanical, SSD, or the like), the storage interface (e.g., SATA-3, SATA-6, NVME, or the like), and the like. The set of parameters used to characterize the workload may include variables and counters, such as, for example, (1) logical and physical I/O operation counters, such as I/O Read Operations/sec, I/O Write Operation, I/O Read Operations/sec, I/O Write Operations/sec, and I/O Data Operations/sec, (2) process parameters, such as, for example, Virtual Bytes, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes/sec, (3) caching and O/S information, such as, for example, Cache Copy Read Hits %, Cache Copy Reads/Sec, and Page File Bytes, and the like. The parameters are measured when the application is performing different tasks (e.g., workloads) and used to determine the I/O profile of the application. The influence of each of the parameters is ranked using Mean Decrease Gini (e.g., based on Random Forest) and a subset of the parameters are selected based on the ranking (e.g., the N highest ranked parameters are selected, N>0).
- In some cases, the workloads may be grouped based on a similarity of the parameters to consolidate and reduce the number of workloads. For example, if a number (e.g., M>0) of the highest ranked parameters are similar, e.g., within a predetermined range, then the workloads may be grouped into a single workload, e.g., a particular workload in which a first parameter is within a first range, a second parameter is within a second range, and so on. At runtime, the parameters for a workload may be determined and compared with the workloads identified in the non-production environment to determine a closest matching workload. Because each application may have different I/O behavior, benchmarking tools may be used to classify of workloads based on a particular set of characteristics, such as, for example, a ratio of reads to writes, block size, and the like. A correlational analysis may be used to rank a dependency level between characteristics to enable each workload type to be uniquely identified based on the measured characteristics of a workload. Based on application behavior with regard to I/O (e.g., system storage), a machine learning classifier is trained to classify a workload based on measuring a set of parameters over a period of time and select a profile corresponding to the workload.
- The data used to train the classifier is generated by executing different types of workloads across multiple platforms using variations of multiple parameters. The classifier identifies parameter settings that increase the performance of particular types of workloads in terms of bandwidth, input/output operations per second (IOPS), latency, and the like. For example, each workload from multiple workloads is executed on different hardware configurations and the performance measured (e.g., read performance, write performance, read/write performance, and the like). The process of executing different workloads is repeated while varying the values of the performance variables to create a tree structure for the resulting data. The tree structure is used by the machine learning algorithm to make a decision as to the configuration that provides the highest performance for a particular application that provides a particular type of I/O workload. At runtime, the trained machine learning model (such as supervised or deep learning) is deployed on client devices. The trained machine learning model is a predictive model, e.g., Y=F(X1 . . . XN)), where a profile Y is selected (e.g., predicted) as a function of the workload X.
- The trained classifier is installed and shipped with systems (e.g., workstations, such as Dell® Precision) that may be used to execute applications that use a significant amount of computing resources, such as storage I/O. A UI enables a user to select one or more apps. When one of the selected apps is executing, the subset of parameters (e.g., that most influence I/O operations) are measured for a predetermined period of time. The data gathered by monitoring the subset of parameters is used with a decision tree to identify a profile with configuration settings to improve performance. Thus, a profile may be selected at runtime to improve performance for a particular application.
- As an example, a computing device may include one or more processors and one or more non-transitory computer readable media storing instructions executable by the one or more processors to perform various operations. For example, the operations may include displaying a user interface (UI) and receiving, via the UI, a user selection of a particular application from a plurality of applications to create a selected application. The plurality of applications may include, for example, Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, or the like. The operations may include determining that the selected application is executing and performing operations to an input/output stack of the computing device. The input/output stack may include: (i) a file system used by the computing device, (ii) a random-access memory used by the computing device, (iii) a logical storage used by the operating system, (iv) a cache allocated in the random-access memory by the operating system, (v) a pagefile used by the operating system, and (vi) physical storage accessible to the operating system. The operations may include gathering, over a predetermined interval of time (e.g., fifteen minutes, thirty minutes, forty-five minutes, sixty minutes, or the like) data associated with the selected application that is performing the operations to the input/output stack. After gathering the data, the operations may include performing an analysis of the data and determining, by a classifier and based at least in part on the analysis, a particular workload type from a predefined set of workload types that is associated with the selected application. In some cases, the operations may include ordering, according to frequency of occurrence, the operations performed by the selected application to the input/output stack, determining a subset of the operations comprising a plurality of most frequently performed operations performed by the selected application to the input/output stack, comparing the subset of the operations to frequent operations associated with each of the predefined set of workload types, and determining that the subset of the operations associated with the selected application matches the frequent operations associated with the particular workload type. The classifier may be trained using multiple hardware platforms, multiple storage configurations, multiple workloads, and the predefined plurality of profiles to classify a workload based on input/output operations performed by a particular application and to identify a profile to increase performance of the input/output operations. The operations may include selecting a particular profile from a plurality of predefined profiles based at least in part on the particular workload type, and modifying, based on the particular profile, a plurality of parameters to create a plurality of modified parameters. The modified parameters may reduce an execution time of performing the operations to the input/output stack. For example, modifying the plurality of parameters to create the plurality of modified parameters comprises at least one of: modifying a process priority associated with the application (e.g., to a highest process priority), modifying a power plan of the operating system (e.g., to a high-performance power plan), modifying (e.g., enabling or disabling) a hyperthreading feature associated with the one or more processors, modifying (e.g., enabling or disabling) a core parking feature associated with the one or more processors, modifying (e.g., enabling or disabling) a compression feature to compress data stored in the random-access memory, modifying (e.g., enabling or disabling) a page combining feature of the operating system to remove duplicates of content stored in the random-access memory, modifying (e.g., enabling or disabling) a vertical synchronization feature associated with synchronizing a frame rate output of the selected application with a monitor refresh rate of a display device associated with the computing device, or modifying (e.g., enabling or disabling) a pre-fetch feature associated with the one or more processors to store frequently accessed data in a random-access memory of the computing device.
-
FIG. 1 is a block diagram 100 of a computing device executing a classifier (e.g., a machine learning algorithm) that gathers data associated with an application and selects a profile to configure resources of the computing device, according to some embodiments. Acomputing device 102 may include one ormore applications 104 and aperformance improvement tool 106. For example, thecomputing device 102 may be a workstation, such as Dell® Precision workstation (e.g., a laptop or a desktop). The applications (“apps”) 104 may include one or more applications, such as, for example, Adobe® Illustrator®, Adobe® After Effects®, Adobe® Media Encoder®, Adobe® Photoshop®, Adobe® Premiere®, Autodesk® AutoCAD®, Avid Media Composer®, ANSYS Fluent, ANSYS Workbench, Sonar Cakewalk, or the like. Theperformance improvement tool 106 may be an application, such as, for example, Dell® Precision Optimizer or similar. - The
performance improvement tool 106 may provide an app selection user interface (UI) 108 to enable a user to select one or more apps, such as apps 110(1) to app 110(N) (N>0) from among theapps 104. Thus, theapps 110 may be a subset of theapplications 104. For example, in some cases, theUI 108 may enable a user to select up to fiveapps 110. - After the user has selected the
apps 110, theperformance improvement tool 106 may determine when one of theapps 110, such as the app 110(N), is being executed by thecomputing device 102 and monitor the input/output (I/O) requests 112(1) to 112(P) (P>0) from the app 110(N) to an I/O stack 114. The I/O stack 114 may include afile system 116, amemory 118, alogical storage 120, acache 122, a pagefile 123, andphysical storage 124. The pagefile 123 is used in paging, a type of memory management scheme by which thecomputing device 102 stores and retrieves data from thephysical storage 124 for use in the main memory (e.g., RAM) 118. Anoperating system 148 may retrieve data fromphysical storage 124 in same-size blocks called pages (e.g., the representative pagefile 123). Paging is a part of virtual memory feature of theoperating system 148 that enables theapplications 104 to exceed the size of the availablephysical storage 124. The I/O stack 114 illustrated inFIG. 1 is merely an example and the particular layers of the I/O stack 114 may vary in different implementations based on theoperating system 148, thefile system 114, thephysical storage 124, and the like. For example, a workstation running adifferent operating system 148, such as Linux® (or another Unix® variant), may have different layers in the I/O stack 114 than what is illustrated inFIG. 1 . - The
performance improvement tool 106 may determine when one of theapps 110, e.g., the app 110(N), is executing and monitor the I/O requests 112(1) to 112(P) to the I/O stack 114 for a predetermined amount of time (e.g., 15, 30, 45, 60 minutes or the like). During this time, theperformance improvement tool 106 may gather the data 126(N) to characterize the workload presented by the app 110(N) and then select one of theprofiles 132. The process of gathering the data 126(N) associated with the app 110(N) and selecting one of theprofiles 132 is typically done once, e.g., after the user opens theapp selection UI 108 and selects on or more of theapps 110 from theapplications 104. Because the process of gathering the data 126(N) may degrade the performance of the app 110(N), the process is performed when the user desires a performance improvement in the app 110(N). Typically, after one of theprofiles 132 has been selected and associated with the app 110(N), the process of gathering the data 126(N) may not subsequently be performed. However, the user may instruct theperformance improvement tool 106 to gather the data 126(N) associated with the app 110(N) and select one of theprofiles 132 by opening theapp selection UI 108 and selecting the app 110(N). For example, if a provider of theperformance improvement tool 106 makes improvements to theprofiles 132 and makes new profiles available, the user may install the new profiles and instruct theperformance improvement tool 106 to gather the data 126(N) associated with the app 110(N) and select one of theprofiles 132 by opening theapp selection UI 108 and selecting the app 110(N). As another example, the user may modify the configuration of thecomputing device 102 by adding more memory (e.g., RAM), adding another storage device with a faster interface (e.g., NVME instead of SATA), adding a storage device with more cache and/or faster I/O characteristics, or the like. After modifying the configuration of thecomputing device 102, the user may instruct theperformance improvement tool 106 to gather the data 126(N) associated with the app 110(N) and select one of theprofiles 132 by opening theapp selection UI 108 and selecting the app 110(N). - The
performance improvement tool 106 may monitor the input/output (I/O) requests 112 for each of the selected apps when each of the selected apps is executing (e.g., the first time each of the selected apps is being executed) and gather data 126(1) associated with how the app 110(1) uses the I/O stack 114 and gather data 126(N) associated with how the app 110(N) uses the I/O stack 114. The data 126(N) may be gathered across each of the layers of the I/O stack 114. For example, in a Microsoft® Windows® environment, the gathereddata 126 may include between about 50 to 100 I/O related parameters (e.g., preferably about 70), such as, for example, The subset of parameters may include, for example, Cache\Async Copy Reads/sec, Cache\Copy Read Hits %, Cache\Copy Reads/sec, Cache\Data Map Hits %, Cache\Data Maps/sec, Cache\Dirty Page Threshold, Cache\Dirty Pages, Cache\MDL Read Hits %, Cache\MDL Reads/sec, Cache\Pin Read Hits %, Cache\Pin Reads/sec, Cache\Read Aheads/sec, LogicalDisk(Total)\% Disk Read Time, LogicalDisk(Total)\% Disk Time, LogicalDisk(Total)\% Disk Write Time, LogicalDisk(Total)\Avg. Disk Bytes/Write, LogicalDisk(Total)\Avg. Disk Queue Length, LogicalDisk(Total)\Avg. Disk Read Queue Length, LogicalDisk(Total)\Avg. Disk sec/Transfer, LogicalDisk(Total)\Avg. Disk Write Queue Length, LogicalDisk(Total)\Current Disk Queue Length, LogicalDisk(Total)\Disk Bytes/sec, LogicalDisk(Total)\Disk Read Bytes/sec, LogicalDisk(Total)\Disk Transfers/sec, LogicalDisk(Total)\Disk Write Bytes/sec, Memory\% Committed Bytes In Use, Memory\Available Bytes, Memory\Available KBytes, Memory\Available MBytes, Memory\Cache Bytes, Memory\Cache Faults/sec, Memory\Committed Bytes, Memory\Free & Zero Page List Bytes, Memory\Free System Page Table Entries, Memory\Modified Page List Bytes, Memory\Page Faults/sec, Memory\Page Reads/sec, Memory\Page Writes/sec, Memory\Pages Input/sec, Memory\Pages Output/sec, Memory\Pages/sec, Memory\Pool Nonpaged Allocs, Memory\Pool Nonpaged Bytes, Memory\Pool Paged Allocs, Memory\Pool Paged Bytes, Memory\Pool Paged Resident Bytes, Memory\Standby Cache Core Bytes, Memory\Standby Cache Normal Priority Bytes, Memory\Standby Cache Reserve Bytes, Memory\System Cache Resident Bytes, Memory\System Driver Resident Bytes, Memory\System Driver Total Bytes, PhysicalDisk(Total)\% Disk Read Time, PhysicalDisk(Total)\% Disk Time, PhysicalDisk(Total)\% Disk Write Time, PhysicalDisk(Total)\Avg. Disk Bytes/Read, PhysicalDisk(Total)\Avg. Disk Bytes/Write, PhysicalDisk(Total)\Avg. Disk Read Queue Length, PhysicalDisk(Total)\Avg. Disk sec/Transfer, PhysicalDisk(Total)\Avg. Disk Write Queue Length, PhysicalDisk(Total)\Current Disk Queue Length, Process(beast)\% Privileged Time, Process(beast)\Elapsed Time, Process(beast)\% Processor Time, Process(beast)\IO Data Bytes/sec, Process(beast)\IO Data Operations/sec, Process(beast)\IO Other Bytes/sec, Process(beast)\IO Other Operations/sec, Process(beast)\IO Read Bytes/sec, Process(beast)\IO Read Operations/sec, Process(beast)\IO Write Bytes/sec, Process(beast)\IO Write Operations/sec, Process(beast)\Page File Bytes, Process(beast)\Page File Bytes Peak, Process(beast)\Page Faults/sec, Process(beast)\Priority Base, Process(beast)\Thread Count, Process(beast)\Pool Nonpaged Bytes, Process(beast)\Pool Paged Bytes, Process(beast)\Private Bytes, Process(beast)\Virtual Bytes, Process(beast)\Virtual Bytes Peak, Process(beast)\Working Set, Process(beast)\Working Set—Private, Process(beast)\Working Set Peak, Processor(Total)\DPC Rate, Processor(Total)\DPCs Queued/sec, Processor(Total)\Interrupts/sec, System\Context Switches/sec, System\File Read Bytes/sec, System\File Read Operations/sec, System\File Write Bytes/sec, System\File Write Operations/sec, System\Processor Queue Length, and System\System Up Time. Of course, in other environments that use a different operating system, data associated with other I/O related parameters may be gathered. - A trained
classifier 128 may analyze thedata 126 and identify apredefined workload type 130 from among workloads types 130(1) to 130(M) (M>0, M not necessarily equal to N) that is closest (e.g., most similar) to thetype 129 of a workload (e.g., determined based on the data 126) that the selectedapps 110 present to the I/O stack 114. To illustrate, M may be between 10 and 30, such as about 25 different types of predefined workloads. For example, theclassifier 128 may analyze the data 126(N) associated with the app 110(N) and determine that the I/O requests 112 to the I/O stack 114 present a workload type 129(N) that is similar (e.g., closest) to the predetermined workload type 130(M) and select the profile 132(M). Theperformance improvement tool 106 may apply the settings in the profile 132(M) to thecomputing device 102 to improve the performance of the app 110(N), e.g., as related to accessing the I/O stack 114. For example, the profile 132(M) may modify various parameters associated with the I/O stack 114, anoperating system 148 of thecomputing device 102, the app 110(N), another set of parameters, or any combination thereof. Applying the profile 132(M) causes an increase in the speed at which the I/O requests 112 are executed, thereby reducing execution time and increasing throughput for the app 110(N). - The profile 132(M) may modify the size of caches, queues, counters, and other data structures in the I/O stack to improve the execution time of the I/O requests 112 of the app 110(N). For example, the profile 132(M) may modify
parameters 134 of thefile system 116, such as the type of file system (e.g., FAT, exFAT, NTFS, or the like), cluster size, volume size, whether compression is used and if so, what type of compression is used, encryption, and the like. The profile 132(M) may modifyparameters 136 of thememory 118, such as how much of thememory 118 is allocated for paging, and other memory-related settings. The profile 132(M) may modifyparameters 138 associated with thelogical storage 120, such as how thelogical storage 120 is implemented. The profile 132(M) may modifyparameters 140 associated with thecache 122, such as a size of thecache 122, under what conditions the contents of thecache 122 are written to thephysical storage 124, and the like. The profile 132(M) may modifyparameters 142 associated with the pagefile 123, such as a size of the pagefile 123, under what conditions paging occurs, and the like. The profile 132(M) may modifyparameters 144 associated with thephysical storage 124. The profile 132(M) may modifyvarious parameters 146 of the app 110(N), such as the location of a temporary file, the size of various internal caches and queues, and the like. For example, a video editor application may enable a location of a temporary file to be specified. If the temporary file is located on the same storage device as the app 110(N), then I/O requests to access portions of the application software and I/O requests to access the temporary file are placed in the same queue as they access the same storage device. In a system with two storage devices, the profile 132(M) may modify theparameters 146 to locate the temporary file (e.g., video file(s), photo file(s), audio file(s), illustration file(s), and the like) on a second storage device while the app 110(N) executes on a first storage device. By locating the temporary file on the second storage device, then I/O requests to access portions of the application software are placed in a first queue associated with the first storage device and I/O requests to access the temporary file are placed in a second queue associated with the second storage device. In this way, the app I/O requests and the temporary file I/O requests can be performed substantially in parallel. - The
profiles 132 may modify one ormore parameters 148 of theoperating system 148 to improve performance of theapps 110. Theparameters 150 may includeprocess priorities 152, apower plan 154,Vsync 156,hyperthreading 158,core parking 160,superfetch 162,cache VMEM 164,memory compression 166, page combining 168, andother parameters 170. Theprocess priorities 152 may include a priority level, e.g., high, normal, or low, associated with each process (e.g., an instance of a software application). Thepower plan 154 may be one of multiple plans, such as, for example, a high-performance plan, a balanced plan, and a power save plan. Of course, by varying various power features of thecomputing device 102, thepower plan 154 may be selected from more than just the three plans provided as examples.Vsync 156 refers to Vertical Synchronization (Vsync), a display option to synchronize the frame rate output by an application (e.g., via a graphics card) with the monitor refresh rate. Because a graphic processor executes as fast as possible, extremely high frame rates may be output, e.g., faster than the display device is capable of displaying. Enabling VSync caps (e.g., throttles) the monitor's refresh rate and may avoid excessive strain on the graphics processor. Because VSync makes frames wait for when the monitor is ready, enabling Vsync can cause a slight delay in displaying input, such as keypresses, mouse input, and the like.Hyperthreading 158 provides simultaneous multithreading to improve parallelization of computations (doing multiple tasks at once) performed by multiple cores of a central processing unit (CPU). Incore parking 160, cores of CPU that do not have threads scheduled for execution are parked (e.g., place in a low power state to conserve power). A parked core may take time to be unparked to enable the core to execute a thread, thereby causing a delay. Thus, turningcore parking 160 may increase performance because an app does not wait for parked core to become available to execute a thread associated with the app.SuperFetch 162 is a pre-fetch feature of a memory manager of theoperating system 148 that is used to cache (e.g., in RAM) frequently-accessed data instead of on a storage device because data can be retrieved from the cache faster than from the storage device. Cache virtual memory (vMem) 164 is memory that is allocated for virtualization (e.g., such as that provided by VMware® or similar software). For example, virtual memory addresses associated with each virtual machine is translated to physical memory addresses.Memory compression 166 is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the storage. Page combining 168 is a technique to free up memory (RAM) in which theoperating system 148 analyzes the content of memory, locates duplicate content, and keeps a single copy of particular content while removing duplicate content from the memory. Of course, these are merely examples of parameters that can be modified andother parameters 170 may be changed depending on the operating system (e.g., Windows, MacOS, iOS, Android, Linux, and the like), the operating system version, and so on. - Table 1 illustrates examples of the application and operating system parameters that each of the
profiles 132 may modify. -
TABLE 1 Parameters Applied Possible Values Process priority Per Process High, normal, or low Power plan System wide High performance, balanced, or power save Vsync System wide on/off Hyperthreading System wide on/off Core parking System wide on/off Superfetch System wide on/off CacheVMEM System wide on/off Memory System wide on/off Compression Page Combining System wide on/off - For a particular app 110(N) (e.g., AutoCAD) executing on a particular platform having a particular hardware configuration, the profile 132(M) associated with the app 110(N) may configure the parameters as illustrated in Table 2.
-
TABLE 2 Setting Value Process priority High Power plan High performance Vsync off Hyperthreading off Core parking off Superfetch on CacheVMEM on Memory Compression off Page Combining on - The
performance improvement tool 106 may providerecommendations 172 to improve performance of one or more of theapps 110. For example, therecommendations 172 may include “Increasing RAM from 8 GB to 16 GB will provide up to X % improvement in execution times for app 110(N)”, “For app 110(N), switching from a first type of storage device (e.g., mechanical disk drive) to a second type of storage device (e.g., SSD) may provide up to Y % improvement in execution times, and switching to a third type of storage device (e.g., NVME) may provide up to Z % improvement in execution times.” (X, Y, and Z>0). Therecommendations 172 may include “Upgrading to the latest Precision workstation with 4.2 GHz 17 processor, 16 GB RAM, and 256 GB NVME memory will yield an improvement of X % for app 110(N).” - Thus, a performance improvement tool may provide a UI to enable a user to select one or more apps. When the tool detects that one of the selected apps is being executed (e.g., for the first time after being selected), the tool gathers data as to how the I/O requests of the selected app affect the I/O stack. The data is gathered for a predetermined period of time, such as 15, 30, 45, 60 minutes or the like. A classifier analyzes the gathered data and characterizes the workload presented by the I/O requests of the app to the I/O stack. The classifier compares the workload of the app to multiple predefined workloads and identifies a closest (e.g., most similar) workload from among the predefined workloads. The classifier selects a profile that corresponds to the closest workload. The process of gathering data, analyzing the data, identifying a most similar workload, and selecting a corresponding profile is repeated for each selected app. After a profile is associated with an app, each time the app is executed, the associated profile is applied. The profile is applied by modifying various parameters associated with (e.g., that affect) the I/O stack. Applying the profile modifies the parameters associated with the I/O stack to enable the I/O stack to execute the I/O requests from the app in a faster period of time, thereby reducing execution time and increasing throughput. For example, after a profile has been associated with each of the selected apps, when a first app begins to execute, the associated first profile is applied. When the user exits the first app and initiates execution of a second app, an associated second profile is selected and applied to reconfigure the parameters to improve throughput when the second app is executing.
-
FIG. 2 is a block diagram 200 illustrating training a classifier, according to some embodiments. For example, theclassifier 128 ofFIG. 1 may be created and trained using theprocess 200. - At 202, the classifier is created. For example, software instructions that implement one or more machine learning algorithms (e.g., Random Forest, Neural Networks, or the like) may be written to create the classifier.
- At 204, the classifier may be trained using
training data 206. Thetraining data 206 may include data that has been pre-classified (e.g., by a human, by another classifier, or a combination thereof). - At 208, the classifier may be used to classify
test data 210. Thetest data 210 may have been pre-classified by a human, by another classifier, or a combination thereof. An accuracy with which the classifier classified thetest data 210 may be determined. If the accuracy does not satisfy a desired accuracy, then the classifier may be tuned, at 212, to achieve a desired accuracy. The desired accuracy may be a predetermined threshold, such as ninety-percent, ninety-five percent, ninety-nine percent and the like. For example, if the classifier was eighty-percent accurate in classifying the test data and the desired accuracy is ninety-percent, then the classifier may be further tuned by modifying the algorithms based on the results of classifying thetest data 210. 208 and 212 may be repeated (e.g., iteratively) until the accuracy of the classifier satisfies the desired accuracy. - When the accuracy of the classifier in classifying the
test data 210 satisfies the desired accuracy, then the process may proceed to 214 where the accuracy of the classifier may be verified usingverification data 216. Theverification data 216 may have been pre-classified by a human, by another classifier, or a combination thereof. The verification process may be performed at 214 to determine whether the classifier exhibits any bias towards thetraining data 206 and/or thetest data 210. For example, theverification data 216 may be documents that are different from both thetest data 210 or thetraining data 206. After verifying, at 214, that the accuracy of the classifier satisfies the desired accuracy, the trainedclassifier 128 may be used to identify the set ofworkload types 130, with each of theworkload types 130 affecting the I/O stack 114 in a different way as compared to others of the workload types 130. - The
training data 206, thetest data 210, and theverification data 216 may include data gathered by usingdifferent hardware platforms 220, with each of thehardware platforms 220 having adifferent storage configurations 222, such as different amount of RAM, different amount of physical storage, different types (e.g., mechanical, SSD, and the like) of physical storage, different storage interfaces (e.g., SATA, NVME, network attached storage (NAS) or the like. Thedata different workloads 224 onvarious hardware platforms 220 andstorage configurations 222 usingdifferent profiles 226. Theclassifier 128 may associate the profile 132(M) with the workload type 130(M) because, among theprofiles 226, the profile 132(M) provides the fastest execution of I/O requests for the workload type 130(M). -
FIG. 3 is a block diagram 300 illustrating examples of variables used to train a machine learning algorithm, according to some embodiments.FIG. 3 illustrates examples of the variables influencing I/O throughput. For example, system file reads 302, I/O reads 304, copy cache reads 306, process reads 308, cache writes 310, logical disk reads 312, system file writes 314, disk readqueue length 316, diskwrite queue length 318, and cache data flush 320. - A correlational analysis may be used to rank the dependency level between I/O related variables, such as the variables identified in
FIG. 3 . For example, when collecting thedata classifier 128 ofFIG. 1 , approximately 700 different I/O related variables may be monitored. The variables may be ranked according to each variable's dependency level, e.g., how much the variable influences I/O, and the variables ranked, as illustrated inFIG. 3 , based on mean decrease Gini. The mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of nodes and leaves in a resulting random forest. Variables that result in nodes with higher purity have a higher decrease in Gini coefficient, indicating a greater influence. A subset of the variables, e.g., the top X (e.g., 10<X<100) variables (e.g., having the highest mean decrease Gini), may be selected. The subset of variables is used to gather thedata 126 associated with each of the selectedapps 110. First, monitoring 700 different I/O related variables when gathering thedata 126 is impractical in a runtime environment because the execution speed of theapps 110 would slow down significantly. Second, monitoring the top X variables is sufficient to characterize the workload of each of theapps 110 because the top X variables have the largest influence (e.g., greatest mean decrease Gini) over I/O. - For example, in
FIG. 3 , assume the top fivevariables apps 110 and the selected one of theapps 110 begins execution, thevariables - In the flow diagram of
FIGS. 4 and 5 , each block represents one or more operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent computer-executable instructions that, when executed by one or more processors, cause the processors to perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, modules, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the blocks are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes. For discussion purposes, theprocesses FIGS. 1, 2, and 3 , as described above, although other models, frameworks, systems and environments may be used to implement this process. -
FIG. 4 is a flowchart of aprocess 400 that includes training a classifier, according to some embodiments. For example, theprocess 400 may be performed in a non-production environment to create theclassifier 128 ofFIG. 1 . - At 402, a hardware platform may be selected. At 404, a storage configuration may be selected. At 406, a workload may be selected. At 408, a profile may be selected.
- At 410, the workload may be executed for a particular period of time. At 412, data associated with the I/O stack may be gathered. At 414, the gathered data, the platform, the configuration, the workload, and the profile may be stored.
- At 416, a determination may be made whether there are more profiles. If there are more profiles, then the process may proceed to 408, where a next profile is selected. Thus, 408, 410, 412, 414, and 416 may be repeated until all profiles have been selected for a particular platform, a particular configuration, and a particular workload. If there are no more profiles to be selected, then the process proceeds to 418.
- At 418, a determination may be made whether there are more workloads. If there are more workloads, then the process may proceed to 406, where a next workload is selected. Thus, 406, 408, 410, 412, 414, 416, and 418 may be repeated until all workloads have been selected for a particular platform and a particular configuration. If there are no more workloads to be executed, then the process proceeds to 418.
- At 420, a determination may be made whether there are more configurations. If there are more configurations, then the process may proceed to 404, where a next configuration is selected. Thus, 404, 406, 408, 410, 412, 414, 416, 418, and 420 may be repeated until all configurations have been selected for a particular platform. If there are no more configurations, then the process proceeds to 422.
- At 422, a determination may be made whether there are more platforms. If there are more platforms, then the process may proceed to 402, where a next hardware platform is selected. Thus, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, and 422 may be repeated until all platforms have been selected. If there are no more platforms, then the process proceeds to 424.
- At 424, the classifier may be trained, as described in
FIG. 2 , using the data gathered in 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, and 422. The classifier may be used to identify a particular profile from multiple profiles that provides a fastest I/O execution for each workload. -
FIG. 5 is a flowchart of aprocess 500 that includes configuring parameters associated with an I/O system based on a profile, according to some embodiments. Theprocess 500 may be performed by theperformance improvement tool 106 ofFIG. 1 . - At 502, the performance improvement tool (“tool”) may be installed on a computing device (e.g., a workstation, such as Dell® Precision). The tool may include predefined workloads and corresponding profiles. For example, in
FIG. 1 , the performance improvement tool 106 (e.g., Dell® Precision Optimizer or similar) may be installed on the computing device 102 (e.g., Dell® Precision workstation). Theperformance improvement tool 106 may include theworkload types 130 and theprofiles 132. - At 504, the tool may display a UI. At 506, a selection of one or more apps (e.g., for which performance is to be improved, e.g., “optimized”) may be received via the UI. For example, in
FIG. 1 , theperformance improvement tool 106 may provide theapp selection UI 108 to enable a user to select one or more apps, such as apps 110(1) to app 110(N) (N>0) from among theapps 104. - At 508, the process may determine that one of the selected apps is executing (e.g., after having been selected via the UI). At 510, data associated with how the app uses the I/O stack may be gathered for a predetermined period of time. For example, in
FIG. 1 , theperformance improvement tool 106 may monitor the input/output (I/O) requests 112 for a particular one of the selectedapps 110 when the selected app is executing and gatherdata 126 associated with how the selected app uses the I/O stack 114. Thedata 126 may be gathered across each of the layers of the I/O stack 114 for a predetermined period of time (e.g., 15, 30, 45, 60 minutes or the like). - At 512, an analysis of the data may be performed using a machine learning algorithm (e.g., a classifier). At 514, based on the analysis, a closest predefined workload may be determined. At 516, a profile corresponding to the closest predefined workload may be selected. At 518, the process may configure one or more parameters of the computing device based on the profile. For example, in
FIG. 1 , theclassifier 128 may analyze the data 126(N) associated with the app 110(N) and determine that the I/O requests 112 to the I/O stack 114 present a workload that is similar (e.g., closest) to the predetermined workload type 130(M) and select the profile 132(M). Theperformance improvement tool 106 may apply the settings in the profile 132(M) to thecomputing device 102 to improve the performance of the app 110(N), e.g., as related to accessing the I/O stack 114. -
FIG. 6 illustrates an example configuration of thecomputing device 102 that can be used to implement the systems and techniques described herein. - The computing device 600 may include one or more processors 602 (e.g., central processing unit (CPU), graphics processing unit (GPU), and the like), a
memory 604, communication interfaces 606, at least one display device 608, other input/output (I/O) devices 610 (e.g., keyboard, trackball, and the like), and one or more mass storage devices 612 (e.g., disk drive, solid state disk drive, or the like), configured to communicate with each other, such as via one or more system buses 614 or other suitable connections. While a single system bus 614 is illustrated for ease of understanding, it should be understood that the system buses 614 may include multiple buses, such as a memory device bus, a storage device bus (e.g., serial ATA (SATA) and the like), data buses (e.g., universal serial bus (USB) and the like), video signal buses (e.g., ThunderBolt®, DVI, HDMI, and the like), power buses, etc. - The
processors 602 are one or more hardware devices that may include a single processing unit or a number of processing units, all of which may include single or multiple computing units or multiple cores. Theprocessors 602 may include a graphics processing unit (GPU) that is integrated into the CPU or the GPU may be a separate processor device from the CPU. Theprocessors 602 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, graphics processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, theprocessors 602 may be configured to fetch and execute computer-readable instructions stored in thememory 604,mass storage devices 612, or other computer-readable media. -
Memory 604 andmass storage devices 612 are examples of computer storage media (e.g., memory storage devices) for storing instructions that can be executed by theprocessors 602 to perform the various functions described herein. For example,memory 604 may include both volatile memory and non-volatile memory (e.g., random access memory (RAM), read only memory (ROM), or the like) devices. Further,mass storage devices 612 may include hard disk drives, solid-state drives, removable media (e.g., secure digital (SD) cards), including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CD, DVD), a storage array, a network attached storage, a storage area network, or the like. Bothmemory 604 andmass storage devices 612 may be collectively referred to as memory or computer storage media herein and may be any type of non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that can be executed by theprocessors 602 as a particular machine configured for carrying out the operations and functions described in the implementations herein. - The computing device 600 may include one or
more communication interfaces 606 for exchanging data via anetwork 618. The communication interfaces 606 can facilitate communications within a wide variety of networks and protocol types, including wired networks (e.g., Ethernet, DOCSIS, DSL, Fiber, USB etc.) and wireless networks (e.g., WLAN, GSM, CDMA, 802.11, Bluetooth, Wireless USB, ZigBee, cellular, satellite, etc.), the Internet and the like. Communication interfaces 606 can also provide communication with external storage, such as a storage array, network attached storage, storage area network, cloud storage, or the like. - The display device 608 may be used for displaying content (e.g., information and images) to users. Other I/
O devices 610 may be devices that receive various inputs from a user and provide various outputs to the user, and may include a keyboard, a touchpad, a mouse, a printer, audio input/output devices, and so forth. - The computer storage media, such as
memory 604 andmass storage devices 612, may be used to store software and data. For example, the computer storage media may be used to store theapps 104, the performance improvement tool 106 (including therecommendations 172, theapp selection UI 108, theclassifier 128, the selectedapps 110, thedata 126, theworkload types 130, and the profiles 132), the I/O stack 114, and theoperating system 148, as well as other applications (e.g., device drivers) and other data. - The
performance improvement tool 106 may enable a user to select theapps 110, from among theapps 104, causing thetool 106 to identify a type of workload that the selected apps present to the I/O stack 114 and select a corresponding one of theprofiles 132. After the user has selected theapps 110, theperformance improvement tool 106 may determine when one of theapps 110 is being executed by thecomputing device 102 and monitor the I/O requests from the selected app to the I/O stack 114 for a predetermined amount of time (e.g., 15, 30, 45, 60 minutes or the like). During this time, theperformance improvement tool 106 may gather thedata 126 to characterize the workload presented by the selected app and select one of theprofiles 132. Typically, after one of theprofiles 132 has been selected and associated with the selected 110, the process of gathering thedata 126 may not subsequently be performed. However, the user may instruct theperformance improvement tool 106 to gather thedata 126 associated with one of theapps 110 and automatically (e.g., without human interaction) select one of theprofiles 132 by opening theapp selection UI 108 and selecting one of theapps 110. For example, if a provider of theperformance improvement tool 106 makes improvements to theprofiles 132 to createnew profiles 620, the user may download thenew profiles 620 from aserver 616 using thenetwork 618. After installing thenew profiles 620, the user may instruct theperformance improvement tool 106 to gather thedata 126 associated with one of theapps 110 and select one of theprofiles 132 by opening theapp selection UI 108 and selecting one of theapps 110. - The
performance improvement tool 106 may monitor the input/output (I/O) requests 112 for each of the selected apps when each of the selected apps is executing (e.g., the first time each of the selected apps is being executed) and gatherdata 126 associated with how the selected app uses the I/O stack 114. Thedata 126 may be gathered across each of the layers of the I/O stack 114. Theclassifier 128 may analyze thedata 126 and identify one ofpredefined workload types 130 that is closest (e.g., most similar) to the type of workload that the selected app presents to the I/O stack 114. Theperformance improvement tool 106 may apply the settings in thecorresponding profile 132 to thecomputing device 102 to improve the performance of the selected app. For example, theprofiles 132 may modify various parameters associated with the I/O stack 114, theoperating system 148, the selectedapp 110, another set of parameters, or any combination thereof. Applying the selectedprofile 132 causes an increase in the speed at which the I/O requests 112 are executed, thereby reducing execution time and increasing throughput for the selected app. -
FIG. 7 is a block diagram 700 illustrating classifying a workload of an app, according to some embodiments. After a user selects an app, such as the app 702 (e.g., one of theapps 110 ofFIG. 1 ), theperformance tool 106 may gatherdata 704 associated with how theapp 702 uses the I/O stack 114. For example, thedata 704 may include a set of operations 706 (e.g., read, write, and the like) performed to the I/O stack 114. Theperformance tool 106 may determine a how frequently eachoperation 708 is performed. For example, theperformance tool 106 may determine that theapp 702 performs operations 708(1) to 708(R) (R>0) with a corresponding frequency, e.g., that the operation 708(1) is performed with a frequency of 710(1) and the operation 708(R) is performed with a frequency of 710(R). - The
classifier 128 may identify asubset 712 of the set ofoperations 706 that includes the most frequently performed operations from the set ofoperations 706. For example, the operations 708(1) to 708(S) with corresponding frequencies 710(1) to 710(S), where S<R, may be selected for thesubset 712. For example, thesubset 712 may determined by selecting thoseoperations 708 that have acorresponding frequency 710 that satisfies a particular threshold (e.g., performed at least V times per second), a particular percentage (e.g., an individual operation represents at least W % of the total number of operations performed in a particular time interval, V=5%, 10%, or the like), the top N (N>0, e.g., N=5, 10, or the like) most frequently performed operations from the set ofoperations 706, or based on another criteria to identify thesubset 712 that includes the most frequently performed operations from the set ofoperations 706. - The
classifier 128 may determine (e.g., classify), based on thesubset 712, a type 714 (e.g., one of thetypes 129 ofFIG. 1 ) of the workload that theapp 702 presents to the I/O stack 114. For example, theclassifier 128 may determine that thetype 714 of theapp 702 is most similar to a workload type 718 (e.g., one of theworkload types 130 ofFIG. 1 ). For example, theworkload type 718 may be associated withfrequent operations 720, e.g., operations 722(1) to 722(T) (T>0) having a corresponding frequency 724(1) to 724(T), respectively. To illustrate, thesubset 712 may be most similar to thefrequent operations 720, e.g., theoperations 708 may be similar (or identical) to theoperations 722 and thefrequencies 710 may be similar (or identical) to thefrequencies 724. For example, if thesubset 712 includes a particular type of read operation and a particular type of write operation, then thefrequent operations 720 may include the particular type of read operation and the particular type of write operation. After determining that (e.g., classifying) thetype 714 of the workload presented by theapp 702 is most similar to theworkload type 718, theclassifier 128 may select the profile 716 (e.g., one of theprofiles 132 ofFIG. 1 ). Theperformance improvement tool 106 may configure various parameters associated with thecomputing device 102 ofFIG. 1 based on theprofile 716 to improve execution (e.g., reduce execution time) of thesubset 712. In this way, thedata 704 gathered by monitoring theoperations 708 performed by theapp 702 to the I/O stack 114 can be classified by theclassifier 128 as thetype 714 of the workload associated with theapp 702. Theclassifier 128 determines that thetype 714 is most similar to theworkload type 718, selects thecorresponding profile 716 and configures the parameters of the computing device to improve performance of theapp 702. Thus, the performance of theapp 702 can be improved even when theapp 702 is a new application, a new version of an existing application, or if the user uses theapp 702 in a way that is different from how other users use theapp 702. Theprofile 716 is thus selected according to the way in which theapp 702 is used. Further, if the user changes the way in which theapp 702 is used, the user can re-run theperformance improvement tool 106 to monitor the new way in which theapp 702 is being used, characterize the type of workload, and select a different profile based on the new usage. - The example systems and computing devices described herein are merely examples suitable for some implementations and are not intended to suggest any limitation as to the scope of use or functionality of the environments, architectures and frameworks that can implement the processes, components and features described herein. Thus, implementations herein are operational with numerous environments or architectures, and may be implemented in general purpose and special-purpose computing systems, or other devices having processing capability. Generally, any of the functions described with reference to the figures can be implemented using software, hardware (e.g., fixed logic circuitry) or a combination of these implementations. The term “module,” “mechanism” or “component” as used herein generally represents software, hardware, or a combination of software and hardware that can be configured to implement prescribed functions. For instance, in the case of a software implementation, the term “module,” “mechanism” or “component” can represent program code (and/or declarative-type instructions) that performs specified tasks or operations when executed on a processing device or devices (e.g., CPUs or processors). The program code can be stored in one or more computer-readable memory devices or other computer storage devices. Thus, the processes, components and modules described herein may be implemented by a computer program product.
- Furthermore, this disclosure provides various example implementations, as described and as illustrated in the drawings. However, this disclosure is not limited to the implementations described and illustrated herein, but can extend to other implementations, as would be known or as would become known to those skilled in the art. Reference in the specification to “one implementation,” “this implementation,” “these implementations” or “some implementations” means that a particular feature, structure, or characteristic described is included in at least one implementation, and the appearances of these phrases in various places in the specification are not necessarily all referring to the same implementation.
- Although the present invention has been described in connection with several embodiments, the invention is not intended to be limited to the specific forms set forth herein. On the contrary, it is intended to cover such alternatives, modifications, and equivalents as can be reasonably included within the scope of the invention as defined by the appended claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/353,153 US10771580B1 (en) | 2019-03-14 | 2019-03-14 | Using machine learning to improve input/output performance of an application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/353,153 US10771580B1 (en) | 2019-03-14 | 2019-03-14 | Using machine learning to improve input/output performance of an application |
Publications (2)
Publication Number | Publication Date |
---|---|
US10771580B1 US10771580B1 (en) | 2020-09-08 |
US20200296182A1 true US20200296182A1 (en) | 2020-09-17 |
Family
ID=72289868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/353,153 Active US10771580B1 (en) | 2019-03-14 | 2019-03-14 | Using machine learning to improve input/output performance of an application |
Country Status (1)
Country | Link |
---|---|
US (1) | US10771580B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240028225A1 (en) * | 2022-07-20 | 2024-01-25 | Dell Products L.P. | Data storage system with self tuning based on cluster analysis of workload features |
US12124714B2 (en) * | 2022-07-20 | 2024-10-22 | Dell Products L.P. | Data storage system with self tuning based on cluster analysis of workload features |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11222282B2 (en) * | 2018-09-21 | 2022-01-11 | International Business Machines Corporation | Sourcing a new machine-learning project by reusing artifacts from reference machine learning projects |
US11977468B2 (en) * | 2021-12-01 | 2024-05-07 | Intel Corporation | Automatic profiling of application workloads in a performance monitoring unit using hardware telemetry |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7898442B1 (en) * | 1997-05-30 | 2011-03-01 | International Business Machines Corporation | On-line data compression analysis and regulation |
US6775745B1 (en) * | 2001-09-07 | 2004-08-10 | Roxio, Inc. | Method and apparatus for hybrid data caching mechanism |
US8754904B2 (en) * | 2011-04-03 | 2014-06-17 | Lucidlogix Software Solutions, Ltd. | Virtualization method of vertical-synchronization in graphics systems |
JP5322615B2 (en) * | 2008-12-15 | 2013-10-23 | キヤノン株式会社 | Image processing apparatus, workflow execution method, and program |
US8239584B1 (en) * | 2010-12-16 | 2012-08-07 | Emc Corporation | Techniques for automated storage management |
US8862917B2 (en) * | 2011-09-19 | 2014-10-14 | Qualcomm Incorporated | Dynamic sleep for multicore computing devices |
JP5989504B2 (en) * | 2012-10-25 | 2016-09-07 | 株式会社東芝 | Information processing apparatus and operation control method |
US9317204B2 (en) * | 2013-11-14 | 2016-04-19 | Sandisk Technologies Inc. | System and method for I/O optimization in a multi-queued environment |
US20150242133A1 (en) * | 2014-02-21 | 2015-08-27 | Lsi Corporation | Storage workload hinting |
US9720601B2 (en) * | 2015-02-11 | 2017-08-01 | Netapp, Inc. | Load balancing technique for a storage array |
US9747201B2 (en) * | 2015-03-26 | 2017-08-29 | Facebook, Inc. | Methods and systems for managing memory allocation |
US9959146B2 (en) * | 2015-10-20 | 2018-05-01 | Intel Corporation | Computing resources workload scheduling |
US10430206B1 (en) * | 2017-03-14 | 2019-10-01 | American Megatrends International, Llc | Multi-user hidden feature enablement in firmware |
US11182210B2 (en) * | 2017-07-31 | 2021-11-23 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method for resource allocation and terminal device |
US11003493B2 (en) * | 2018-07-25 | 2021-05-11 | International Business Machines Corporation | Application and storage based scheduling |
US10621123B2 (en) * | 2018-08-02 | 2020-04-14 | EMC IP Holding Company LLC | Managing storage system performance |
US11163452B2 (en) * | 2018-09-24 | 2021-11-02 | Elastic Flash Inc. | Workload based device access |
-
2019
- 2019-03-14 US US16/353,153 patent/US10771580B1/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20240028225A1 (en) * | 2022-07-20 | 2024-01-25 | Dell Products L.P. | Data storage system with self tuning based on cluster analysis of workload features |
US12124714B2 (en) * | 2022-07-20 | 2024-10-22 | Dell Products L.P. | Data storage system with self tuning based on cluster analysis of workload features |
Also Published As
Publication number | Publication date |
---|---|
US10771580B1 (en) | 2020-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11175953B2 (en) | Determining an allocation of computing resources for a job | |
US11275672B2 (en) | Run-time determination of application performance with low overhead impact on system performance | |
US10761957B2 (en) | Optimization of operating system and virtual machine monitor memory management | |
US10628225B2 (en) | Resource configuration system, resource configuration method and resource configuration program for selecting a computational resource and selecting a provisioning method | |
US11163452B2 (en) | Workload based device access | |
US9501313B2 (en) | Resource management and allocation using history information stored in application's commit signature log | |
US11194517B2 (en) | Method and apparatus for storage device latency/bandwidth self monitoring | |
US11579906B2 (en) | Managing performance optimization of applications in an information handling system (IHS) | |
Son et al. | An empirical evaluation and analysis of the performance of NVM express solid state drive | |
US20170315924A1 (en) | Dynamically Sizing a Hierarchical Tree Based on Activity | |
US10152339B1 (en) | Methods and apparatus for server caching simulator | |
US20240036756A1 (en) | Systems, methods, and devices for partition management of storage resources | |
US10771580B1 (en) | Using machine learning to improve input/output performance of an application | |
US9176676B2 (en) | Efficiency of virtual machines that use de-duplication as primary data storage | |
US10983832B2 (en) | Managing heterogeneous memory resource within a computing system | |
Noorshams et al. | Experimental evaluation of the performance-influencing factors of virtualized storage systems | |
Awasthi et al. | System-level characterization of datacenter applications | |
Meyer et al. | Supporting heterogeneous pools in a single ceph storage cluster | |
US10809937B2 (en) | Increasing the speed of data migration | |
US20220171656A1 (en) | Adjustable-precision multidimensional memory entropy sampling for optimizing memory resource allocation | |
Qiao et al. | Application classification based on preference for resource requirements in virtualization environment | |
Zhou | Bridging the Gap between Application and Solid-State-Drives | |
Fritchey et al. | Memory Performance Analysis | |
CN116737056A (en) | Data movement between storage tiers of a clustered storage system based on input/output patterns of storage objects | |
Lu | Optimizing virtual machine I/O performance in cloud environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: DELL PRODUCTS L. P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KHOSROWPOUR, FARZAD;VICHARE, NIKHIL;SIGNING DATES FROM 20190222 TO 20190305;REEL/FRAME:048598/0252 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, NORTH CAROLINA Free format text: SECURITY AGREEMENT;ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;AND OTHERS;REEL/FRAME:050405/0534 Effective date: 20190917 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: PATENT SECURITY AGREEMENT (NOTES);ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;AND OTHERS;REEL/FRAME:050724/0466 Effective date: 20191010 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:053546/0001 Effective date: 20200409 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT, TEXAS Free format text: SECURITY INTEREST;ASSIGNORS:DELL PRODUCTS L.P.;EMC CORPORATION;EMC IP HOLDING COMPANY LLC;REEL/FRAME:053311/0169 Effective date: 20200603 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: WYSE TECHNOLOGY L.L.C., CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST AT REEL 050405 FRAME 0534;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0001 Effective date: 20211101 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST AT REEL 050405 FRAME 0534;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0001 Effective date: 20211101 Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST AT REEL 050405 FRAME 0534;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0001 Effective date: 20211101 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST AT REEL 050405 FRAME 0534;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058001/0001 Effective date: 20211101 |
|
AS | Assignment |
Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO WYSE TECHNOLOGY L.L.C.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (050724/0466);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060753/0486 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (050724/0466);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060753/0486 Effective date: 20220329 Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (050724/0466);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060753/0486 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (050724/0466);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060753/0486 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053311/0169);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060438/0742 Effective date: 20220329 Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053311/0169);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060438/0742 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053311/0169);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:060438/0742 Effective date: 20220329 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |