WO2017171973A1 - Method and apparatus to optimize power settings for a workload - Google Patents

Method and apparatus to optimize power settings for a workload Download PDF

Info

Publication number
WO2017171973A1
WO2017171973A1 PCT/US2017/013440 US2017013440W WO2017171973A1 WO 2017171973 A1 WO2017171973 A1 WO 2017171973A1 US 2017013440 W US2017013440 W US 2017013440W WO 2017171973 A1 WO2017171973 A1 WO 2017171973A1
Authority
WO
WIPO (PCT)
Prior art keywords
permutation
permutations
platform
settings
performance
Prior art date
Application number
PCT/US2017/013440
Other languages
French (fr)
Inventor
Andy Hoffman
Devadatta Bodas
Muralidhar Rajappa
Neven ABOU GAZALA
Justin Song
Kaushik BALASUBRAMANIAN
Thomas Birrer
Benjamin GREFE
Marvin FORBES
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Publication of WO2017171973A1 publication Critical patent/WO2017171973A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • Embodiments of the invention relate to power management. More specifically, embodiments of the invention relate to granular control of power performance tradeoffs in an execution environment.
  • HPC high performance computing
  • a node or server runs a specific job (task) and may run for hours on that task. Because each task may have a different power performance optimal function, the one-size-fits-all existing methodology generally results in all nodes exhibiting inefficient power usage.
  • Figure 1 is a block diagram of a system according to one embodiment of the invention.
  • FIG. 2 is a flow diagram of operation according to one embodiment of the invention.
  • FIG. 1 is a block diagram of a system according to one embodiment of the invention.
  • Execution environment 106 includes a plurality of platforms 120-1, 120-2... 120-n (generically, platform 120).
  • the number of platforms 120 may be arbitrarily large, e.g., in the context of cloud services, the execution environment 106 may include hundreds of thousands of platforms 120.
  • Execution environment 106 may be connected over a network 104 to an administrative node 102.
  • Administrative node displays a user interface 110 through which a user can specify optimization criteria 112 that may be used to optimize the power-performance characteristics of the execution environment 106.
  • Execution environment 106 may include a load balancer 122 to distribute incoming service requests amongst the platforms 120.
  • Each platform 120 includes a processor 142 which executes tasks 144.
  • Each platform 120 also includes a number of settings 146 which may be thought of as knobs 148-1, 148-2, 148-3... 148-n (generically, knob 148).
  • knob refers to a virtual knob that can be set to a plurality of values. Examples of knobs 148 that typically exist includes, for example:
  • QPI QuickPath Interconnect
  • UPI UltraPath Interconnect
  • FBB Front-side Bus
  • DDR Double Data Rate
  • PCIe Peripheral Component Interconnect Express
  • I2C Inter-Integrated Circuit
  • PECI Platform Environmental Control Interface
  • SATA Serial Advanced Technology Attachment
  • SAS Serial- Attached SCSI
  • USB Universal Serial Bus
  • ETH Video
  • Computer interconnects e.g., Stormlake and Infiniband, etc.
  • non-volatile memory non-volatile memory
  • memory caches, pipelines, queues, (e.g.
  • knobs provides a vast number of possible permutations of knob settings that collectively provide tremendous granularity to the power- performance tradeoff decision. By providing greater control of the granular knob settings improved power-performance can be achieved.
  • the platform 120 also includes an optimization agent 152 that causes the processor 142 to execute a sample workload such as task 144 based on a particular permutation of knob settings 146, and a performance metric for that permutation may be associated with the task type.
  • the optimization agent includes an candidate list creator that aggregates an indication of the permutation and the resulting metric as part of a candidate list 154.
  • a candidate list 154 may include an indication of the workload executed (Al) and indication of the criteria (CI) and indication of the metric (Mx) corresponding to the permutation (Px).
  • knobs 148 By repeatedly executing the workload 144, with different permutations of knobs 148 and optimal permutation can be identified based on the resulting metric. Given the arbitrarily large number of knobs that presently exist or may exist in the future, it may be impractical to use a brute-force approach to go through every possible permutation. However, since the general effect of adjusting the knob is known, that is, some knobs will improve energy usage characteristics but will reduce performance (e.g. reducing processor clock speed reduces performance and improves power usage), if the goal is to maximize performance, different permutations where that knob is changed may be reduced or eliminated. Some embodiments may choose a set of knobs known to have the greatest impact on e.g. power consumption and cycle through all permutations of those knobs. As used herein, "set" is deemed to have one or more members and does include the empty set. In the context of selecting a set of knobs a suitable set is likely tens of knobs.
  • the sample workload may execute repeatedly on a single platform with a different set of permutations on each execution
  • other embodiments execute the workload over a plurality of platforms, with each platform having a different permutation of the knobs 148. This permits parallel identification of the optimal permutation.
  • the "optimal permutation" is a permutation most closely matching the user-supplied criteria.
  • a user may supply criteria to be used in permutation selection via the administrative node 102 over network 104.
  • the criteria may be a ratio of power savings to performance lost.
  • Some embodiments allow complex specification of the criteria, and may include establishment of a performance floor and/or a ceiling for power usage.
  • Other criteria may primarily specify, for example, that a tradeoff is acceptable when it achieves, e.g., a 2% power savings for each % performance decrease.
  • Other ratios and metrics are within the scope of embodiments of the invention.
  • the optimization agent 152 provides the user interface 110 to the administrative node 102 to accept the optimization criteria 112. It also populates the benefit selection list within the user interface 110 with entries from the candidate list 152 that satisfy the user-supplied criteria. Candidates from the optimizations agents 152 from all the platforms 120 can be assembled into the selection list 114 to be displayed to the user where multiple platforms are used to cycle knob permutations.
  • a selection list compiler 134 may exist in a central location such as load balancer 122 to assemble the selection list from the candidate lists 154 of each platform 120.
  • the selection list compiler may be part of the user interface 110.
  • the selection list compiler may be part of the optimization agent.
  • the selection list 114 both displays permutations within the threshold of the optimization criteria and, in some embodiments, permits user selection of one of the
  • the user interfce 110 conveys the selection to the optimization agent 152, and knobs 148 are set consistent with that permutation for each platform 120.
  • Some embodiments of the invention allow the optimization agent 152 to select a permutation based on the user-defined criteria 112 automatically, without providing the benefit selection list to the user.
  • an association between a task type and an optimal permutation may be stored within the execution environment 106, such as optimal setting storage 132 that maintains and associate between task type and the optimal permutation in load balancer 122. In this manner, tasks having a particular type can trigger the automatic setting of the desired permutation when the load balancer 122 sends a task to a particular platform 120.
  • administration may be local to the platform.
  • embodiments of the invention may be employed in a mobile environment such as a laptop computer, where the optimization agent optimizes the laptop based on a locally provided metric, and the selection of the permutation of knobs settings may be locally administered through the mobile platform.
  • FIG. 2 is a flow diagram of operation according to one embodiment of the invention.
  • an execution environment accepts a user-defined criteria for optimal operation.
  • Typical execution environments include cloud services server facilities, HPC facilities and mobile computing facilities.
  • the criteria may be provided by an administrative node over a network or locally within the execution environment.
  • optimization agent sets a permutation of knobs on a platform within the execution environment.
  • a sample workload is executed on the platform.
  • the sample workload may be taken from a real data set or a fictitious set, which may be artificially created to be representative of the task to be performed by the platform.
  • a metric is generated based on the execution of the workload. For example, a ratio of power to performance is one suitable metric.
  • the effectiveness of the existing permutation may be compared against one or more prior permutations at block 209.
  • the current permutation is rejected at block 210.
  • the result of the comparison may be used to predict a permutation with greater effectiveness. For example, if turning a knob in one direction has shown a negative effectiveness relative to a prior permutation, it may be inferred that the knob should be turned in the opposite direction to improve effectiveness. Use of a heuristic approach can more rapidly find an optimal permutation. If instead the metric satisfies the criteria, that metric and its corresponding permutation (or a representation thereof) is added to a candidate list at block 212.
  • a selection list may be optionally displayed to allow a user to select their desired permutation.
  • the selection list may be derived from the candidate list and may include information on the metric achieved by the corresponding permutation.
  • a best permutation is selected from the selection list at block 220.
  • the best permutation may be selected by a user through, for example, administrative node where a selection list is provided to the user. Some embodiment may not display the selection list and the best permutation may be selected automatically as the highest performance or lowest power option satisfying the user-defined criteria depending on whether power or performance is desired for the particular task.
  • the selected permutation is then applied to all knobs for all platforms in the execution environment that will execute the task at block 218.
  • the permutation may be stored in association with the task type so that future executions of tasks of the same type may have the optimal permutation automatically applied.
  • Some embodiments pertain a system with granular power/performance management.
  • the system includes a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage.
  • the platforms include an optimization agent to execute on each platform, the optimization agents to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings.
  • a candidate list creator is used to aggregate a list of performance metrics associated with the plurality of permutations.
  • the system has an administrative node to accept a user- defined criterion for system optimization, to display the selection list, and to accept a selection of a permutation from the list.
  • the optimization agents are collectively to apply an optimal permutation of the settings to the plurality of platforms based on the metrics.
  • the system has a load balancer to distribute tasks between the plurality of platforms.
  • the load balancer has an optimal setting storage, the optimal setting storage to maintain an association between a task type and an optimal permutation of settings for the task type.
  • a platform has a processor to execute tasks.
  • the platform also has a plurality of knobs to adjust settings of the platform.
  • An optimization agent on the platform causes the processor to execute a workload under a plurality of permutations of the settings and tracks a performance metric for each permutation.
  • the optimization agent is to adjust the knobs to a permutation responsive to a comparison between the metric and a user- specified set of criteria.
  • the apparatus provides a user interface to accept the set of user- defined criteria for comparison with the metric.
  • the apparatus has a setting storage to store an association between a type of task and an optimal setting permutation for the type of task.
  • the optimization agent is to select the optimal setting permutation from the setting storage for a task to be processed.
  • the apparatus has a selection list compiler to aggregate a list of permutations and associated metrics.
  • the user interface accepts a user selection of one of the permutations in the selection list for use in the apparatus.
  • Some embodiments pertain to a method to granularly control power/performance in a system.
  • the control is accomplished by applying a sample workload to an execution
  • permutations of settings is established for use with future workloads based at least in part on the evaluation.
  • a benefit selection list is compiled for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.
  • the selection list is displayed within a user interface and a selection of one of the permutations from the list is accepted.
  • a user-defined set of criteria that dictate an optimal platform setting permutation is accepted.
  • the platform settings are automatically set to the optimal permutation.
  • the user-defined set of criteria include a ratio of power savings to performance impact.
  • evaluating is accomplished by comparing an effect of a permutation against a user-defined criterion and rejecting a permutation not satisfying the criterion for the workload.
  • compiling includes eliminating permutations not satisfying a user-defined set of criteria.
  • Some embodiments pertain to a non-transitory computer-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a set of operations to granularly control power/performance in a system.
  • the instructions cause the processor to apply a sample workload to an execution environment.
  • the instructions also cause the processor to evaluate performance of the execution environment executing the workload at a plurality of permutations of platform settings.
  • the instructions also cause the processor to establish one of the permutations of settings for use with future workloads based at least in part on the evaluation.
  • the instructions cause the processor to compile a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.
  • the instructions cause the processor to display the selection list within a user interface and accept a selection of one of the permutations from the list.
  • the instructions cause the processor to accept a user-defined set of criteria that dictate an optimal platform setting permutation and automatically set the platform settings to the optimal permutation.
  • the user-defined set of criteria include a ratio of power savings to performance impact.
  • the instructions cause the processor to compare an effect of a permutation against a user-defined criterion and reject a permutation not satisfying the criterion for the workload. In further embodiments, the instructions cause the processor to eliminate permutations not satisfying a user-defined set of criteria.
  • Some embodiments pertain to a system with granular power performance control.
  • the system has a plurality of platforms to execute tasks.
  • the system also has means for determining an optimal permutation of platform settings from a plurality of permutations of platform settings and means for applying the optimal permutation of platform settings to each platform in the plurality.
  • the means for determining has means for causing execution of a workload on one platform of the plurality under one permutation of the platform settings and means for comparing a metric generated responsive to the execution with a user-defined value of a desired metric.
  • system has means for compiling a list of setting permutations associated with the metric generated responsive to executing a workload on the platforms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A system with granular power/performance management. The system includes a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage. Each platform executes an optimization agent to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings. A candidate list creator exists as part of the optimization agent to aggregate a list of performance metrics associated with the plurality of permutations.

Description

METHOD AND APPARATUS TO OPTIMIZE POWER
SETTINGS FOR A WORKLOAD
BACKGROUND
FIELD
Embodiments of the invention relate to power management. More specifically, embodiments of the invention relate to granular control of power performance tradeoffs in an execution environment.
BACKGROUND
Systems providing, for example, cloud services, often employ hundreds of thousands of servers to provide those services. Many servers are used for specific types of workloads or tasks. Depending on the tasks, power performance tradeoffs may exist. However, existing platforms generally support only three power modes: "performance," "balanced" and "power savings." These three modes are generally one-size-fits-all such that a single case or single type of workload may force the platform to, for example, always operate in the performance mode with the corresponding negative power tradeoff. Where scaled over hundreds of thousands of units, the unnecessary power usage becomes quite significant.
A similar problem exists in high performance computing (HPC). In HPC, a node or server runs a specific job (task) and may run for hours on that task. Because each task may have a different power performance optimal function, the one-size-fits-all existing methodology generally results in all nodes exhibiting inefficient power usage.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to "an" or "one" embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one.
Figure 1 is a block diagram of a system according to one embodiment of the invention.
Figure 2 is a flow diagram of operation according to one embodiment of the invention. DETAILED DESCRIPTION
Figure 1 is a block diagram of a system according to one embodiment of the invention. Execution environment 106 includes a plurality of platforms 120-1, 120-2... 120-n (generically, platform 120). The number of platforms 120 may be arbitrarily large, e.g., in the context of cloud services, the execution environment 106 may include hundreds of thousands of platforms 120. Execution environment 106 may be connected over a network 104 to an administrative node 102. Administrative node displays a user interface 110 through which a user can specify optimization criteria 112 that may be used to optimize the power-performance characteristics of the execution environment 106.
Execution environment 106 may include a load balancer 122 to distribute incoming service requests amongst the platforms 120. Each platform 120 includes a processor 142 which executes tasks 144. Each platform 120 also includes a number of settings 146 which may be thought of as knobs 148-1, 148-2, 148-3... 148-n (generically, knob 148). As used herein, "knob" refers to a virtual knob that can be set to a plurality of values. Examples of knobs 148 that typically exist includes, for example:
- Knobs for selection, change, coordination, of frequencies, power states, idle states for:
- CPU cores, uncores, Field-Programmable Gate Arrays (FPGAs), Application- Specific Integrated Circuits (ASICs), packages, sockets
- Input/output controllers, buses, devices (e.g., QuickPath Interconnect (QPI), UltraPath Interconnect (UPI), Front-side Bus (FSB), Double Data Rate (computer memory bus) (DDR), Peripheral Component Interconnect Express (PCIe), Inter-Integrated Circuit (I2C), Platform Environmental Control Interface (PECI), rings, etc.)
- Fabric/communication controllers, buses, devices (e.g., Serial Advanced Technology Attachment (SATA), Serial- Attached SCSI (SAS), Universal Serial Bus (USB), Video, Ethernet (ETH), computer interconnects e.g., Stormlake and Infiniband, etc.)
- Knobs for write-back/through, prefetching of data and instructions for:
- Data storage devices, non-volatile memory, memory, caches, pipelines, queues, (e.g.
DDR clock enable (CKE))
- Knobs for selection, change, coordination, of width of buses and links (e.g. LOp, LOs (LOx in general, which is the nomenclature used in configuring link widths, feature designed to save power while activity is low for PCIe, QPI, UPI, FSB, rings, etc.))
- Knobs for selection of values for various Performance-Bias registers.
- Knobs for configurations of future generations and technologies of buses, links, interconnect, processors, chipsets, controllers, memory devices, etc.
As can be appreciated this myriad of knobs provides a vast number of possible permutations of knob settings that collectively provide tremendous granularity to the power- performance tradeoff decision. By providing greater control of the granular knob settings improved power-performance can be achieved.
The platform 120 also includes an optimization agent 152 that causes the processor 142 to execute a sample workload such as task 144 based on a particular permutation of knob settings 146, and a performance metric for that permutation may be associated with the task type. The optimization agent includes an candidate list creator that aggregates an indication of the permutation and the resulting metric as part of a candidate list 154. Thus for example in some embodiments a candidate list 154 may include an indication of the workload executed (Al) and indication of the criteria (CI) and indication of the metric (Mx) corresponding to the permutation (Px).
By repeatedly executing the workload 144, with different permutations of knobs 148 and optimal permutation can be identified based on the resulting metric. Given the arbitrarily large number of knobs that presently exist or may exist in the future, it may be impractical to use a brute-force approach to go through every possible permutation. However, since the general effect of adjusting the knob is known, that is, some knobs will improve energy usage characteristics but will reduce performance (e.g. reducing processor clock speed reduces performance and improves power usage), if the goal is to maximize performance, different permutations where that knob is changed may be reduced or eliminated. Some embodiments may choose a set of knobs known to have the greatest impact on e.g. power consumption and cycle through all permutations of those knobs. As used herein, "set" is deemed to have one or more members and does include the empty set. In the context of selecting a set of knobs a suitable set is likely tens of knobs.
Additionally, while some embodiments may execute the sample workload repeatedly on a single platform with a different set of permutations on each execution, other embodiments execute the workload over a plurality of platforms, with each platform having a different permutation of the knobs 148. This permits parallel identification of the optimal permutation. Generally, as used herein, the "optimal permutation" is a permutation most closely matching the user-supplied criteria.
As noted above, a user may supply criteria to be used in permutation selection via the administrative node 102 over network 104. In some embodiments, the criteria may be a ratio of power savings to performance lost. Some embodiments allow complex specification of the criteria, and may include establishment of a performance floor and/or a ceiling for power usage. Other criteria may primarily specify, for example, that a tradeoff is acceptable when it achieves, e.g., a 2% power savings for each % performance decrease. Other ratios and metrics are within the scope of embodiments of the invention.
The optimization agent 152 provides the user interface 110 to the administrative node 102 to accept the optimization criteria 112. It also populates the benefit selection list within the user interface 110 with entries from the candidate list 152 that satisfy the user-supplied criteria. Candidates from the optimizations agents 152 from all the platforms 120 can be assembled into the selection list 114 to be displayed to the user where multiple platforms are used to cycle knob permutations. In some embodiments a selection list compiler 134 may exist in a central location such as load balancer 122 to assemble the selection list from the candidate lists 154 of each platform 120. In other embodiment, the selection list compiler may be part of the user interface 110. In still other embodiment the selection list compiler may be part of the optimization agent.
The selection list 114 both displays permutations within the threshold of the optimization criteria and, in some embodiments, permits user selection of one of the
permutations from the benefit selection list. The user interfce 110 conveys the selection to the optimization agent 152, and knobs 148 are set consistent with that permutation for each platform 120. Some embodiments of the invention allow the optimization agent 152 to select a permutation based on the user-defined criteria 112 automatically, without providing the benefit selection list to the user.
In some embodiments, an association between a task type and an optimal permutation may be stored within the execution environment 106, such as optimal setting storage 132 that maintains and associate between task type and the optimal permutation in load balancer 122. In this manner, tasks having a particular type can trigger the automatic setting of the desired permutation when the load balancer 122 sends a task to a particular platform 120.
While administrative node 102 is shown remote from the platform 120, in some embodiments administration may be local to the platform. For example, embodiments of the invention may be employed in a mobile environment such as a laptop computer, where the optimization agent optimizes the laptop based on a locally provided metric, and the selection of the permutation of knobs settings may be locally administered through the mobile platform.
Figure 2 is a flow diagram of operation according to one embodiment of the invention. At block 202, an execution environment accepts a user-defined criteria for optimal operation. Typical execution environments include cloud services server facilities, HPC facilities and mobile computing facilities. The criteria may be provided by an administrative node over a network or locally within the execution environment. At block 204, optimization agent sets a permutation of knobs on a platform within the execution environment. At block 206, a sample workload is executed on the platform. The sample workload may be taken from a real data set or a fictitious set, which may be artificially created to be representative of the task to be performed by the platform. A metric is generated based on the execution of the workload. For example, a ratio of power to performance is one suitable metric.
If the metric generated from execution does not satisfy user-defined criteria at block 208, the effectiveness of the existing permutation may be compared against one or more prior permutations at block 209. The current permutation is rejected at block 210. Then at block 211, the result of the comparison may be used to predict a permutation with greater effectiveness. For example, if turning a knob in one direction has shown a negative effectiveness relative to a prior permutation, it may be inferred that the knob should be turned in the opposite direction to improve effectiveness. Use of a heuristic approach can more rapidly find an optimal permutation. If instead the metric satisfies the criteria, that metric and its corresponding permutation (or a representation thereof) is added to a candidate list at block 212. A
determination is then made whether there are more permutations in the set of possible permutations desired to be tested at block 214. If there are more permutations at block 214, the process repeats.
If there are no more permutations, at block 216 a selection list may be optionally displayed to allow a user to select their desired permutation. The selection list may be derived from the candidate list and may include information on the metric achieved by the corresponding permutation. A best permutation is selected from the selection list at block 220. As noted, the best permutation may be selected by a user through, for example, administrative node where a selection list is provided to the user. Some embodiment may not display the selection list and the best permutation may be selected automatically as the highest performance or lowest power option satisfying the user-defined criteria depending on whether power or performance is desired for the particular task.
The selected permutation is then applied to all knobs for all platforms in the execution environment that will execute the task at block 218. At block 220, the permutation may be stored in association with the task type so that future executions of tasks of the same type may have the optimal permutation automatically applied.
The following examples pertain to further embodiments. The various features of the different embodiments may be variously combined with some features included and others excluded to suit a variety of different applications. Some embodiments pertain a system with granular power/performance management. The system includes a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage. The platforms include an optimization agent to execute on each platform, the optimization agents to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings. A candidate list creator is used to aggregate a list of performance metrics associated with the plurality of permutations. In further embodiments, the system has an administrative node to accept a user- defined criterion for system optimization, to display the selection list, and to accept a selection of a permutation from the list.
In further embodiments, the optimization agents are collectively to apply an optimal permutation of the settings to the plurality of platforms based on the metrics.
In further embodiments, the system has a load balancer to distribute tasks between the plurality of platforms.
In further embodiments, the load balancer has an optimal setting storage, the optimal setting storage to maintain an association between a task type and an optimal permutation of settings for the task type.
Some embodiments pertain to an apparatus with granular power management. A platform has a processor to execute tasks. The platform also has a plurality of knobs to adjust settings of the platform. An optimization agent on the platform causes the processor to execute a workload under a plurality of permutations of the settings and tracks a performance metric for each permutation. The optimization agent is to adjust the knobs to a permutation responsive to a comparison between the metric and a user- specified set of criteria.
In further embodiments, the apparatus provides a user interface to accept the set of user- defined criteria for comparison with the metric.
In further embodiments, the apparatus has a setting storage to store an association between a type of task and an optimal setting permutation for the type of task.
In further embodiments, the optimization agent is to select the optimal setting permutation from the setting storage for a task to be processed.
In further embodiments, the apparatus has a selection list compiler to aggregate a list of permutations and associated metrics. The user interface accepts a user selection of one of the permutations in the selection list for use in the apparatus.
Some embodiments pertain to a method to granularly control power/performance in a system. The control is accomplished by applying a sample workload to an execution
environment. Based on the application, performance of the execution environment executing the workload at a plurality of permutations of platform settings is evaluated. One of the
permutations of settings is established for use with future workloads based at least in part on the evaluation.
In further embodiments, a benefit selection list is compiled for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations. In further embodiments, the selection list is displayed within a user interface and a selection of one of the permutations from the list is accepted.
In further embodiments, a user-defined set of criteria that dictate an optimal platform setting permutation is accepted. The platform settings are automatically set to the optimal permutation.
In further embodiments, the user-defined set of criteria include a ratio of power savings to performance impact.
In further embodiments, evaluating is accomplished by comparing an effect of a permutation against a user-defined criterion and rejecting a permutation not satisfying the criterion for the workload.
In further embodiments, compiling includes eliminating permutations not satisfying a user-defined set of criteria.
Some embodiments pertain to a non-transitory computer-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a set of operations to granularly control power/performance in a system. The instructions cause the processor to apply a sample workload to an execution environment. The instructions also cause the processor to evaluate performance of the execution environment executing the workload at a plurality of permutations of platform settings. The instructions also cause the processor to establish one of the permutations of settings for use with future workloads based at least in part on the evaluation.
In further embodiments, the instructions cause the processor to compile a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.
In further embodiments, the instructions cause the processor to display the selection list within a user interface and accept a selection of one of the permutations from the list.
In further embodiments, the instructions cause the processor to accept a user-defined set of criteria that dictate an optimal platform setting permutation and automatically set the platform settings to the optimal permutation.
In further embodiments, the user-defined set of criteria include a ratio of power savings to performance impact.
In further embodiments, the instructions cause the processor to compare an effect of a permutation against a user-defined criterion and reject a permutation not satisfying the criterion for the workload. In further embodiments, the instructions cause the processor to eliminate permutations not satisfying a user-defined set of criteria.
Some embodiments pertain to a system with granular power performance control. The system has a plurality of platforms to execute tasks. The system also has means for determining an optimal permutation of platform settings from a plurality of permutations of platform settings and means for applying the optimal permutation of platform settings to each platform in the plurality.
In further embodiments, the means for determining has means for causing execution of a workload on one platform of the plurality under one permutation of the platform settings and means for comparing a metric generated responsive to the execution with a user-defined value of a desired metric.
In further embodiments, the system has means for compiling a list of setting permutations associated with the metric generated responsive to executing a workload on the platforms.
While embodiments of the invention are discussed above in the context of flow diagrams reflecting a particular linear order, this is for convenience only. In some cases, various operations may be performed in a different order than shown or various operations may occur in parallel. It should also be recognized that some operations described with respect to one embodiment may be advantageously incorporated into another embodiment. Such incorporation is expressly contemplated.
In the foregoing specification, the invention has been described with reference to the specific embodiments thereof. It will, however, be evident that various modifications and changes can be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMS What is claimed is:
1. A system with granular power/performance management, the system comprising:
a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage;
an optimization agent to execute on each platform, the optimization agents to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings; and
a candidate list creator to aggregate a list of performance metrics associated with the plurality of permutations.
2. The system of claim 1, further comprising:
an administrative node to accept a user-defined criterion for system optimization, to display the selection list, and to accept a selection of a permutation from the list.
3. The system of claim 1, wherein the optimization agents are collectively to apply an optimal permutation of the settings to the plurality of platforms based on the metrics.
4. The system of claim 1, further comprising:
a load balancer to distribute tasks between the plurality of platforms.
5. The system of claim 4, wherein the load balancer comprises:
an optimal setting storage, the optimal setting storage to maintain an association between a task type and an optimal permutation of settings for the task type.
6. An apparatus with granular power management, comprising:
a platform having a processor to execute tasks;
a plurality of knobs to adjust settings of the platform;
an optimization agent to cause the processor to execute a workload under a plurality of permutations of the settings and tracks a performance metric for each permutation; and
wherein the optimization agent is to adjust the knobs to a permutation responsive to a comparison between the metric and a user- specified set of criteria.
7. The apparatus of claim 6, further comprising:
a user interface to accept the set of user-defined criteria for comparison with the metric.
8. The apparatus of claim 6, further comprising:
a setting storage to store an association between a type of task and an optimal setting permutation for the type of task.
9. The apparatus of claim 6, wherein the optimization agent is to select the optimal setting permutation from the setting storage for a task to be processed.
10. The apparatus of claim 7, further comprising:
a selection list compiler to aggregate a list of permutations and associated metrics, and wherein the user interface accepts a user selection of one of the permutations in the selection list for use in the apparatus.
11. A method to granularly control power/performance in a system, comprising:
applying a sample workload to an execution environment;
evaluating performance of the execution environment executing the workload at a plurality of permutations of platform settings; and
establishing one of the permutations of settings for use with future workloads based at least in part on the evaluation.
12. The method of claim 11, further comprising:
compiling a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.
13. The method of claim 12, further comprising:
displaying the selection list within a user interface; and
accepting a selection of one of the permutations from the list.
14. The method of claim 11, further comprising:
accepting a user-defined set of criteria that dictate an optimal platform setting permutation;
automatically setting the platform settings to the optimal permutation.
15. The method of claim 14, wherein the user-defined set of criteria include a ratio of power savings to performance impact.
16. The method of claim 11, wherein evaluating comprises:
comparing an effect of a permutation against a user-defined criterion; and
rejecting a permutation not satisfying the criterion for the workload.
17. The method of claim 12, wherein compiling comprises:
eliminating permutations not satisfying a user-defined set of criteria.
18. A non-transitory computer-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a set of operations to granularly control power/performance in a system comprising:
applying a sample workload to an execution environment; evaluating performance of the execution environment executing the workload at a plurality of permutations of platform settings; and
establishing one of the permutations of settings for use with future workloads based at least in part on the evaluation.
19. The non-transitory computer-readable medium of claim 17, wherein the instructions cause the processor to perform a set of operations further comprising:
compiling a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.
20. The non-transitory computer-readable medium of claim 17, wherein the instructions cause the processor to perform a set of operations further comprising:
accepting a user-defined set of criteria that dictate an optimal platform setting permutation;
automatically setting the platform settings to the optimal permutation.
21. The non- transitory computer-readable medium of claim 20, wherein the user-defined set of criteria include a ratio of power savings to performance impact.
22. The non-transitory computer-readable medium of claim 17, wherein evaluating causes the processor to perform a set of operations comprising:
comparing an effect of a permutation against a user-defined criterion; and
rejecting a permutation not satisfying the criterion for the workload.
23. A system with granular power performance control, the system comprising:
a plurality of platforms to execute tasks;
means for determining an optimal permutation of platform settings from a plurality of permutations of platform settings; and
means for applying the optimal permutation of platform settings to each platform in the plurality.
24. The system of claim 23, wherein the means for determining comprises:
means for causing execution of a workload on one platform of the plurality under one permutation of the platform settings; and
means for comparing a metric generated responsive to the execution with a user-defined value of a desired metric.
25. The system of claim 23, further comprising:
means for compiling a list of setting permutations associated with the metric generated responsive to executing a workload on the platforms.
PCT/US2017/013440 2016-04-01 2017-01-13 Method and apparatus to optimize power settings for a workload WO2017171973A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201641011558 2016-04-01
IN201641011558 2016-04-01

Publications (1)

Publication Number Publication Date
WO2017171973A1 true WO2017171973A1 (en) 2017-10-05

Family

ID=59965067

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/013440 WO2017171973A1 (en) 2016-04-01 2017-01-13 Method and apparatus to optimize power settings for a workload

Country Status (1)

Country Link
WO (1) WO2017171973A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110282982A1 (en) * 2010-05-13 2011-11-17 Microsoft Corporation Dynamic application placement based on cost and availability of energy in datacenters
US20120151490A1 (en) * 2010-12-10 2012-06-14 Nec Laboratories America, Inc. System positioning services in data centers
US20130179706A1 (en) * 2011-12-15 2013-07-11 Krishnakanth V. Sistla User Level Control Of Power Management Policies
US20130261826A1 (en) * 2010-02-26 2013-10-03 International Business Machines Corporation Optimizing power consumption by dynamic workload adjustment
US9292060B1 (en) * 2012-06-28 2016-03-22 Amazon Technologies, Inc. Allowing clients to limited control on power consumed by the cloud while executing the client's tasks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130261826A1 (en) * 2010-02-26 2013-10-03 International Business Machines Corporation Optimizing power consumption by dynamic workload adjustment
US20110282982A1 (en) * 2010-05-13 2011-11-17 Microsoft Corporation Dynamic application placement based on cost and availability of energy in datacenters
US20120151490A1 (en) * 2010-12-10 2012-06-14 Nec Laboratories America, Inc. System positioning services in data centers
US20130179706A1 (en) * 2011-12-15 2013-07-11 Krishnakanth V. Sistla User Level Control Of Power Management Policies
US9292060B1 (en) * 2012-06-28 2016-03-22 Amazon Technologies, Inc. Allowing clients to limited control on power consumed by the cloud while executing the client's tasks

Similar Documents

Publication Publication Date Title
US8341441B2 (en) Reducing energy consumption in a cloud computing environment
US8776066B2 (en) Managing task execution on accelerators
US7657766B2 (en) Apparatus for an energy efficient clustered micro-architecture
CN111406250B (en) Provisioning using prefetched data in a serverless computing environment
US11367010B2 (en) Quantum computer simulator characterization
US20140108828A1 (en) Semi-static power and performance optimization of data centers
US20100250642A1 (en) Adaptive Computing Using Probabilistic Measurements
US9524009B2 (en) Managing the operation of a computing device by determining performance-power states
US10601690B2 (en) Assessing performance of networked computing environments
US20180024812A1 (en) Dynamic evaluation and adaption of hardware hash function
US10452443B2 (en) Dynamic tuning of multiprocessor/multicore computing systems
US11372594B2 (en) Method and apparatus for scheduling memory access request, device and storage medium
US20200174542A1 (en) Speculation throttling for reliability management
CN115698958A (en) Power-performance based system management
Moghaddam et al. Dynamic energy management for chip multi-processors under performance constraints
CN111782147A (en) Method and apparatus for cluster scale-up
JP6823626B2 (en) Database management system and method
US9584379B2 (en) Sorted event monitoring by context partition
US10635444B2 (en) Shared compare lanes for dependency wake up in a pair-based issue queue
US20140013142A1 (en) Processing unit power management
WO2017171973A1 (en) Method and apparatus to optimize power settings for a workload
Slegers et al. Dynamic server allocation for power and performance
US10621008B2 (en) Electronic device with multi-core processor and management method for multi-core processor
US20170075589A1 (en) Memory and bus frequency scaling by detecting memory-latency-bound workloads
Das et al. Augmenting amdahl's second law: A theoretical model to build cost-effective balanced HPC infrastructure for data-driven science

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17776032

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17776032

Country of ref document: EP

Kind code of ref document: A1