WO2011011668A1 - Determining performance sensitivities of computational units - Google Patents

Determining performance sensitivities of computational units Download PDF

Info

Publication number
WO2011011668A1
WO2011011668A1 PCT/US2010/043029 US2010043029W WO2011011668A1 WO 2011011668 A1 WO2011011668 A1 WO 2011011668A1 US 2010043029 W US2010043029 W US 2010043029W WO 2011011668 A1 WO2011011668 A1 WO 2011011668A1
Authority
WO
WIPO (PCT)
Prior art keywords
performance
computational units
computer system
recited
computational
Prior art date
Application number
PCT/US2010/043029
Other languages
English (en)
French (fr)
Inventor
Sebastien Nussbaum
Alexander Branover
John Kalamatianos
Original Assignee
Advanced Micro Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/508,935 external-priority patent/US8443209B2/en
Priority claimed from US12/508,902 external-priority patent/US20110022356A1/en
Priority claimed from US12/508,929 external-priority patent/US8447994B2/en
Application filed by Advanced Micro Devices, Inc. filed Critical Advanced Micro Devices, Inc.
Publication of WO2011011668A1 publication Critical patent/WO2011011668A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3296Power saving characterised by the action undertaken by lowering the supply or operating voltage
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This invention relates to power allocation in computer systems and more particularly to allocating power to improve performance.
  • Processors run at various performance levels in an effort to match power consumption to work load requirements.
  • the performance levels are typically determined by
  • an embodiment enables analysis of workload executed on computational units to help identify those computational units that are more performance sensitive to a change in performance capability caused by, e.g., a frequency change.
  • a method in one embodiment, includes determining respective performance sensitivities to a change in performance capability of each of a plurality of computational units of a computer system based on measured performance metrics for each of the computational units.
  • the performance sensitivities may be determined by determining respective first performance metrics of the computational units associated with a first performance level and determining respective second performance metrics of the computational units associated with a second performance level.
  • the performance sensitivities of the respective computational units are determined based on the respective first and second performance metrics, e.g., as a difference between the first and second utilization metrics.
  • the performance sensitivity of each of the computational units is continually updated in response to, e.g., a process context change of a computational unit or a predetermined period of time elapsing since the last performance sensitivity was determined.
  • an integrated circuit in another embodiment, includes a plurality of computational units and control logic configured to determine respective performance sensitivities for each of the plurality of computational units.
  • the control logic applies clock signals at respective first frequencies to respective ones of the computational units during respective first evaluation periods and determines respective first utilization metrics.
  • the control logic applies the respective clock signals at respective second frequencies to the respective ones of the computational units during respective second evaluation periods and determines respective second utilization metrics.
  • the respective performance sensitivities for the plurality of computational units may be determined based on the respective first and second utilization metrics.
  • the integrated circuit determines boost sensitivity of a computational unit in response to a process context change associated with the computational unit.
  • the integrated circuit may determine performance sensitivity of a computational unit in response to passage of a predetermined period of time since a performance sensitivity was previously determined for the computational unit.
  • a computer system in still another embodiment, includes a plurality of computational units.
  • the computer system is configured to determine respective
  • the computer system is configured to determine the performance sensitivity for at least one of the computational units in response to at least one of a context change of the one computational unit and an elapsed period of time since a previous performance sensitivity determination was made for the one computational unit.
  • the computer system further includes storage locations and is configured to store an identification of the respective computational units, respective process contexts associated with the respective computational units, and the respective performance sensitivities of the respective computational units.
  • the computational units may comprise one or more of processing cores, a graphical processing unit and a memory controller.
  • the computer system is further configured to apply respective clock signals at respective first frequencies to respective ones of the computational units during respective first evaluation periods and determine the respective first performance metrics.
  • the computer system applies the respective clock signals at respective second frequencies to the respective ones of the computational units during respective second evaluation periods and determines the respective second performance metrics.
  • the computer system determines the respective performance sensitivities for the plurality of computational units based on the respective first and second performance metrics.
  • the computer system is configured to identify respective process contexts associated with respective ones of computational units during determination of the respective first and second performance metrics.
  • Fig. 1 shows a high-level block diagram of an exemplary System on a Chip (SOC) system according to an embodiment of the invention.
  • Fig. 2 illustrates a high-level flow diagram for profiling performance sensitivity to core frequency changes according to one embodiment of the invention.
  • SOC System on a Chip
  • Fig. 3 illustrates frequency training at a system block diagram level.
  • Fig. 4 illustrates additional aspects of frequency training.
  • Fig. 5 illustrates an exemplary flow diagram of power reallocation according to an embodiment of the invention.
  • Fig. 6 illustrates an exemplary flow diagram for throttling computational units according to frequency sensitivity. - A -
  • a core in PO highest performance state set by the operating system (OS)
  • OS operating system
  • active cores or other computational units may be gaining little or no performance increase from a higher core frequency, while other cores or computational units may be running workloads with a higher sensitivity to an increase in core frequency.
  • Fig. 1 shows a high-level view of an exemplary System on a Chip (SOC) 100 incorporating an embodiment of the invention.
  • the SOC 100 includes multiple CPU processing cores 101, a GPU (Graphics Processing Unit) 103, an I/O Bridge 105 (named South-Bridge in some embodiments) and a North-Bridge 107 (which may be combined with the Memory Controller in some embodiments).
  • the power allocation controller 109 is the functional element that controls allocation of the Thermal Design Point (TDP) power headroom to the on-die or on-platform components.
  • the performance analysis control logic 111 analyzes performance sensitivity of the cores and other computational units as described further herein. Note that while the power allocation control 109 and performance analysis center 111 are shown as being part of the North-Bridge 107, in other embodiments they may be located elsewhere in the SOC 100.
  • a TDP (Thermal Design Point) represents the power that can be consumed by the entire SOC and depends on such factors as the form-factor, available cooling solution, AC adapter/battery, and voltage regulator.
  • the SOC performance is optimized within the current TDP and in an embodiment, the power limit corresponding to the TDP is never exceeded.
  • the SOC power limit is the SOC TDP Limit.
  • SOC characterization is typically based on allocating maximum power for each of the on-die components while staying within the SOC TDP Limit. That occurs by setting the highest operational point (in frequency (F) and voltage (V)) so that even maximally anticipated activity executed at this operational point will not cause the power to exceed the allocated envelope. For example, assume that maximum power of a 4-Core SOC is limited by a 4Ow TDP envelope. Table 1 itemizes the power budget allocated for each of the on-die components:
  • the 8w power budget is a limit that defines the highest nominal operational point (F ,V) of the core and the 5w power budget does the same for the GPU. That allocation, however, is conservative and only a nominal maximum since it assumes simultaneous utilization of all on-die components.
  • Most real-world applications are either CPU or GPU-bounded. Even if an application engages both computing engines (e.g., playback video off-loads some tasks to the processor cores), it does not utilize all 4 processor cores. Even CPU-bounded client applications mostly utilize 1-2 processor cores (1-2 thread workloads) and only a few of them have sufficient parallelism for utilizing all 4 cores for long periods of time.
  • An embodiment provides reallocation of the power from idle or less active components to the busy components by having more power allocated to the busy ones. For example, in a workload sample where 2 out of 4 cores are idle and GPU operates at half power, then the power budget table reflecting this state is shown in Table 2:
  • CoreO and Corel are allocated 16.75w to improve overall CPU throughput.
  • the operational point (F ,V) of both cores may be increased to fill the new power headroom (16.75w instead of 8w).
  • the power budget of only one core can be increased to 25.5w, while the other core can be left with an 8w power budget.
  • the core with the increased power budget may be boosted to an even higher operational point (F,V), so that the new power headroom (25.5w) can be exploited.
  • the decision whether to equally boost two cores or provide all available power headroom to one core is dependent on what is the best way to improve the overall SOC performance.
  • one way to determine how to allocate power between CoreO and Corel to try and achieve improved performance gain is to know which of the two cores, if any, can better exploit an increase in performance capability provided, e.g., by an increase in frequency.
  • Changes in performance capability may also be provided by, e.g., a change in the amount of cache available to a core, the number of pipelines operating in the core and/or the instruction fetch rate.
  • performance sensitivity of each computational unit to frequency change, and/or other change in performance capability also referred to herein as boost sensitivity, is determined and stored on a computational unit basis.
  • a pre-defined low frequency clock signal is applied to the CPU core being analyzed for a predetermined or programmable interval, e.g., a lOOus-lOms interval.
  • the hardware performance analysis control logic samples and averages core instructions per cycle (IPC) (as reported by the core).
  • the performance analysis control logic determines a first instructions per second (IPS) metric based on the IPC x Core frequency (the low frequency or first performance level) as the first performance metric.
  • the IPS metric may be stored in a temporary register "A".
  • the performance analysis control logic causes a pre-defined high frequency clock signal to be applied to the CPU core being analyzed for the same predetermined or programmable time interval.
  • the performance analysis control logic again samples and averages core IPC (as reported by the core) in 207.
  • the performance analysis control logic determines a second instructions per second (IPS) metric based on the IPC x Core frequency (the high frequency or second performance level) and stores the second IPS metric in a temporary register "B" as the second performance metric.
  • the performance analysis control logic determines the numerical difference between A and B in 209 and stores the result as the performance sensitivity in a performance or boost sensitivity table along with the core number being analyzed and the process context number running on the CPU core during the analysis. Note that other changes in performance capability may be utilized instead of, or in conjunction with, frequency changes to determine boost sensitivity.
  • the context number may be determined by the content of the CR3 register or a hash of the CR3 register to allow for a shorter number to be stored.
  • This numerical difference represents the boost sensitivity for the core. That is, it represents the sensitivity of the core, running that particular process context, to a change in frequency. The greater the sensitivity, the more performance increase is to be gained by increasing the frequency.
  • the same training shown in Fig. 2 is applied to each of the processor cores and to any other component that can be boosted (over-clocked) above its nominal maximum power value and the values are stored in the boost sensitivity table.
  • the values in the boost sensitivity table may be sorted in descending order starting with the core or other on-die component with the highest boost sensitivity.
  • frequency sensitivity training is applied to all computational units whose frequency can be changed to implement various performance states, regardless of whether they can be clocked (or overclocked) above a nominal power level. In that way, systems can still allocate power budget to cores (or other computational units) that are more sensitive to frequency change and away from cores that are less sensitive to a change in frequency. In that way, cores or other computational units may have their frequency reduced to save power without a significant performance decrease for the SOC.
  • Fig. 3 illustrates frequency training at a system block diagram level. Training of core 301 is representative of the frequency training for each core.
  • the clock generator 303 as controlled by the performance analysis control logic 111, supplies the high and low frequency clock signals to core 301 during the frequency period.
  • the core 301 supplies the instructions per cycle value to the performance analysis control logic 111, which controls the process in accordance with Fig. 2.
  • Fig. 4 illustrates an instruction per cycle
  • Boost sensitivity table 407 stores for each
  • boost sensitivity expressed, e.g., as Instructions Per Second (IPS) computed via Average IPC x Core Frequency.
  • IPS Instructions Per Second
  • the boost sensitivity table may be storage within the SOC 100 (Fig. 1) or elsewhere in the computer system.
  • the boost sensitivity for each core is tied to the current processor context, which can be approximated by the x86 register value of CR3, tracked by the North-Bridge. In one embodiment, when the context changes, the sensitivity is re-evaluated.
  • the boost sensitivity expires for each context based on a fixed or programmable timer (e.g., after 1-lOOms). In still other embodiments, both a timer and context switch, whichever occurs first, are used to initiate the boost sensitivity reevaluation.
  • Fig. 2 may be implemented in hardware (e.g., state machines in performance analysis control block 111), in firmware (in microcode or a microcontroller), or in software (e.g., a driver, BIOS routine or higher level software).
  • Software may be responsible to kick off the low and high frequency clock signals, receive the IPC values, average the IPC values and perform the other functions described in relation to Fig. 2.
  • the software may be stored in computer readable electronic, optical, magnetic, or other kinds of volatile or non- volatile memory in the computer system of Fig. 1 and executed by one or more of the cores.
  • the frequency sensitivity training illustrated in Fig.
  • software may be responsible for maintaining the boost sensitivity table, reading the CR3 register to determine process context, and maintaining software timers to re-determine boost sensitivity, while the hardware, when notified by the software, applies the clocks with the first and second frequencies for the appropriate time period and determines the average IPC.
  • the software may be responsible for determining the IPS values. Power Budget Reallocation
  • the Boost Sensitivity Table (BST) is maintained as a result of a frequency sensitivity training session for the components to be potentially boosted.
  • BST Boost Sensitivity Table
  • a frequency sensitivity table is maintained as a result of the frequency sensitivity training for all components whose performance can be adjusted, typically through adjusting frequency (and voltage if necessary).
  • power budget reallocation uses the information in the BST to decide which on-die component(s) are the most sensitive to boosting and thus "deserve" to get a higher TDP power margin reallocated when a reallocation takes place.
  • a particular processor core may be in one of N performance states.
  • a performance state is characterized by a unique pair of core voltage and frequency values. The highest performance state is typically selected and characterized so that any anticipated activity will not cause the core power (dynamic + static) to exceed the power budget allocated for the core.
  • the core performance state is defined by the operating system software guided by current core utilization. In other embodiments, the core performance state may be specified by hardware, based on the context currently executed by the core. Table 3 shows performance states for an exemplary system having four performance states (PO, Pl, P2, and P3) that the operating system (OS) (or any other high-level software) may utilize for each core, depending on the core utilization over a time-interval.
  • OS operating system
  • the time- interval in one exemplary operating system ranges from 1msec to 100msec.
  • Two idle states are used when the OS (or any other high-level SW) sets the core to a low C-state.
  • a C-state is a core power state.
  • the core may be placed either in an IDLE state (when it is expected to be idle for a short time) or in a deep C-state.
  • the highest operational point (P-boost) is the one when core power (CoreBoostPwr) exceeds the nominal maximal power budget allocated for that specific core.
  • the GPU Power state is traditionally controlled by software (the graphics driver). In other embodiments, it may also be controlled by hardware tracking the GPU activity and receiving information from other graphic-related engines (Unified Video Decoder (UVD), Display, etc.). In one exemplary embodiment, the GPU may be in one of four power states, as shown in Table 4.
  • only two on-die components may be boosted to a higher performance point.
  • the I/O module and the memory controller may contribute to the boosting process of the cores or the GPU by reallocating their "unused" power budget to these components, but they cannot be boosted themselves.
  • the memory controller may be boosted as well by transitioning the Dynamic Random Access Memory (DRAM) and its own frequency to a higher operational point.
  • DRAM Dynamic Random Access Memory
  • One embodiment to allocate power efficiently to computational units is predicated on permanently tracking the available power headroom, or TDP power margin.
  • SOC TDP Margin SOC TDP Limit -
  • any change in the state of the on-die components triggers an update of the SOC TDP Margin value.
  • the change of state that triggers the update is a change in performance or power state or change in application/workload activity.
  • the change of state triggering the update may be a process context change, or either a process context change or a performance state change.
  • any event resulting in a change in power consumed by the component can function as the change of state triggering event.
  • the power of a particular computational unit is based on the frequency of the clock signal, the supply voltage, and the amount of activity in the computational unit.
  • the average workload activity can be calculated as an average number of signal toggles across the computational unit over the interval, or average IPC over the interval. Power calculations may utilize software methods as well in which the software (e.g., a driver) is aware of the application activity running in the computational unit and determines average power using a similar approach to that described above.
  • software e.g., a driver
  • only a core residing in a PO-state and the GPU residing in GPU PO-state can be reallocated power from the other on-die components and boosted to a higher performance point. That is based on the observation that a core being in the PO-state or a GPU being in GPU PO-state are essentially hints (provided by the OS or some high-level SW such as the graphics driver) that the currently executed task is computationally bounded.
  • the core and/or the GPU may be boosted when they reside in other non-idle states.
  • Fig. 5 illustrates an exemplary flow diagram of operation of an embodiment of the power allocation controller 109 (Fig. 1) to allocate power.
  • the power allocation controller waits for a state change for any of the on-die components, e.g., a performance state, application/activity change, or process context change.
  • a state change occurs, the TDP SOC Margin is tracked in 503 and a determination is made in 505 whether the margin is greater than 0. If it is not, the flow goes to 501. If the margin is greater than zero, meaning that there is headroom to boost one or more cores, a check is made to see if any CPU core is in the PO state in 507. In this particular embodiment, only cores in PO can be boosted.
  • the New TDP SOC Margin is the predicted margin value assuming all cores in PO are boosted.
  • TDP SOC Margin is the current margin value.
  • CoreBoostPwr is the core power when boosted and Core Pwr is the current core power in the PO state.
  • the power allocation controller checks in 511 if that new margin is greater than zero. If so, there is sufficient headroom to boost all PO cores, and that is done in 515 and the TDP SOC Margin is updated. The flow then returns to 501 to await another state change.
  • the flow goes to 517 to find some margin if possible.
  • Those cores with the highest sensitivity are identified. That may be done, e.g., by accessing the boost sensitivity table provided by the boost sensitivity training discussed above.
  • the cores in the PO state are ordered, e.g., in decreasing order of boost sensitivity. Thus, those at the bottom are least sensitive to a frequency increase.
  • the power allocation controller removes a core with the lowest boost sensitivity from the list and re-calculates the New TDP SOC Margin as in 509 for all cores still on the list.
  • all cores having a boost sensitivity below a predetermined or programmable threshold are removed from the list at the same time.
  • the rationale for that is to not waste power by boosting cores whose performance will not be increased.
  • the New TDP SOC Margin is > 0, those PO cores still on the list are transitioned to P-boost and the TDP SOC Margin is updated.
  • the power allocation controller checks to see if the GPU is in the GPU PO state. If not, the flow returns to 501 to await a state change.
  • the power allocation controller determines if there is sufficient headroom to boost the GPU in 525 by calculating a New TDP SOC Margin by subtracting the difference between boosted and current power for the GPU from the current TDP SOC Margin. In 527, the power allocation controller checks to see if the new margin is greater than zero, and if so, transitions the GPU to its boosted state and updates the TDP SOC Margin and returns to 503 to await another state change in any of the components. If there is not sufficient margin, the flow returns to 503.
  • the frequency boost is only provided, e.g., to those computational units with a sufficiently high boost sensitivity, e.g., above a predetermined or programmable threshold, to warrant the extra power. In that way, increased performance can be provided while still trying to maintain reduced power consumption where possible.
  • the functionality in Fig. 5 may be implemented in hardware (e.g., state machines), in firmware (in microcode or a microcontroller), or in software (e.g., a driver, BIOS routine or higher level software), or any appropriate combination of hardware and software to allocate power based on boost sensitivity.
  • Fig. 5 software may be notified of a change in state of any component and implement the approach described in relation to Fig. 5.
  • the software may be stored in computer readable electronic, optical, or magnetic volatile or nonvolatile memory in the computer system of Fig. 1 and executed by one or more of the cores.
  • the functionality of Fig. 5 is implemented partly in hardware and partly in software according to the needs and capabilities of the particular system.
  • boost sensitivity information can be utilized in various ways by the SOC.
  • Central processing unit (CPU) throttling is one example of such utilization. Assume a GPU- bounded application is being executed. That is, the application being executed on the GPU is limited by the performance of the GPU, because, e.g., a current performance state is lower than needed for the particular application. In that case, the CPU cores may be throttled
  • a GPU-bounded or CPU-bounded application is identified based data indicating how busy a particular core or GPU is.
  • a CPU-bounded (or compute-bounded) application an application limited by the performance of one or more processing cores
  • a GPU-bounded application is an application limited by performance of the GPU.
  • Fig. 6 shows a high level flow diagram of performance throttling based on boost sensitivity information. In 601, CPU-bounded or GPU-bounded applications are identified.
  • the stored boost or performance sensitivity information is reviewed and in 605, a subset of computational units, e.g., processing cores, are identified to throttle based on the subset of the cores being less sensitive in terms of performance to a reduction in performance capability, e.g., a reduction in frequency, voltage, the amount of cache available to a core, the number of pipelines operating in the core, and/or the instruction fetch rate.
  • a reduction in performance capability e.g., a reduction in frequency, voltage, the amount of cache available to a core, the number of pipelines operating in the core, and/or the instruction fetch rate.
  • the performance of the subset is limited and the power headroom made available through throttling is provided in 609 to the computational unit(s) executing the CPU-bounded and/or GPU-bounded application.
  • the functionality described in Fig. 6 may be implemented in the power allocation controller 109 or in high-level software or utilizing both hardware and software.
  • the GPU may be throttled by either forcing a GPU P-state limit lower than GPU PwrO or by throttling its instruction/memory traffic stream. If the throttled GPU power is equivalent to GPU_Pwr2, then the extra power margin, GPU PwrO - GPU_Pwr2, can be reallocated for boosting one or more of the CPU cores, depending on the boost sensitivity table values.
  • memory may also be throttled.
  • One way is to stall every other access to DRAM by a number of cycles, thus reducing the dynamic part of DRAM I/O and DRAM DIMM power by a factor close to 2.
  • Another approach may involve shutting down a number of the available memory channels, also releasing a given percentage of the DRAM I/O and DRAM DIMM power.
  • Reduced DRAM I/O power may be reallocated to either the GPU or CPU cores depending on the utilization of these components and the BST values (as far as the CPU cores are concerned), thus leading to higher overall SOC performance throughput.
  • DIMM may not be part of the SOC in which case its power budget is not part of SOC TDP. However, in circumstances where the reduced DRAM DIMM power margin can be reallocated back to the SOC TDP, the extra margin can be used to boost the GPU or some of the CPU cores. While circuits and physical structures are generally presumed for some embodiments of the invention, it is well recognized that in modern semiconductor design and fabrication, physical structures and circuits may be embodied in computer-readable descriptive form suitable for use in subsequent design, test or fabrication stages. Structures and functionality presented as discrete components in the exemplary configurations may be implemented as a combined structure or component.
  • a computer-readable medium includes at least disk, tape, or other magnetic, optical, semiconductor (e.g., flash memory cards, ROM), or electronic medium.
  • computational units may be part of a multi-core processor, in other embodiments, the computational units are in separate integrated circuits that may be packaged together or separately.
  • a graphical processing unit (GPU) and processor may be separate integrated circuits packaged together or separately. Variations and modifications of the embodiments disclosed herein may be made based on the description set forth herein without departing from the scope of the invention as set forth in the following claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)
  • Saccharide Compounds (AREA)
  • Steroid Compounds (AREA)
PCT/US2010/043029 2009-07-24 2010-07-23 Determining performance sensitivities of computational units WO2011011668A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US12/508,935 US8443209B2 (en) 2009-07-24 2009-07-24 Throttling computational units according to performance sensitivity
US12/508,929 2009-07-24
US12/508,902 2009-07-24
US12/508,902 US20110022356A1 (en) 2009-07-24 2009-07-24 Determining performance sensitivities of computational units
US12/508,929 US8447994B2 (en) 2009-07-24 2009-07-24 Altering performance of computational units heterogeneously according to performance sensitivity
US12/508,935 2009-07-24

Publications (1)

Publication Number Publication Date
WO2011011668A1 true WO2011011668A1 (en) 2011-01-27

Family

ID=42953822

Family Applications (3)

Application Number Title Priority Date Filing Date
PCT/US2010/043032 WO2011011670A1 (en) 2009-07-24 2010-07-23 Altering performance of computational units heterogeneously according to performance sensitivity
PCT/US2010/043035 WO2011011673A1 (en) 2009-07-24 2010-07-23 Throttling computational units according to performance sensitivity
PCT/US2010/043029 WO2011011668A1 (en) 2009-07-24 2010-07-23 Determining performance sensitivities of computational units

Family Applications Before (2)

Application Number Title Priority Date Filing Date
PCT/US2010/043032 WO2011011670A1 (en) 2009-07-24 2010-07-23 Altering performance of computational units heterogeneously according to performance sensitivity
PCT/US2010/043035 WO2011011673A1 (en) 2009-07-24 2010-07-23 Throttling computational units according to performance sensitivity

Country Status (6)

Country Link
EP (1) EP2457139A1 (ja)
JP (1) JP5564564B2 (ja)
KR (1) KR20120046232A (ja)
CN (1) CN102483646B (ja)
IN (1) IN2012DN00933A (ja)
WO (3) WO2011011670A1 (ja)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012158728A3 (en) * 2011-05-16 2013-03-14 Advanced Micro Devices, Inc. Adjusting the clock frequency of a processing unit in real-time based on a frequency sensitivity value
US9959919B2 (en) 2015-03-20 2018-05-01 Kabushiki Kaisha Toshiba Memory system including non-volatile memory of which access speed is electrically controlled
CN108139960A (zh) * 2016-04-18 2018-06-08 华为技术有限公司 中央处理器cpu的调频方法、调频装置和处理设备
WO2019245558A1 (en) * 2018-06-21 2019-12-26 Hewlett-Packard Development Company, L.P. Increasing cpu clock speed to improve system performance
WO2021021185A1 (en) * 2019-07-31 2021-02-04 Hewlett-Packard Development Company, L.P. Configuring power level of central processing units at boot time

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5601236B2 (ja) * 2011-02-10 2014-10-08 富士通株式会社 情報抽出プログラム、情報抽出方法、および情報抽出装置
JP5958395B2 (ja) * 2013-03-22 2016-08-02 日本電気株式会社 コンピュータシステム
US9703613B2 (en) * 2013-12-20 2017-07-11 Qualcomm Incorporated Multi-core dynamic workload management using native and dynamic parameters
US9348380B2 (en) * 2013-12-28 2016-05-24 Samsung Electronics Co., Ltd. Dynamic thermal budget allocation for memory array
JP5986138B2 (ja) * 2014-05-09 2016-09-06 レノボ・シンガポール・プライベート・リミテッド 複数のプロセッサに電力を供給する電源装置の出力を制御する方法、電源システムおよび情報処理装置
US20160077576A1 (en) * 2014-09-17 2016-03-17 Abhinav R. Karhu Technologies for collaborative hardware and software scenario-based power management
US9882383B2 (en) * 2014-12-23 2018-01-30 Intel Corporation Smart power delivery network
US9572104B2 (en) 2015-02-25 2017-02-14 Microsoft Technology Licensing, Llc Dynamic adjustment of user experience based on system capabilities
US10474221B2 (en) * 2018-01-30 2019-11-12 Hewlett Packard Enterprise Development Lp Power control in a storage subsystem
CN110442224A (zh) * 2019-09-17 2019-11-12 联想(北京)有限公司 电子设备的电源功率分配方法、电子设备和可读存储介质
KR102103842B1 (ko) * 2019-10-02 2020-05-29 한화시스템 주식회사 차세대 함정 전투체계의 트래픽 모델링 장치
CN114816033A (zh) * 2019-10-17 2022-07-29 华为技术有限公司 处理器的调频方法及装置、计算设备
KR102275529B1 (ko) * 2019-12-23 2021-07-09 주식회사 텔레칩스 멀티-마스터를 지원하는 그래픽 처리 장치를 공유하는 시스템 온 칩 및 그래픽 처리 장치의 동작 방법
CN113078933B (zh) * 2020-01-03 2023-01-24 内蒙古龙图电气有限公司 一种基于北斗卫星通信的组网式终端控制器
US11971774B2 (en) * 2020-10-13 2024-04-30 Nvidia Corporation Programmable power balancing in a datacenter
CN116521355A (zh) * 2022-01-30 2023-08-01 台达电子企业管理(上海)有限公司 提升处理器峰值算力的方法、用于提升处理器峰值算力的系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020087904A1 (en) * 2000-12-28 2002-07-04 Zhong-Ning (George) Cai Method and apparatus for thermal sensitivity based dynamic power control
WO2007056705A2 (en) * 2005-11-03 2007-05-18 Los Alamos National Security Adaptive real-time methodology for optimizing energy-efficient computing
US20080307240A1 (en) * 2007-06-08 2008-12-11 Texas Instruments Incorporated Power management electronic circuits, systems, and methods and processes of manufacture

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10268963A (ja) * 1997-03-28 1998-10-09 Mitsubishi Electric Corp 情報処理装置
US7386739B2 (en) * 2005-05-03 2008-06-10 International Business Machines Corporation Scheduling processor voltages and frequencies based on performance prediction and power constraints
US7490254B2 (en) * 2005-08-02 2009-02-10 Advanced Micro Devices, Inc. Increasing workload performance of one or more cores on multiple core processors
US7412353B2 (en) * 2005-09-28 2008-08-12 Intel Corporation Reliable computing with a many-core processor
CN101241392B (zh) * 2007-03-01 2012-07-04 威盛电子股份有限公司 根据工作温度的变化来动态改变功耗的微处理器及方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020087904A1 (en) * 2000-12-28 2002-07-04 Zhong-Ning (George) Cai Method and apparatus for thermal sensitivity based dynamic power control
WO2007056705A2 (en) * 2005-11-03 2007-05-18 Los Alamos National Security Adaptive real-time methodology for optimizing energy-efficient computing
US20080307240A1 (en) * 2007-06-08 2008-12-11 Texas Instruments Incorporated Power management electronic circuits, systems, and methods and processes of manufacture

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012158728A3 (en) * 2011-05-16 2013-03-14 Advanced Micro Devices, Inc. Adjusting the clock frequency of a processing unit in real-time based on a frequency sensitivity value
US9959919B2 (en) 2015-03-20 2018-05-01 Kabushiki Kaisha Toshiba Memory system including non-volatile memory of which access speed is electrically controlled
CN108139960A (zh) * 2016-04-18 2018-06-08 华为技术有限公司 中央处理器cpu的调频方法、调频装置和处理设备
CN108139960B (zh) * 2016-04-18 2020-07-07 华为技术有限公司 中央处理器cpu的调频方法、调频装置和处理设备
WO2019245558A1 (en) * 2018-06-21 2019-12-26 Hewlett-Packard Development Company, L.P. Increasing cpu clock speed to improve system performance
US11379337B2 (en) 2018-06-21 2022-07-05 Hewlett-Packard Development Company, L.P. Increasing CPU clock speed to improve system performance
TWI771593B (zh) * 2018-06-21 2022-07-21 美商惠普發展公司有限責任合夥企業 自動超頻系統及方法及與其相關之機器可讀媒體
WO2021021185A1 (en) * 2019-07-31 2021-02-04 Hewlett-Packard Development Company, L.P. Configuring power level of central processing units at boot time
US11630500B2 (en) 2019-07-31 2023-04-18 Hewlett-Packard Development Company, L.P. Configuring power level of central processing units at boot time

Also Published As

Publication number Publication date
EP2457139A1 (en) 2012-05-30
KR20120046232A (ko) 2012-05-09
WO2011011670A1 (en) 2011-01-27
JP2013500520A (ja) 2013-01-07
CN102483646B (zh) 2015-06-03
WO2011011673A1 (en) 2011-01-27
JP5564564B2 (ja) 2014-07-30
CN102483646A (zh) 2012-05-30
IN2012DN00933A (ja) 2015-04-03

Similar Documents

Publication Publication Date Title
US8443209B2 (en) Throttling computational units according to performance sensitivity
US8447994B2 (en) Altering performance of computational units heterogeneously according to performance sensitivity
WO2011011668A1 (en) Determining performance sensitivities of computational units
US20110022356A1 (en) Determining performance sensitivities of computational units
US11009938B1 (en) Power management for a graphics processing unit or other circuit
KR101429299B1 (ko) 컴퓨터에 이용가능한 전력량에 기초하여 전력을 보존하는 컴퓨터 구현 방법, 컴퓨터상의 애플리케이션 프로그램들과 대화할 수 있는 기간을 연장하기 위한 소프트웨어 시스템, 및 컴퓨터 판독가능 매체
US8904205B2 (en) Increasing power efficiency of turbo mode operation in a processor
US9354689B2 (en) Providing energy efficient turbo operation of a processor
KR101476568B1 (ko) 코어 마다의 전압 및 주파수 제어 제공
US8892916B2 (en) Dynamic core pool management
CN101379453B (zh) 使用动态工作负载特征来控制cpu频率和电压调节的方法和装置
US7490256B2 (en) Identifying a target processor idle state
US20090320031A1 (en) Power state-aware thread scheduling mechanism
EP3237998B1 (en) Systems and methods for dynamic temporal power steering
US11138037B2 (en) Switch policy for hybrid scheduling in multi-processor systems
EP2972826B1 (en) Multi-core binary translation task processing
US20140025967A1 (en) Techniques for reducing processor power consumption through dynamic processor resource allocation
Singh et al. Thermal aware power save policy for hot and cold jobs
Islam et al. Learning based power management for periodic real-time tasks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10737217

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10737217

Country of ref document: EP

Kind code of ref document: A1