WO2021056418A1 - Power management method and apparatus - Google Patents

Power management method and apparatus Download PDF

Info

Publication number
WO2021056418A1
WO2021056418A1 PCT/CN2019/108559 CN2019108559W WO2021056418A1 WO 2021056418 A1 WO2021056418 A1 WO 2021056418A1 CN 2019108559 W CN2019108559 W CN 2019108559W WO 2021056418 A1 WO2021056418 A1 WO 2021056418A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing system
performance
processing components
group
performance level
Prior art date
Application number
PCT/CN2019/108559
Other languages
French (fr)
Inventor
Jun Song
Lin Cheng
Yijun Lu
Youquan FENG
Hao Zhu
Xuegang ZHANG
Lingfang HE
Guan Wang
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to CN201980099699.2A priority Critical patent/CN114391128A/en
Priority to PCT/CN2019/108559 priority patent/WO2021056418A1/en
Publication of WO2021056418A1 publication Critical patent/WO2021056418A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of power management and, more particularly, relates to methods and apparatuses for power management of a computing system.
  • Power capping is a widely used technology in modern data centers (DCs) to increase on-rack compute density and avoid power outage.
  • Most power capping technologies employ a proportional–integral–derivative (PID) control method to quickly find a set of parameters of hardware control knobs, e.g. operating frequencies of processing components, so that the system can reach a target power level.
  • PID controllers cause frequency overshooting and/or frequency undershooting during power capping, where the frequency undershooting could be very dangerous to cloud applications. If the frequencies of processing components are too low, a huge number of requests may not be processed in time.
  • an ill-designed back-pressure mechanism might spread the problem to upstream servers, lowering the overall performance of the system or even causing requests loss. Thus, performance downgrade occurs and may be disastrous to a cloud platform.
  • Hardware vendors such as Original Design Manufacturer (ODM) and Original Equipment Manufacturer (OEM) support power capping capability in the silicon or firmware. However, such capability does not come with minimal frequency protection. In other words, the performance level might be unacceptable during power capping procedure.
  • ODM Original Design Manufacturer
  • OEM Original Equipment Manufacturer
  • a power management component performs power capping on the computing system and dynamically monitors the performance level of the computing system.
  • the power management component suspends the power capping process when monitoring that the performance level of the computing system is lower than a threshold.
  • an acceptable performance level of the computing system can be ensured where the performance downgrade of the computing system during power capping is under control or acceptable to the customer.
  • the acceptable performance level of the computing system and the performance downgrade of the computing system acceptable to the customer can be determined based on a Service Level Agreement (SLA) between the service provider and the customer. Therefore, the power management of the computing system is improved.
  • SLA Service Level Agreement
  • FIGs. 1A and 1B illustrate example block diagrams of a scenario where performance downgrade occurs during power capping.
  • FIG. 2 illustrates an example flowchart of a process of the power management of a computing system.
  • FIG. 3 illustrates an example block diagram of a mechanism of the power management of a computing system.
  • FIG. 4 illustrates an example flowchart of a process for calculating the average value of frequencies of the processing components in the Performance Critical Zone in the computing system.
  • FIG. 5 illustrates an example block diagram of an apparatus for implementing the processes and methods described above.
  • Power capping refers to the practice of limiting how much power a computing system can consume.
  • Power cap refers to the power limit that the power consumption of the computing system cannot exceed.
  • Nominal frequency refers to the guaranteed highest frequency within a Thermal Design Power (TDP) envelope.
  • PID controller refers to a proportional–integral–derivative controller (or three-term controller) , which is a control loop feedback mechanism widely used in industrial control systems and a variety of other applications requiring continuously modulated control.
  • FIGs. 1A and 1B illustrate example block diagrams of scenario 100 where performance downgrade occurs during power capping.
  • FIG. 1A at block 102, there are 16 instances A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, and P, where the data generation and consumption are tightly connected among the instances.
  • the arrows between the instances represent the dependencies between the instances. For example, after instance A is performed, instance B is performed, and so on. The output of instance A is provided to instance F, and the output of instance B is provided to instance E, and so on.
  • the 16 instances are mapped to 12 nodes on 4 racks.
  • instance J runs on rack_3 node_5 at 2.5 GHz+.
  • Instance K runs on rack_4, node_7 at 2.5 GHz+.
  • Instances A through I run on rack_11, node_6 through node_10 at 2.5 GHz.
  • Instances L through P run on rack_15, node_11 through node_15 at 2.5 GHz.
  • the total power drawn by the 12 nodes at the 4 racks is capped to not exceed a power limit/cap.
  • the power limit/cap may be set and/or adjusted dynamically based on actual needs.
  • the frequencies of the nodes are dynamically adjusted to satisfy the power limit/cap.
  • rack 3 runs in a failsafe mode (S5) due to the power capping.
  • the failsafe mode is a design feature or practice that in the event of failure, the rack/node runs in a minimum responsive way (almost suspended) .
  • Rack 4 runs at 1.2 GHz.
  • Rack 11 runs at 2.0 GHz. Instances L through P at rack 15, node 11 through 15, run at 2.0 GHz.
  • the frequencies of the 12 nodes at the 4 racks are lowered due to the power capping.
  • block 108 shows the consequences at a time T.
  • a buffer running on node_5 at rack_3 is blown (stops responding) because of the low frequency due to the power capping.
  • instance J running on node_5 at rack_3 is crashed (stops responding) .
  • Block 110 shows the consequences at a time T+1. Because the buffer blowing problem is contagious, after one node fails to process requests in time, another node will do so. For example, another instance K also stops responding.
  • Block 112 shows the consequences at a time T+2, where four other nodes are severely impacted. Instances A, C, F, and G respond slowly or even stop responding. As a result, the performance downgrade can be observed. In that case, an avalanche occurs. The service stops, and the customers are surprised.
  • FIG. 2 illustrates an example flowchart of a process 200 of the power management of a computing system.
  • a power management component dynamically monitors a performance level of a computing system while a power capping process is performed on the computing system.
  • the computing system includes a plurality of processing components.
  • the computing system includes one or more nodes in a distributed system.
  • the distributed system may be a large-scale distributed computing system.
  • the power management component is a component that monitors and manages the power consumption of the computing system.
  • the power management component may be implemented with software, hardware, firmware, or any combination thereof.
  • the power management component performs the power capping process on the computing system such that the power consumption of the computing system does not exceed a power limit/cap.
  • the power limit/cap may be set and/or adjusted dynamically based on actual needs.
  • the power management component obtains a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system.
  • a respective performance parameter in the plurality of performance parameters is associated with a respective processing component in the group of processing components.
  • the power management component calculates the performance level based on the plurality of performance parameters.
  • the performance level comprises an average value of the plurality of performance parameters. In some embodiments, the average value is an instantaneous value.
  • the plurality of performance parameters of the group of processing components includes a plurality of frequencies of the group of processing components.
  • the power management component suspends/stops the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  • the power management component obtains a plurality of frequencies of a group of processing components in the computing system.
  • the power management component calculates the performance level based on the plurality of frequencies of the group of processing components in the computing system.
  • the performance of the group of processing components has a significant impact on the overall performance of the computing system.
  • the group of processing components runs one or more instances that require a latency under a first threshold, for example, 1 ⁇ s, 5 ⁇ s, and so on.
  • the group of processing components runs one or more instances that require an instruction execution rate above a second threshold, for example, 2 billion instructions per second, 5 billion instructions per second, and so on.
  • the first and second thresholds may be set and/or adjusted dynamically based on actual needs.
  • the group of processing components is not fixed, and the processing components in the group can be changed dynamically.
  • the threshold represents the minimal/lowest performance level of the computing system at which the downgrade of the performance level of the computing system during the power capping process is under control or acceptable to the customers.
  • the minimal/lowest performance level of the computing system and the performance downgrade acceptable to the customer can be determined based on the SLA between the service provider and the customer.
  • the threshold may be set and/or adjusted dynamically based on actual needs.
  • the power management component automatically resumes the power capping process after suspending the power capping process for a period of time, for example, 50ms, 2s, 1min, and so on. In some embodiments, the power management component resumes the power capping process when conditions are met, for example, when the power consumption of the computing system is over an upper limit.
  • the upper limit may be set and/or adjusted dynamically based on actual needs.
  • the performance level of the computing system is dynamically monitored during power capping.
  • the power capping is suspended/stopped such that further performance downgrade of the computing system can be prevented.
  • an acceptable performance level of the computing system is ensured at which the performance downgrade of the computing system during power capping is under control or acceptable to the customer. Therefore, the power management of the computing system is improved.
  • FIG. 3 illustrates an example block diagram of a mechanism 300 of the power management of a computing system.
  • a power management component 302 creates a power capping thread 304 and a performance monitoring thread 306.
  • the power management component 302 performs power capping on the computing system (not shown) to ensure the power consumption of the computing system to not exceed the power limit/cap.
  • the power limit/cap may be set and/or adjusted dynamically based on actual needs.
  • the power management component 302 monitors the performance of the computing system.
  • Performance Critical Zone is defined herein as a group of processing components whose performance is critical to or has a significant impact on the overall performance of the computing system.
  • the Performance Critical Zone is not fixed, and the processing components in the Performance Critical Zone can be changed dynamically.
  • the computing system includes N processing components, for example, N CPUs, where N is a positive integer.
  • N CPUs there are M CPUs in the Performance Critical Zone, where M is a positive integer.
  • M CPUs in the Performance Critical Zone run latency sensitive workloads/instances.
  • the latency sensitive workloads/instances require a latency under a first threshold, for example, 10 ⁇ s, 5 ⁇ s, and so on.
  • M CPUs in the Performance Critical Zone run throughput sensitive workloads/instances.
  • the throughput sensitive workloads/instances require an instruction execution rate above a second threshold, for example, 2 billion instructions per second, 5 billion instructions per second, and so on.
  • the first and second thresholds may be set and/or adjusted dynamically.
  • the power management component 302 obtains the performance level of the computing system dynamically.
  • the power management component 302 obtains the performance level of the computing system periodically, for example, every X milliseconds, where X may be set and/or adjusted based on actual needs. In some embodiments, X may be dozens to hundreds of milliseconds.
  • the power management component 302 obtains the performance parameters of the processing components in the Performance Critical Zone and calculates the performance level.
  • the power management component 302 may collect telemetry data using performance monitoring units (PMUs) based mechanism to assist performance downgradation sensing. Telemetry is the automatic recording and transmission of data from remote or inaccessible sources to an IT system in a different location for monitoring and analysis.
  • the performance parameters are frequencies of the processing components.
  • the power management component 302 reads Mode Specific Registers (MSRs) including APERF and MPERF through dev/CPUx/MSR interface in each of the M CUPs in the Performance Critical Zone.
  • MSRs Mode Specific Registers
  • the power management component 302 reads APERF 0 and MPERF 0 in CPU 0 , APERF 1 and MPERF 1 in CPU 1 , ..., and APERF M-1 and MPERF M-1 in CPU M-1 .
  • the power management component 302 calculates an average value F avg of frequencies of M CPUs in the Performance Critical Zone based on the reading results.
  • the average value F avg is an instantaneous value. Details regarding the algorithm of calculating the average value F avg are described hereinafter with reference to FIG. 4.
  • the power management component 302 determines whether the performance level is more than or equal to a threshold. In implementations, the power management component 302 determines whether the average value F avg is more than or equal to a threshold frequency F min .
  • the threshold frequency F min represents a minimal/lowest frequency at which the performance level of the computing system is acceptable and the performance downgrade of the computing system during power capping is under control or acceptable to the customer.
  • the threshold frequency F min can be determined based on the customer’ s requirement, machine learning, empirical value, experimental data, etc.
  • the threshold frequency F min may be set and/or adjusted dynamically.
  • the performance lever and performance downgrade of the computing system that are acceptable to the customer can be determined based on the SLA between the service provider and the customer.
  • the power management component 302 determines that the performance level is greater than or equal to the threshold at block 312, the power management component 302 waits until the next period at block 314. In implementations, if the power management component 302 determines that the average value F avg is greater than or equal to the threshold frequency F min , the power management component 302 waits until the next period.
  • the power management component 302 determines that the performance level is not greater than or equal to the threshold at block 312, the power management component 302 suspends/stops the power capping thread 304 at block 316. In implementations, if the power management component 302 determines that the average value F avg is lower than the threshold frequency F min , the power management component 302 suspends/stops the power capping thread 304, and returns to the performance monitoring thread 306 immediately at block 316.
  • the power management component 302 determines that the performance level is lower than the threshold, the power management component indicates a scheduler to migrate instances running on the computing system to other computing systems (not shown) as soon as possible. However, such migration may impose pressure on the scheduler and some instances may not be migrated.
  • the power management component 302 obtains the performance level of the computing system again as described above with reference to blocks 308 to 310, and details are not repeated here.
  • the power management component 302 automatically resumes the power capping process after suspending the power capping process for a period of time, for example, 50ms, 2s, 1min, and so on. In some embodiments, the power management component 302 resumes the power capping process when a condition is met, for example, when the power consumption of the computing system is over an upper limit.
  • the upper limit may be set and/or adjusted dynamically based on actual needs.
  • the performance level of the computing system is dynamically monitored during power capping.
  • the power capping is suspended/stopped such that further performance downgrade of the computing system can be prevented.
  • the acceptable performance level of the computing system is guaranteed at which the performance downgrade of the computing system during power capping is under control or acceptable to the customer. Therefore, the power management of the computing system is improved.
  • FIG. 4 illustrates an example flowchart of a process 400 for calculating the average value of frequencies of the processing components in the Performance Critical Zone in the computing system.
  • the computing system includes N CPUs, where N is a positive integer.
  • the power management component reads a nominal frequency F*of CPU i , where i is a positive integer.
  • the nominal frequency F* can be determined based on a specification, empirical value, experimental data, etc.
  • the power management component determines whether CPU i is in the Performance Critical Zone (PCZ) .
  • the power management component determines that CPU i is in the PCZ at block 404, the power management component reads Mode Specific Registers, including APERF and MPERF, of CPU i , where the reading results are referred as APERF i and MPERF i at block 406. If the power management component determines that CPU i is not in the PCZ at block 404, the process 400 proceeds to block 412.
  • the power management component calculates a first change value delta_APERF i , which is the change value between a current APERF i and a previous APERF prv_i according to the following formula (1) , and a second change value delta_MPERF i which is the change value between a current MPERF i and a previous MPERF prv_i according to the following formula (2):
  • the power management component calculates an average frequency F avg_i of CPU i according to the following formula (3) .
  • the average frequency F avg_i is an instantaneous value.
  • F avg_i F* (delta_APERF/delta_MPERF) (3)
  • the power management component increases i by 1.
  • the power management component determines whether i is greater than the total number of CPUs in the computing system, that is, whether i is greater than N.
  • the power management component determines that i is greater than N at block 414, the power management component calculates the average value F avg of frequencies of all CPUs in the PCZ at block 416. If i is determined not to be greater than N at block 414, the process 400 goes back to block 404.
  • FIG. 5 illustrates an example block diagram of an apparatus 500 for implementing the processes and methods described above.
  • the apparatus 500 includes one or more processors 502 and memory 504 communicatively coupled to the processor (s) 502.
  • the processor (s) 502 executes one or more modules and/or processes to cause the processor (s) 502 to perform a variety of functions.
  • the processor (s) 502 may include a central processing unit (CPU) , a graphics processing unit (GPU) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 502 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
  • the memory 504 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof.
  • the apparatus 500 may additionally include an input/output (I/O) interface 506 for receiving and outputting data.
  • the apparatus 500 may also include a communication module 508 allowing the apparatus 500 to communicate with other devices (not shown) over a network (not shown) .
  • the network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF) , infrared, and other wireless media.
  • RF radio frequency
  • the memory 504 may include one or more computer-executable modules (modules) that are executable by the processor (s) 502.
  • the memory 504 may include, but not limited to, a monitoring module 510 and a suspending module 512.
  • the monitoring module 510 is configured to dynamically monitor a performance level of a computing system while a power capping process is performed on the computing system.
  • the computing system including a plurality of processing components.
  • the computing system comprises one or more nodes in a distributed system.
  • the distributed system may be a large-scale distributed computing system.
  • the monitoring module 510 is further configured to obtain a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, and calculate the performance level based on the plurality of performance parameters.
  • a respective performance parameter in the plurality of performance parameters is associated with a respective processing component in the group of processing components.
  • the performance level comprises an average value of the plurality of performance parameters. In some embodiments, the average value is an instantaneous value.
  • the plurality of performance parameters of the group of processing components includes a plurality of frequencies of the group of processing components.
  • the suspending module 512 is configured to suspend/stop the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  • the suspending module 512 is further configured to determine whether the average value is lower than a threshold frequency, and suspend/stop the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  • the performance of the group of processing components or has a significant impact on the performance level of the computing system.
  • the group of processing components runs one or more instances that require a latency under a first threshold, for example, 10 ⁇ s, 5 ⁇ s, and so on.
  • the group of processing components runs one or more instances that require an instruction execution rate above a second threshold, for example, 2 billion instructions per second, 5 billion instructions per second, and so on.
  • the first and second thresholds may be set and/or adjusted dynamically based on actual needs.
  • the group of processing components is not fixed, and the processing components in the group can be changed dynamically.
  • the threshold represents the minimal/lowest performance level of the computing system at which the performance downgrade of the computing system during the power capping process is under control or acceptable to the customers.
  • the minimal/lowest performance level of the computing system and the performance downgrade acceptable to the customer can be determined based on the SLA between the service provider and the customer.
  • the threshold may be set and/or adjusted dynamically based on actual needs.
  • the performance level of the computing system is dynamically monitored during power capping.
  • the power capping is suspended/stopped such that further performance downgrade of the computing system can be prevented.
  • the acceptable performance level of the computing system is guaranteed at which the performance downgrade of the computing system during power capping is under control or acceptable to the customer. Therefore, the power management of the computing system is improved.
  • Processes and systems discussed herein may be implemented in, but not limited to, distributed computing environment, parallel computing environment, cluster computing environment, grid computing environment, cloud computing environment, electrical vehicles, power facilities, etc.
  • Computer-readable instructions include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like.
  • Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
  • the computer-readable storage media may include volatile memory (such as random access memory (RAM) ) and/or non-volatile memory (such as read-only memory (ROM) , flash memory, etc. ) .
  • volatile memory such as random access memory (RAM)
  • non-volatile memory such as read-only memory (ROM) , flash memory, etc.
  • the computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
  • a non-transient computer-readable storage medium is an example of computer-readable media.
  • Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media.
  • Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Computer-readable storage media includes, but is not limited to, phase change memory (PRAM) , static random-access memory (SRAM) , dynamic random-access memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read-only memory (EEPROM) , flash memory or other memory technology, compact disk read-only memory (CD-ROM) , digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
  • communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer-readable storage media do not include communication media.
  • the computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGs. 1-5.
  • computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
  • a method comprising: dynamically monitoring a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  • Clause 4 The method of clause 3, wherein the performance level comprises an average value of the plurality of performance parameters.
  • suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than the threshold comprises: determining whether the average value is lower than a threshold frequency; and suspending the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  • Clause 6 The method of clause 3, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
  • Clause 7 The method of clause 6, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
  • Clause 8 The method of clause 6, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
  • Clause 9 The method of clause 3, wherein obtaining a plurality of performance parameters of the group of processing components comprises: reading a plurality of registers of the group of processing component to obtain the plurality of performance parameters.
  • Clause 10 The method of clause 3, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
  • Clause 11 The method of clause 1, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
  • Clause 12 The method of clause 1, wherein the computing system comprises one or more nodes in a distributed system.
  • a computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: dynamically monitoring a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  • Clause 14 The computer-readable storage medium of clause 13, wherein dynamically monitoring the performance level of the computing system comprises obtaining the performance level of the computing system periodically.
  • Clause 16 The computer-readable storage medium of clause 15, wherein the performance level comprises an average value of the plurality of performance parameters.
  • suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than the threshold comprises: determining whether the average value is lower than a threshold frequency; and suspending the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  • Clause 18 The computer-readable storage medium of clause 15, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
  • Clause 19 The computer-readable storage medium of clause 18, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
  • Clause 20 The computer-readable storage medium of clause 18, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
  • Clause 21 The computer-readable storage medium of clause 15, wherein obtaining a plurality of performance parameters of the group of processing components comprises: reading a plurality of registers of the group of processing component to obtain the plurality of performance parameters.
  • Clause 22 The computer-readable storage medium of clause 15, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
  • Clause 23 The computer-readable storage medium of clause 13, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
  • Clause 24 The computer-readable storage medium of clause 13, wherein the computing system comprises one or more nodes in a distributed system.
  • An apparatus comprising: one or more processors; and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors, the computer-executable modules including: a monitoring module, configured to dynamically monitor a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and a suspending module, configured to suspend the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  • a monitoring module configured to dynamically monitor a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components
  • a suspending module configured to suspend the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  • monitoring module is further configured to: obtain a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of performance parameters being associated with a respective processing component in the group of processing components; and calculate the performance level based on the plurality of performance parameters.
  • Clause 27 The apparatus of clause 26, wherein the performance level comprises an average value of the plurality of performance parameters.
  • Clause 28 The apparatus of clause 27, wherein the suspending module is further configured to: determine whether the average value is lower than a threshold frequency; and suspend the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  • Clause 29 The apparatus of clause 26, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
  • Clause 30 The apparatus of clause 29, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
  • Clause 31 The apparatus of clause 29, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
  • Clause 32 The apparatus of clause 26, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
  • Clause 33 The apparatus of clause 25, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
  • Clause 34 The apparatus of clause 25, wherein the computing system comprises one or more nodes in a distributed system.

Abstract

Methods and apparatus are provided for improving power management. A power management component dynamically monitors a performance level of a computing system while performing a power capping process on the computing system. The power management component suspends the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold. An acceptable performance level of the computing system is ensured at which the performance downgrade of the computing system during the power capping process is under control or acceptable to the customers.

Description

POWER MANAGEMENT METHOD AND APPARATUS TECHNICAL FIELD
The present disclosure relates to the field of power management and, more particularly, relates to methods and apparatuses for power management of a computing system.
BACKGROUND
Power capping is a widely used technology in modern data centers (DCs) to increase on-rack compute density and avoid power outage. Most power capping technologies employ a proportional–integral–derivative (PID) control method to quickly find a set of parameters of hardware control knobs, e.g. operating frequencies of processing components, so that the system can reach a target power level. However, most PID controllers cause frequency overshooting and/or frequency undershooting during power capping, where the frequency undershooting could be very dangerous to cloud applications. If the frequencies of processing components are too low, a huge number of requests may not be processed in time. Moreover, an ill-designed back-pressure mechanism might spread the problem to upstream servers, lowering the overall performance of the system or even causing requests loss. Thus, performance downgrade occurs and may be disastrous to a cloud platform.
Hardware vendors such as Original Design Manufacturer (ODM) and Original Equipment Manufacturer (OEM) support power capping capability in the silicon or firmware. However, such capability does not come with minimal frequency protection. In other words, the performance level might be unacceptable during power capping procedure.
Software companies using power capping may suffer from performance problems. As investigated, some software companies try to avoid performance problems by conducting conservative power capping, leaving a big margin to a power-tripping level, and resulting in poor resource usage optimization.
In view of the above, improving the power management of a computing system is necessary.
SUMMARY
This summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in limiting the scope of the claimed subject matter.
The following describes example implementations of power management methods and apparatuses. In implementations, a power management component performs power capping on the computing system and dynamically monitors the performance level of the computing system. The power management component suspends the power capping process when monitoring that the performance level of the computing system is lower than a  threshold. Thus, an acceptable performance level of the computing system can be ensured where the performance downgrade of the computing system during power capping is under control or acceptable to the customer. The acceptable performance level of the computing system and the performance downgrade of the computing system acceptable to the customer can be determined based on a Service Level Agreement (SLA) between the service provider and the customer. Therefore, the power management of the computing system is improved.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.
FIGs. 1A and 1B illustrate example block diagrams of a scenario where performance downgrade occurs during power capping.
FIG. 2 illustrates an example flowchart of a process of the power management of a computing system.
FIG. 3 illustrates an example block diagram of a mechanism of the power management of a computing system.
FIG. 4 illustrates an example flowchart of a process for calculating the average value of frequencies of the processing components in the Performance Critical Zone in the computing system.
FIG. 5 illustrates an example block diagram of an apparatus for implementing the processes and methods described above.
DETAILED DESCRIPTION
Terminologies as used herein are denoted as follows. Power capping refers to the practice of limiting how much power a computing system can consume. Power cap refers to the power limit that the power consumption of the computing system cannot exceed. Nominal frequency refers to the guaranteed highest frequency within a Thermal Design Power (TDP) envelope. PID controller refers to a proportional–integral–derivative controller (or three-term controller) , which is a control loop feedback mechanism widely used in industrial control systems and a variety of other applications requiring continuously modulated control.
FIGs. 1A and 1B illustrate example block diagrams of scenario 100 where performance downgrade occurs during power capping.
Referring to FIG. 1A, at block 102, there are 16 instances A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, and P, where the data generation and consumption are tightly connected among the instances. The arrows between the instances represent the dependencies between the instances. For example, after instance A is performed, instance B is performed, and so on. The output of instance A is provided to instance F, and the output of instance B is provided to instance E, and so on.
Referring to block 104, the 16 instances are mapped to 12 nodes on 4 racks. For example, instance J runs on rack_3 node_5 at 2.5 GHz+. Instance K runs on rack_4, node_7 at 2.5 GHz+. Instances A through I run on rack_11, node_6 through node_10 at 2.5 GHz. Instances L through P run on rack_15, node_11 through node_15 at 2.5 GHz.
Referring to block 106, when power capping is on, the total power drawn by the 12 nodes at the 4 racks is capped to not exceed a power limit/cap. The power limit/cap may be set and/or adjusted dynamically based on actual needs. The frequencies of the nodes are dynamically adjusted to satisfy the power limit/cap. For example, rack 3 runs in a failsafe mode (S5) due to the power capping. The failsafe mode is a design feature or practice that in the event of failure, the rack/node runs in a minimum responsive way (almost suspended) . Rack 4 runs at 1.2 GHz. Rack 11 runs at 2.0 GHz. Instances L through P at rack 15, node 11 through 15, run at 2.0 GHz. Thus, the frequencies of the 12 nodes at the 4 racks are lowered due to the power capping.
Referring to FIG. 1B, block 108 shows the consequences at a time T. For example, a buffer running on node_5 at rack_3 is blown (stops responding) because of the low frequency due to the power capping. As a result, instance J running on node_5 at rack_3 is crashed (stops responding) .
Block 110 shows the consequences at a time T+1. Because the buffer blowing problem is contagious, after one node fails to process requests in time, another node will do so. For example, another instance K also stops responding.
Block 112 shows the consequences at a time T+2, where four other nodes are severely impacted. Instances A, C, F, and G respond slowly or even stop responding. As a result, the performance downgrade can be observed. In that case, an avalanche occurs. The service stops, and the customers are surprised.
FIG. 2 illustrates an example flowchart of a process 200 of the power management of a computing system.
At block 202, a power management component dynamically monitors a performance level of a computing system while a power capping process is performed on the computing system. The computing system includes a plurality of processing components. In implementations, the computing system includes one or more nodes in a distributed system. The distributed system may be a large-scale distributed computing system.
In implementations, the power management component is a component that monitors and manages the power consumption of the computing system. The power management component may be implemented with software, hardware, firmware, or any combination thereof. The power management component performs the power capping process on the computing system such that the power consumption of the computing system does not exceed a power limit/cap. The power limit/cap may be set and/or adjusted dynamically based on actual needs.
In implementations, the power management component obtains a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system. A respective  performance parameter in the plurality of performance parameters is associated with a respective processing component in the group of processing components. The power management component calculates the performance level based on the plurality of performance parameters. The performance level comprises an average value of the plurality of performance parameters. In some embodiments, the average value is an instantaneous value. The plurality of performance parameters of the group of processing components includes a plurality of frequencies of the group of processing components.
At block 204, the power management component suspends/stops the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
In implementations, the power management component obtains a plurality of frequencies of a group of processing components in the computing system. The power management component calculates the performance level based on the plurality of frequencies of the group of processing components in the computing system.
In implementations, the performance of the group of processing components has a significant impact on the overall performance of the computing system. In some embodiments, the group of processing components runs one or more instances that require a latency under a first threshold, for example, 1μs, 5μs, and so on. In some embodiments, the group of processing components runs one or more instances that require an instruction execution rate above a second threshold, for example, 2 billion instructions per second, 5 billion  instructions per second, and so on. The first and second thresholds may be set and/or adjusted dynamically based on actual needs. In some embodiments, the group of processing components is not fixed, and the processing components in the group can be changed dynamically.
In implementations, the threshold represents the minimal/lowest performance level of the computing system at which the downgrade of the performance level of the computing system during the power capping process is under control or acceptable to the customers. The minimal/lowest performance level of the computing system and the performance downgrade acceptable to the customer can be determined based on the SLA between the service provider and the customer. The threshold may be set and/or adjusted dynamically based on actual needs.
In implementations, the power management component automatically resumes the power capping process after suspending the power capping process for a period of time, for example, 50ms, 2s, 1min, and so on. In some embodiments, the power management component resumes the power capping process when conditions are met, for example, when the power consumption of the computing system is over an upper limit. The upper limit may be set and/or adjusted dynamically based on actual needs.
With the above example process 200, the performance level of the computing system is dynamically monitored during power capping. When the performance level is going below the threshold, the power capping is suspended/stopped such that further performance downgrade of the computing  system can be prevented. Thus, an acceptable performance level of the computing system is ensured at which the performance downgrade of the computing system during power capping is under control or acceptable to the customer. Therefore, the power management of the computing system is improved.
FIG. 3 illustrates an example block diagram of a mechanism 300 of the power management of a computing system.
Referring to FIG. 3, a power management component 302 creates a power capping thread 304 and a performance monitoring thread 306. In the power capping thread 304, the power management component 302 performs power capping on the computing system (not shown) to ensure the power consumption of the computing system to not exceed the power limit/cap. The power limit/cap may be set and/or adjusted dynamically based on actual needs.
In the performance monitoring thread 306, the power management component 302 monitors the performance of the computing system.
In implementations, the performance of one or more processing components may be critical to or have a significant impact on the overall performance of the computing system. Thus, Performance Critical Zone (PCZ) is defined herein as a group of processing components whose performance is critical to or has a significant impact on the overall performance of the computing system. In some embodiments, the Performance Critical Zone is not fixed, and the processing components in the Performance Critical Zone can be changed dynamically.
In implementations, the computing system includes N processing components, for example, N CPUs, where N is a positive integer. Among the N CPUs, there are M CPUs in the Performance Critical Zone, where M is a positive integer. In other words, the performance of M CPUs has a significant impact on the overall performance of the computing system. In some embodiments, M CPUs in the Performance Critical Zone run latency sensitive workloads/instances. The latency sensitive workloads/instances require a latency under a first threshold, for example, 10μs, 5μs, and so on. In some embodiments, M CPUs in the Performance Critical Zone run throughput sensitive workloads/instances. The throughput sensitive workloads/instances require an instruction execution rate above a second threshold, for example, 2 billion instructions per second, 5 billion instructions per second, and so on. The first and second thresholds may be set and/or adjusted dynamically.
At block 308, the power management component 302 obtains the performance level of the computing system dynamically. In implementations, the power management component 302 obtains the performance level of the computing system periodically, for example, every X milliseconds, where X may be set and/or adjusted based on actual needs. In some embodiments, X may be dozens to hundreds of milliseconds.
At block 310 the power management component 302 obtains the performance parameters of the processing components in the Performance Critical Zone and calculates the performance level.
In implementations, the power management component 302 may collect telemetry data using performance monitoring units (PMUs) based mechanism to assist performance downgradation sensing. Telemetry is the automatic recording and transmission of data from remote or inaccessible sources to an IT system in a different location for monitoring and analysis. For example, the performance parameters are frequencies of the processing components. The power management component 302 reads Mode Specific Registers (MSRs) including APERF and MPERF through dev/CPUx/MSR interface in each of the M CUPs in the Performance Critical Zone. In more details, the power management component 302 reads APERF 0 and MPERF 0 in CPU 0, APERF 1 and MPERF 1 in CPU 1, …, and APERF M-1 and MPERF M-1 in CPU M-1. The power management component 302 calculates an average value F avg of frequencies of M CPUs in the Performance Critical Zone based on the reading results. In some embodiments, the average value F avg is an instantaneous value. Details regarding the algorithm of calculating the average value F avg are described hereinafter with reference to FIG. 4.
At block 312, the power management component 302 determines whether the performance level is more than or equal to a threshold. In implementations, the power management component 302 determines whether the average value F avg is more than or equal to a threshold frequency F min. The threshold frequency F min represents a minimal/lowest frequency at which the performance level of the computing system is acceptable and the performance downgrade of the computing system during power capping is under control or  acceptable to the customer. The threshold frequency F min can be determined based on the customer’ s requirement, machine learning, empirical value, experimental data, etc. The threshold frequency F min may be set and/or adjusted dynamically. The performance lever and performance downgrade of the computing system that are acceptable to the customer can be determined based on the SLA between the service provider and the customer.
If the power management component 302 determines that the performance level is greater than or equal to the threshold at block 312, the power management component 302 waits until the next period at block 314. In implementations, if the power management component 302 determines that the average value F avg is greater than or equal to the threshold frequency F min, the power management component 302 waits until the next period.
If the power management component 302 determines that the performance level is not greater than or equal to the threshold at block 312, the power management component 302 suspends/stops the power capping thread 304 at block 316. In implementations, if the power management component 302 determines that the average value F avg is lower than the threshold frequency F min, the power management component 302 suspends/stops the power capping thread 304, and returns to the performance monitoring thread 306 immediately at block 316.
Additionally or alternatively, if the power management component 302 determines that the performance level is lower than the threshold, the power management component indicates a scheduler to migrate instances running on  the computing system to other computing systems (not shown) as soon as possible. However, such migration may impose pressure on the scheduler and some instances may not be migrated.
At block 318, in a next period, the power management component 302 obtains the performance level of the computing system again as described above with reference to blocks 308 to 310, and details are not repeated here.
Additionally or alternatively, the power management component 302 automatically resumes the power capping process after suspending the power capping process for a period of time, for example, 50ms, 2s, 1min, and so on. In some embodiments, the power management component 302 resumes the power capping process when a condition is met, for example, when the power consumption of the computing system is over an upper limit. The upper limit may be set and/or adjusted dynamically based on actual needs.
With the above example mechanism 300, the performance level of the computing system is dynamically monitored during power capping. When the performance level is going below the threshold, the power capping is suspended/stopped such that further performance downgrade of the computing system can be prevented. Thus, the acceptable performance level of the computing system is guaranteed at which the performance downgrade of the computing system during power capping is under control or acceptable to the customer. Therefore, the power management of the computing system is improved.
FIG. 4 illustrates an example flowchart of a process 400 for calculating the average value of frequencies of the processing components in the Performance Critical Zone in the computing system. In implementations, the computing system includes N CPUs, where N is a positive integer.
At block 402, the power management component reads a nominal frequency F*of CPU i, where i is a positive integer. The nominal frequency F*can be determined based on a specification, empirical value, experimental data, etc.
At block 404, the power management component determines whether CPU i is in the Performance Critical Zone (PCZ) .
If the power management component determines that CPU i is in the PCZ at block 404, the power management component reads Mode Specific Registers, including APERF and MPERF, of CPU i, where the reading results are referred as APERF i and MPERF i at block 406. If the power management component determines that CPU i is not in the PCZ at block 404, the process 400 proceeds to block 412.
At block 408, the power management component calculates a first change value delta_APERF i, which is the change value between a current APERF i and a previous APERF prv_i according to the following formula (1) , and a second change value delta_MPERF i which is the change value between a current MPERF i and a previous MPERF prv_i according to the following formula (2):
delta_APERF i = APERF i -APERF prv_i    (1)
delta_MPERF i = MPERF i -MPERF prv_i    (2)
At block 410, the power management component calculates an average frequency F avg_i of CPU i according to the following formula (3) . In some embodiments, the average frequency F avg_i is an instantaneous value.
F avg_i = F* (delta_APERF/delta_MPERF)     (3)
At block 412, the power management component increases i by 1.
At block 414, the power management component determines whether i is greater than the total number of CPUs in the computing system, that is, whether i is greater than N.
If the power management component determines that i is greater than N at block 414, the power management component calculates the average value F avg of frequencies of all CPUs in the PCZ at block 416. If i is determined not to be greater than N at block 414, the process 400 goes back to block 404.
FIG. 5 illustrates an example block diagram of an apparatus 500 for implementing the processes and methods described above.
The apparatus 500 includes one or more processors 502 and memory 504 communicatively coupled to the processor (s) 502. The processor (s) 502 executes one or more modules and/or processes to cause the processor (s) 502 to perform a variety of functions. In implementations, the processor (s) 502 may include a central processing unit (CPU) , a graphics processing unit (GPU) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 502 may possess its own local memory, which also may store program modules, program data, and/or one or more  operating systems. In implementations, the memory 504 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof.
The apparatus 500 may additionally include an input/output (I/O) interface 506 for receiving and outputting data. The apparatus 500 may also include a communication module 508 allowing the apparatus 500 to communicate with other devices (not shown) over a network (not shown) . The network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF) , infrared, and other wireless media.
The memory 504 may include one or more computer-executable modules (modules) that are executable by the processor (s) 502. In implementations, the memory 504 may include, but not limited to, a monitoring module 510 and a suspending module 512.
The monitoring module 510 is configured to dynamically monitor a performance level of a computing system while a power capping process is performed on the computing system. The computing system including a plurality of processing components. In implementations, the computing system comprises one or more nodes in a distributed system. The distributed system may be a large-scale distributed computing system.
The monitoring module 510 is further configured to obtain a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, and calculate the  performance level based on the plurality of performance parameters. A respective performance parameter in the plurality of performance parameters is associated with a respective processing component in the group of processing components. The performance level comprises an average value of the plurality of performance parameters. In some embodiments, the average value is an instantaneous value. The plurality of performance parameters of the group of processing components includes a plurality of frequencies of the group of processing components.
The suspending module 512 is configured to suspend/stop the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
The suspending module 512 is further configured to determine whether the average value is lower than a threshold frequency, and suspend/stop the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
In implementations, the performance of the group of processing components or has a significant impact on the performance level of the computing system. In some embodiments, the group of processing components runs one or more instances that require a latency under a first threshold, for example, 10μs, 5μs, and so on. In some embodiments, the group of processing components runs one or more instances that require an instruction execution rate above a second threshold, for example, 2 billion instructions per second, 5 billion instructions per second, and so on. The first and second thresholds may be set  and/or adjusted dynamically based on actual needs. In some embodiments, the group of processing components is not fixed, and the processing components in the group can be changed dynamically.
In implementations, the threshold represents the minimal/lowest performance level of the computing system at which the performance downgrade of the computing system during the power capping process is under control or acceptable to the customers. The minimal/lowest performance level of the computing system and the performance downgrade acceptable to the customer can be determined based on the SLA between the service provider and the customer. The threshold may be set and/or adjusted dynamically based on actual needs.
With the above example apparatus 500, the performance level of the computing system is dynamically monitored during power capping. When the performance level is going below the threshold, the power capping is suspended/stopped such that further performance downgrade of the computing system can be prevented. Thus, the acceptable performance level of the computing system is guaranteed at which the performance downgrade of the computing system during power capping is under control or acceptable to the customer. Therefore, the power management of the computing system is improved.
Processes and systems discussed herein may be implemented in, but not limited to, distributed computing environment, parallel computing  environment, cluster computing environment, grid computing environment, cloud computing environment, electrical vehicles, power facilities, etc.
Some or all operations of the methods described above can be performed by execution of computer-readable instructions stored on a computer-readable storage medium, as defined below. The term “computer-readable instructions” as used in the description and claims, include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
The computer-readable storage media may include volatile memory (such as random access memory (RAM) ) and/or non-volatile memory (such as read-only memory (ROM) , flash memory, etc. ) . The computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
A non-transient computer-readable storage medium is an example of computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media. Computer-readable storage media includes volatile and  non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, phase change memory (PRAM) , static random-access memory (SRAM) , dynamic random-access memory (DRAM) , other types of random-access memory (RAM) , read-only memory (ROM) , electrically erasable programmable read-only memory (EEPROM) , flash memory or other memory technology, compact disk read-only memory (CD-ROM) , digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer-readable storage media do not include communication media.
The computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGs. 1-5.Generally, computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the  described operations can be combined in any order and/or in parallel to implement the processes.
EXAMPLE CLAUSES
Clause 1. A method comprising: dynamically monitoring a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
Clause 2. The method of clause 1, wherein dynamically monitoring the performance level of the computing system comprises obtaining the performance level of the computing system periodically.
Clause 3. The method of clause 1, wherein dynamically monitoring the performance level of the computing system comprises: obtaining a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of performance parameters being associated with a respective processing component in the group of processing components; and calculating the performance level based on the plurality of performance parameters.
Clause 4. The method of clause 3, wherein the performance level comprises an average value of the plurality of performance parameters.
Clause 5. The method of clause 4, wherein suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than the threshold comprises: determining whether the average value is lower than a threshold frequency; and suspending the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
Clause 6. The method of clause 3, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
Clause 7. The method of clause 6, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
Clause 8. The method of clause 6, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
Clause 9. The method of clause 3, wherein obtaining a plurality of performance parameters of the group of processing components comprises: reading a plurality of registers of the group of processing component to obtain the plurality of performance parameters.
Clause 10. The method of clause 3, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
Clause 11. The method of clause 1, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
Clause 12. The method of clause 1, wherein the computing system comprises one or more nodes in a distributed system.
Clause 13. A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: dynamically monitoring a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
Clause 14. The computer-readable storage medium of clause 13, wherein dynamically monitoring the performance level of the computing system comprises obtaining the performance level of the computing system periodically.
Clause 15. The computer-readable storage medium of clause 13, wherein dynamically monitoring the performance level of the computing system comprises: obtaining a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of  performance parameters being associated with a respective processing component in the group of processing components; and calculating the performance level based on the plurality of performance parameters.
Clause 16. The computer-readable storage medium of clause 15, wherein the performance level comprises an average value of the plurality of performance parameters.
Clause 17. The computer-readable storage medium of clause 18, wherein suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than the threshold comprises: determining whether the average value is lower than a threshold frequency; and suspending the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
Clause 18. The computer-readable storage medium of clause 15, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
Clause 19. The computer-readable storage medium of clause 18, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
Clause 20. The computer-readable storage medium of clause 18, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
Clause 21. The computer-readable storage medium of clause 15, wherein obtaining a plurality of performance parameters of the group of processing components comprises: reading a plurality of registers of the group of processing component to obtain the plurality of performance parameters.
Clause 22. The computer-readable storage medium of clause 15, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
Clause 23. The computer-readable storage medium of clause 13, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
Clause 24. The computer-readable storage medium of clause 13, wherein the computing system comprises one or more nodes in a distributed system.
Clause 25. An apparatus comprising: one or more processors; and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors, the computer-executable modules including: a monitoring module, configured to dynamically monitor a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and a suspending module, configured to suspend the power capping process of the  computing system when monitoring that the performance level of the computing system is lower than a threshold.
Clause 26. The apparatus of clause 25, wherein the monitoring module is further configured to: obtain a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of performance parameters being associated with a respective processing component in the group of processing components; and calculate the performance level based on the plurality of performance parameters.
Clause 27. The apparatus of clause 26, wherein the performance level comprises an average value of the plurality of performance parameters.
Clause 28. The apparatus of clause 27, wherein the suspending module is further configured to: determine whether the average value is lower than a threshold frequency; and suspend the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
Clause 29. The apparatus of clause 26, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
Clause 30. The apparatus of clause 29, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
Clause 31. The apparatus of clause 29, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
Clause 32. The apparatus of clause 26, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
Clause 33. The apparatus of clause 25, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
Clause 34. The apparatus of clause 25, wherein the computing system comprises one or more nodes in a distributed system.
CONCLUSION
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.

Claims (34)

  1. A method comprising:
    dynamically monitoring a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and
    suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  2. The method of claim 1, wherein dynamically monitoring the performance level of the computing system comprises obtaining the performance level of the computing system periodically.
  3. The method of claim 1, wherein dynamically monitoring the performance level of the computing system comprises:
    obtaining a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of performance parameters being associated with a respective processing component in the group of processing components; and
    calculating the performance level based on the plurality of performance parameters.
  4. The method of claim 3, wherein the performance level comprises an average value of the plurality of performance parameters.
  5. The method of claim 4, wherein suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than the threshold comprises:
    determining whether the average value is lower than a threshold frequency; and
    suspending the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  6. The method of claim 3, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
  7. The method of claim 6, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
  8. The method of claim 6, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
  9. The method of claim 3, wherein obtaining a plurality of performance parameters of the group of processing components comprises:
    reading a plurality of registers of the group of processing component to obtain the plurality of performance parameters.
  10. The method of claim 3, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
  11. The method of claim 1, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
  12. The method of claim 1, wherein the computing system comprises one or more nodes in a distributed system.
  13. A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the  one or more processors, cause the one or more processors to perform operations comprising:
    dynamically monitoring a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and
    suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  14. The computer-readable storage medium of claim 13, wherein dynamically monitoring the performance level of the computing system comprises obtaining the performance level of the computing system periodically.
  15. The computer-readable storage medium of claim 13, wherein dynamically monitoring the performance level of the computing system comprises:
    obtaining a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of performance parameters being associated with a respective processing component in the group of processing components; and
    calculating the performance level based on the plurality of performance parameters.
  16. The computer-readable storage medium of claim 15, wherein the performance level comprises an average value of the plurality of performance parameters.
  17. The computer-readable storage medium of claim 18, wherein suspending the power capping process of the computing system when monitoring that the performance level of the computing system is lower than the threshold comprises:
    determining whether the average value is lower than a threshold frequency; and
    suspending the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  18. The computer-readable storage medium of claim 15, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
  19. The computer-readable storage medium of claim 18, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
  20. The computer-readable storage medium of claim 18, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
  21. The computer-readable storage medium of claim 15, wherein obtaining a plurality of performance parameters of the group of processing components comprises:
    reading a plurality of registers of the group of processing component to obtain the plurality of performance parameters.
  22. The computer-readable storage medium of claim 15, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
  23. The computer-readable storage medium of claim 13, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
  24. The computer-readable storage medium of claim 13, wherein the computing system comprises one or more nodes in a distributed system.
  25. An apparatus comprising:
    one or more processors; and
    memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors, the computer-executable modules including:
    a monitoring module, configured to dynamically monitor a performance level of a computing system while a power capping process is performed on the computing system, the computing system including a plurality of processing components; and
    a suspending module, configured to suspend the power capping process of the computing system when monitoring that the performance level of the computing system is lower than a threshold.
  26. The apparatus of claim 25, wherein the monitoring module is further configured to:
    obtain a plurality of performance parameters of a group of processing components among the plurality of processing components in the computing system, a respective performance parameter in the plurality of performance parameters being associated with a respective processing component in the group of processing components; and
    calculate the performance level based on the plurality of performance parameters.
  27. The apparatus of claim 26, wherein the performance level comprises an average value of the plurality of performance parameters.
  28. The apparatus of claim 27, wherein the suspending module is further configured to:
    determine whether the average value is lower than a threshold frequency; and
    suspend the power capping process of the computing system in response to determining that the average value is lower than the threshold frequency.
  29. The apparatus of claim 26, wherein a performance of the group of processing components has a significant impact on the performance level of the computing system.
  30. The apparatus of claim 29, wherein the group of processing components runs one or more instances that require a latency under a first threshold.
  31. The apparatus of claim 29, wherein the group of processing components runs one or more instances that require an instruction execution rate above a second threshold.
  32. The apparatus of claim 26, wherein the plurality of performance parameters of the group of processing components comprises a plurality of frequencies of the group of processing components.
  33. The apparatus of claim 25, wherein the threshold represents a lowest performance level of the computing system at which a performance downgrade of the computing system during the power capping process is acceptable.
  34. The apparatus of claim 25, wherein the computing system comprises one or more nodes in a distributed system.
PCT/CN2019/108559 2019-09-27 2019-09-27 Power management method and apparatus WO2021056418A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980099699.2A CN114391128A (en) 2019-09-27 2019-09-27 Power management method and apparatus
PCT/CN2019/108559 WO2021056418A1 (en) 2019-09-27 2019-09-27 Power management method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/108559 WO2021056418A1 (en) 2019-09-27 2019-09-27 Power management method and apparatus

Publications (1)

Publication Number Publication Date
WO2021056418A1 true WO2021056418A1 (en) 2021-04-01

Family

ID=75164780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/108559 WO2021056418A1 (en) 2019-09-27 2019-09-27 Power management method and apparatus

Country Status (2)

Country Link
CN (1) CN114391128A (en)
WO (1) WO2021056418A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110144818A1 (en) * 2009-12-14 2011-06-16 Cong Li Method and apparatus for dynamically allocating power in a data center
US20110178652A1 (en) * 2010-01-15 2011-07-21 International Business Machines Corporation Dynamically Adjusting an Operating State of a Data Processing System Running Under a Power Cap
US20120036385A1 (en) * 2010-08-09 2012-02-09 Faasse Scott P Method For Controlling The Power Usage Of A Computer
US20130111478A1 (en) * 2010-11-02 2013-05-02 International Business Machines Corporation Unified resource manager providing a single point of control
US20170017288A1 (en) * 2015-07-16 2017-01-19 Cisco Technology, Inc. Determining power capping policies for a computer device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8627123B2 (en) * 2010-03-25 2014-01-07 Microsoft Corporation Managing power provisioning in distributed computing
US8320898B2 (en) * 2010-09-16 2012-11-27 Qualcomm Incorporated Systems and methods for optimizing the configuration of a set of performance scaling algorithms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110144818A1 (en) * 2009-12-14 2011-06-16 Cong Li Method and apparatus for dynamically allocating power in a data center
US20110178652A1 (en) * 2010-01-15 2011-07-21 International Business Machines Corporation Dynamically Adjusting an Operating State of a Data Processing System Running Under a Power Cap
US20120036385A1 (en) * 2010-08-09 2012-02-09 Faasse Scott P Method For Controlling The Power Usage Of A Computer
US20130111478A1 (en) * 2010-11-02 2013-05-02 International Business Machines Corporation Unified resource manager providing a single point of control
US20170017288A1 (en) * 2015-07-16 2017-01-19 Cisco Technology, Inc. Determining power capping policies for a computer device

Also Published As

Publication number Publication date
CN114391128A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
US11609810B2 (en) Technologies for predicting computer hardware performance with machine learning
US10877533B2 (en) Energy efficient workload placement management using predetermined server efficiency data
US9396008B2 (en) System and method for continuous optimization of computing systems with automated assignment of virtual machines and physical machines to hosts
US7181651B2 (en) Detecting and correcting a failure sequence in a computer system before a failure occurs
US8464086B2 (en) Software-based power capping
EP3577561A2 (en) Resource management for virtual machines in cloud computing systems
US10162397B2 (en) Energy efficient workload placement management based on observed server efficiency measurements
EP3577558A2 (en) Resource management for virtual machines in cloud computing systems
US20190018469A1 (en) Method and apparatus for limiting rack power consumption
EP3750057B1 (en) Hybrid system-on-chip for power and performance prediction and control
US20180054477A1 (en) Statistical resource balancing of constrained microservices in cloud paas environments
WO2021042245A1 (en) Method and apparatus for controlling heat dissipation device
CN110826075A (en) PLC dynamic measurement method, device, system, storage medium and electronic equipment
CN112887407B (en) Job flow control method and device for distributed cluster
WO2021056418A1 (en) Power management method and apparatus
KR102365839B1 (en) Method and device for monitoring application performance
CN113312235A (en) Service quality early warning power capping system with optimized throughput
US20180309686A1 (en) Reducing rate limits of rate limiters
CN113542027B (en) Flow isolation method, device and system based on distributed service architecture
CN113360344B (en) Server monitoring method, device, equipment and computer readable storage medium
CN114879832A (en) Power consumption control method, device, chip, apparatus, and medium for arithmetic device
CN114402272B (en) Power management method and apparatus
CN110399028A (en) Method, equipment and the medium for preventing surge from occurring when a kind of power supply batch operation
US20230418674A1 (en) Information processing apparatus, information processing system, and information processing method
US11782490B2 (en) Software-defined fail-safe power draw control for rack power distribution units

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946310

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946310

Country of ref document: EP

Kind code of ref document: A1