US20160054779A1 - Managing power performance of distributed computing systems - Google Patents
Managing power performance of distributed computing systems Download PDFInfo
- Publication number
- US20160054779A1 US20160054779A1 US14/582,743 US201414582743A US2016054779A1 US 20160054779 A1 US20160054779 A1 US 20160054779A1 US 201414582743 A US201414582743 A US 201414582743A US 2016054779 A1 US2016054779 A1 US 2016054779A1
- Authority
- US
- United States
- Prior art keywords
- power
- hpc
- job
- nodes
- manager
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000001816 cooling Methods 0.000 claims abstract description 110
- 238000000034 method Methods 0.000 claims abstract description 63
- 230000004044 response Effects 0.000 claims description 30
- 230000007246 mechanism Effects 0.000 claims description 29
- 238000012544 monitoring process Methods 0.000 claims description 19
- 230000007423 decrease Effects 0.000 claims description 3
- 238000007726 management method Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 13
- 238000003860 storage Methods 0.000 description 13
- 230000015654 memory Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 230000003993 interaction Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000005611 electricity Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000002826 coolant Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B15/00—Systems controlled by a computer
- G05B15/02—Systems controlled by a computer electric
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/30—Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3209—Monitoring remote activity, e.g. over telephone lines or network connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3228—Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/324—Power saving characterised by the action undertaken by lowering clock frequency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/329—Power saving characterised by the action undertaken by task scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3296—Power saving characterised by the action undertaken by lowering the supply or operating voltage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4893—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues taking into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
- H04L41/0833—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability for reduction of network energy consumption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/70—Admission control; Resource allocation
- H04L47/78—Architectures of resource allocation
- H04L47/783—Distributed allocation of resources, e.g. bandwidth brokers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/70—Admission control; Resource allocation
- H04L47/82—Miscellaneous aspects
- H04L47/821—Prioritising resource allocation or reservation requests
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/70—Admission control; Resource allocation
- H04L47/83—Admission control; Resource allocation based on usage prediction
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S40/00—Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
Definitions
- Embodiments of the invention relate to the field of computer systems; and more specifically, to the methods and systems of power management and monitoring of high performance computing systems.
- a High Performance Computing (HPC) system performs parallel computing by simultaneous use of multiple nodes to execute a computational assignment referred to as a job.
- Each node typically includes processors, memory, operating system, and I/O components.
- the nodes communicate with each other through a high speed network fabric and may use shared file systems or storage.
- the job is divided in thousands of parallel tasks distributed over thousands of nodes. These tasks synchronize with each other hundreds of times a second.
- a HPC system can consume megawatts of power.
- RAPL Running Average Power Limit
- NM Node Manager
- DCM Datacenter Manager
- RAPL Running Average Power Limit
- NM Node Manager
- DCM Datacenter Manager
- FIG. 1 illustrates an exemplary block diagram of an overall architecture of a power management and monitoring system in accordance with one embodiment.
- FIG. 2 illustrates an exemplary block diagram of overall interaction architecture of HPC Power-Performance Manager in accordance with one embodiment.
- FIG. 3 illustrates an exemplary block diagram showing an interaction between the HPC facility power manager and other component of the HPC facility.
- FIG. 4 illustrates an exemplary block diagram showing an interaction between the HPC System Power Manager with a Rack Manager and a Node Manager.
- FIG. 5 illustrates HPPM response mechanism at a node level in case of a power delivery or cooling failures.
- FIG. 6 illustrates an exemplary block diagram of a HPC system receiving various policy instructions.
- FIG. 7 illustrates an exemplary block diagram showing the interaction between the HPC Resource Manager and other components of the HPC System.
- FIG. 8 illustrates an exemplary block diagram of the interaction of the Job Manager with Power Aware Job Launcher according to power performance policies.
- FIG. 9 illustrates one embodiment of a process for power management and monitoring of high performance computing systems.
- FIG. 10 illustrates another embodiment of a process for power management and monitoring of high performance computing systems.
- references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- embodiments described herein relate to the power management and monitoring for high performance computing systems.
- a frame work for workload aware, hierarchical and holistic management and monitoring for power and performance is disclosed.
- FIG. 1 illustrates an example of power management and monitoring system for HPC systems according to one embodiment.
- the system is referred to herein as an HPC Power-Performance Manager (HPPM).
- HPC System 400 includes multiple components including Resource Manager 410 , Job Manager 420 , Datacenter Manager 310 , Rack Manager 430 , Node Manager 431 , and Thermal Control 432 .
- HPPM receives numerous power performance policies input at different stages of the management.
- power performance policies include a facility policy, a utility provider policy, a facility administrative policy, and a user policy.
- HPC System Power Manager 300 communicates the capacity and requirements of HPC System 400 to HPC Facility Power Manager 200 .
- HPC Facility Power Manager 200 then communicates the power allocated by the utility provider back to HPC System Power Manager 300 .
- HPC System Power Manager 300 also receives administrative policies from HPC System Administrator 202 .
- HPC System Power Manager 300 receives the power and thermal capacity of HPC System 400 and maintains the average power consumption of HPC System 400 at or below the allocation.
- a soft limit is defined in part by the power available for the allocation.
- the soft limit includes the power allocated to each HPC system within HPPM and the power allocated to each job.
- the job manager 420 enforces the soft limit to each job based on the power consumption of each node.
- a hard limit is defined by the power and thermal capacity of the cooling and power delivery infrastructures.
- hard limit defines power and cooling capability available for the nodes, racks, systems and datacenters within a HPC facility.
- the cooling and power infrastructures may or may not be shared by different elements of the HPC facility.
- the hard limit fluctuates in response to failures in cooling and power delivery infrastructures, while the soft limit remains at or below the hard limit at any time.
- HPC System Power Manager 300 uses Out of Band mechanism 301 (e.g., Node Manager 431 , Thermal Control 432 , Rack Manager 430 and Datacenter Manager 310 ) to monitor and manage the hard limit for each component.
- Out of Band mechanism 301 e.g., Node Manager 431 , Thermal Control 432 , Rack Manager 430 and Datacenter Manager 310
- the Out of Band mechanism 301 unlike In Band mechanism 302 , uses an independent embedded controller outside the system with an independent networking capability to perform its operation.
- HPC System Power Manager 300 allocates power to the jobs. In one embodiment, the allocation of power to the jobs is based on the dynamic monitoring and power-aware management of Resource Manager 410 and Job Manager 420 further described below. In one embodiment, Resource Manager 410 and Job Manager 420 are operated by In-Band mechanism 302 . In one embodiment, In Band mechanism 301 uses system network and software for monitoring, communication, and execution.
- An advantage of embodiments described herein is that the power consumption is managed by allocating power to the jobs. As such, the power consumption is allocated in a way to cause significant reduction in the performance variations of the nodes and subsequently improvement in job completion time. In other words, the power allocated to a particular job is distributed among the nodes dedicated to run the job in such a way to achieve the increased performance.
- FIG. 2 illustrates an example of interactions between different components of HPC Power-Performance Manager 100 . It is pointed out that those elements of FIG. 2 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such.
- the lines connecting the blocks represent communication between different components of a HPPM.
- these communications include communicating, for example, the soft and hard limits for each component of the HPPM 100 , reporting the power and thermal status of the components, reporting failures of power and thermal infrastructures, and communicating the available power for the components, etc.
- HPPM 100 includes multiple components divided between multiple datacenters within a HPC facility. HPPM 100 also includes power and cooling resources shared by the components.
- each datacenter includes a plurality of sever racks, and each server rack includes a plurality of nodes.
- HPPM 100 manages power and performance of the system by forming a dynamic hierarchical management and monitoring structure.
- the power and thermal status of each layer is regularly monitored by a managing component and reported to a higher layer.
- the managing component of the higher layer aggregates the power and thermal conditions of its lower components and reports it to its higher layer. Reversely, the higher managing component ensures the allocation of power to its lower layers is based upon the current power and thermal capacity of their components.
- HPC Facility Power Manager 200 distributes power to multiple datacenters and resources shared within the HPC facility.
- HPC Facility Power Manager 200 receives the aggregated report of the power and thermal conditions of the HPC facility from Datacenter Manager 210 .
- Datacenter Manager 210 is the highest managing component of HPPM 100 .
- Datacenter Manager 210 is the higher managing component of plurality of datacenters.
- Each datacenter is managed by a datacenter manager, such as for example, Datacenter Manager 310 .
- Datacenter Manager 310 is the higher managing component of a plurality of server racks. Each server rack includes plurality of nodes.
- Datacenter Manager 310 is a managing component for the nodes of an entire or part of a server rack while in other embodiments Datacenter Manager 301 is a managing component for nodes of multiple racks.
- Each node is managed by a node manager.
- each of Nodes 500 is managed by Node Manager 431 .
- Node Manager 431 monitors and manages power consumption and thermal status of its associated node.
- Datacenter Manager 310 is also a higher managing component for the power and cooling resources shared by a plurality of the nodes.
- Each shared power and cooling resource is managed by a rack manager, for example the Rack Manager 430 .
- plurality of nodes share multiple power and cooling resources each managed by a rack manager.
- HPC Facility Power Manager 200 sends the capacity and requirements of the HPC facility to a utility provider.
- HPC Facility Power Manager 200 distributes the power budget to HPC System Power Manager associated with each HPC System (e.g., the HPC System Power Manager 300 ).
- HPC System Power Manager 300 determines how much power to allocate to each job.
- Job Manager 420 manages power performance of a job within the budget allocated by the HPC System Power Manager 300 .
- Job Manager 420 manages a job throughout its life cycle by controlling the power allocation and frequencies of Nodes 500 .
- Datacenter Manager 310 if a power or thermal failure occurs on any lower layers of Datacenter Manager 310 , Datacenter Manager 310 immediately warns HPC System Power Manager 300 of the change in power or thermal capacity. Subsequently, HPC System Power Manager 300 adjusts the power consumption of the HPC system by changing the power allocation to the jobs.
- FIG. 3 demonstrates the role of HPC Facility Power Manager 200 in more details. It is pointed out that those elements of FIG. 3 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such.
- HPC Facility 101 includes HPC Facility Power Manager 200 , Power Generator and Storage 210 , Power Convertor 220 , Cooling System 230 that may include storage of a cooling medium, and several HPC systems including the HPC System 400 .
- Each HPC system is managed by a HPC System Power Manager (e.g., HPC System Power Manager 300 manages HPC System 400 ).
- HPC Facility Power Manager 200 manages the power consumption of HPC Facility 101 .
- HPC Facility Power Manager 200 receives facility level policies from the Facility Administrator 102 .
- the facility level policies relate to selecting a local source of power, environmental considerations, and the overall operation policy of the facility.
- HPC Facility Power Manager 200 also communicates with Utility Provider 103 .
- HPC Facility Power Manager 200 communicates its forecasted capacity and requirements of HPC Facility 101 in advance to the Utility Provider 103 .
- HPC Facility 101 uses Demand/Response interface to communicate with Utility Provider 103 .
- the Demand/Response interface provides a non-proprietary interface that allows the Utility Provider 103 to send signals about electricity price and system grid reliability directly to customers, e.g. HPC Facility 101 .
- the dynamic monitoring allows for HPC Facility Power Manager 200 to more accurately estimate the required power and communicate its capacity and requirement automatically to Utility Provider 103 .
- This method allows for improving cost based on the price in real time and reduces the disparity between the allocated power by the Utility Provider 103 and the power actually used by the Facility 101 .
- HPPM determines a power budget at a given time based upon the available power from Utility Provider 103 , the cost of the power from Utility Provider 103 , the available power in the local Power Generator and Storage 210 , and actual demand by the HPC systems. In one embodiment, HPPM substitutes the energy from the utility provider by the energy from the local storages or electricity generators. In one embodiment, HPPM receives the current price of electricity and makes the electricity produced by Power Generator and Storage 210 available for sell in the market.
- FIG. 4 illustrates how HPC System Power Manager 300 manages shared power supply among nodes using a combination of Rack Manager 430 and Node Manager 440 . It is pointed out that those elements of FIG. 4 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such.
- Rack Manager 430 reports the status of the shared resources and receives power limits from Datacenter Manager 310 .
- Node Manager 440 reports node power consumption and receive node power limits from Datacenter Manager 310 .
- Datacenter Manager 310 reports system power consumption to HPC System Power Manager 300 .
- the communication between HPC System Power Manager 300 and Datacenter Manager 310 facilitates monitoring of the cooling and power delivery infrastructure in order to maintain the power consumption within the hard limit.
- HPC System Power Manager 300 maintains the power consumption of the nodes or processors by adjusting the power allocated to them.
- the hard limit is reduced automatically by either or both of Rack Manager 430 and Node Manager 440 to a lower limit to avoid a complete failure of the power supply. Subsequently the sudden reduction of available power is reported to HPC System Power Manger 300 through Datacenter Manager 310 by either or both of Rack Manager 430 and Node Manager 440 , so that HPC System Power Manger 300 can readjust the power allocation accordingly.
- FIG. 5 illustrates HPPM response mechanism at a node level in case of a power delivery or cooling failures. It is pointed out that those elements of FIG. 5 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such.
- a cooling and power delivery failure does not impact all nodes equally.
- Node Manager 431 identifies the impacted nodes, for example Nodes 500 , it will adjust the associated hard limit for Nodes 500 .
- This hard limit is then communicated to Job Manager 420 .
- Job Manager 420 adjusts the soft limit associated with Nodes 500 to maintain both soft limit and power consumption of Nodes 500 at or below the hard limit.
- the frequency of the communication between Node Manager 431 and Job Manager 420 is in milliseconds.
- Node Manager 431 directly alerts Nodes 500 .
- the alert imposes a restriction on Nodes 500 and causes an immediate reduction of power consumption by Nodes 500 . In one embodiment, such a reduction could be more than necessary to avoid further power failures.
- Node Manager 431 communicates the new hard limit to Job Manager 420 .
- Job Manager 420 adjusts the soft limits of Nodes 500 to maintain the power consumption of Nodes 500 at or below the hard limit
- Job Manager 420 enforces the new hard limit and removes the alert asserted by Node Manager 431 .
- HPC system 400 includes one or more operating system (OS) nodes 501 , one or more compute nodes 502 , one or more input/output (I/O) nodes 503 and a storage system 504 .
- OS operating system
- I/O input/output
- the high-speed fabric 505 communicatively connects the OS nodes 501 , compute nodes 502 and I/O nodes 503 and storage system 504
- the high-speed fabric may be a network topology of nodes interconnected via one or more switches.
- I/O nodes 503 are communicatively connected to storage 504 .
- storage 504 is a non-persistent storage such as volatile memory (e.g., any type of random access memory “RAM”); persistent storage such as non-volatile memory (e.g., read-only memory “ROM”, power-backed RAM, flash memory, phase-change memory, etc.), a solid-state drive, hard disk drive, an optical disc drive, or a portable memory device.
- volatile memory e.g., any type of random access memory “RAM”
- persistent storage such as non-volatile memory (e.g., read-only memory “ROM”, power-backed RAM, flash memory, phase-change memory, etc.), a solid-state drive, hard disk drive, an optical
- the OS nodes 501 provide a gateway to accessing the compute nodes 502 .
- a user may be required to log-in to HPC system 400 which may be through OS nodes 501 .
- OS nodes 501 accept jobs submitted by users and assist in the launching and managing of jobs being processed by compute nodes 502 .
- compute nodes 502 provide the bulk of the processing and computational power.
- I/O nodes 503 provides an interface between compute nodes 502 and external devices (e.g., separate computers) that provides input to HPC system 400 or receive output from HPC system 400 .
- the limited power allocated to HPC system 400 is used by HPC system 400 to run one or more of jobs 520 .
- Jobs 520 comprise one or more jobs requested to be run on HPC system 400 by one or more users, for example User 201 .
- Each job includes a power policy, which will be discussed in-depth below. The power policy will assist the HPC System Power Manager in allocating power for the job and aid in the management of the one or more jobs 520 being run by HPC system 400 .
- HPC System Administrator 202 provides administrative policies to guide the management of running jobs 520 by providing an over-arching policy that defines the operation of HPC system 400 .
- examples of policies in the administrative policies include, but are not limited or restricted to, (1) a policy to increase utilization of all hardware and software resources (e.g., instead of running fewer jobs at high power and leaving resources unused, run as many jobs as possible to use as much of the resources as possible); (2) a job with no power limit is given the highest priority among all running jobs; and/or (3) suspended jobs are at higher priority for resumption.
- Such administrative policies govern the way the HPC System Power Manager schedules, launches, suspends and re-launches one or more jobs.
- User 201 policy can be specific to a particular job. User 201 can instruct HPC System 400 to run a particular job with no power limit or according to a customized policy. Additionally User 201 can set the energy policy of a particular job, for example at most efficiency or highest performance.
- HPC System Administrator 202 and User 201 communicate their policies to the HPC System Power Manager 300 and Resource Manager 410 .
- Resource Manager 410 receives these policies and formulates them into “modes” under which Job Manager 420 instructs OS Nodes 501 , CPU Nodes 502 , and IO Node 503 to operate.
- FIG. 7 shows the flow of information between Resource Manager 410 (including Power Aware Job scheduler 411 , and Power Aware Job launcher 412 ) and other elements of the HPPM (HPC System Power Manager 300 , Estimator 413 , Calibrator 414 , and Job Manager 420 ).
- the purpose of these communications is to allocate sufficient hardware resources (e.g., nodes, processors, memories, network bandwidth and etc.) and schedule execution of appropriate jobs.
- power is allocated to the jobs in such a way to maintain HPC System 400 power within the limits, increase energy efficiency, and control HPC system 400 rate of power consumption change.
- HPC System Power Manager 300 communicates with Resource Manager 410 .
- Power Aware Job scheduler 411 considers the policies and priorities of Facility Administrator 102 , Utility Provider 103 , User 201 , and HPC System Administrator 202 and determines accordingly what hardware resources of HPC System 400 is needed to run a particular job. Additionally, Power Aware Job scheduler 411 receives power-performance characteristics of the job at different operating points from Estimator 413 and Calibrator 414 . Resource Manager 410 forecasts how much power a particular job needs and take corrective actions when actual power differs from the estimation.
- Estimator 413 provides Resource Manager 410 with estimates of power consumption for each job enabling Resource Manager 410 to efficiently schedule and monitor each job requested by one or more job owners (e.g., users). Estimator 413 provides a power consumption estimate based on, for example, maximum and average power values stored in a calibration database, wherein the calibration database is populated by the processing of Calibrator 414 . In addition, the minimum power required for each job is considered.
- Estimator 413 uses the job power policy limiting the power supplied to the job (e.g., a predetermined fixed frequency at which the job will run, a minimum power required for the job, or varying frequencies and/or power supplied determined by Resource Manager 410 ), the startup power for the job, the frequency at which the job will run, the available power to HPC System 400 and/or the allocated power to HPC System 400 .
- the job power policy limiting the power supplied to the job e.g., a predetermined fixed frequency at which the job will run, a minimum power required for the job, or varying frequencies and/or power supplied determined by Resource Manager 410
- the startup power for the job e.g., a predetermined fixed frequency at which the job will run, a minimum power required for the job, or varying frequencies and/or power supplied determined by Resource Manager 410
- the startup power for the job e.g., a predetermined fixed frequency at which the job will run, a minimum power required for the job, or varying frequencies and/or power supplied determined by
- Calibrator 414 calibrates the power, thermal dissipation and performance of each node within HPC System 400 .
- Calibrator 414 provides a plurality of methods for calibrating the nodes within HPC system 400 .
- Calibrator 414 provides a first method of calibration in which every node within HPC system 400 runs sample workloads (e.g., a mini-application and/or a test script) so Calibrator 414 may sample various parameters (e.g., power consumed) at predetermined time intervals in order to determine, inter alia, (1) the average power, (2) the maximum power, and (3) the minimum power for each node.
- the sample workload is run on each node at every operating frequency of the node.
- Calibrator 414 provides a second method of calibration in which calibration of one or more nodes occurs during the run-time of a job.
- Calibrator 414 samples the one or more nodes on which a job is running (e.g., processing).
- Calibrator 414 obtains power measurements of each node during actual run-time.
- Power Aware Job Scheduler 411 is configured to receive a selection of a mode for a job, to determine an available power for the job based on the mode and to allocate a power for the job based on the available power. In one embodiment, Power Aware Job Scheduler 411 is configured to determine a uniform frequency for the job based on the available power. In one embodiment, the power aware job scheduler is configured to determine the available power for the job based on at least one of a monitored power, an estimated power, and a calibrated power.
- a user submits a program to be executed (“job”) to a queue.
- job queue refers to a data structure containing jobs to run.
- Power Aware Job Scheduler 411 examines the job queue at appropriate times (periodically or at certain events e.g., termination of previously running jobs) and determines if resources including the power needed to run the job can be allocated. In some cases, such resources can be allocated only at a future time, and in such cases the job is scheduled to run at a designated time in future.
- Power Aware Job Launcher 412 selects a job among the jobs in the queue, based on available resources and priority, and schedules it to be launched. In one embodiment, in case the available power is limited, Power Aware Job Launcher 412 will look at the operating points to select the one which results in highest frequency while maintain the power consumption below the limit.
- FIG. 8 illustrates the interaction of Job Manager 420 with Power Aware Job Launcher 412 according to Power Performance Policies 440 .
- Job Manager 420 manages power performance of the job throughout its life cycle.
- Job Manager 420 is responsible for operating the job within the constraints of one or more power policies and various power limits after the job has been launched.
- a user may designate “special” jobs that are not power limited.
- Power Aware Job scheduler 411 will need to estimate the maximum power the job could consume, and only start the job when the power is available.
- System Power Performance 300 redistributes power among the normal jobs in order to reduce stranded power and increase efficiency.
- a user may specify the frequency for a particular job.
- user selection may be based upon a table that indicates degradation in performance and reduction in power for each frequency.
- the frequency selection for the jobs can be automated based upon available power.
- Job Manager 420 will adjust the frequency periodically based upon power headroom.
- An advantage of embodiment described herein is that a job will be allowed to operate at all available frequencies. Job Manager 420 will determine the best mode to run the job based upon the policies and priorities communicated by Facility Administrator 102 , Utility Provider 103 , User 201 , and HPC System Administrator 202 .
- FIG. 9 is a flow diagram of one embodiment of a process for managing power and performance of HPC systems.
- the process is performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), firmware or a combination of the three.
- HPPM communicates capacity and requirements of the HPC system to a utility provider.
- the capacity of the HPC system is determined based on the cooling and power delivery capacity of the HPC system.
- the HPPM communicates its capacity and requirements to the utility provider through a demand/response interface.
- the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and input from the utility provider.
- the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- HPPM determines a power budget for the HPC system.
- the power budget is determined based on the cooling and power delivery capacity of the HPC system.
- the power budget is determined based on the power performance policies.
- the power performance policies based on at least one of a facility policy, a utility provider policy, a facility administrative policy, and a user policy.
- HPPM determines a power and cooling capacity of the HPC system.
- determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures. In one embodiment, in case of a failure the power consumption is adjusted accordingly. In one embodiment, determining the power and cooling capacity of the HPC system is performed by an out of band mechanism.
- HPPM allocates the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system.
- allocating the power budget to the job is based on power performance policies.
- allocating the power budget to the job is based on an estimate of power required to execute the job.
- the estimate of the required power to execute the job is based on at least one of a monitored power, an estimated power, and a calibrated power.
- HPPM executes the job on selected HPC nodes.
- the selected HPC nodes are selected based on power performance policies.
- the selected HPC nodes are selected based on power characteristics of the nodes.
- the power characteristics of the HPC nodes are determined based on running of a sample workload.
- the power characteristics of the HPC nodes are determined during runtime. In one embodiment, wherein the job is executed on the selected HPC nodes based on power performance policies.
- FIG. 10 is a flow diagram of one embodiment of a process for managing power and performance of HPC systems.
- the process is performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), firmware or a combination of the three.
- HPPM defines a hard power limit based on a thermal and power delivery capacity of a HPC facility.
- the hard power limit is managed and monitored by an out of band mechanism.
- the hard power limit decreases in response to failures of the power and cooling infrastructures of the HPC facility.
- HPPM defines a soft power limit based on a power budget allocated to the HPC facility.
- the power budget for the HPC facility is provided by a utility provider through a demand/response interface.
- the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and input from the utility provider.
- the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- HPPM allocates the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit.
- allocating the power budget to the job is based on power performance policies.
- allocating the power budget to the job is based on an estimate of power required to execute the job.
- the estimate of the required power to execute the job is based on at least one of a monitored power, an estimated power, and a calibrated power.
- HPPM executes the job on nodes while maintaining the soft power limit at or below the hard power limit.
- allocating the power budget to the job and executing the job on the nodes is according to power performance policies.
- the power performance policies are based on at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy.
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power performance policies.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power characteristics of the nodes.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power characteristics of the nodes determined based on running of a sample workload.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power characteristics of the nodes determined during runtime.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the job is executed on the selected HPC nodes based on power performance policies.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein allocating the power budget to the job is based on power performance policies.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the allocating the power budget to the job is based on an estimate of power required to execute the job.
- HPC system includes a plurality of interconnected HPC nodes operable to execute a job
- determining a power and cooling capacity of the HPC system allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system
- executing the job on selected HPC nodes wherein the allocating the power budget to the job is based on an estimate of power required to execute
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the allocating the power budget to the job is based on an estimate of power required to execute the job determined based on at least one of a monitored power, an estimated power, and a calibrated power.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is based on the power and cooling capacity of the HPC system.
- HPC system includes a plurality of interconnected HPC nodes operable to execute a job
- determining a power and cooling capacity of the HPC system allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system
- determining the power budget for the HPC system is based on the power and cooling capacity of the HPC system.
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is performed by communicating to a utility provider through a demand/response interface.
- HPC High-performance computing
- the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and inputs from the utility provider. In one embodiment, the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures.
- the method further comprises of adjusting the power consumption of the HPC system in response to the failure of the power and cooling infrastructures.
- a method of managing power and performance of a High-performance computing (HPC) system comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining a power and cooling capacity of the HPC system is performed by an out of band mechanism.
- HPC system includes a plurality of interconnected HPC nodes operable to execute a job
- determining a power and cooling capacity of the HPC system allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining a power and cooling capacity of the HPC system is performed by an out of band
- a method of managing power and performance of a High-performance computing (HPC) system comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies, wherein the hard power limit decreases in response to failures of the power and cooling infrastructures of the HPC facility.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies, wherein allocating the power budget to the job is based on an estimate of a required power to execute the job.
- HPC High-performance computing
- a method of managing power and performance of a High-performance computing (HPC) system comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies, wherein the hard power limit is managed by an out of band mechanism.
- the power performance policies is based on at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes to execute the job are selected based in part by power performance policies.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes to execute the job are selected based in part by a power characteristics of the nodes.
- the power characteristics of the HPC nodes are determined upon running a sample workload. In another embodiment, the power characteristics of the HPC nodes are determined during an actual runtime.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the job is executed on the selected HPC nodes based in part upon a power performance policies.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein allocating the power budget to the job is based in part upon a power performance policies
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the allocating the power budget to the job is based in part on an estimate of a required power to execute the job.
- the estimate of the required power to execute the job is in part based upon at least one of a monitored power, an estimated power, and a calibrated power.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is in part based upon the power and cooling capacity of the HPC system.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is performed in part by communicating to a utility provider through a demand/response interface.
- the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and inputs from the utility provider. In one embodiment, the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures.
- the method further comprises adjusting the power consumption of the HPC system in response to the failure of the power and cooling infrastructures.
- a computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining a power and cooling capacity of the HPC system is performed by an out of band system.
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes.
- HPC facility manager to determine a power budget for the HPC system
- the HPC system includes a plurality of interconnected HPC nodes operable to execute a job
- an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager
- the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system
- a job manager to execute the job on selected nodes.
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC facility manager, the HPC system manager, and the job manager are governed by power performance policies.
- the power performance policies are in part based upon at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy.
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC system manager selects the selected HPC nodes to execute the job based in part by a power characteristics of the nodes.
- HPC facility manager to determine a power budget for the HPC system
- the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to
- a calibrator runs a sample workload on the HPC nodes and reports the power characteristics of the HPC nodes to the HPC system manager. In another embodiment, a calibrator determines the power characteristics of the HPC nodes during an actual runtime and reports it to the HPC system manager.
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC system manager allocates power to the job based in part by an estimated power required to run the job.
- an estimator calculates the estimated power required to run the job in part based upon at least one of a monitored power, an estimated power, and a calibrated power.
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the out of band mechanism monitors and reports failures of power delivery and cooling infrastructures of a HPC facility to the HPC facility manager.
- HPC facility manager to determine a power budget for the HPC system
- the HPC system includes a plurality of interconnected HPC nodes operable to execute a job
- an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager
- the HPC system manager to allocate the power b
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the out of band mechanism monitors and reports failures of power delivery and cooling infrastructures of the HPC system to the HPC system manager.
- HPC facility manager to determine a power budget for the HPC system
- the HPC system includes a plurality of interconnected HPC nodes operable to execute a job
- an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager
- the HPC system manager to allocate the power budge
- a system for managing power and performance of a High-performance computing (HPC) system comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC facility manager communicates capacity and requirements of the HPC system to a utility provider through a demand/response interface.
- HPC facility manager to determine a power budget for the HPC system
- the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge
- the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and inputs from the utility provider. In one embodiment, the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Water Supply & Treatment (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Public Health (AREA)
- General Business, Economics & Management (AREA)
- Automation & Control Theory (AREA)
- Environmental & Geological Engineering (AREA)
- Power Sources (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Debugging And Monitoring (AREA)
Abstract
A method of managing power and performance of a High-performance computing (HPC) systems, including: determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes is shown.
Description
- The present application claims the benefit of prior U.S. Provisional Patent Application No. 62/040,576, entitled “SIMPLE POWER-AWARE SCHEDULER TO LIMIT POWER CONSUMPTION BY HPC SYSTEM WITHIN A BUDGET” filed on Aug. 22, 2014, which is hereby incorporated by reference in its entirety.
- The present application is related to the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P73498) entitled ______ filed ______; the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74562) entitled ______ filed ______; the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74563) entitled ______ filed ______; the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74564) entitled ______ filed ______; the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74565) entitled ______ filed ______; the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74566) entitled ______ filed ______; the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74568) entitled ______ filed ______; and the U.S. patent application Ser. No. ______ (Attorney Docket No. 42P74569) entitled “A POWER AWARE JOB SCHEDULER AND MANAGER FOR A DATA PROCESSING SYSTEM”, filed ______.
- Embodiments of the invention relate to the field of computer systems; and more specifically, to the methods and systems of power management and monitoring of high performance computing systems.
- A High Performance Computing (HPC) system performs parallel computing by simultaneous use of multiple nodes to execute a computational assignment referred to as a job. Each node typically includes processors, memory, operating system, and I/O components. The nodes communicate with each other through a high speed network fabric and may use shared file systems or storage. The job is divided in thousands of parallel tasks distributed over thousands of nodes. These tasks synchronize with each other hundreds of times a second. Usually a HPC system can consume megawatts of power.
- Growing usage of HPC systems in the recent years have made power management a concern in the industry. Future systems are expected to deliver higher performance while operating under a power constrained environment. However, current methods used to manage power and cooling in traditional servers cause a degradation of performance.
- The most commonly used power management systems use an out of band mechanism to enforce both power allocation and system capacity limits Commonly used approaches to limit power usage of an HPC, such as Running Average Power Limit (RAPL), Node Manager (NM), and Datacenter Manager (DCM), use a power capping methodology. These power management systems define and enforce a power cap for each layer of HPC systems (e.g., Datacenter, Processors, Racks, Nodes, etc.) based on the limits. However, the power allocation in this methodology is not tailored to increase the performance. For example, Node Managers allocate equal power to the nodes within their power budget. However, if nodes under the same power conditions operate with different performance level such a variation in performance of the nodes results in degradation of the overall performance of the HPC system.
- Furthermore, today's HPC facilities communicate their demand for power to utility companies months in advance. Lacking a proper monitoring mechanism to forecast power consumption, such demands are usually made equal to or greater than the maximum power for a worst case workload a facility can use. However, the actual power consumption is usually expected to be lower and so the unused power is wasted.
- The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:
-
FIG. 1 illustrates an exemplary block diagram of an overall architecture of a power management and monitoring system in accordance with one embodiment. -
FIG. 2 illustrates an exemplary block diagram of overall interaction architecture of HPC Power-Performance Manager in accordance with one embodiment. -
FIG. 3 illustrates an exemplary block diagram showing an interaction between the HPC facility power manager and other component of the HPC facility. -
FIG. 4 illustrates an exemplary block diagram showing an interaction between the HPC System Power Manager with a Rack Manager and a Node Manager. -
FIG. 5 illustrates HPPM response mechanism at a node level in case of a power delivery or cooling failures. -
FIG. 6 illustrates an exemplary block diagram of a HPC system receiving various policy instructions. -
FIG. 7 illustrates an exemplary block diagram showing the interaction between the HPC Resource Manager and other components of the HPC System. -
FIG. 8 illustrates an exemplary block diagram of the interaction of the Job Manager with Power Aware Job Launcher according to power performance policies. -
FIG. 9 illustrates one embodiment of a process for power management and monitoring of high performance computing systems. -
FIG. 10 illustrates another embodiment of a process for power management and monitoring of high performance computing systems. - The following description describes methods and apparatuses for power management and monitoring of high performance computing systems. In the following description, numerous specific details such as specific power policies, particular power management devices, and etc. are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details.
- References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- As discussed above, embodiments described herein relate to the power management and monitoring for high performance computing systems. According to various embodiments of the invention, a frame work for workload aware, hierarchical and holistic management and monitoring for power and performance is disclosed.
-
FIG. 1 illustrates an example of power management and monitoring system for HPC systems according to one embodiment. The system is referred to herein as an HPC Power-Performance Manager (HPPM). In this example, HPCSystem 400 includes multiple components including ResourceManager 410, JobManager 420,Datacenter Manager 310, RackManager 430, Node Manager 431, andThermal Control 432. In one embodiment, HPPM receives numerous power performance policies input at different stages of the management. In one embodiment, power performance policies include a facility policy, a utility provider policy, a facility administrative policy, and a user policy. - HPC
System Power Manager 300 communicates the capacity and requirements ofHPC System 400 to HPCFacility Power Manager 200. HPCFacility Power Manager 200 then communicates the power allocated by the utility provider back to HPCSystem Power Manager 300. In one embodiment, HPCSystem Power Manager 300 also receives administrative policies fromHPC System Administrator 202. - In order to properly allocate power to
HPC System 400, in one embodiment HPCSystem Power Manager 300 receives the power and thermal capacity ofHPC System 400 and maintains the average power consumption ofHPC System 400 at or below the allocation. In one embodiment, a soft limit is defined in part by the power available for the allocation. In one embodiment, the soft limit includes the power allocated to each HPC system within HPPM and the power allocated to each job. In one embodiment, thejob manager 420 enforces the soft limit to each job based on the power consumption of each node. - Furthermore, the power consumption of
HPC System 400 never exceeds the power and thermal capacity of the cooling and power delivery infrastructures. In one embodiment, a hard limit is defined by the power and thermal capacity of the cooling and power delivery infrastructures. In one embodiment, hard limit defines power and cooling capability available for the nodes, racks, systems and datacenters within a HPC facility. The cooling and power infrastructures may or may not be shared by different elements of the HPC facility. In one embodiment, the hard limit fluctuates in response to failures in cooling and power delivery infrastructures, while the soft limit remains at or below the hard limit at any time. - HPC
System Power Manager 300 uses Out of Band mechanism 301 (e.g.,Node Manager 431,Thermal Control 432,Rack Manager 430 and Datacenter Manager 310) to monitor and manage the hard limit for each component. In one embodiment, the Out ofBand mechanism 301, unlike InBand mechanism 302, uses an independent embedded controller outside the system with an independent networking capability to perform its operation. - To maintain the power consumption of
HPC System 400 within the limits (both the hard limit and the soft limit) and to increase energy efficiency, HPCSystem Power Manager 300 allocates power to the jobs. In one embodiment, the allocation of power to the jobs is based on the dynamic monitoring and power-aware management ofResource Manager 410 andJob Manager 420 further described below. In one embodiment,Resource Manager 410 andJob Manager 420 are operated by In-Band mechanism 302. In one embodiment, InBand mechanism 301 uses system network and software for monitoring, communication, and execution. - An advantage of embodiments described herein is that the power consumption is managed by allocating power to the jobs. As such, the power consumption is allocated in a way to cause significant reduction in the performance variations of the nodes and subsequently improvement in job completion time. In other words, the power allocated to a particular job is distributed among the nodes dedicated to run the job in such a way to achieve the increased performance.
-
FIG. 2 illustrates an example of interactions between different components of HPC Power-Performance Manager 100. It is pointed out that those elements ofFIG. 2 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such. The lines connecting the blocks represent communication between different components of a HPPM. - In one embodiment, these communications include communicating, for example, the soft and hard limits for each component of the
HPPM 100, reporting the power and thermal status of the components, reporting failures of power and thermal infrastructures, and communicating the available power for the components, etc. In one embodiment,HPPM 100 includes multiple components divided between multiple datacenters within a HPC facility.HPPM 100 also includes power and cooling resources shared by the components. In one embodiment, each datacenter includes a plurality of sever racks, and each server rack includes a plurality of nodes. - In one embodiment,
HPPM 100 manages power and performance of the system by forming a dynamic hierarchical management and monitoring structure. The power and thermal status of each layer is regularly monitored by a managing component and reported to a higher layer. The managing component of the higher layer aggregates the power and thermal conditions of its lower components and reports it to its higher layer. Reversely, the higher managing component ensures the allocation of power to its lower layers is based upon the current power and thermal capacity of their components. - For example, in one embodiment, HPC
Facility Power Manager 200 distributes power to multiple datacenters and resources shared within the HPC facility. HPCFacility Power Manager 200 receives the aggregated report of the power and thermal conditions of the HPC facility fromDatacenter Manager 210. In one embodiment,Datacenter Manager 210 is the highest managing component ofHPPM 100.Datacenter Manager 210 is the higher managing component of plurality of datacenters. Each datacenter is managed by a datacenter manager, such as for example,Datacenter Manager 310.Datacenter Manager 310 is the higher managing component of a plurality of server racks. Each server rack includes plurality of nodes. In one embodiment,Datacenter Manager 310 is a managing component for the nodes of an entire or part of a server rack while in otherembodiments Datacenter Manager 301 is a managing component for nodes of multiple racks. Each node is managed by a node manager. For example, each ofNodes 500 is managed byNode Manager 431.Node Manager 431 monitors and manages power consumption and thermal status of its associated node. -
Datacenter Manager 310 is also a higher managing component for the power and cooling resources shared by a plurality of the nodes. Each shared power and cooling resource is managed by a rack manager, for example theRack Manager 430. In one embodiment, plurality of nodes share multiple power and cooling resources each managed by a rack manager. In one embodiment, HPCFacility Power Manager 200 sends the capacity and requirements of the HPC facility to a utility provider. HPCFacility Power Manager 200 distributes the power budget to HPC System Power Manager associated with each HPC System (e.g., the HPC System Power Manager 300). HPCSystem Power Manager 300 determines how much power to allocate to each job.Job Manager 420 manages power performance of a job within the budget allocated by the HPCSystem Power Manager 300.Job Manager 420 manages a job throughout its life cycle by controlling the power allocation and frequencies ofNodes 500. - In one embodiment, if a power or thermal failure occurs on any lower layers of
Datacenter Manager 310,Datacenter Manager 310 immediately warns HPCSystem Power Manager 300 of the change in power or thermal capacity. Subsequently, HPCSystem Power Manager 300 adjusts the power consumption of the HPC system by changing the power allocation to the jobs. -
FIG. 3 demonstrates the role of HPCFacility Power Manager 200 in more details. It is pointed out that those elements ofFIG. 3 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such. In one embodiment,HPC Facility 101 includes HPCFacility Power Manager 200, Power Generator andStorage 210,Power Convertor 220,Cooling System 230 that may include storage of a cooling medium, and several HPC systems including theHPC System 400. Each HPC system is managed by a HPC System Power Manager (e.g., HPCSystem Power Manager 300 manages HPC System 400). - In one embodiment, HPC
Facility Power Manager 200 manages the power consumption ofHPC Facility 101. HPCFacility Power Manager 200 receives facility level policies from theFacility Administrator 102. In one embodiment, the facility level policies relate to selecting a local source of power, environmental considerations, and the overall operation policy of the facility. HPCFacility Power Manager 200 also communicates withUtility Provider 103. In one embodiment, HPCFacility Power Manager 200 communicates its forecasted capacity and requirements ofHPC Facility 101 in advance to theUtility Provider 103. In oneembodiment HPC Facility 101 uses Demand/Response interface to communicate withUtility Provider 103. - In one embodiment, the Demand/Response interface provides a non-proprietary interface that allows the
Utility Provider 103 to send signals about electricity price and system grid reliability directly to customers, e.g.HPC Facility 101. The dynamic monitoring allows for HPCFacility Power Manager 200 to more accurately estimate the required power and communicate its capacity and requirement automatically toUtility Provider 103. This method allows for improving cost based on the price in real time and reduces the disparity between the allocated power by theUtility Provider 103 and the power actually used by theFacility 101. - In one embodiment, HPPM determines a power budget at a given time based upon the available power from
Utility Provider 103, the cost of the power fromUtility Provider 103, the available power in the local Power Generator andStorage 210, and actual demand by the HPC systems. In one embodiment, HPPM substitutes the energy from the utility provider by the energy from the local storages or electricity generators. In one embodiment, HPPM receives the current price of electricity and makes the electricity produced by Power Generator andStorage 210 available for sell in the market. -
FIG. 4 illustrates how HPCSystem Power Manager 300 manages shared power supply among nodes using a combination ofRack Manager 430 andNode Manager 440. It is pointed out that those elements ofFIG. 4 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such. - In one embodiment,
Rack Manager 430 reports the status of the shared resources and receives power limits fromDatacenter Manager 310.Node Manager 440 reports node power consumption and receive node power limits fromDatacenter Manager 310. Similarly,Datacenter Manager 310 reports system power consumption to HPCSystem Power Manager 300. The communication between HPCSystem Power Manager 300 andDatacenter Manager 310 facilitates monitoring of the cooling and power delivery infrastructure in order to maintain the power consumption within the hard limit. In one embodiment, HPCSystem Power Manager 300 maintains the power consumption of the nodes or processors by adjusting the power allocated to them. - In one embodiment, in case failure of power supply or cooling systems results in a sudden reduction of available power, the hard limit is reduced automatically by either or both of
Rack Manager 430 andNode Manager 440 to a lower limit to avoid a complete failure of the power supply. Subsequently the sudden reduction of available power is reported to HPCSystem Power Manger 300 throughDatacenter Manager 310 by either or both ofRack Manager 430 andNode Manager 440, so that HPCSystem Power Manger 300 can readjust the power allocation accordingly. -
FIG. 5 illustrates HPPM response mechanism at a node level in case of a power delivery or cooling failures. It is pointed out that those elements ofFIG. 5 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such. - In one embodiment, a cooling and power delivery failure does not impact all nodes equally. Once
Node Manager 431 identifies the impacted nodes, forexample Nodes 500, it will adjust the associated hard limit forNodes 500. This hard limit is then communicated toJob Manager 420.Job Manager 420 adjusts the soft limit associated withNodes 500 to maintain both soft limit and power consumption ofNodes 500 at or below the hard limit. In one embodiment, the frequency of the communication betweenNode Manager 431 andJob Manager 420 is in milliseconds. - In one embodiment, a faster response is required to avoid further power failure of the system. As such,
Node Manager 431 directly alertsNodes 500. The alert imposes a restriction onNodes 500 and causes an immediate reduction of power consumption byNodes 500. In one embodiment, such a reduction could be more than necessary to avoid further power failures. SubsequentlyNode Manager 431 communicates the new hard limit toJob Manager 420.Job Manager 420 adjusts the soft limits ofNodes 500 to maintain the power consumption ofNodes 500 at or below the hardlimit Job Manager 420 enforces the new hard limit and removes the alert asserted byNode Manager 431. - Referring to
FIG. 6 , an exemplary block diagram of a HPC system receiving various inputs is illustrated. It is pointed out that those elements ofFIG. 6 having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such. In one embodiment described herein,HPC system 400 includes one or more operating system (OS)nodes 501, one ormore compute nodes 502, one or more input/output (I/O)nodes 503 and astorage system 504. The high-speed fabric 505 communicatively connects theOS nodes 501, computenodes 502 and I/O nodes 503 andstorage system 504 The high-speed fabric may be a network topology of nodes interconnected via one or more switches. In one embodiment, as illustrated inFIG. 6 , I/O nodes 503 are communicatively connected tostorage 504. In one embodiment,storage 504 is a non-persistent storage such as volatile memory (e.g., any type of random access memory “RAM”); persistent storage such as non-volatile memory (e.g., read-only memory “ROM”, power-backed RAM, flash memory, phase-change memory, etc.), a solid-state drive, hard disk drive, an optical disc drive, or a portable memory device. - The
OS nodes 501 provide a gateway to accessing thecompute nodes 502. For example, prior to submitting a job for processing on thecompute nodes 502, a user may be required to log-in toHPC system 400 which may be throughOS nodes 501. In embodiments described herein,OS nodes 501 accept jobs submitted by users and assist in the launching and managing of jobs being processed bycompute nodes 502. - In one embodiment, compute
nodes 502 provide the bulk of the processing and computational power. I/O nodes 503 provides an interface betweencompute nodes 502 and external devices (e.g., separate computers) that provides input toHPC system 400 or receive output fromHPC system 400. - The limited power allocated to
HPC system 400 is used byHPC system 400 to run one or more ofjobs 520.Jobs 520 comprise one or more jobs requested to be run onHPC system 400 by one or more users, forexample User 201. Each job includes a power policy, which will be discussed in-depth below. The power policy will assist the HPC System Power Manager in allocating power for the job and aid in the management of the one ormore jobs 520 being run byHPC system 400. - In addition,
HPC System Administrator 202 provides administrative policies to guide the management of runningjobs 520 by providing an over-arching policy that defines the operation ofHPC system 400. In one embodiment, examples of policies in the administrative policies include, but are not limited or restricted to, (1) a policy to increase utilization of all hardware and software resources (e.g., instead of running fewer jobs at high power and leaving resources unused, run as many jobs as possible to use as much of the resources as possible); (2) a job with no power limit is given the highest priority among all running jobs; and/or (3) suspended jobs are at higher priority for resumption. Such administrative policies govern the way the HPC System Power Manager schedules, launches, suspends and re-launches one or more jobs. -
User 201 policy can be specific to a particular job.User 201 can instructHPC System 400 to run a particular job with no power limit or according to a customized policy. AdditionallyUser 201 can set the energy policy of a particular job, for example at most efficiency or highest performance. - As shown in
FIG. 1 ,HPC System Administrator 202 andUser 201 communicate their policies to the HPCSystem Power Manager 300 andResource Manager 410. In one embodiment,Resource Manager 410 receives these policies and formulates them into “modes” under whichJob Manager 420 instructsOS Nodes 501,CPU Nodes 502, andIO Node 503 to operate. -
FIG. 7 shows the flow of information between Resource Manager 410 (including PowerAware Job scheduler 411, and Power Aware Job launcher 412) and other elements of the HPPM (HPCSystem Power Manager 300,Estimator 413,Calibrator 414, and Job Manager 420). In one embodiment, the purpose of these communications is to allocate sufficient hardware resources (e.g., nodes, processors, memories, network bandwidth and etc.) and schedule execution of appropriate jobs. In one embodiment, power is allocated to the jobs in such a way to maintainHPC System 400 power within the limits, increase energy efficiency, and controlHPC system 400 rate of power consumption change. - Referring to
FIG. 6 , to determine amount of power allocation to each job, HPCSystem Power Manager 300 communicates withResource Manager 410. PowerAware Job scheduler 411 considers the policies and priorities ofFacility Administrator 102,Utility Provider 103,User 201, andHPC System Administrator 202 and determines accordingly what hardware resources ofHPC System 400 is needed to run a particular job. Additionally, PowerAware Job scheduler 411 receives power-performance characteristics of the job at different operating points fromEstimator 413 andCalibrator 414.Resource Manager 410 forecasts how much power a particular job needs and take corrective actions when actual power differs from the estimation. -
Estimator 413 providesResource Manager 410 with estimates of power consumption for each job enablingResource Manager 410 to efficiently schedule and monitor each job requested by one or more job owners (e.g., users).Estimator 413 provides a power consumption estimate based on, for example, maximum and average power values stored in a calibration database, wherein the calibration database is populated by the processing ofCalibrator 414. In addition, the minimum power required for each job is considered. Other factors that is used byEstimator 413 to create a power consumption estimate include, but are not limited or restricted to, whether the owner of the job permits the job to be subject to a power limit, the job power policy limiting the power supplied to the job (e.g., a predetermined fixed frequency at which the job will run, a minimum power required for the job, or varying frequencies and/or power supplied determined by Resource Manager 410), the startup power for the job, the frequency at which the job will run, the available power toHPC System 400 and/or the allocated power toHPC System 400. -
Calibrator 414 calibrates the power, thermal dissipation and performance of each node withinHPC System 400.Calibrator 414 provides a plurality of methods for calibrating the nodes withinHPC system 400. In one embodiment,Calibrator 414 provides a first method of calibration in which every node withinHPC system 400 runs sample workloads (e.g., a mini-application and/or a test script) soCalibrator 414 may sample various parameters (e.g., power consumed) at predetermined time intervals in order to determine, inter alia, (1) the average power, (2) the maximum power, and (3) the minimum power for each node. In addition, the sample workload is run on each node at every operating frequency of the node. In another embodiment,Calibrator 414 provides a second method of calibration in which calibration of one or more nodes occurs during the run-time of a job. In such a situation,Calibrator 414 samples the one or more nodes on which a job is running (e.g., processing). In the second method,Calibrator 414 obtains power measurements of each node during actual run-time. - In one embodiment, Power
Aware Job Scheduler 411 is configured to receive a selection of a mode for a job, to determine an available power for the job based on the mode and to allocate a power for the job based on the available power. In one embodiment, PowerAware Job Scheduler 411 is configured to determine a uniform frequency for the job based on the available power. In one embodiment, the power aware job scheduler is configured to determine the available power for the job based on at least one of a monitored power, an estimated power, and a calibrated power. - Generally, a user submits a program to be executed (“job”) to a queue. The job queue refers to a data structure containing jobs to run. In one embodiment, Power
Aware Job Scheduler 411 examines the job queue at appropriate times (periodically or at certain events e.g., termination of previously running jobs) and determines if resources including the power needed to run the job can be allocated. In some cases, such resources can be allocated only at a future time, and in such cases the job is scheduled to run at a designated time in future. PowerAware Job Launcher 412 selects a job among the jobs in the queue, based on available resources and priority, and schedules it to be launched. In one embodiment, in case the available power is limited, PowerAware Job Launcher 412 will look at the operating points to select the one which results in highest frequency while maintain the power consumption below the limit. -
FIG. 8 illustrates the interaction ofJob Manager 420 with PowerAware Job Launcher 412 according toPower Performance Policies 440. Once a job is launched, it is assigned a job manager, forexample Job Manager 420.Job Manager 420 manages power performance of the job throughout its life cycle. In one embodiment,Job Manager 420 is responsible for operating the job within the constraints of one or more power policies and various power limits after the job has been launched. In one embodiment, for example, a user may designate “special” jobs that are not power limited. PowerAware Job scheduler 411 will need to estimate the maximum power the job could consume, and only start the job when the power is available.System Power Performance 300 redistributes power among the normal jobs in order to reduce stranded power and increase efficiency. But even if the allocated power forHPC System 400 falls, the workload manager ensures that these “special” jobs' power allocations remain intact. In another example, a user may specify the frequency for a particular job. In one embodiment, user selection may be based upon a table that indicates degradation in performance and reduction in power for each frequency. - Alternatively, the frequency selection for the jobs can be automated based upon available power. In one embodiment, with dynamic power monitoring,
Job Manager 420 will adjust the frequency periodically based upon power headroom. An advantage of embodiment described herein is that a job will be allowed to operate at all available frequencies.Job Manager 420 will determine the best mode to run the job based upon the policies and priorities communicated byFacility Administrator 102,Utility Provider 103,User 201, andHPC System Administrator 202. -
FIG. 9 is a flow diagram of one embodiment of a process for managing power and performance of HPC systems. The process is performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), firmware or a combination of the three. - Referring to
FIG. 9 , at block 901, HPPM communicates capacity and requirements of the HPC system to a utility provider. In one embodiment, the capacity of the HPC system is determined based on the cooling and power delivery capacity of the HPC system. In one embodiment, the HPPM communicates its capacity and requirements to the utility provider through a demand/response interface. In one embodiment, the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and input from the utility provider. In one embodiment the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism. - At
block 902, HPPM determines a power budget for the HPC system. In one embodiment, the power budget is determined based on the cooling and power delivery capacity of the HPC system. In one embodiment, the power budget is determined based on the power performance policies. In one embodiment, the power performance policies based on at least one of a facility policy, a utility provider policy, a facility administrative policy, and a user policy. - At
block 903, HPPM determines a power and cooling capacity of the HPC system. In one embodiment, determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures. In one embodiment, in case of a failure the power consumption is adjusted accordingly. In one embodiment, determining the power and cooling capacity of the HPC system is performed by an out of band mechanism. - At
block 904, HPPM allocates the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system. In one embodiment, allocating the power budget to the job is based on power performance policies. In one embodiment, allocating the power budget to the job is based on an estimate of power required to execute the job. In one embodiment, the estimate of the required power to execute the job is based on at least one of a monitored power, an estimated power, and a calibrated power. - At
block 905, HPPM executes the job on selected HPC nodes. In one embodiment, the selected HPC nodes are selected based on power performance policies. In one embodiment, the selected HPC nodes are selected based on power characteristics of the nodes. In one embodiment, the power characteristics of the HPC nodes are determined based on running of a sample workload. In one embodiment, the power characteristics of the HPC nodes are determined during runtime. In one embodiment, wherein the job is executed on the selected HPC nodes based on power performance policies. -
FIG. 10 is a flow diagram of one embodiment of a process for managing power and performance of HPC systems. The process is performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as is run on a general purpose computer system or a dedicated machine), firmware or a combination of the three. Referring toFIG. 10 , atblock 1001, HPPM defines a hard power limit based on a thermal and power delivery capacity of a HPC facility. In one embodiment, the hard power limit is managed and monitored by an out of band mechanism. In one embodiment, the hard power limit decreases in response to failures of the power and cooling infrastructures of the HPC facility. - At
block 1002, HPPM defines a soft power limit based on a power budget allocated to the HPC facility. In one embodiment, the power budget for the HPC facility is provided by a utility provider through a demand/response interface. In one embodiment, the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and input from the utility provider. In one embodiment the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism. - At
block 1003, HPPM allocates the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit. In one embodiment, allocating the power budget to the job is based on power performance policies. In one embodiment, allocating the power budget to the job is based on an estimate of power required to execute the job. In one embodiment, the estimate of the required power to execute the job is based on at least one of a monitored power, an estimated power, and a calibrated power. - At
block 1004, HPPM executes the job on nodes while maintaining the soft power limit at or below the hard power limit. In one embodiment, allocating the power budget to the job and executing the job on the nodes is according to power performance policies. In one embodiment, the power performance policies are based on at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy. - Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of transactions on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of transactions leading to a desired result. The transactions are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
- It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method transactions. The required structure for a variety of these systems will appear from the description above. In addition, embodiments of the present invention are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the invention as described herein.
- In the foregoing specification, embodiments of the invention have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the invention as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
- Throughout the description, embodiments of the present invention have been presented through flow diagrams. It will be appreciated that the order of transactions and transactions described in these flow diagrams are only intended for illustrative purposes and not intended as a limitation of the present invention. One having ordinary skill in the art would recognize that variations can be made to the flow diagrams without departing from the broader spirit and scope of the invention as set forth in the following claims.
- The following examples pertain to further embodiments:
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power performance policies.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power characteristics of the nodes.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power characteristics of the nodes determined based on running of a sample workload.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes are selected based on power characteristics of the nodes determined during runtime.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the job is executed on the selected HPC nodes based on power performance policies.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein allocating the power budget to the job is based on power performance policies.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the allocating the power budget to the job is based on an estimate of power required to execute the job.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the allocating the power budget to the job is based on an estimate of power required to execute the job determined based on at least one of a monitored power, an estimated power, and a calibrated power.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is based on the power and cooling capacity of the HPC system.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is performed by communicating to a utility provider through a demand/response interface. In one embodiment, the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and inputs from the utility provider. In one embodiment, the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures. In one embodiment, the method further comprises of adjusting the power consumption of the HPC system in response to the failure of the power and cooling infrastructures.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining a power and cooling capacity of the HPC system is performed by an out of band mechanism.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies, wherein the hard power limit decreases in response to failures of the power and cooling infrastructures of the HPC facility.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies, wherein allocating the power budget to the job is based on an estimate of a required power to execute the job.
- A method of managing power and performance of a High-performance computing (HPC) system, comprising, defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, defining a soft power limit based on a power budget allocated to the HPC facility, allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit, executing the job on nodes while maintaining the soft power limit at or below the hard power limit, and allocating the power budget to the job and executing the job on the nodes according to power performance policies, wherein the hard power limit is managed by an out of band mechanism. In one embodiment, the power performance policies is based on at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes to execute the job are selected based in part by power performance policies.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the selected HPC nodes to execute the job are selected based in part by a power characteristics of the nodes. In one embodiment, the power characteristics of the HPC nodes are determined upon running a sample workload. In another embodiment, the power characteristics of the HPC nodes are determined during an actual runtime.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the job is executed on the selected HPC nodes based in part upon a power performance policies.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein allocating the power budget to the job is based in part upon a power performance policies
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein the allocating the power budget to the job is based in part on an estimate of a required power to execute the job. In one embodiment, the estimate of the required power to execute the job is in part based upon at least one of a monitored power, an estimated power, and a calibrated power.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is in part based upon the power and cooling capacity of the HPC system.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power budget for the HPC system is performed in part by communicating to a utility provider through a demand/response interface. In one embodiment, the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and inputs from the utility provider. In one embodiment, the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures. In one embodiment, the method further comprises adjusting the power consumption of the HPC system in response to the failure of the power and cooling infrastructures.
- A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising, determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, determining a power and cooling capacity of the HPC system, allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system, and executing the job on selected HPC nodes, wherein determining a power and cooling capacity of the HPC system is performed by an out of band system.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC facility manager, the HPC system manager, and the job manager are governed by power performance policies. In one embodiment, the power performance policies are in part based upon at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC system manager selects the selected HPC nodes to execute the job based in part by a power characteristics of the nodes. In one embodiment, a calibrator runs a sample workload on the HPC nodes and reports the power characteristics of the HPC nodes to the HPC system manager. In another embodiment, a calibrator determines the power characteristics of the HPC nodes during an actual runtime and reports it to the HPC system manager.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC system manager allocates power to the job based in part by an estimated power required to run the job. In one embodiment, an estimator calculates the estimated power required to run the job in part based upon at least one of a monitored power, an estimated power, and a calibrated power.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the out of band mechanism monitors and reports failures of power delivery and cooling infrastructures of a HPC facility to the HPC facility manager.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the out of band mechanism monitors and reports failures of power delivery and cooling infrastructures of the HPC system to the HPC system manager.
- A system for managing power and performance of a High-performance computing (HPC) system, comprising, a HPC facility manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job, an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC system manager, the HPC system manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system, and a job manager to execute the job on selected nodes, wherein the HPC facility manager communicates capacity and requirements of the HPC system to a utility provider through a demand/response interface. In one embodiment, the demand/response interface reduces a cost for the power budget based on the capacity and requirements of the HPC system and inputs from the utility provider. In one embodiment, the demand/response interface communicates the capacity and requirements of the HPC system through an automated mechanism.
- In the foregoing specification, methods and apparatuses have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of embodiments as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
Claims (25)
1. A method of managing power and performance of a High-performance computing (HPC) system, comprising:
determining a power budget for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job;
determining a power and cooling capacity of the HPC system;
allocating the power budget to the job to maintain a power consumption of the HPC system within the power budget and the power and cooling capacity of the HPC system; and
executing the job on selected HPC nodes.
2. The methods of claim 1 , wherein the selected HPC nodes are selected based on power characteristics of the nodes.
3. The methods of claim 2 , wherein the power characteristics of the HPC nodes are determined based on running of sample workloads.
4. The methods of claim 1 , wherein the allocating the power budget to the job is based on an estimate of power required to execute the job.
5. The methods of claim 4 , wherein the estimate of the required power to execute the job is based on at least one of a monitored power, an estimated power, and a calibrated power.
6. The methods of claim 1 , wherein determining the power budget for the HPC system is performed by communicating to a utility provider through a demand/response interface.
7. The methods of claim 1 , wherein determining the power and cooling capacity of the HPC system includes monitoring and reporting failures of power delivery and cooling infrastructures.
8. The methods of claim 7 further comprising adjusting the power consumption of the HPC system in response to the failure of the power and cooling infrastructures.
9. The methods of claim 1 , wherein the allocating the power budget to the job and executing the job on selected HPC nodes are governed by power performance policies.
10. A method of managing power and performance of a High-performance computing (HPC) system, comprising:
defining a hard power limit based on a thermal and power delivery capacity of a HPC facility, wherein the HPC facility includes plurality of HPC systems, and the HPC system includes a plurality of interconnected HPC nodes operable to execute a job;
defining a soft power limit based on a power budget allocated to the HPC facility;
allocating the power budget to the job to maintain an average power consumption of the HPC facility below the soft power limit;
executing the job on nodes while maintaining the soft power limit at or below the hard power limit; and
allocating the power budget to the job and executing the job on the nodes according to power performance policies.
11. The methods of claim 10 , wherein the hard power limit decreases in response to failures of the power and cooling infrastructures of the HPC facility.
12. The methods of claim 10 , wherein allocating the power budget to the job is based on an estimate of a required power to execute the job.
13. The methods of claim 10 , wherein the power performance policies is based on at least one of a HPC facility policy, a utility provider policy, a HPC administrative policy, and a user policy.
14. A computer readable medium having stored thereon sequences of instruction which are executable by a system, and which, when executed by the system, cause the system to perform a method, comprising:
determining a power budge for a HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job;
determining a power and cooling capacity of the HPC system;
allocating the power budget to the job such that a power consumption of the HPC system stays within the power budget and the power and cooling capacity of the HPC system; and
executing the job on selected HPC nodes.
15. The computer readable medium of claim 14 , wherein the selected HPC nodes to execute the job are selected based in part by a power characteristics of the nodes.
16. The computer readable medium of claim 15 , wherein the power characteristics of the HPC nodes are determined upon running a sample workload.
17. The computer readable medium of claim 14 , wherein the allocating the power budget to the job is based in part on an estimate of a required power to execute the job.
18. The computer readable medium of claim 17 , wherein the estimate of the required power to execute the job is in part based upon at least one of a monitored power, an estimated power, and a calibrated power.
19. The computer readable medium of claim 14 , wherein determining the power budget for the HPC system is performed in part by communicating to a utility provider through a demand/response interface.
20. The computer readable medium of claim 14 , wherein the allocating the power budget to the job and executing the job on selected HPC nodes are governed by power performance policies.
21. A system for managing power and performance of a High-performance computing (HPC) system, comprising:
a HPC Facility Power Manager to determine a power budget for the HPC system, wherein the HPC system includes a plurality of interconnected HPC nodes operable to execute a job;
an out of band mechanism to monitor and report a cooling and power capacity of the HPC system to a HPC System Power Manager;
the HPC System Power Manager to allocate the power budge to the job within limitations of the cooling and power capacity of the HPC system;
a job manager to execute the job on selected nodes.
22. The system of claim 21 , wherein the HPC System Power Manager selects the selected HPC nodes to execute the job based in part by a power characteristics of the nodes.
23. The system of claim 21 , wherein the HPC System Power Manager allocates power to the job based in part by an estimated power required to run the job.
24. The system of claim 21 , wherein the out of band mechanism monitors and reports failures of power delivery and cooling infrastructures of the HPC system to the HPC System Power Manager.
25. The system of claim 21 , wherein the HPC Facility Power Manager communicates capacity and requirements of the HPC system to a utility provider through a demand/response interface.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/582,743 US20160054779A1 (en) | 2014-08-22 | 2014-12-24 | Managing power performance of distributed computing systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462040576P | 2014-08-22 | 2014-08-22 | |
US14/582,743 US20160054779A1 (en) | 2014-08-22 | 2014-12-24 | Managing power performance of distributed computing systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160054779A1 true US20160054779A1 (en) | 2016-02-25 |
Family
ID=55348281
Family Applications (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/582,772 Active US10289183B2 (en) | 2014-08-22 | 2014-12-24 | Methods and apparatus to manage jobs that can and cannot be suspended when there is a change in power allocation to a distributed computer system |
US14/582,743 Abandoned US20160054779A1 (en) | 2014-08-22 | 2014-12-24 | Managing power performance of distributed computing systems |
US14/582,764 Active 2035-03-08 US9921633B2 (en) | 2014-08-22 | 2014-12-24 | Power aware job scheduler and manager for a data processing system |
US14/582,756 Active US9927857B2 (en) | 2014-08-22 | 2014-12-24 | Profiling a job power and energy consumption for a data processing system |
US14/582,795 Active 2035-03-23 US9575536B2 (en) | 2014-08-22 | 2014-12-24 | Methods and apparatus to estimate power performance of a job that runs on multiple nodes of a distributed computer system |
US14/582,783 Active 2036-03-21 US10712796B2 (en) | 2014-08-22 | 2014-12-24 | Method and apparatus to generate and use power, thermal and performance characteristics of nodes to improve energy efficiency and reducing wait time for jobs in the queue |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/582,772 Active US10289183B2 (en) | 2014-08-22 | 2014-12-24 | Methods and apparatus to manage jobs that can and cannot be suspended when there is a change in power allocation to a distributed computer system |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/582,764 Active 2035-03-08 US9921633B2 (en) | 2014-08-22 | 2014-12-24 | Power aware job scheduler and manager for a data processing system |
US14/582,756 Active US9927857B2 (en) | 2014-08-22 | 2014-12-24 | Profiling a job power and energy consumption for a data processing system |
US14/582,795 Active 2035-03-23 US9575536B2 (en) | 2014-08-22 | 2014-12-24 | Methods and apparatus to estimate power performance of a job that runs on multiple nodes of a distributed computer system |
US14/582,783 Active 2036-03-21 US10712796B2 (en) | 2014-08-22 | 2014-12-24 | Method and apparatus to generate and use power, thermal and performance characteristics of nodes to improve energy efficiency and reducing wait time for jobs in the queue |
Country Status (6)
Country | Link |
---|---|
US (6) | US10289183B2 (en) |
EP (5) | EP3183654A4 (en) |
JP (2) | JP6701175B2 (en) |
KR (2) | KR102213555B1 (en) |
CN (4) | CN106537348B (en) |
WO (3) | WO2016028371A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160291656A1 (en) * | 2015-04-01 | 2016-10-06 | Dell Products, L.P. | Method and apparatus for collaborative power and thermal control of fan run time average power limiting |
US10042410B2 (en) * | 2015-06-11 | 2018-08-07 | International Business Machines Corporation | Managing data center power consumption |
US10503230B2 (en) * | 2015-11-25 | 2019-12-10 | Electronics And Telecommunications Research Institute | Method and apparatus for power scheduling |
US20240154415A1 (en) * | 2022-11-08 | 2024-05-09 | Oracle International Corporation | Techniques for orchestrated load shedding |
US12007734B2 (en) | 2022-09-23 | 2024-06-11 | Oracle International Corporation | Datacenter level power management with reactive power capping |
Families Citing this family (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8601473B1 (en) | 2011-08-10 | 2013-12-03 | Nutanix, Inc. | Architecture for managing I/O and storage for a virtualization environment |
WO2013081600A1 (en) * | 2011-11-30 | 2013-06-06 | Intel Corporation | Reducing power for 3d workloads |
US11243707B2 (en) | 2014-03-12 | 2022-02-08 | Nutanix, Inc. | Method and system for implementing virtual machine images |
US9900164B2 (en) * | 2015-06-10 | 2018-02-20 | Cisco Technology, Inc. | Dynamic power management |
US10073659B2 (en) | 2015-06-26 | 2018-09-11 | Intel Corporation | Power management circuit with per activity weighting and multiple throttle down thresholds |
US10437304B2 (en) * | 2016-03-15 | 2019-10-08 | Roku, Inc. | Brown out condition detection and device calibration |
US10613947B2 (en) | 2016-06-09 | 2020-04-07 | Nutanix, Inc. | Saving and restoring storage devices using application-consistent snapshots |
CN106200612B (en) * | 2016-07-07 | 2019-01-22 | 百度在线网络技术(北京)有限公司 | For testing the method and system of vehicle |
BR112016024645A2 (en) | 2016-08-11 | 2018-05-15 | Ericsson Telecomunicacoes Sa | ? method for managing a data processing task required by a client, cloud management node, computer program, and carrier? |
US10360077B2 (en) | 2016-10-10 | 2019-07-23 | International Business Machines Corporation | Measuring utilization of resources in datacenters |
US10545560B2 (en) | 2016-10-10 | 2020-01-28 | International Business Machines Corporation | Power management and utilization detection of computing components |
US10838482B2 (en) | 2016-10-10 | 2020-11-17 | International Business Machines Corporation | SLA-based power management in disaggregated computing systems |
US10401940B2 (en) * | 2016-10-10 | 2019-09-03 | International Business Machines Corporation | Power management in disaggregated computing systems |
US11169592B2 (en) | 2016-10-10 | 2021-11-09 | International Business Machines Corporation | SLA-based backup power management during utility power interruption in disaggregated datacenters |
US10819599B2 (en) | 2016-10-10 | 2020-10-27 | International Business Machines Corporation | Energy consumption as a measure of utilization and work characterization in a system |
CN106779295A (en) * | 2016-11-18 | 2017-05-31 | 南方电网科学研究院有限责任公司 | Power supply plan generation method and system |
US20180165772A1 (en) * | 2016-12-14 | 2018-06-14 | Palo Alto Research Center Incorporated | Tiered greening for large business operations with heavy power reliance |
IT201700034731A1 (en) * | 2017-03-29 | 2018-09-29 | St Microelectronics Srl | MODULE AND METHOD OF MANAGEMENT OF ACCESS TO A MEMORY |
WO2018190785A1 (en) * | 2017-04-10 | 2018-10-18 | Hewlett-Packard Development Company, L.P. | Delivering power to printing functions |
US10656700B2 (en) * | 2017-07-10 | 2020-05-19 | Oracle International Corporation | Power management in an integrated circuit |
US10782772B2 (en) * | 2017-07-12 | 2020-09-22 | Wiliot, LTD. | Energy-aware computing system |
US10831252B2 (en) | 2017-07-25 | 2020-11-10 | International Business Machines Corporation | Power efficiency-aware node component assembly |
JP6874594B2 (en) * | 2017-08-24 | 2021-05-19 | 富士通株式会社 | Power management device, node power management method and node power management program |
US10917496B2 (en) * | 2017-09-05 | 2021-02-09 | Amazon Technologies, Inc. | Networked storage architecture |
JP6996216B2 (en) * | 2017-10-16 | 2022-01-17 | コニカミノルタ株式会社 | Simulation device, information processing device, device setting method and device setting program |
KR102539044B1 (en) | 2017-10-30 | 2023-06-01 | 삼성전자주식회사 | Method of operating system on chip, system on chip performing the same and electronic system including the same |
US10824522B2 (en) * | 2017-11-27 | 2020-11-03 | Nutanix, Inc. | Method, apparatus, and computer program product for generating consistent snapshots without quiescing applications |
US10725834B2 (en) | 2017-11-30 | 2020-07-28 | International Business Machines Corporation | Job scheduling based on node and application characteristics |
CN108052394B (en) * | 2017-12-27 | 2021-11-30 | 福建星瑞格软件有限公司 | Resource allocation method based on SQL statement running time and computer equipment |
US10627885B2 (en) | 2018-01-09 | 2020-04-21 | Intel Corporation | Hybrid prioritized resource allocation in thermally- or power-constrained computing devices |
JP2019146298A (en) * | 2018-02-16 | 2019-08-29 | 富士ゼロックス株式会社 | Information processing apparatus and program |
KR102663815B1 (en) * | 2018-06-01 | 2024-05-07 | 삼성전자주식회사 | A computing device and operation method thereof |
US10936039B2 (en) * | 2018-06-19 | 2021-03-02 | Intel Corporation | Multi-tenant edge cloud system power management |
US11226667B2 (en) | 2018-07-12 | 2022-01-18 | Wiliot Ltd. | Microcontroller operable in a battery-less wireless device |
US11366753B2 (en) * | 2018-07-31 | 2022-06-21 | Marvell Asia Pte Ltd | Controlling performance of a solid state drive |
EP3608779B1 (en) | 2018-08-09 | 2024-04-03 | Bayerische Motoren Werke Aktiengesellschaft | Method for processing a predetermined computing task by means of a distributed, vehicle-based computing system as well as computing system, server device, and motor vehicle |
US11031787B2 (en) | 2018-09-14 | 2021-06-08 | Lancium Llc | System of critical datacenters and behind-the-meter flexible datacenters |
WO2020102930A1 (en) * | 2018-11-19 | 2020-05-28 | Alibaba Group Holding Limited | Power management method |
US11907029B2 (en) | 2019-05-15 | 2024-02-20 | Upstream Data Inc. | Portable blockchain mining system and methods of use |
US11073888B2 (en) * | 2019-05-31 | 2021-07-27 | Advanced Micro Devices, Inc. | Platform power manager for rack level power and thermal constraints |
US11314558B2 (en) * | 2019-07-23 | 2022-04-26 | Netapp, Inc. | Methods for dynamic throttling to satisfy minimum throughput service level objectives and devices thereof |
US11809252B2 (en) * | 2019-07-29 | 2023-11-07 | Intel Corporation | Priority-based battery allocation for resources during power outage |
US11397999B2 (en) | 2019-08-01 | 2022-07-26 | Lancium Llc | Modifying computing system operations based on cost and power conditions |
US11868106B2 (en) | 2019-08-01 | 2024-01-09 | Lancium Llc | Granular power ramping |
US10608433B1 (en) | 2019-10-28 | 2020-03-31 | Lancium Llc | Methods and systems for adjusting power consumption based on a fixed-duration power option agreement |
CN110958389B (en) * | 2019-12-05 | 2021-12-14 | 浙江大华技术股份有限公司 | Load starting method, equipment, device and storage medium of camera |
JP7367565B2 (en) | 2020-03-03 | 2023-10-24 | 富士通株式会社 | Power control device and power control program |
US11307627B2 (en) * | 2020-04-30 | 2022-04-19 | Hewlett Packard Enterprise Development Lp | Systems and methods for reducing stranded power capacity |
US20210397476A1 (en) * | 2020-06-18 | 2021-12-23 | International Business Machines Corporation | Power-performance based system management |
KR102176028B1 (en) * | 2020-08-24 | 2020-11-09 | (주)에오스와이텍 | System for Real-time integrated monitoring and method thereof |
KR102432007B1 (en) * | 2020-10-08 | 2022-08-12 | 인하대학교 산학협력단 | Reward-oriented task offloading under limited edge server power for mobile edge computing |
CN114816025A (en) * | 2021-01-19 | 2022-07-29 | 联想企业解决方案(新加坡)有限公司 | Power management method and system |
US20220342469A1 (en) * | 2021-04-23 | 2022-10-27 | Hewlett-Packard Development Company, L.P. | Power budget profiles of computing devices |
CN113434034B (en) * | 2021-07-08 | 2023-04-18 | 北京华恒盛世科技有限公司 | Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning |
EP4137913A1 (en) | 2021-08-17 | 2023-02-22 | Axis AB | Power management in processing circuitry which implements a neural network |
KR20230036589A (en) * | 2021-09-06 | 2023-03-15 | 삼성전자주식회사 | System-on-chip and operating method thereof |
KR102458919B1 (en) * | 2021-11-15 | 2022-10-26 | 삼성전자주식회사 | Memory System controlling an operation performance and Operating Method thereof |
US11720256B2 (en) * | 2021-12-15 | 2023-08-08 | Dell Products L.P. | Maximizing power savings using IO monitoring |
US11972267B2 (en) * | 2022-10-04 | 2024-04-30 | International Business Machines Corporation | Hibernation of computing device with faulty batteries |
US11714688B1 (en) * | 2022-11-17 | 2023-08-01 | Accenture Global Solutions Limited | Sustainability-based computing resource allocation |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100106985A1 (en) * | 2008-01-03 | 2010-04-29 | Broadcom Corporation | System and method for global power management in a power over ethernet chassis |
US20100205469A1 (en) * | 2009-02-06 | 2010-08-12 | Mccarthy Clifford A | Power budgeting for a group of computer systems |
US20100257531A1 (en) * | 2009-04-03 | 2010-10-07 | International Business Machines, Corporation | Scheduling jobs of a multi-node computer system based on environmental impact |
US20110022857A1 (en) * | 2009-07-24 | 2011-01-27 | Sebastien Nussbaum | Throttling computational units according to performance sensitivity |
US20120072745A1 (en) * | 2010-09-22 | 2012-03-22 | International Business Machines Corporation | Server power management with automatically-expiring server power allocations |
US20130339776A1 (en) * | 2012-06-13 | 2013-12-19 | Cisco Technology, Inc. | System and method for automated service profile placement in a network environment |
US20150177814A1 (en) * | 2013-12-23 | 2015-06-25 | Dell, Inc. | Predictive power capping and power allocation to computing nodes in a rack-based information handling system |
US20160011914A1 (en) * | 2013-06-20 | 2016-01-14 | Seagate Technology Llc | Distributed power delivery |
Family Cites Families (108)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2264794B (en) | 1992-03-06 | 1995-09-20 | Intel Corp | Method and apparatus for automatic power management in a high integration floppy disk controller |
US5483656A (en) | 1993-01-14 | 1996-01-09 | Apple Computer, Inc. | System for managing power consumption of devices coupled to a common bus |
US5598537A (en) | 1994-03-25 | 1997-01-28 | Advanced Micro Devices, Inc. | Apparatus and method for driving a bus to a docking safe state in a dockable computer system including a docking station and a portable computer |
US5752050A (en) | 1994-10-04 | 1998-05-12 | Intel Corporation | Method and apparatus for managing power consumption of external devices for personal computers using a power management coordinator |
US5784628A (en) | 1996-03-12 | 1998-07-21 | Microsoft Corporation | Method and system for controlling power consumption in a computer system |
KR100285949B1 (en) | 1996-12-12 | 2001-04-16 | 윤종용 | Battery charging circuit |
US6125450A (en) | 1996-12-19 | 2000-09-26 | Intel Corporation | Stop clock throttling in a computer processor through disabling bus masters |
US5905900A (en) | 1997-04-30 | 1999-05-18 | International Business Machines Corporation | Mobile client computer and power management architecture |
JP2001109729A (en) | 1999-10-12 | 2001-04-20 | Nec Corp | Device and method for controlling power consumption in multiprocessor system |
US20020194251A1 (en) * | 2000-03-03 | 2002-12-19 | Richter Roger K. | Systems and methods for resource usage accounting in information management environments |
US6760852B1 (en) | 2000-08-31 | 2004-07-06 | Advanced Micro Devices, Inc. | System and method for monitoring and controlling a power-manageable resource based upon activities of a plurality of devices |
US7143300B2 (en) | 2001-07-25 | 2006-11-28 | Hewlett-Packard Development Company, L.P. | Automated power management system for a network of computers |
AU2002317618A1 (en) * | 2001-08-06 | 2003-02-24 | Mercury Interactive Corporation | System and method for automated analysis of load testing results |
US6839854B2 (en) | 2001-08-27 | 2005-01-04 | Intel Corporation | Voltage regulation for computer system components that increases voltage level when a component enters a sleep state as indicated by a power state status signal |
US7111179B1 (en) | 2001-10-11 | 2006-09-19 | In-Hand Electronics, Inc. | Method and apparatus for optimizing performance and battery life of electronic devices based on system and application parameters |
DE60106467T2 (en) | 2001-12-14 | 2006-02-23 | Hewlett-Packard Development Co., L.P., Houston | Procedure for installing monitoring agent, system and computer program of objects in an IT network monitoring |
US20030163745A1 (en) | 2002-02-27 | 2003-08-28 | Kardach James P. | Method to reduce power in a computer system with bus master devices |
US7028200B2 (en) | 2002-05-15 | 2006-04-11 | Broadcom Corporation | Method and apparatus for adaptive power management of memory subsystem |
GB0211764D0 (en) | 2002-05-22 | 2002-07-03 | 3Com Corp | Automatic power saving facility for network devices |
US7093146B2 (en) | 2002-07-31 | 2006-08-15 | Hewlett-Packard Development Company, L.P. | Power management state distribution using an interconnect |
US7403511B2 (en) | 2002-08-02 | 2008-07-22 | Texas Instruments Incorporated | Low power packet detector for low power WLAN devices |
US6971033B2 (en) | 2003-01-10 | 2005-11-29 | Broadcom Corporation | Method and apparatus for improving bus master performance |
US7418517B2 (en) | 2003-01-30 | 2008-08-26 | Newisys, Inc. | Methods and apparatus for distributing system management signals |
EP1480378A1 (en) * | 2003-05-23 | 2004-11-24 | Alcatel | Method for setting up a generic protocol relationship between network elements in a telecom network |
US6965776B2 (en) | 2003-11-21 | 2005-11-15 | Motorola, Inc. | Portable communication device and network and methods therefore |
US20050136961A1 (en) | 2003-12-17 | 2005-06-23 | Telefonaktiebolaget Lm Ericsson (Publ), | Power control method |
US7363517B2 (en) | 2003-12-19 | 2008-04-22 | Intel Corporation | Methods and apparatus to manage system power and performance |
US7406691B2 (en) | 2004-01-13 | 2008-07-29 | International Business Machines Corporation | Minimizing complex decisions to allocate additional resources to a job submitted to a grid environment |
US7272741B2 (en) | 2004-06-02 | 2007-09-18 | Intel Corporation | Hardware coordination of power management activities |
US7418608B2 (en) | 2004-06-17 | 2008-08-26 | Intel Corporation | Method and an apparatus for managing power consumption of a server |
US7908313B2 (en) * | 2004-07-21 | 2011-03-15 | The Mathworks, Inc. | Instrument-based distributed computing systems |
US8271807B2 (en) * | 2008-04-21 | 2012-09-18 | Adaptive Computing Enterprises, Inc. | System and method for managing energy consumption in a compute environment |
US7386739B2 (en) * | 2005-05-03 | 2008-06-10 | International Business Machines Corporation | Scheduling processor voltages and frequencies based on performance prediction and power constraints |
US7444526B2 (en) | 2005-06-16 | 2008-10-28 | International Business Machines Corporation | Performance conserving method for reducing power consumption in a server system |
US7475262B2 (en) | 2005-06-29 | 2009-01-06 | Intel Corporation | Processor power management associated with workloads |
US7562234B2 (en) | 2005-08-25 | 2009-07-14 | Apple Inc. | Methods and apparatuses for dynamic power control |
US7861068B2 (en) | 2006-03-07 | 2010-12-28 | Intel Corporation | Method and apparatus for using dynamic workload characteristics to control CPU frequency and voltage scaling |
US20070220293A1 (en) | 2006-03-16 | 2007-09-20 | Toshiba America Electronic Components | Systems and methods for managing power consumption in data processors using execution mode selection |
US8190682B2 (en) | 2006-03-31 | 2012-05-29 | Amazon Technologies, Inc. | Managing execution of programs by multiple computing systems |
US7539881B2 (en) | 2006-04-15 | 2009-05-26 | Hewlett-Packard Development Company, L.P. | System and method for dynamically adjusting power caps for electronic components based on power consumption |
US7555666B2 (en) | 2006-05-04 | 2009-06-30 | Dell Products L.P. | Power profiling application for managing power allocation in an information handling system |
US7827738B2 (en) * | 2006-08-26 | 2010-11-09 | Alexander Abrams | System for modular building construction |
US7694160B2 (en) * | 2006-08-31 | 2010-04-06 | Ati Technologies Ulc | Method and apparatus for optimizing power consumption in a multiprocessor environment |
WO2008035276A2 (en) * | 2006-09-22 | 2008-03-27 | Koninklijke Philips Electronics N.V. | Methods for feature selection using classifier ensemble based genetic algorithms |
US8370929B1 (en) * | 2006-09-28 | 2013-02-05 | Whitehat Security, Inc. | Automatic response culling for web application security scan spidering process |
US8055343B2 (en) | 2006-10-20 | 2011-11-08 | Cardiac Pacemakers, Inc. | Dynamic battery management in an implantable device |
US7844838B2 (en) | 2006-10-30 | 2010-11-30 | Hewlett-Packard Development Company, L.P. | Inter-die power manager and power management method |
US7793126B2 (en) | 2007-01-19 | 2010-09-07 | Microsoft Corporation | Using priorities and power usage to allocate power budget |
JP4370336B2 (en) | 2007-03-09 | 2009-11-25 | 株式会社日立製作所 | Low power consumption job management method and computer system |
US7941681B2 (en) * | 2007-08-17 | 2011-05-10 | International Business Machines Corporation | Proactive power management in a parallel computer |
JP5029823B2 (en) * | 2007-09-06 | 2012-09-19 | コニカミノルタビジネステクノロジーズ株式会社 | Image forming apparatus, power consumption management system, power consumption management method, and program |
JP4935595B2 (en) | 2007-09-21 | 2012-05-23 | 富士通株式会社 | Job management method, job management apparatus, and job management program |
CN101419495B (en) * | 2007-10-22 | 2012-05-30 | 国际商业机器公司 | Method and device for reducing I /O power in computer system and the computer system |
US8046600B2 (en) | 2007-10-29 | 2011-10-25 | Microsoft Corporation | Collaborative power sharing between computing devices |
KR20100081341A (en) * | 2007-10-31 | 2010-07-14 | 인터내셔널 비지네스 머신즈 코포레이션 | Method, system and computer program for distributing a plurality of jobs to a plurality of computers |
US8041521B2 (en) | 2007-11-28 | 2011-10-18 | International Business Machines Corporation | Estimating power consumption of computing components configured in a computing system |
US7971084B2 (en) * | 2007-12-28 | 2011-06-28 | Intel Corporation | Power management in electronic systems |
US8793786B2 (en) * | 2008-02-08 | 2014-07-29 | Microsoft Corporation | User indicator signifying a secure mode |
US8001403B2 (en) | 2008-03-14 | 2011-08-16 | Microsoft Corporation | Data center power management utilizing a power policy and a load factor |
US8301742B2 (en) | 2008-04-07 | 2012-10-30 | International Business Machines Corporation | Systems and methods for coordinated management of power usage and runtime performance in performance-managed computing environments |
US9405348B2 (en) | 2008-04-21 | 2016-08-02 | Adaptive Computing Enterprises, Inc | System and method for managing energy consumption in a compute environment |
US7756652B2 (en) | 2008-04-24 | 2010-07-13 | Oracle America, Inc. | Estimating a power utilization of a computer system |
US8296773B2 (en) * | 2008-06-30 | 2012-10-23 | International Business Machines Corporation | Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance |
US8365175B2 (en) | 2009-03-10 | 2013-01-29 | International Business Machines Corporation | Power management using dynamic application scheduling |
US8589931B2 (en) | 2009-03-18 | 2013-11-19 | International Business Machines Corporation | Environment based node selection for work scheduling in a parallel computing system |
EP2435914B1 (en) | 2009-05-26 | 2019-12-11 | Telefonaktiebolaget LM Ericsson (publ) | Method and scheduler in an operating system |
US8904394B2 (en) * | 2009-06-04 | 2014-12-02 | International Business Machines Corporation | System and method for controlling heat dissipation through service level agreement analysis by modifying scheduled processing jobs |
JP2011013764A (en) * | 2009-06-30 | 2011-01-20 | Hitachi Ltd | Method, system and program for managing power consumption |
US8276012B2 (en) | 2009-06-30 | 2012-09-25 | International Business Machines Corporation | Priority-based power capping in data processing systems |
US8589709B2 (en) | 2009-07-23 | 2013-11-19 | Carnegie Mellon University | Systems and methods for managing power consumption and performance of a processor |
CN101694631B (en) * | 2009-09-30 | 2016-10-05 | 曙光信息产业(北京)有限公司 | Real time job dispatching patcher and method |
US8224993B1 (en) | 2009-12-07 | 2012-07-17 | Amazon Technologies, Inc. | Managing power consumption in a data center |
US8336056B1 (en) | 2009-12-22 | 2012-12-18 | Gadir Omar M A | Multi-threaded system for data management |
US8341441B2 (en) * | 2009-12-24 | 2012-12-25 | International Business Machines Corporation | Reducing energy consumption in a cloud computing environment |
US8429433B2 (en) * | 2010-01-15 | 2013-04-23 | International Business Machines Corporation | Dynamically adjusting an operating state of a data processing system running under a power cap |
US8627123B2 (en) * | 2010-03-25 | 2014-01-07 | Microsoft Corporation | Managing power provisioning in distributed computing |
US9052895B2 (en) * | 2010-04-07 | 2015-06-09 | International Business Machines | Power budget allocation in multi-processor systems |
US8612984B2 (en) | 2010-04-28 | 2013-12-17 | International Business Machines Corporation | Energy-aware job scheduling for cluster environments |
US8412479B2 (en) * | 2010-06-29 | 2013-04-02 | Intel Corporation | Memory power estimation by means of calibrated weights and activity counters |
US8589932B2 (en) | 2010-07-02 | 2013-11-19 | International Business Machines Corporation | Data processing workload control |
US8464080B2 (en) | 2010-08-25 | 2013-06-11 | International Business Machines Corporation | Managing server power consumption in a data center |
US8627322B2 (en) * | 2010-10-29 | 2014-01-07 | Google Inc. | System and method of active risk management to reduce job de-scheduling probability in computer clusters |
US8868936B2 (en) * | 2010-11-29 | 2014-10-21 | Cisco Technology, Inc. | Dynamic power balancing among blade servers in a chassis |
KR20120072224A (en) | 2010-12-23 | 2012-07-03 | 한국전자통신연구원 | Apparatus for controlling power of sensor nodes based on estimation of power acquisition and method thereof |
US8645733B2 (en) * | 2011-05-13 | 2014-02-04 | Microsoft Corporation | Virtualized application power budgeting |
US8904209B2 (en) | 2011-11-14 | 2014-12-02 | Microsoft Corporation | Estimating and managing power consumption of computing devices using power models |
US9244721B2 (en) * | 2011-11-24 | 2016-01-26 | Hitachi, Ltd. | Computer system and divided job processing method and program |
CN103136055B (en) | 2011-11-25 | 2016-08-03 | 国际商业机器公司 | For controlling the method and apparatus to the use calculating resource in database service |
US8689220B2 (en) * | 2011-11-30 | 2014-04-01 | International Business Machines Corporation | Job scheduling to balance energy consumption and schedule performance |
US9218035B2 (en) | 2012-02-10 | 2015-12-22 | University Of Florida Research Foundation, Inc. | Renewable energy control systems and methods |
US9262232B2 (en) | 2012-02-29 | 2016-02-16 | Red Hat, Inc. | Priority build execution in a continuous integration system |
CN104246705B (en) | 2012-05-14 | 2019-10-11 | 英特尔公司 | A kind of method, system, medium and device operated for managing computing system |
US9857858B2 (en) | 2012-05-17 | 2018-01-02 | Intel Corporation | Managing power consumption and performance of computing systems |
CN102685808A (en) * | 2012-05-18 | 2012-09-19 | 电子科技大学 | Distribution type clustering method based on power control |
US9342376B2 (en) | 2012-06-27 | 2016-05-17 | Intel Corporation | Method, system, and device for dynamic energy efficient job scheduling in a cloud computing environment |
CN102819460B (en) * | 2012-08-07 | 2015-05-20 | 清华大学 | Budget power guidance-based high-energy-efficiency GPU (Graphics Processing Unit) cluster system scheduling algorithm |
JP5787365B2 (en) * | 2012-09-18 | 2015-09-30 | Necフィールディング株式会社 | Power control apparatus, power control system, power control method, and program |
US8939654B2 (en) | 2012-09-27 | 2015-01-27 | Adc Telecommunications, Inc. | Ruggedized multi-fiber fiber optic connector with sealed dust cap |
GB2506626B (en) | 2012-10-03 | 2018-02-07 | Imperial Innovations Ltd | Frequency estimation |
JP6072257B2 (en) | 2012-10-05 | 2017-02-01 | 株式会社日立製作所 | Job management system and job control method |
US20140114107A1 (en) | 2012-10-24 | 2014-04-24 | Lummus Technology Inc. | Use of hydrocarbon diluents to enhance conversion in a dehydrogenation process at low steam/oil ratios |
US9110972B2 (en) | 2012-11-07 | 2015-08-18 | Dell Products L.P. | Power over ethernet dynamic power allocation system |
US9250858B2 (en) * | 2013-02-20 | 2016-02-02 | International Business Machines Corporation | Dual-buffer serialization and consumption of variable-length data records produced by multiple parallel threads |
US9009677B2 (en) | 2013-03-18 | 2015-04-14 | Microsoft Technology Licensing, Llc | Application testing and analysis |
US9335751B1 (en) * | 2013-08-28 | 2016-05-10 | Google Inc. | Dynamic performance based cooling control for cluster processing devices |
JP6201530B2 (en) * | 2013-08-30 | 2017-09-27 | 富士通株式会社 | Information processing system, job management apparatus, control program for job management apparatus, and control method for information processing system |
US9189273B2 (en) | 2014-02-28 | 2015-11-17 | Lenovo Enterprise Solutions PTE. LTD. | Performance-aware job scheduling under power constraints |
US9336106B2 (en) * | 2014-04-17 | 2016-05-10 | Cisco Technology, Inc. | Dynamically limiting bios post for effective power management |
-
2014
- 2014-12-24 US US14/582,772 patent/US10289183B2/en active Active
- 2014-12-24 US US14/582,743 patent/US20160054779A1/en not_active Abandoned
- 2014-12-24 US US14/582,764 patent/US9921633B2/en active Active
- 2014-12-24 US US14/582,756 patent/US9927857B2/en active Active
- 2014-12-24 US US14/582,795 patent/US9575536B2/en active Active
- 2014-12-24 US US14/582,783 patent/US10712796B2/en active Active
-
2015
- 2015-06-17 CN CN201580040949.7A patent/CN106537348B/en active Active
- 2015-06-17 EP EP15833962.2A patent/EP3183654A4/en active Pending
- 2015-06-17 WO PCT/US2015/036294 patent/WO2016028371A1/en active Application Filing
- 2015-06-17 JP JP2017510501A patent/JP6701175B2/en active Active
- 2015-06-17 KR KR1020177002070A patent/KR102213555B1/en active IP Right Grant
- 2015-06-18 CN CN201580040030.8A patent/CN107003706B/en active Active
- 2015-06-18 EP EP19208324.4A patent/EP3627285A1/en not_active Ceased
- 2015-06-18 WO PCT/US2015/036435 patent/WO2016028375A1/en active Application Filing
- 2015-06-18 WO PCT/US2015/036403 patent/WO2016028374A1/en active Application Filing
- 2015-06-18 EP EP15833856.6A patent/EP3183628A4/en not_active Withdrawn
- 2015-06-18 JP JP2017510311A patent/JP6386165B2/en active Active
- 2015-06-18 EP EP15834071.1A patent/EP3183629B1/en active Active
- 2015-06-18 EP EP22154366.3A patent/EP4016248A1/en active Pending
- 2015-06-18 CN CN201580040005.XA patent/CN106537287B/en active Active
- 2015-06-18 CN CN201911128004.7A patent/CN111176419B/en active Active
- 2015-06-18 KR KR1020177002008A patent/KR102207050B1/en active IP Right Grant
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100106985A1 (en) * | 2008-01-03 | 2010-04-29 | Broadcom Corporation | System and method for global power management in a power over ethernet chassis |
US20100205469A1 (en) * | 2009-02-06 | 2010-08-12 | Mccarthy Clifford A | Power budgeting for a group of computer systems |
US20100257531A1 (en) * | 2009-04-03 | 2010-10-07 | International Business Machines, Corporation | Scheduling jobs of a multi-node computer system based on environmental impact |
US20110022857A1 (en) * | 2009-07-24 | 2011-01-27 | Sebastien Nussbaum | Throttling computational units according to performance sensitivity |
US20120072745A1 (en) * | 2010-09-22 | 2012-03-22 | International Business Machines Corporation | Server power management with automatically-expiring server power allocations |
US20130339776A1 (en) * | 2012-06-13 | 2013-12-19 | Cisco Technology, Inc. | System and method for automated service profile placement in a network environment |
US20160011914A1 (en) * | 2013-06-20 | 2016-01-14 | Seagate Technology Llc | Distributed power delivery |
US20150177814A1 (en) * | 2013-12-23 | 2015-06-25 | Dell, Inc. | Predictive power capping and power allocation to computing nodes in a rack-based information handling system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160291656A1 (en) * | 2015-04-01 | 2016-10-06 | Dell Products, L.P. | Method and apparatus for collaborative power and thermal control of fan run time average power limiting |
US9870037B2 (en) * | 2015-04-01 | 2018-01-16 | Dell Products, L.P. | Method and apparatus for collaborative power and thermal control of fan run time average power limiting |
US10042410B2 (en) * | 2015-06-11 | 2018-08-07 | International Business Machines Corporation | Managing data center power consumption |
US10503230B2 (en) * | 2015-11-25 | 2019-12-10 | Electronics And Telecommunications Research Institute | Method and apparatus for power scheduling |
US12007734B2 (en) | 2022-09-23 | 2024-06-11 | Oracle International Corporation | Datacenter level power management with reactive power capping |
US20240154415A1 (en) * | 2022-11-08 | 2024-05-09 | Oracle International Corporation | Techniques for orchestrated load shedding |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20160054779A1 (en) | Managing power performance of distributed computing systems | |
JP6605609B2 (en) | Power consumption control | |
US9471139B2 (en) | Non-intrusive power management | |
US8627123B2 (en) | Managing power provisioning in distributed computing | |
US9665294B2 (en) | Dynamic feedback-based throughput control for black-box storage systems | |
US8954765B2 (en) | Energy based resource allocation across virtualized machines and data centers | |
EP3238162B1 (en) | Forecast for demand of energy | |
Sanjeevi et al. | NUTS scheduling approach for cloud data centers to optimize energy consumption | |
US9746911B2 (en) | Same linking | |
US11640195B2 (en) | Service-level feedback-driven power management framework | |
Samrajesh et al. | Component based energy aware multi-tenant application in software as-a service | |
US20230350722A1 (en) | Apparatuses and methods for determining an interdependency between resources of a computing system | |
Bheda et al. | Qos and performance optimization with vm provisioning approach in cloud computing environment | |
Nassiffe et al. | A Model for Reconfiguration of Multi-Modal Real-Time Systems under Energy Constraints |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BODAS, DEVADATTA;RAJAPPA, MURALIDHAR;SONG, JUSTIN;AND OTHERS;SIGNING DATES FROM 20150112 TO 20150126;REEL/FRAME:034820/0411 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |