US20150241887A1 - Thermal Management for Integrated Circuits - Google Patents
Thermal Management for Integrated Circuits Download PDFInfo
- Publication number
- US20150241887A1 US20150241887A1 US14/697,388 US201514697388A US2015241887A1 US 20150241887 A1 US20150241887 A1 US 20150241887A1 US 201514697388 A US201514697388 A US 201514697388A US 2015241887 A1 US2015241887 A1 US 2015241887A1
- Authority
- US
- United States
- Prior art keywords
- thermal
- temperature sensors
- location
- host system
- temperature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D23/00—Control of temperature
- G05D23/19—Control of temperature characterised by the use of electric means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/20—Cooling means
- G06F1/206—Cooling means comprising thermal management
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01K—MEASURING TEMPERATURE; MEASURING QUANTITY OF HEAT; THERMALLY-SENSITIVE ELEMENTS NOT OTHERWISE PROVIDED FOR
- G01K13/00—Thermometers specially adapted for specific purposes
- G01K13/10—Thermometers specially adapted for specific purposes for measuring temperature within piled or stacked materials
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B15/00—Systems controlled by a computer
- G05B15/02—Systems controlled by a computer electric
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3287—Power saving characterised by the action undertaken by switching off individual functional units in the computer system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/329—Power saving characterised by the action undertaken by task scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3293—Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- FIG. 1 is a diagrammatic perspective view of a device board according to various aspects of the present disclosure.
- FIG. 2 is a diagrammatic top view of a device board according to various aspects of the present disclosure.
- FIG. 3 is a diagrammatic top view of a device board according to various aspects of the present disclosure.
- FIG. 4 is a diagrammatic top view of a device board according to various aspects of the present disclosure.
- FIG. 5 is a schematic diagram of a circuit device according to various aspects of the present disclosure.
- FIG. 6 is a flow chart of a method of thermal management for a device board according to an embodiment of the present disclosure.
- FIG. 7 is a flow chart of a method of thermal management for a device board according to an embodiment of the present disclosure.
- FIG. 8 is a flow chart of a method of thermal management for a device board according to an embodiment of the present disclosure.
- FIG. 9 is a diagrammatic perspective view of a device board according to various aspects of the present disclosure.
- FIG. 10 is a diagrammatic top view of a thermal dissipating layer according to various aspects of the present disclosure.
- the present disclosure relates generally to integrated circuit design, integration, and operation and more particularly to devices and methods for thermal management of integrated circuit devices.
- first device connected to a second device may include embodiments in which the first device is directly connected to the second device and may further include embodiments where the first device and the second are connected via an intermediary.
- references made to directions and locations, such as “above,” “below,” “to the left,” and “to the right,” are intended to simplify understanding of an illustrated embodiment.
- the disclosure applies equally to embodiments where the orientation is altered. For example, a device described as being above another may be located below when an actual embodiment is observed. This is understood, as any embodiment may be observed from any orientation.
- FIG. 1 is a diagrammatic perspective view of a device board 100 .
- the device board 100 may be an essential component of a host system, such as a motherboard or daughtercard, and may also be an expansion card to expand the capabilities of the host system.
- Device boards 100 are commonly used to add additional processing power, to add networking capability, to add signal processing capacity, to add graphics and audio functionality, to add input and output bandwidth, to handle device I/O, and to add any other suitable enhancement to the host system.
- the device board 100 includes a number of components including circuit devices 102 for providing functionality, a bus interface unit 104 for interfacing with other devices, and a power-regulating unit 106 for performing voltage control.
- the components are disposed on a printed circuit board (PCB) 108 .
- PCB printed circuit board
- the circuit devices 102 provide the bulk of the functionality of the device board 100 .
- the circuit devices 102 include integrated circuit processing devices such as general-purpose processors (CPUs), graphics processing units (GPUs), multicore processors, digital signal processors (DSPs), and/or other suitable processors.
- the circuit devices 102 include field-programmable gate arrays (FPGAs), programmable logic controllers (PLCs), and/or microcontrollers.
- the circuit devices 102 include other integrated circuits such as interface devices (e.g., a bridge device), fabric controllers, analog-to-digital converters, watchdog monitors, and memory circuits (e.g., RAM, ROM, EEPROM, and/or Flash Memory).
- Circuit devices 102 may also include supporting devices such as capacitors, resistors, diodes, optical isolators, and other suitable supporting devices.
- the bus interface unit 104 transports data between the device board 100 and other device boards, peripheral components, a host system, and any other suitable system or device.
- the bus interface unit 104 provides any number of data connections and may further provide connections for supply voltage, clock signals, diagnostic and status signals, and other suitable signals. These data connections may take the form of a connector such as a blade connector, a pin array connector, a socket connector, a cable connector, or any other connector known to one of skill in the art. In an exemplary embodiment, these data connections are established wirelessly.
- the bus interface unit 104 may support any data transfer standard including Ethernet, IEEE 802.11, PCIe (Peripheral Component Interconnect Express), PCI, RapidIO, AGP (Accelerated Graphics Port), ISA, SATA, InfiniBand, USB, and other suitable bus standards.
- PCIe Peripheral Component Interconnect Express
- PCI Peripheral Component Interconnect Express
- AGP Accelerated Graphics Port
- SATA Serial Advanced Technology Attachment Bus
- USB Universal Serial Bus interface
- the power-regulating unit 106 receives a supply voltage, commonly a DC voltage from a computing system power supply or battery, and distributes an operational voltage to other components of the device board 100 such as the bus interface unit 104 and the circuit devices 102 .
- the operational voltage may be stepped down or stepped up from the supply voltage and may be a rectified DC function of the supply voltage, such as in embodiments with an AC supply voltage.
- the power-regulating unit 106 receives the supply voltage via the bus interface unit 104 .
- the power-regulating unit 106 includes power connectors for receiving the supply voltage directly.
- the power-regulating unit 106 commonly includes power-handling components such as transformers, diodes, capacitors, inductors, power MOSFETs, and fusible links.
- the power-regulating unit 106 is an adaptive power-regulating unit.
- the adaptive power-regulating unit is capable of varying the operational voltage delivered to the circuit devices 102 based on a voltage control signal.
- the adaptive power-regulating unit may be configured to supply a 3.5V DC operational voltage nominally and may be able to vary the operational voltage +/ ⁇ 10% in response to the voltage control signal.
- the adaptive power-regulating unit is capable of delivering a first operational voltage to a first circuit device while delivering a second operational voltage to a second circuit device. This allows the adaptive power-regulating unit to increase or reduce power to a particular circuit device without affecting other circuit devices.
- An adaptive power-regulating unit is particularly useful for managing heat.
- the first circuit device approaches a critical temperature
- power to the device can be reduced.
- Reducing the operational voltage of the first circuit device may decrease heat output but may also decrease performance.
- the overall performance of the device board 100 is not as severely impacted.
- the device board 100 further includes a clock control unit 114 .
- the clock control unit 114 generates and distributes a clock signal to the circuit devices 102 .
- the clock signal is commonly based on a reference clock.
- the reference clock is received via the bus interface unit 104 .
- the reference clock is generated by an oscillator such as an oscillator crystal, a ceramic resonator, or an oscillating circuit.
- the clock control unit 114 may distribute the reference clock without modification or it may perform synchronization, shaping, amplification, frequency division or multiplication, duty cycle adjustment, or other suitable modifications to the reference clock.
- the clock control unit 114 is an adaptive clock control unit.
- the adaptive clock control unit is configured to adjust the clock signal delivered to the circuit devices 102 in response to a frequency control signal.
- the adaptive clock control unit may generate a 1 GHz clock by default with the ability to adjust the clock frequency by up to +20% or down to ⁇ 50% based on the frequency control signal.
- the adaptive clock control unit is capable of delivering a first clock signal to a first circuit device while delivering a second clock signal to a second circuit device and is capable of adjusting the different clock signals independently. This allows the clock control unit to respond to changing needs of the first circuit device while maintaining an optimal clock for the second circuit device.
- the clock control unit may reduce the clock frequency for the first circuit device in response to a thermal condition. Reducing the clock frequency commonly reduces power consumption and heat production. It may also reduce overall performance. If the first circuit device is approaching a temperature limit, an adaptive clock control unit can reduce the clock signal frequency of the first circuit device while maintaining the clock signal frequency of the second circuit device. Because the second circuit device can continue operating at a higher frequency, the impact on overall performance is reduced.
- the circuit devices 102 , the bus interface unit 104 , the power-regulating unit 106 , and, in some embodiments, the clock control unit 114 are mounted on the printed circuit board (PCB) 108 .
- the PCB 108 physically supports the components and provides connections between them.
- the PCB 108 is made up of a number of layers. These include insulating layers 110 and trace layers 112 .
- the insulating layers 110 provide physical rigidity and durability. They typically contain dielectric material combined with an epoxy to create a laminate sheet.
- the insulating layers 110 may comprise an FR4-rated glass-reinforced epoxy laminate.
- the trace layers 112 contain conductive traces that connect the components disposed on the PCB 108 including the circuit devices 102 , the bus interface unit 104 , and the power-regulating unit 106 .
- the conductive traces may be formed from any conductive material including copper, tin, silver, and gold, other metals and alloys and including non-metallic conductors such as graphite, conductive polymers, and organic conductors.
- the conductive traces may be formed on or bonded to the insulating layers 110 directly or may be formed on a backing material. Connecting traces on different trace layers 112 often requires creating openings in the insulating layers 110 . The openings are then filled with a conductor to create via structures between the traces of the different trace layers 112 .
- FIG. 2 is a diagrammatic top view of a device board 100 .
- the power-regulating unit is an adaptive power-regulating unit 200 .
- the adaptive power-regulating unit 200 is capable of varying the operational voltages delivered to the circuit devices 102 a and 102 b based on independent voltage control signals.
- the clock control unit is an adaptive clock control unit 220 .
- the adaptive clock control unit 220 is capable of delivering a first clock signal to a first circuit device 102 a while delivering a second clock signal to a second circuit device 102 b and is capable of adjusting the different clock signals independently. It is understood that the designations of circuit device 102 a and circuit device 102 b are arbitrary and do not imply that the operational voltage is in any manner linked to the clock signal for a given circuit device.
- the device board further includes a dispatch unit 240 .
- the dispatch unit 240 receives instructions and distributes the instructions for execution to the circuit devices including circuit devices 102 a and 102 b .
- the dispatch unit 240 may look to criteria including capabilities of each circuit device, current workload of each circuit device, data dependencies, available board resources such as bus availability, operating conditions of each circuit device, and other performance-related criteria.
- the dispatch unit 240 may also look to the thermal conditions and thermal profile of a circuit device 102 .
- a thermal factor may be included when weighing the desirability of a particular circuit device. For example, the dispatch unit 240 may forego sending some instructions to device 102 a in response to a thermal factor.
- a more severe thermal factor may cause the dispatch unit to forego sending any instructions to device 102 a .
- a critical thermal factor may cause the dispatch unit to cancel instructions sent to circuit device 102 a and reassign them to device 102 b .
- Many circuit devices consume less energy and produce less heat when idle. Thus, it is possible that the temperature of device 102 a will drop during the idle time and relieve the thermal condition.
- the overall performance penalty may be small.
- FIG. 3 is a diagrammatic top view of a device board 100 according to an embodiment of the present disclosure.
- the device board 100 further includes a thermal monitor unit 300 .
- the thermal monitor unit 300 receives temperature data from a number of temperature sensors 302 .
- the temperature sensors 302 produce temperature data, which may include producing an analog or digital temperature reading, producing a warning when a critical temperature is reached or surpassed, producing another type of temperature data, or a combination thereof.
- Temperature sensors 302 may be stand-alone devices, and, in some embodiments, temperature sensors are integrated into circuit devices 102 . Data from the temperature sensors 302 is used to determine conditions at thermal reference points. Thermal reference points are not limited to locations of temperature sensors 302 . In many embodiments, conditions at thermal reference points are interpolated from temperature sensor data.
- Thermal regions 304 form a portion or combination of thermal regions 304 (of which thermal regions 304 a and 304 b are examples) of the device board 100 .
- Thermal regions 304 are defined as needed and may be defined differently for each operating parameter.
- Thermal regions 304 may contain part of a circuit device 102 , an entire circuit device 102 , and/or multiple circuit devices 102 . It is not necessary for the entire device board 100 to have a corresponding thermal region 304 . Particularly, areas of low density may not have an associated thermal region 304 .
- these thermal regions 304 when evaluated in light of the architecture of the device board 100 , make up a thermal map. By aggregating temperature data from the various thermal reference points and processing them to create a thermal map of the region, one or more operating characteristics of the circuits on the device board 100 can be modified to respond to and manage the thermal characteristics of the device board 100 .
- the thermal map represents the current thermal conditions throughout the thermal regions. With reference to the circuit devices 102 and arrangement of circuit devices 102 shown in various embodiments, there is a known or ascertainable spatial relationship between the elements on the PCB 108 and the values represented on the thermal map. In one embodiment, the thermal map is correlated with the physical dimensions of the underlying circuit devices 102 , so that a change in value at a thermal reference point corresponds to a change in the conditions at a particular location in two-dimensional (X-Y) or three-dimensional (X-Y-Z) space.
- This correlation may be developed by using or inputting physical specifications from the layout and characteristics of the underlying circuit devices 102 and elements of the circuit devices 102 , or it may be built up probabilistically by observing correlations between various thermal reference points and creating the map that corresponds to the observed or inferred relationships.
- T X temperature at thermal reference point X, T X , between sensors A and B can be estimated as:
- T X T A + ( T B - T A ) ⁇ D A ⁇ X D A ⁇ B
- the thermal monitor unit 300 calculates the temperature at a thermal reference point X based on the distance from nearby sensors, the existence and location of heat-generating circuit devices between the sensors, and the distance between point X and the heat-generating devices. In some embodiments, the thermal monitor unit 300 further compensates for the operating parameters of the heat-generating devices when determining the effect on the thermal reference point. By considering the physical structure of the thermal zones, the thermal monitor unit 300 constructs a more accurate thermal map. The thermal map may also include thermal factors that are not circuit devices 102 and thermal factors that are not near the thermal region 304 . For example, the thermal monitor unit 300 may consider a heat source that is not part of the device board 100 but is known to have an effect on the thermal reference point. Other relevant systemic factors include airflow, nearby cooling solutions, and other thermal aspects of the host system.
- the thermal monitor unit 300 observes the changes in thermal conditions over time at various thermal reference points and adapts the thermal map accordingly. This may include analyzing the correspondence between two or more thermal reference points. As an example, a first and a second temperature sensor 302 track closely when a given circuit device 102 a is active. The thermal monitor unit may infer that circuit device 102 a affects both sensors 302 and also affects nearby thermal reference points. As a further example, an unknown host system trait causes adverse thermal conditions at one or more temperature sensors 302 on a regular basis. The thermal monitor unit can then use this information when interpolating thermal conditions at other thermal reference points. This spatially and operationally aware mapping provides more accurate prediction and monitoring of thermal reference points where temperature sensors 302 are not available and facilitates more effective responses.
- the thermal monitor unit 300 may account for thermal factors, such as systemic factors, by using them to calculate thermal conditions at thermal reference points. Furthermore, in some embodiments, the thermal monitor unit 300 considers the systemic conditions including circuit devices 102 when determining a response to a thermal condition. It should be emphasized that it is not always necessary to determine the cause of a thermal condition to be able to formulate an effective response.
- One of the most basic responses is a trigger response where the voltage, clock rate, or workload of one or more circuit devices 102 is modified when the measurement point or area corresponding to that circuit device 102 passes some threshold.
- the thermal monitor unit 300 analyzes the temperature data to determine when a triggering event occurs.
- triggering events A wide variety of triggering events is contemplated. For example, a single report of excessive temperature by a single temperature sensor 302 may be a triggering event. An excessive temperature over a prolonged period may also be a triggering event. A sudden increase in temperature or a rate of increase may trigger a response.
- a trigger may be based on a number of temperature sensors 302 experiencing excessive temperatures.
- a number of temperature sensors 302 reporting high but not critical temperatures triggers a response. Commonly, it will not be possible to dispose a temperature sensor 302 next to a critical device. Therefore, in many embodiments, the thermal monitor unit 300 interpolates conditions at thermal reference points throughout a portion or combination of thermal regions 304 . Thus, the triggering event may be a temperature that is calculated, not recorded by a temperature sensor 302 .
- the thermal monitor unit 300 may take one or more corrective actions. These responses include changing operating parameters such as voltage, clock frequency, or workload.
- the thermal monitor unit 300 interacts with an adaptive power-regulating unit 200 to control one or more operating voltages for the circuit devices 102 . If the conditions at one or more thermal reference points indicates a response is needed in a thermal region such as 304 a , or a portion or combination of thermal regions, the thermal monitor unit 300 sends a voltage control signal to the adaptive power-regulating unit 200 to lower the operating voltage of a thermal region such as 304 a , or in a portion or combination of thermal regions such as at circuit device 102 a .
- the thermal monitor unit 300 sends a voltage control signal to the adaptive power-regulating unit 200 to maintain or increase the current operating voltage for a portion or combination of the area pertaining to thermal regions 304 b .
- the thermal monitor unit 300 and the adaptive power-regulating unit 200 work together to maintain peak performance while combating overheating.
- the thermal monitor unit 300 interfaces with an adaptive clock control unit 220 .
- the thermal monitor unit 300 transmits one or more clock control signals to the clock control unit 220 .
- the clock control unit 220 alters the clock signals sent one or more thermal regions 304 .
- the thermal monitor unit 300 may respond to excessive conditions in thermal region 304 a or a portion or combination of thermal regions by sending a clock control signal to the adaptive clock control unit 220 to reduce the frequency of the clock signal for a portion or combination of the circuits correlated with thermal region 304 a . If thermal region 304 b is not experiencing adverse conditions, the thermal monitor unit 300 may send a clock control signal to the adaptive clock control unit 220 to maintain or increase the current frequency.
- circuit devices can operate at their maximum levels of performance.
- the thermal monitor unit 300 may also interface with the dispatch unit 240 to direct traffic away from or towards a circuit device.
- the thermal monitor unit 300 assigns and transmits a thermal factor for each circuit device 102 a and 102 b to the dispatch unit 240 .
- the dispatch unit 240 weighs the thermal factor when assigning instructions to the circuit devices. Small thermal factor values for circuit device 102 a may drive a percentage of traffic away from circuit device 102 a and towards device 102 b . Moderate thermal factor values may drive all traffic towards device 102 b . Critical thermal factor values may suspend all tasks for circuit device 102 a and reassign them to circuit device 102 b.
- the thermal monitor unit 300 may also issue power-saving commands directly to circuit devices 102 .
- the thermal monitor unit 300 responds to an event by issuing a Shutdown command to circuit device (or circuit devices) 102 .
- This action may greatly reduce heat output in the associated thermal regions.
- other low power mode commands such as Halt and Sleep (as per the Advanced Configuration and Power Interface standard) are supported as well.
- Halt and Sleep as per the Advanced Configuration and Power Interface standard
- a Halt state or C1 state no commands are executed, but circuit devices 102 remain powered.
- a Sleep state or C3 state volatile caches are flushed, and parts of the circuit devices 102 may be powered down. Power-saving commands may not be recognized by all circuit devices 102 within the thermal region.
- the thermal monitor unit 300 may power down any one or more of the circuit devices 102 while leaving the remainder functioning.
- the thermal monitor unit 300 may issue a request to a dispatch unit 240 to reassign the instruction transparently to an alternate circuit device 102 .
- the thermal monitor unit 300 further observes individual circuit devices 102 to ensure that changes do not cause the device to fail. This may be done by monitoring a “heartbeat” signal.
- a heartbeat signal can be generated by an instruction executed by one of circuit devices 102 to pulse a heartbeat output at a regular interval. In the event of a fault such as a deadlock, livelock, or inadvertent reset, the heartbeat output would fail to pulse as expected.
- the thermal monitor 300 includes a heartbeat monitor.
- the thermal monitor 300 responds by reverting changed parameters such as operational voltage or clock signal frequency, or responds by rebooting one or more of circuit devices 102 .
- the thermal monitor 300 may also hold the device in a Halt, Sleep, or Shutdown condition until further user input is received. This is useful if a circuit device repeatedly fails to work at a lower voltage or frequency.
- the thermal monitor unit 300 utilizes the heartbeat signal to pursue aggressive reductions in power. Instead of making large changes in operating parameters, the thermal monitor 300 instructs the adaptive power-regulating unit 200 or adaptive clock control unit 220 to make a smaller change. The thermal monitor unit 300 pauses to determine whether the circuit devices 102 within thermal regions 304 operate correctly at the new parameters. If the circuit devices 102 function correctly and the thermal condition does not abate, the thermal monitor unit 300 may make another small change to the operating parameters. This is continued until a minimum operating power is reached or until the thermal condition is resolved.
- the thermal monitor unit 300 can recognize contributing factors that may be remedied in order to alleviate the thermal condition. For example, airflow issues may cause heat from circuit device 102 b to collect in region 304 a but not in region 304 b where circuit device 102 b is located. Modifying operating parameters of circuit device 102 a within region 304 a may not relieve this condition as effectively as modifying operating parameters of circuit device 102 b . From the thermal map, the thermal monitor unit 300 recognizes the contributing factors to the thermal conditions of region 304 a . In response, the thermal monitor unit 300 modifies the operation of region 304 b to relieve the condition of region 304 a.
- the thermal monitor unit 300 may also modify the operation of peripherals not located on the device board 100 .
- the thermal monitor unit 300 utilizes the bus interface unit 104 to send commands to connected devices.
- the thermal monitor unit 300 is capable of modifying operating parameters, such as voltage, clock frequency, and workload, of circuit devices on other device boards.
- the thermal monitor unit 300 can adjust airflow, coolant flow, and other regulating mechanisms on the device board 100 and elsewhere.
- the thermal monitor unit 300 may further include a user interface.
- the user interface is used to notify the host system of thermal events and may allow users to change system parameters and reconfigure the thermal monitor unit 300 .
- the thermal monitor unit 300 sends a status notification to the user via the user interface.
- This status notification may include a list of current operating parameters, a list of recent events, a list of trigger criteria, status for various thermal regions 304 , and other suitable status data.
- the status notification may be sent as a regularly occurring event, as a response to other data such as a critical temperature reading, as a response to a user request, or as a response to any other event.
- the thermal monitor unit 300 records and stores temperature data and changes to operating parameters.
- the thermal monitor unit 300 may also record and store a record of the state of the circuit devices 102 including details on the instructions being executed. This may be crucial when debugging software that leads to an adverse thermal condition.
- the user can also manually modify operating parameters, configure triggers and responses, and execute instructions such as to Resume or Shutdown circuit devices 102 via the user interface.
- the user interface is intended to help users analyze performance metrics, evaluate system reliability, and resolve heat management issues.
- software may be used to present the information in a form that is easy for the user to digest.
- software may receive thermal information at the thermal reference points via the user interface and produce a diagnostic display.
- a diagnostic display a graphical bitmap illustrating the device board 100 is generated.
- the user selects datasets to be displayed as overlays on the board illustration, such as a gradient map, measured thermal conditions, circuit device status including uptime and load, and other diagnostic information.
- the datasets may contain information received from the thermal monitor unit 300 , information received from circuit devices 102 of the device board 100 , information received from a host system, and information received from other sources.
- the datasets may further contain information interpolated from received information, particularly when producing overlays such as gradient maps.
- the graphical bitmap may include one or more diagnostic regions. These may be, but are not necessarily, coincident with the thermal regions for any particular operating parameter.
- Software may also be used to manage the thermal monitor unit 300 via the user interface. For example, in high-risk environments, a software program on a host system may regularly inspect the thermal monitor unit 300 and trigger a shutdown of a circuit device 102 if the thermal monitor unit 300 is unable to resolve a problem.
- FIG. 4 is a diagrammatic top view of a device board 100 according to an embodiment of the present disclosure.
- circuit device 400 warrants multiple thermal regions 402 , 404 , and 406 and contains multiple temperature sensors 302 within the thermal regions.
- the thermal monitor unit 300 monitors the conditions of the temperature sensors 302 and of the thermal reference points. If necessary, the thermal monitor unit 300 is capable of altering the operating parameters of one or more thermal regions of the circuit device 400 independently. This configuration allows fine-grained control of heat generation. Regions of the circuit device 400 can be optimized in response to thermal conditions without affecting neighboring regions.
- the thermal monitor unit 300 interfaces with an adaptive power-regulating unit 200 to alter the operating parameters of the circuit device 400 .
- the circuit device 400 may receive a different operating voltage for each of thermal regions 402 , 404 , and 406 .
- the operating voltage for thermal region 402 can be altered without affecting the operating voltages of thermal regions 404 and 406 .
- the thermal monitor unit 300 interfaces with an adaptive clock control unit 220 to alter the clock-related parameters of the circuit device 400 .
- the circuit device 400 is capable of altering its operating parameters independent of an adaptive power-regulating unit 200 or an adaptive clock control unit 220 . This allows the thermal monitor unit 300 to coordinate with circuit device 400 directly to tune the operation of the circuit device 400 .
- the thermal monitor unit 300 may interface with a dispatch unit 240 to assign workload to subunits of the circuit device 400 .
- workload may be shifted from subunits in thermal region 402 and towards subunits in thermal region 406 .
- the device board 100 retains performance that may otherwise be lost.
- FIG. 5 is a schematic diagram of a circuit device according to an embodiment of the present disclosure.
- Circuit device 500 contains one or more circuit subunits 502 , a chip-level power-regulating unit 504 , a chip-level clock control unit 506 , a chip-level dispatch unit 508 , and a chip-level thermal monitor unit 510 .
- Possible circuit subunits 502 include fixed-point processing cores, floating-point processing cores, matrix math units, vector processing units, special function processors, controllers, branch prediction units, I/O interface units, intra-core interface units, wire busses, pervasive and test units, memory management units, and other suitable circuit subunits.
- select circuit subunits 502 are memory such as caches, register files, memory arrays, programmable read-only memory, and flash memory.
- the chip-level power-regulating unit 504 handles power distribution for the circuit device 500 .
- the chip-level power-regulating unit 504 receives a source voltage for the circuit device 500 , converts it to one or more operating voltages, and distributes the one or more operating voltages to the circuit subunits 502 , the chip-level clock control unit 506 , and the chip-level thermal monitor unit 510 .
- the chip-level power-regulating unit 504 is capable of varying the one or more operating voltages in response to a voltage control signal.
- the chip-level clock control unit 506 creates the appropriate clocks for the functional logic within the circuit device 500 .
- the chip-level clock control unit 506 receives a system clock for the circuit device 500 , creates one or more functional clocks, and distributes the one or more functional clocks to the functional logic including that found in the circuit subunits 502 and the chip-level thermal monitor unit 510 .
- the chip-level clock control unit is capable of varying the one or more functional clocks in response to a clock control signal.
- the chip-level dispatch unit 508 receives instructions and assigns them to subunits 502 for execution.
- the assignment may depend on the type of instruction, the capabilities of a particular subunit 502 , the data dependencies of the instruction, the system resources available to the subunit 502 , the current workloads of the subunit 502 and of other subunits, operating conditions of the subunit 502 , and other factors.
- the assignment further depends on a thermal factor assigned to a subunit 502 .
- the thermal factor may cause the chip-level dispatch unit 508 to assign a given subunit 502 fewer instructions or no instructions or may cause the chip-level dispatch unit 508 to suspend all tasks assigned to the given subunit 502 and reassign them to other subunits.
- the chip-level thermal monitor unit 510 observes and maintains a suitable thermal environment on the circuit device 500 .
- the chip level thermal monitor unit 510 receives data from temperature sensors 302 and utilizes the data to determine conditions at thermal reference points.
- the thermal reference points are grouped by either circuit device 500 or thermal regions 512 .
- a thermal region 512 may include part of a circuit subunit 502 , an entire circuit subunit 502 , more than one circuit subunit 502 , or any combination thereof.
- thermal regions 512 are defined differently for each operating parameter. It is not necessary for the entire circuit device 500 to have a corresponding thermal region 512 . Particularly, areas of low density may not have an associated thermal region 512 .
- the chip-level thermal monitor unit 510 optimizes performance by monitoring conditions throughout the thermal regions 512 and taking corrective action such as varying operating parameters and issuing power-saving commands.
- conditions at a thermal reference point can trigger a response. Possible triggering events include temperature data exceeding a preset limit, multiple thermal reference points with temperature exceeding a preset limit, excessive rate of change in temperature, and excessive temperature over a prolonged period.
- thermal reference points do not correspond with the location of a temperature sensor 302 , thermal conditions are interpolated. Interpolation may be based on a simple linear model, or may account for the existence, operation, and location of heat generating structures within the circuit device 500 . Interpolation may also be based on systemic conditions that affect the device board 100 .
- the chip-level thermal monitor unit 510 When the chip-level thermal monitor unit 510 detects unacceptable conditions in a thermal region 512 , or some portion of or combination thereof, it may take one or more corrective actions in response.
- the thermal monitor unit 510 may issue a command to shut down integrated circuits correlated with thermal regions 512 or may issue a command to place the integrated circuits into a low-power mode.
- the chip-level thermal monitor unit 510 may also modify an operating parameter, such as voltage, frequency, or workload, for the region.
- the chip-level thermal monitor unit 510 interacts with the chip-level power-regulating unit 504 to reduce the operating voltage delivered to a thermal region 512 .
- the chip-level thermal monitor unit 510 interacts with the chip-level clock control unit 506 to reduce the frequency of the functional clock for thermal region 512 or some portion or combination of thermal regions 512 .
- the chip-level thermal monitor unit 510 interacts with the chip-level dispatch unit 508 to reduce the workload for the circuits correlated with a thermal region 512 or some portion or combination of thermal regions 512 .
- the chip-level thermal monitor unit 510 observes the circuits within their associated thermal regions to ensure that the circuits function properly at the new operating parameter.
- the chip-level thermal monitor unit 510 may include a heartbeat monitor to track the operating status of the circuit subunits.
- FIG. 6 is a flow chart of an exemplary method of thermal management for a device board. Additional steps can be provided before, during, and after the method 600 , and some of the steps described can be replaced or eliminated for other embodiments of the method.
- the method 600 begins at block 602 where temperature data is received from thermal measurement points corresponding to a thermal region or some portion or combination thereof.
- thermal conditions are interpolated for a set of thermal reference points where directly measured data is not available.
- a thermal map is determined from the thermal conditions in the thermal regions, as measured by the circuit devices within the thermal regions.
- a first trigger event is detected.
- This trigger event may be based on received temperature data, interpolated temperature data, rates of change of temperature data, and/or other trigger criteria.
- the current operating parameters are analyzed for circuit devices as measured from their associated thermal reference points.
- a first response is made to the trigger event.
- Exemplary responses include modifying an operating parameter within the thermal region and issuing a command to suspend or shut down a circuit device.
- a second trigger event is detected.
- the current operating parameters are analyzed for circuit devices as measured from their associated thermal reference points.
- a second response is made to the trigger event.
- FIG. 7 is a flow chart of an exemplary method of thermal management for a device board.
- the method 700 begins at block 702 where an operating parameter is modified.
- a circuit device affected by the modification to the operating parameter is monitored to determine whether the circuit device functions properly. If not, at block 706 , the operating parameter is reverted to a previous value. If the circuit device functions properly with the modified operating parameter, at block 708 , the operating parameter is maintained at the modified value.
- FIG. 8 is a flow chart of an exemplary method of thermal management for a device board.
- the method 800 begins at block 802 where an operating parameter is modified.
- the temperature data is monitored to determine whether the modified operating parameter was successful at resolving the temperature event. If the temperature data indicates that the temperature is no longer critical, at block 806 , the operating parameter is reverted to a previous value. If the temperature remains within a critical window, at block 808 , the operating parameter is maintained in its current state. If the temperature exceeds the critical window, at block 810 , a command is issued to instruct a circuit device to suspend operation.
- FIG. 9 is a diagrammatic perspective view of a device board according to various aspects of the present disclosure.
- the PCB 108 of the device board 100 may include a thermal dissipating layer 900 between the insulating layers 110 .
- the thermal dissipating layer 900 is used to conduct heat away from active regions of the device board 100 and towards one or more radiating islands 902 disposed on the surface of the PCB 108 . Areas of the thermal dissipating layer, such as those in proximity to circuit devices, absorb heat. The heat is conducted along the thermal dissipating layer 900 to the thermal vias 904 .
- the thermal vias 904 transfer heat energy through the insulating layers 110 of the PCB 108 and to the radiating islands 902 .
- space around the device board 100 is limited. It may not be possible to add heat sinks directly above heat-generating components. By conducting heat through the PCB 108 , this structure disperses heat without adding excessive height to the device board 100 .
- the radiating islands 902 may comprise any suitable radiating material such as copper, tin, silver, aluminum, gold, non-metallic conductors, organic conductors, and/or other suitable heat-transferring materials. In some embodiments, the radiating islands 902 are configured to conduct heat to the perimeter of the device board 100 . In further embodiments, the radiating islands 902 conduct heat to a heat transfer system in the host system such as an airflow region, a heat pipe, or a liquid cooling waterblock. In some embodiments, the radiating islands 902 may be disposed on the opposite side of the PCB from the circuit devices.
- FIG. 10 is a diagrammatic top view of a thermal dissipating layer according to various aspects of the present disclosure.
- the thermal dissipating layer 900 includes thermally conductive regions 1000 of a thermally conductive material such as copper, aluminum, tin, silver, gold, non-metallic conductors, and organic conductors.
- the thermal dissipating layer 900 may include one or more electrically conductive circuit traces 1002 for routing signals between circuit devices.
- the regions 1000 may include holes 1004 to allow signal vias from trace layers 112 to pass through the thermal dissipating layer 900 without shorting.
- multiple thermal dissipating layers 900 are used.
- the additional thermal dissipating layers 900 are connected with thermal vias 904 . This configuration enhances the thermal conductive capacity of the overall structure.
- the technique of: 1) developing a multi-dimensional map representing a type of stress experienced by a system; and 2) utilizing the multi-dimensional map to adaptively respond can be applied to other types of stress.
- the same technique may be applied to vibration sensors and vibration stress, bandwidth utilization and communication pressure, circuit monitoring and circuit faults.
- the “transformation” that occurs between thermal reference points, circuit devices, and thermal regions is useful for both analytic and reporting purposes.
- one embodiment allows an end user to diagnose issues within a server.
- a piece of hardware unrelated to the monitored circuits may result in a particular thermal region that is consistently hot, and that no amount of voltage or clock control serves to mitigate the stress.
- the transformation from a series of points into a spatially or circuit-correlated thermal map can help users diagnose this “hot spot” as a problem on the motherboard, in the power supply, or elsewhere in the server.
- inventions can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements.
- embodiments of the present disclosure can take the form of a computer program product accessible from a tangible computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a tangible computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, a semiconductor system (or apparatus or device), or a propagation medium.
- the present invention provides a system and method for thermal management for integrated circuits.
- the circuit device board includes: a plurality of circuits; a plurality of temperature sensors; a thermal management unit; and a printed circuit board wherein the plurality of temperature sensors are communicatively coupled to the thermal management unit and at least one of the plurality of circuits are controllably coupled to the thermal management unit; and wherein the thermal management unit comprises a thermal monitor unit configured to receive thermal data from the plurality of temperature sensors and to determine a plurality of thermal reference points, the thermal reference points defining a plurality of thermal regions; wherein the thermal monitor unit is further configured to make a first corrective response to modify the conditions in a one or more thermal regions; wherein the thermal monitor unit is further configured to make a second corrective response to modify the conditions in one ore more thermal regions; and wherein the first corrective response and the second corrective response are independent.
- the circuit device includes: a power-regulating unit; a clock control unit; a plurality of circuit subunits; a plurality of temperature sensors; and a thermal management unit configured to receive thermal data from the plurality of temperature sensors and to determine a plurality of thermal reference points, the thermal reference points defining a plurality of thermal regions; wherein the thermal management unit is further configured to make a first corrective response to modify the conditions in one or more thermal regions; wherein the thermal management unit is further configured to make a second corrective response to modify the conditions in one or more thermal regions; and wherein the first corrective response and the second corrective response are independent.
- the method of thermal management includes: measuring thermal data at a plurality of points in a circuit device; determining a thermal map from the thermal data, the thermal map comprising a plurality of regions, and wherein the values in the thermal map are correlated with the operating characteristics of the circuit device; identifying a first trigger event for a first thermal region; analyzing current operating parameters of one or more thermal regions; responding to the first trigger event by making a first corrective response; identifying a second trigger event for a second thermal region; and analyzing current operating parameters of one or more thermal regions; responding to the second trigger event by making a second corrective response; wherein the first corrective response and the second corrective response are independent.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Power Sources (AREA)
Abstract
A method and system for thermal management in integrated circuits and integrated circuit boards is described. In an embodiment, the circuit device board includes circuit devices, temperature sensors, and a thermal management unit. The thermal management unit receives thermal data from the temperature sensors and determines thermal reference points that define thermal regions. The thermal reference points are correlated with the operating characteristics of the circuit devices. When warranted, the thermal management unit makes independent corrective responses to each of the thermal regions. These corrective responses include modifying operating parameters, adjusting workload, and suspending operation of circuit devices within the thermal region. Thus, the disclosed method and system can preserve function in one thermal region while alleviating stress on another thermal region.
Description
- This application is a continuation of U.S. patent application Ser. No. 13/398,686 to Jeffrey H. Brower, filed Feb. 16, 2012, entitled “Thermal Management for Integrated Circuits”, which is a continuation-in-part of U.S. patent application Ser. No. 13/397,534 to Jeffrey H. Brower, filed Feb. 15, 2012, now U.S. Pat. No. 8,996,192, issued Mar. 31, 2015, entitled “Thermal Management for Integrated Circuits”, which claims the benefit of U.S. Provisional Application No. 61/443,430, filed Feb. 16, 2011 and U.S. Provisional Application No. 61/443,394, filed Feb. 16, 2011, the entire disclosures of each of which are herein incorporated by reference.
- Rapid advances in semiconductor technology have dramatically changed the landscape of integrated circuits (IC) and their applications. Feature sizes continue to fall despite the increasing time and effort required to deliver improvements. Developments in device density allow a greater number of circuit devices to be fit into a given area. At the same time, device performance continues to improve, thereby delivering greater speeds, increased efficiency, and reduced cost. These improvements both necessitate and facilitate advances in fields such as device design, manufacturing technology, system integration, and software engineering.
- For example, increasing circuit density frequently increases the amount of heat generated within a region. While efficiency gains may cut down on heat produced by a given device, frequently this thermal efficiency is more than offset by the increased number of circuits and increased leakage. As a result, advanced devices generate more heat in a smaller area. This heat must be controlled as it can lead to system instability, thermal shutdown, and even permanent damage. To meet customers' performance and efficiency goals and to allow future improvements in device density, increasingly complex methods of thermal management are required.
- The features and advantages of the present disclosure will be apparent from the following detailed description and the accompanying figures. It is understood that the figures that follow are merely illustrative and, in the interest of clarity, are not necessarily drawn to scale. Furthermore, features may be enlarged or omitted as necessary to best illustrate the invention.
-
FIG. 1 is a diagrammatic perspective view of a device board according to various aspects of the present disclosure. -
FIG. 2 is a diagrammatic top view of a device board according to various aspects of the present disclosure. -
FIG. 3 is a diagrammatic top view of a device board according to various aspects of the present disclosure. -
FIG. 4 is a diagrammatic top view of a device board according to various aspects of the present disclosure. -
FIG. 5 is a schematic diagram of a circuit device according to various aspects of the present disclosure. -
FIG. 6 is a flow chart of a method of thermal management for a device board according to an embodiment of the present disclosure. -
FIG. 7 is a flow chart of a method of thermal management for a device board according to an embodiment of the present disclosure. -
FIG. 8 is a flow chart of a method of thermal management for a device board according to an embodiment of the present disclosure. -
FIG. 9 is a diagrammatic perspective view of a device board according to various aspects of the present disclosure. -
FIG. 10 is a diagrammatic top view of a thermal dissipating layer according to various aspects of the present disclosure. - The present disclosure relates generally to integrated circuit design, integration, and operation and more particularly to devices and methods for thermal management of integrated circuit devices.
- The following disclosure describes features of multiple separate embodiments. The specific embodiments are selected to promote clarity and understanding. As examples, it is understood that the embodiments that follow are not intended to be limiting. Except as noted, features may be combined between embodiments. Other features may be omitted from some embodiments. Unless otherwise specified, the repetition of numbers between figures does not convey any relationship between the embodiments depicted therein.
- Moreover, a description of a first device connected to a second device may include embodiments in which the first device is directly connected to the second device and may further include embodiments where the first device and the second are connected via an intermediary. Similarly, references made to directions and locations, such as “above,” “below,” “to the left,” and “to the right,” are intended to simplify understanding of an illustrated embodiment. The disclosure applies equally to embodiments where the orientation is altered. For example, a device described as being above another may be located below when an actual embodiment is observed. This is understood, as any embodiment may be observed from any orientation.
-
FIG. 1 is a diagrammatic perspective view of adevice board 100. Thedevice board 100 may be an essential component of a host system, such as a motherboard or daughtercard, and may also be an expansion card to expand the capabilities of the host system.Device boards 100 are commonly used to add additional processing power, to add networking capability, to add signal processing capacity, to add graphics and audio functionality, to add input and output bandwidth, to handle device I/O, and to add any other suitable enhancement to the host system. Thedevice board 100 includes a number of components includingcircuit devices 102 for providing functionality, abus interface unit 104 for interfacing with other devices, and a power-regulatingunit 106 for performing voltage control. The components are disposed on a printed circuit board (PCB) 108. - The
circuit devices 102 provide the bulk of the functionality of thedevice board 100. In some embodiments, thecircuit devices 102 include integrated circuit processing devices such as general-purpose processors (CPUs), graphics processing units (GPUs), multicore processors, digital signal processors (DSPs), and/or other suitable processors. In some embodiments, thecircuit devices 102 include field-programmable gate arrays (FPGAs), programmable logic controllers (PLCs), and/or microcontrollers. Commonly, thecircuit devices 102 include other integrated circuits such as interface devices (e.g., a bridge device), fabric controllers, analog-to-digital converters, watchdog monitors, and memory circuits (e.g., RAM, ROM, EEPROM, and/or Flash Memory).Circuit devices 102 may also include supporting devices such as capacitors, resistors, diodes, optical isolators, and other suitable supporting devices. - The
bus interface unit 104 transports data between thedevice board 100 and other device boards, peripheral components, a host system, and any other suitable system or device. Thebus interface unit 104 provides any number of data connections and may further provide connections for supply voltage, clock signals, diagnostic and status signals, and other suitable signals. These data connections may take the form of a connector such as a blade connector, a pin array connector, a socket connector, a cable connector, or any other connector known to one of skill in the art. In an exemplary embodiment, these data connections are established wirelessly. Thebus interface unit 104 may support any data transfer standard including Ethernet, IEEE 802.11, PCIe (Peripheral Component Interconnect Express), PCI, RapidIO, AGP (Accelerated Graphics Port), ISA, SATA, InfiniBand, USB, and other suitable bus standards. - The power-regulating
unit 106 receives a supply voltage, commonly a DC voltage from a computing system power supply or battery, and distributes an operational voltage to other components of thedevice board 100 such as thebus interface unit 104 and thecircuit devices 102. The operational voltage may be stepped down or stepped up from the supply voltage and may be a rectified DC function of the supply voltage, such as in embodiments with an AC supply voltage. In some embodiments, the power-regulatingunit 106 receives the supply voltage via thebus interface unit 104. In further embodiments, the power-regulatingunit 106 includes power connectors for receiving the supply voltage directly. The power-regulatingunit 106 commonly includes power-handling components such as transformers, diodes, capacitors, inductors, power MOSFETs, and fusible links. - In some embodiments, the power-regulating
unit 106 is an adaptive power-regulating unit. The adaptive power-regulating unit is capable of varying the operational voltage delivered to thecircuit devices 102 based on a voltage control signal. For example, the adaptive power-regulating unit may be configured to supply a 3.5V DC operational voltage nominally and may be able to vary the operational voltage +/−10% in response to the voltage control signal. In further embodiments, the adaptive power-regulating unit is capable of delivering a first operational voltage to a first circuit device while delivering a second operational voltage to a second circuit device. This allows the adaptive power-regulating unit to increase or reduce power to a particular circuit device without affecting other circuit devices. An adaptive power-regulating unit is particularly useful for managing heat. For example, if the first circuit device approaches a critical temperature, power to the device can be reduced. Reducing the operational voltage of the first circuit device may decrease heat output but may also decrease performance. By maintaining a higher operational voltage for the second circuit device, which is not experiencing thermal issues, the overall performance of thedevice board 100 is not as severely impacted. - In some embodiments, the
device board 100 further includes aclock control unit 114. Theclock control unit 114 generates and distributes a clock signal to thecircuit devices 102. The clock signal is commonly based on a reference clock. In some embodiments, the reference clock is received via thebus interface unit 104. In further embodiments, the reference clock is generated by an oscillator such as an oscillator crystal, a ceramic resonator, or an oscillating circuit. To create the clock signal, theclock control unit 114 may distribute the reference clock without modification or it may perform synchronization, shaping, amplification, frequency division or multiplication, duty cycle adjustment, or other suitable modifications to the reference clock. - In a further embodiment, the
clock control unit 114 is an adaptive clock control unit. The adaptive clock control unit is configured to adjust the clock signal delivered to thecircuit devices 102 in response to a frequency control signal. For example, the adaptive clock control unit may generate a 1 GHz clock by default with the ability to adjust the clock frequency by up to +20% or down to −50% based on the frequency control signal. In some embodiments, the adaptive clock control unit is capable of delivering a first clock signal to a first circuit device while delivering a second clock signal to a second circuit device and is capable of adjusting the different clock signals independently. This allows the clock control unit to respond to changing needs of the first circuit device while maintaining an optimal clock for the second circuit device. For example, the clock control unit may reduce the clock frequency for the first circuit device in response to a thermal condition. Reducing the clock frequency commonly reduces power consumption and heat production. It may also reduce overall performance. If the first circuit device is approaching a temperature limit, an adaptive clock control unit can reduce the clock signal frequency of the first circuit device while maintaining the clock signal frequency of the second circuit device. Because the second circuit device can continue operating at a higher frequency, the impact on overall performance is reduced. - The
circuit devices 102, thebus interface unit 104, the power-regulatingunit 106, and, in some embodiments, theclock control unit 114 are mounted on the printed circuit board (PCB) 108. ThePCB 108 physically supports the components and provides connections between them. In many embodiments, thePCB 108 is made up of a number of layers. These include insulatinglayers 110 and trace layers 112. The insulatinglayers 110 provide physical rigidity and durability. They typically contain dielectric material combined with an epoxy to create a laminate sheet. For example, the insulatinglayers 110 may comprise an FR4-rated glass-reinforced epoxy laminate. The trace layers 112 contain conductive traces that connect the components disposed on thePCB 108 including thecircuit devices 102, thebus interface unit 104, and the power-regulatingunit 106. Based on the application, the conductive traces may be formed from any conductive material including copper, tin, silver, and gold, other metals and alloys and including non-metallic conductors such as graphite, conductive polymers, and organic conductors. The conductive traces may be formed on or bonded to the insulatinglayers 110 directly or may be formed on a backing material. Connecting traces ondifferent trace layers 112 often requires creating openings in the insulating layers 110. The openings are then filled with a conductor to create via structures between the traces of the different trace layers 112. -
FIG. 2 is a diagrammatic top view of adevice board 100.FIG. 2 and all other figures herein are simplified for clarity. In the depicted embodiment, the power-regulating unit is an adaptive power-regulatingunit 200. The adaptive power-regulatingunit 200 is capable of varying the operational voltages delivered to thecircuit devices clock control unit 220. The adaptiveclock control unit 220 is capable of delivering a first clock signal to afirst circuit device 102 a while delivering a second clock signal to asecond circuit device 102 b and is capable of adjusting the different clock signals independently. It is understood that the designations ofcircuit device 102 a andcircuit device 102 b are arbitrary and do not imply that the operational voltage is in any manner linked to the clock signal for a given circuit device. - In an embodiment, the device board further includes a
dispatch unit 240. Thedispatch unit 240 receives instructions and distributes the instructions for execution to the circuit devices includingcircuit devices dispatch unit 240 may look to criteria including capabilities of each circuit device, current workload of each circuit device, data dependencies, available board resources such as bus availability, operating conditions of each circuit device, and other performance-related criteria. Thedispatch unit 240 may also look to the thermal conditions and thermal profile of acircuit device 102. A thermal factor may be included when weighing the desirability of a particular circuit device. For example, thedispatch unit 240 may forego sending some instructions todevice 102 a in response to a thermal factor. A more severe thermal factor may cause the dispatch unit to forego sending any instructions todevice 102 a. A critical thermal factor may cause the dispatch unit to cancel instructions sent tocircuit device 102 a and reassign them todevice 102 b. Many circuit devices consume less energy and produce less heat when idle. Thus, it is possible that the temperature ofdevice 102 a will drop during the idle time and relieve the thermal condition. Furthermore, ifdevice 102 b is not experiencing an adverse thermal condition and has bandwidth available to execute the instruction, the overall performance penalty may be small. -
FIG. 3 is a diagrammatic top view of adevice board 100 according to an embodiment of the present disclosure. Thedevice board 100 further includes athermal monitor unit 300. Thethermal monitor unit 300 receives temperature data from a number oftemperature sensors 302. Thetemperature sensors 302 produce temperature data, which may include producing an analog or digital temperature reading, producing a warning when a critical temperature is reached or surpassed, producing another type of temperature data, or a combination thereof.Temperature sensors 302 may be stand-alone devices, and, in some embodiments, temperature sensors are integrated intocircuit devices 102. Data from thetemperature sensors 302 is used to determine conditions at thermal reference points. Thermal reference points are not limited to locations oftemperature sensors 302. In many embodiments, conditions at thermal reference points are interpolated from temperature sensor data. Groups of thermal reference points form a portion or combination of thermal regions 304 (of whichthermal regions device board 100.Thermal regions 304 are defined as needed and may be defined differently for each operating parameter.Thermal regions 304 may contain part of acircuit device 102, anentire circuit device 102, and/ormultiple circuit devices 102. It is not necessary for theentire device board 100 to have a correspondingthermal region 304. Particularly, areas of low density may not have an associatedthermal region 304. Taken together, thesethermal regions 304, when evaluated in light of the architecture of thedevice board 100, make up a thermal map. By aggregating temperature data from the various thermal reference points and processing them to create a thermal map of the region, one or more operating characteristics of the circuits on thedevice board 100 can be modified to respond to and manage the thermal characteristics of thedevice board 100. - The thermal map represents the current thermal conditions throughout the thermal regions. With reference to the
circuit devices 102 and arrangement ofcircuit devices 102 shown in various embodiments, there is a known or ascertainable spatial relationship between the elements on thePCB 108 and the values represented on the thermal map. In one embodiment, the thermal map is correlated with the physical dimensions of theunderlying circuit devices 102, so that a change in value at a thermal reference point corresponds to a change in the conditions at a particular location in two-dimensional (X-Y) or three-dimensional (X-Y-Z) space. This correlation may be developed by using or inputting physical specifications from the layout and characteristics of theunderlying circuit devices 102 and elements of thecircuit devices 102, or it may be built up probabilistically by observing correlations between various thermal reference points and creating the map that corresponds to the observed or inferred relationships. - To provide increased granularity, it is frequently necessary to interpolate conditions at thermal reference points where
temperature sensors 302 are not available. Interpolation may be performed using a purely linear interpolation where temperature is assumed to vary linearly between thetemperature sensors 302. For example, temperature at thermal reference point X, TX, between sensors A and B can be estimated as: -
- However, this type of linear interpolation does not account for thermal factors such as a heat-generating element located between sensors A and B. To correct for this, in some embodiments, the
thermal monitor unit 300 calculates the temperature at a thermal reference point X based on the distance from nearby sensors, the existence and location of heat-generating circuit devices between the sensors, and the distance between point X and the heat-generating devices. In some embodiments, thethermal monitor unit 300 further compensates for the operating parameters of the heat-generating devices when determining the effect on the thermal reference point. By considering the physical structure of the thermal zones, thethermal monitor unit 300 constructs a more accurate thermal map. The thermal map may also include thermal factors that are notcircuit devices 102 and thermal factors that are not near thethermal region 304. For example, thethermal monitor unit 300 may consider a heat source that is not part of thedevice board 100 but is known to have an effect on the thermal reference point. Other relevant systemic factors include airflow, nearby cooling solutions, and other thermal aspects of the host system. - It may not always be possible, practical, or desirable to construct a thermal map with knowledge of all possible mechanisms that may drive thermal conditions. Thus in some embodiments, the
thermal monitor unit 300 observes the changes in thermal conditions over time at various thermal reference points and adapts the thermal map accordingly. This may include analyzing the correspondence between two or more thermal reference points. As an example, a first and asecond temperature sensor 302 track closely when a givencircuit device 102 a is active. The thermal monitor unit may infer thatcircuit device 102 a affects bothsensors 302 and also affects nearby thermal reference points. As a further example, an unknown host system trait causes adverse thermal conditions at one ormore temperature sensors 302 on a regular basis. The thermal monitor unit can then use this information when interpolating thermal conditions at other thermal reference points. This spatially and operationally aware mapping provides more accurate prediction and monitoring of thermal reference points wheretemperature sensors 302 are not available and facilitates more effective responses. - As explained, the
thermal monitor unit 300 may account for thermal factors, such as systemic factors, by using them to calculate thermal conditions at thermal reference points. Furthermore, in some embodiments, thethermal monitor unit 300 considers the systemic conditions includingcircuit devices 102 when determining a response to a thermal condition. It should be emphasized that it is not always necessary to determine the cause of a thermal condition to be able to formulate an effective response. - One of the most basic responses is a trigger response where the voltage, clock rate, or workload of one or
more circuit devices 102 is modified when the measurement point or area corresponding to thatcircuit device 102 passes some threshold. In an embodiment implementing a trigger response, thethermal monitor unit 300 analyzes the temperature data to determine when a triggering event occurs. A wide variety of triggering events is contemplated. For example, a single report of excessive temperature by asingle temperature sensor 302 may be a triggering event. An excessive temperature over a prolonged period may also be a triggering event. A sudden increase in temperature or a rate of increase may trigger a response. A trigger may be based on a number oftemperature sensors 302 experiencing excessive temperatures. In some embodiments, a number oftemperature sensors 302 reporting high but not critical temperatures triggers a response. Commonly, it will not be possible to dispose atemperature sensor 302 next to a critical device. Therefore, in many embodiments, thethermal monitor unit 300 interpolates conditions at thermal reference points throughout a portion or combination ofthermal regions 304. Thus, the triggering event may be a temperature that is calculated, not recorded by atemperature sensor 302. - When a response is warranted, the
thermal monitor unit 300 may take one or more corrective actions. These responses include changing operating parameters such as voltage, clock frequency, or workload. Referring still toFIG. 3 , in one exemplary embodiment, thethermal monitor unit 300 interacts with an adaptive power-regulatingunit 200 to control one or more operating voltages for thecircuit devices 102. If the conditions at one or more thermal reference points indicates a response is needed in a thermal region such as 304 a, or a portion or combination of thermal regions, thethermal monitor unit 300 sends a voltage control signal to the adaptive power-regulatingunit 200 to lower the operating voltage of a thermal region such as 304 a, or in a portion or combination of thermal regions such as atcircuit device 102 a. At the same time, if thermal conditions indicate thatcircuit device 102 a is within the acceptable temperature range, thethermal monitor unit 300 sends a voltage control signal to the adaptive power-regulatingunit 200 to maintain or increase the current operating voltage for a portion or combination of the area pertaining tothermal regions 304 b. Thus, circuit devices inthermal region 304 b, which are not experiencing heat issues, are able to operate at a higher voltage and level of performance. In this way, thethermal monitor unit 300 and the adaptive power-regulatingunit 200 work together to maintain peak performance while combating overheating. - In a further embodiment, the
thermal monitor unit 300 interfaces with an adaptiveclock control unit 220. Thethermal monitor unit 300 transmits one or more clock control signals to theclock control unit 220. In response, theclock control unit 220 alters the clock signals sent one or morethermal regions 304. For example, thethermal monitor unit 300 may respond to excessive conditions inthermal region 304 a or a portion or combination of thermal regions by sending a clock control signal to the adaptiveclock control unit 220 to reduce the frequency of the clock signal for a portion or combination of the circuits correlated withthermal region 304 a. Ifthermal region 304 b is not experiencing adverse conditions, thethermal monitor unit 300 may send a clock control signal to the adaptiveclock control unit 220 to maintain or increase the current frequency. By independently monitoring and responding to thermal events forthermal regions - The
thermal monitor unit 300 may also interface with thedispatch unit 240 to direct traffic away from or towards a circuit device. In an embodiment, thethermal monitor unit 300 assigns and transmits a thermal factor for eachcircuit device dispatch unit 240. Thedispatch unit 240 weighs the thermal factor when assigning instructions to the circuit devices. Small thermal factor values forcircuit device 102 a may drive a percentage of traffic away fromcircuit device 102 a and towardsdevice 102 b. Moderate thermal factor values may drive all traffic towardsdevice 102 b. Critical thermal factor values may suspend all tasks forcircuit device 102 a and reassign them tocircuit device 102 b. - The
thermal monitor unit 300 may also issue power-saving commands directly tocircuit devices 102. In some embodiments, thethermal monitor unit 300 responds to an event by issuing a Shutdown command to circuit device (or circuit devices) 102. This action may greatly reduce heat output in the associated thermal regions. However, as a Shutdown command may compromise data, other low power mode commands such as Halt and Sleep (as per the Advanced Configuration and Power Interface standard) are supported as well. In a Halt state or C1 state, no commands are executed, butcircuit devices 102 remain powered. In a Sleep state or C3 state, volatile caches are flushed, and parts of thecircuit devices 102 may be powered down. Power-saving commands may not be recognized by allcircuit devices 102 within the thermal region. In fact, in embodiments where more than onecircuit device 102 within athermal region 304 may be powered down, thethermal monitor unit 300 may power down any one or more of thecircuit devices 102 while leaving the remainder functioning. When a command to power down one ormore circuit devices 102 may result in a loss of data, thethermal monitor unit 300 may issue a request to adispatch unit 240 to reassign the instruction transparently to analternate circuit device 102. - Particularly in, but not limited to, embodiments where the
thermal monitor unit 300 adjusts operating parameters such as operating voltage or clock signal frequency, thethermal monitor unit 300 further observesindividual circuit devices 102 to ensure that changes do not cause the device to fail. This may be done by monitoring a “heartbeat” signal. For example, a heartbeat signal can be generated by an instruction executed by one ofcircuit devices 102 to pulse a heartbeat output at a regular interval. In the event of a fault such as a deadlock, livelock, or inadvertent reset, the heartbeat output would fail to pulse as expected. In some embodiments, thethermal monitor 300 includes a heartbeat monitor. If the heartbeat monitor does not receive a regular heartbeat signal from a monitored circuit device, thethermal monitor 300 responds by reverting changed parameters such as operational voltage or clock signal frequency, or responds by rebooting one or more ofcircuit devices 102. Thethermal monitor 300 may also hold the device in a Halt, Sleep, or Shutdown condition until further user input is received. This is useful if a circuit device repeatedly fails to work at a lower voltage or frequency. - In further embodiments, the
thermal monitor unit 300 utilizes the heartbeat signal to pursue aggressive reductions in power. Instead of making large changes in operating parameters, thethermal monitor 300 instructs the adaptive power-regulatingunit 200 or adaptiveclock control unit 220 to make a smaller change. Thethermal monitor unit 300 pauses to determine whether thecircuit devices 102 withinthermal regions 304 operate correctly at the new parameters. If thecircuit devices 102 function correctly and the thermal condition does not abate, thethermal monitor unit 300 may make another small change to the operating parameters. This is continued until a minimum operating power is reached or until the thermal condition is resolved. - It is not necessary nor always optimal for the
thermal monitor unit 300 to take corrective action exclusively in the thermal region that is experiencing the thermal event. By utilizing the thermal map, thethermal monitor unit 300 can recognize contributing factors that may be remedied in order to alleviate the thermal condition. For example, airflow issues may cause heat fromcircuit device 102 b to collect inregion 304 a but not inregion 304 b wherecircuit device 102 b is located. Modifying operating parameters ofcircuit device 102 a withinregion 304 a may not relieve this condition as effectively as modifying operating parameters ofcircuit device 102 b. From the thermal map, thethermal monitor unit 300 recognizes the contributing factors to the thermal conditions ofregion 304 a. In response, thethermal monitor unit 300 modifies the operation ofregion 304 b to relieve the condition ofregion 304 a. - The
thermal monitor unit 300 may also modify the operation of peripherals not located on thedevice board 100. In an embodiment, thethermal monitor unit 300 utilizes thebus interface unit 104 to send commands to connected devices. Thethermal monitor unit 300 is capable of modifying operating parameters, such as voltage, clock frequency, and workload, of circuit devices on other device boards. Thethermal monitor unit 300 can adjust airflow, coolant flow, and other regulating mechanisms on thedevice board 100 and elsewhere. - The
thermal monitor unit 300 may further include a user interface. The user interface is used to notify the host system of thermal events and may allow users to change system parameters and reconfigure thethermal monitor unit 300. In some embodiments, thethermal monitor unit 300 sends a status notification to the user via the user interface. This status notification may include a list of current operating parameters, a list of recent events, a list of trigger criteria, status for variousthermal regions 304, and other suitable status data. The status notification may be sent as a regularly occurring event, as a response to other data such as a critical temperature reading, as a response to a user request, or as a response to any other event. In some embodiments, thethermal monitor unit 300 records and stores temperature data and changes to operating parameters. Thethermal monitor unit 300 may also record and store a record of the state of thecircuit devices 102 including details on the instructions being executed. This may be crucial when debugging software that leads to an adverse thermal condition. In some embodiments, the user can also manually modify operating parameters, configure triggers and responses, and execute instructions such as to Resume orShutdown circuit devices 102 via the user interface. - The user interface is intended to help users analyze performance metrics, evaluate system reliability, and resolve heat management issues. To facilitate this, software may be used to present the information in a form that is easy for the user to digest. For example, software may receive thermal information at the thermal reference points via the user interface and produce a diagnostic display. In an exemplary diagnostic display, a graphical bitmap illustrating the
device board 100 is generated. The user then selects datasets to be displayed as overlays on the board illustration, such as a gradient map, measured thermal conditions, circuit device status including uptime and load, and other diagnostic information. The datasets may contain information received from thethermal monitor unit 300, information received fromcircuit devices 102 of thedevice board 100, information received from a host system, and information received from other sources. It is understood that the datasets may further contain information interpolated from received information, particularly when producing overlays such as gradient maps. For further clarity, the graphical bitmap may include one or more diagnostic regions. These may be, but are not necessarily, coincident with the thermal regions for any particular operating parameter. Software may also be used to manage thethermal monitor unit 300 via the user interface. For example, in high-risk environments, a software program on a host system may regularly inspect thethermal monitor unit 300 and trigger a shutdown of acircuit device 102 if thethermal monitor unit 300 is unable to resolve a problem. -
FIG. 4 is a diagrammatic top view of adevice board 100 according to an embodiment of the present disclosure. In the depicted embodiment,circuit device 400 warrants multiplethermal regions multiple temperature sensors 302 within the thermal regions. Thethermal monitor unit 300 monitors the conditions of thetemperature sensors 302 and of the thermal reference points. If necessary, thethermal monitor unit 300 is capable of altering the operating parameters of one or more thermal regions of thecircuit device 400 independently. This configuration allows fine-grained control of heat generation. Regions of thecircuit device 400 can be optimized in response to thermal conditions without affecting neighboring regions. In some embodiments, thethermal monitor unit 300 interfaces with an adaptive power-regulatingunit 200 to alter the operating parameters of thecircuit device 400. For example, thecircuit device 400 may receive a different operating voltage for each ofthermal regions thermal region 402 can be altered without affecting the operating voltages ofthermal regions thermal monitor unit 300 interfaces with an adaptiveclock control unit 220 to alter the clock-related parameters of thecircuit device 400. In further embodiments, thecircuit device 400 is capable of altering its operating parameters independent of an adaptive power-regulatingunit 200 or an adaptiveclock control unit 220. This allows thethermal monitor unit 300 to coordinate withcircuit device 400 directly to tune the operation of thecircuit device 400. Furthermore, thethermal monitor unit 300 may interface with adispatch unit 240 to assign workload to subunits of thecircuit device 400. For example, workload may be shifted from subunits inthermal region 402 and towards subunits inthermal region 406. By modifying operating parameters of select thermal regions within acircuit device 400 while preserving the optimum performance of other thermal regions within the same device, thedevice board 100 retains performance that may otherwise be lost. -
FIG. 5 is a schematic diagram of a circuit device according to an embodiment of the present disclosure.Circuit device 500 contains one ormore circuit subunits 502, a chip-level power-regulatingunit 504, a chip-levelclock control unit 506, a chip-level dispatch unit 508, and a chip-levelthermal monitor unit 510.Possible circuit subunits 502 include fixed-point processing cores, floating-point processing cores, matrix math units, vector processing units, special function processors, controllers, branch prediction units, I/O interface units, intra-core interface units, wire busses, pervasive and test units, memory management units, and other suitable circuit subunits. In some embodiments,select circuit subunits 502 are memory such as caches, register files, memory arrays, programmable read-only memory, and flash memory. - The chip-level power-regulating
unit 504 handles power distribution for thecircuit device 500. The chip-level power-regulatingunit 504 receives a source voltage for thecircuit device 500, converts it to one or more operating voltages, and distributes the one or more operating voltages to thecircuit subunits 502, the chip-levelclock control unit 506, and the chip-levelthermal monitor unit 510. In some embodiments, the chip-level power-regulatingunit 504 is capable of varying the one or more operating voltages in response to a voltage control signal. - The chip-level
clock control unit 506 creates the appropriate clocks for the functional logic within thecircuit device 500. The chip-levelclock control unit 506 receives a system clock for thecircuit device 500, creates one or more functional clocks, and distributes the one or more functional clocks to the functional logic including that found in thecircuit subunits 502 and the chip-levelthermal monitor unit 510. In some embodiments, the chip-level clock control unit is capable of varying the one or more functional clocks in response to a clock control signal. - The chip-
level dispatch unit 508 receives instructions and assigns them tosubunits 502 for execution. The assignment may depend on the type of instruction, the capabilities of aparticular subunit 502, the data dependencies of the instruction, the system resources available to thesubunit 502, the current workloads of thesubunit 502 and of other subunits, operating conditions of thesubunit 502, and other factors. In some embodiments, the assignment further depends on a thermal factor assigned to asubunit 502. The thermal factor may cause the chip-level dispatch unit 508 to assign a givensubunit 502 fewer instructions or no instructions or may cause the chip-level dispatch unit 508 to suspend all tasks assigned to the givensubunit 502 and reassign them to other subunits. - The chip-level
thermal monitor unit 510 observes and maintains a suitable thermal environment on thecircuit device 500. The chip levelthermal monitor unit 510 receives data fromtemperature sensors 302 and utilizes the data to determine conditions at thermal reference points. The thermal reference points are grouped by eithercircuit device 500 orthermal regions 512. Athermal region 512 may include part of acircuit subunit 502, anentire circuit subunit 502, more than onecircuit subunit 502, or any combination thereof. In some embodiments,thermal regions 512 are defined differently for each operating parameter. It is not necessary for theentire circuit device 500 to have a correspondingthermal region 512. Particularly, areas of low density may not have an associatedthermal region 512. - The chip-level
thermal monitor unit 510 optimizes performance by monitoring conditions throughout thethermal regions 512 and taking corrective action such as varying operating parameters and issuing power-saving commands. In an embodiment, conditions at a thermal reference point can trigger a response. Possible triggering events include temperature data exceeding a preset limit, multiple thermal reference points with temperature exceeding a preset limit, excessive rate of change in temperature, and excessive temperature over a prolonged period. When thermal reference points do not correspond with the location of atemperature sensor 302, thermal conditions are interpolated. Interpolation may be based on a simple linear model, or may account for the existence, operation, and location of heat generating structures within thecircuit device 500. Interpolation may also be based on systemic conditions that affect thedevice board 100. - When the chip-level
thermal monitor unit 510 detects unacceptable conditions in athermal region 512, or some portion of or combination thereof, it may take one or more corrective actions in response. Thethermal monitor unit 510 may issue a command to shut down integrated circuits correlated withthermal regions 512 or may issue a command to place the integrated circuits into a low-power mode. The chip-levelthermal monitor unit 510 may also modify an operating parameter, such as voltage, frequency, or workload, for the region. In an embodiment, the chip-levelthermal monitor unit 510 interacts with the chip-level power-regulatingunit 504 to reduce the operating voltage delivered to athermal region 512. In a further embodiment, the chip-levelthermal monitor unit 510 interacts with the chip-levelclock control unit 506 to reduce the frequency of the functional clock forthermal region 512 or some portion or combination ofthermal regions 512. In an embodiment, the chip-levelthermal monitor unit 510 interacts with the chip-level dispatch unit 508 to reduce the workload for the circuits correlated with athermal region 512 or some portion or combination ofthermal regions 512. In some embodiments, after modifying an operating parameter, the chip-levelthermal monitor unit 510 observes the circuits within their associated thermal regions to ensure that the circuits function properly at the new operating parameter. In such embodiments, the chip-levelthermal monitor unit 510 may include a heartbeat monitor to track the operating status of the circuit subunits. -
FIG. 6 is a flow chart of an exemplary method of thermal management for a device board. Additional steps can be provided before, during, and after themethod 600, and some of the steps described can be replaced or eliminated for other embodiments of the method. Themethod 600 begins atblock 602 where temperature data is received from thermal measurement points corresponding to a thermal region or some portion or combination thereof. Atblock 604, thermal conditions are interpolated for a set of thermal reference points where directly measured data is not available. Atblock 606, a thermal map is determined from the thermal conditions in the thermal regions, as measured by the circuit devices within the thermal regions. Atblock 608, a first trigger event is detected. This trigger event may be based on received temperature data, interpolated temperature data, rates of change of temperature data, and/or other trigger criteria. Atblock 610, the current operating parameters are analyzed for circuit devices as measured from their associated thermal reference points. Atblock 612, a first response is made to the trigger event. Exemplary responses include modifying an operating parameter within the thermal region and issuing a command to suspend or shut down a circuit device. Atblock 620, a second trigger event is detected. Atblock 622, the current operating parameters are analyzed for circuit devices as measured from their associated thermal reference points. Atblock 624, a second response is made to the trigger event. -
FIG. 7 is a flow chart of an exemplary method of thermal management for a device board. Themethod 700 begins atblock 702 where an operating parameter is modified. Atblock 704, a circuit device affected by the modification to the operating parameter is monitored to determine whether the circuit device functions properly. If not, atblock 706, the operating parameter is reverted to a previous value. If the circuit device functions properly with the modified operating parameter, atblock 708, the operating parameter is maintained at the modified value. -
FIG. 8 is a flow chart of an exemplary method of thermal management for a device board. Themethod 800 begins atblock 802 where an operating parameter is modified. Atblock 804, the temperature data is monitored to determine whether the modified operating parameter was successful at resolving the temperature event. If the temperature data indicates that the temperature is no longer critical, atblock 806, the operating parameter is reverted to a previous value. If the temperature remains within a critical window, atblock 808, the operating parameter is maintained in its current state. If the temperature exceeds the critical window, atblock 810, a command is issued to instruct a circuit device to suspend operation. -
FIG. 9 is a diagrammatic perspective view of a device board according to various aspects of the present disclosure.FIG. 9 and all other figures herein are simplified for clarity. In a further embodiment of the system and method for thermal management of a device board, thePCB 108 of thedevice board 100 may include a thermal dissipatinglayer 900 between the insulating layers 110. The thermal dissipatinglayer 900 is used to conduct heat away from active regions of thedevice board 100 and towards one or more radiatingislands 902 disposed on the surface of thePCB 108. Areas of the thermal dissipating layer, such as those in proximity to circuit devices, absorb heat. The heat is conducted along the thermal dissipatinglayer 900 to thethermal vias 904. Thethermal vias 904 transfer heat energy through the insulatinglayers 110 of thePCB 108 and to the radiatingislands 902. In many applications, space around thedevice board 100 is limited. It may not be possible to add heat sinks directly above heat-generating components. By conducting heat through thePCB 108, this structure disperses heat without adding excessive height to thedevice board 100. - The radiating
islands 902 may comprise any suitable radiating material such as copper, tin, silver, aluminum, gold, non-metallic conductors, organic conductors, and/or other suitable heat-transferring materials. In some embodiments, the radiatingislands 902 are configured to conduct heat to the perimeter of thedevice board 100. In further embodiments, the radiatingislands 902 conduct heat to a heat transfer system in the host system such as an airflow region, a heat pipe, or a liquid cooling waterblock. In some embodiments, the radiatingislands 902 may be disposed on the opposite side of the PCB from the circuit devices. -
FIG. 10 is a diagrammatic top view of a thermal dissipating layer according to various aspects of the present disclosure. The thermal dissipatinglayer 900 includes thermallyconductive regions 1000 of a thermally conductive material such as copper, aluminum, tin, silver, gold, non-metallic conductors, and organic conductors. In addition to thermallyconductive regions 1000, the thermal dissipatinglayer 900 may include one or more electrically conductive circuit traces 1002 for routing signals between circuit devices. In embodiments where the thermallyconductive regions 1000 are electrically conductive as well, theregions 1000 may includeholes 1004 to allow signal vias fromtrace layers 112 to pass through the thermal dissipatinglayer 900 without shorting. In some embodiments suitable fordevice boards 100 that generate substantial heat, multiple thermal dissipatinglayers 900 are used. The additional thermal dissipatinglayers 900 are connected withthermal vias 904. This configuration enhances the thermal conductive capacity of the overall structure. - Although the various embodiments are described herein with reference to thermal measurement and thermal stress, it is explicitly contemplated that the technique of: 1) developing a multi-dimensional map representing a type of stress experienced by a system; and 2) utilizing the multi-dimensional map to adaptively respond can be applied to other types of stress. In particular, the same technique may be applied to vibration sensors and vibration stress, bandwidth utilization and communication pressure, circuit monitoring and circuit faults. In each of the embodiments described above, it is possible to substitute the thermal reference point and thermal measurement with a stress reference point and stress measurement and to use the corresponding system stress map to respond.
- In one embodiment, the “transformation” that occurs between thermal reference points, circuit devices, and thermal regions is useful for both analytic and reporting purposes. For example, one embodiment allows an end user to diagnose issues within a server. In some cases, a piece of hardware unrelated to the monitored circuits may result in a particular thermal region that is consistently hot, and that no amount of voltage or clock control serves to mitigate the stress. The transformation from a series of points into a spatially or circuit-correlated thermal map can help users diagnose this “hot spot” as a problem on the motherboard, in the power supply, or elsewhere in the server.
- The present embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. Furthermore, embodiments of the present disclosure can take the form of a computer program product accessible from a tangible computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a tangible computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, a semiconductor system (or apparatus or device), or a propagation medium.
- Thus, the present invention provides a system and method for thermal management for integrated circuits. In one embodiment, the circuit device board includes: a plurality of circuits; a plurality of temperature sensors; a thermal management unit; and a printed circuit board wherein the plurality of temperature sensors are communicatively coupled to the thermal management unit and at least one of the plurality of circuits are controllably coupled to the thermal management unit; and wherein the thermal management unit comprises a thermal monitor unit configured to receive thermal data from the plurality of temperature sensors and to determine a plurality of thermal reference points, the thermal reference points defining a plurality of thermal regions; wherein the thermal monitor unit is further configured to make a first corrective response to modify the conditions in a one or more thermal regions; wherein the thermal monitor unit is further configured to make a second corrective response to modify the conditions in one ore more thermal regions; and wherein the first corrective response and the second corrective response are independent.
- In a further embodiment, the circuit device includes: a power-regulating unit; a clock control unit; a plurality of circuit subunits; a plurality of temperature sensors; and a thermal management unit configured to receive thermal data from the plurality of temperature sensors and to determine a plurality of thermal reference points, the thermal reference points defining a plurality of thermal regions; wherein the thermal management unit is further configured to make a first corrective response to modify the conditions in one or more thermal regions; wherein the thermal management unit is further configured to make a second corrective response to modify the conditions in one or more thermal regions; and wherein the first corrective response and the second corrective response are independent.
- In yet another embodiment, the method of thermal management includes: measuring thermal data at a plurality of points in a circuit device; determining a thermal map from the thermal data, the thermal map comprising a plurality of regions, and wherein the values in the thermal map are correlated with the operating characteristics of the circuit device; identifying a first trigger event for a first thermal region; analyzing current operating parameters of one or more thermal regions; responding to the first trigger event by making a first corrective response; identifying a second trigger event for a second thermal region; and analyzing current operating parameters of one or more thermal regions; responding to the second trigger event by making a second corrective response; wherein the first corrective response and the second corrective response are independent.
- This disclosure presents several embodiments so that those skilled in the art may better understand the features and advantages of the present disclosure. Those skilled in the art will appreciate that the disclosure serves as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art will also appreciate that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that the various changes, substitutions, and alterations remain within the spirit and scope of the present disclosure.
Claims (20)
1. A host system for dynamic thermal adaption, the host system comprising:
a circuit device comprising a plurality of temperature sensors; and
a computer readable medium that stores a plurality of instructions for execution by at least one computer processor, wherein the instructions are for:
receiving data from the plurality of temperature sensors;
determining a temperature at a location separate from each sensor of the plurality of temperature sensors based on the received data; and
modifying an operational parameter of a processing device of the host system based on the determined temperature.
2. The host system of claim 1 , wherein the circuit device includes at least one of: an expansion card, a motherboard, or a daughtercard upon which the plurality of temperature sensors are disposed.
3. The host system of claim 1 , wherein the circuit device includes the processing device disposed thereupon.
4. The host system of claim 1 , wherein the processing device is spaced away from the circuit device.
5. The host system of claim 1 , wherein the location at which the temperature is determined corresponds a location of the processing device.
6. The host system of claim 1 , wherein the modified operational parameter includes at least one of: a voltage parameter, a clock parameter, or a workload parameter.
7. The host system of claim 1 , wherein the temperature is a first temperature, the location is a first location, and the operational parameter is a first operational parameter, and
wherein the computer readable medium includes further instructions for:
determining a second temperature at a second location separate from each sensor of the plurality of temperature sensors based on the received data; and
modifying a second operational parameter based on the second temperature, wherein the first operational parameter and the second operational parameter are independent.
8. The host system of claim 1 , wherein the plurality of temperature sensors includes at least some temperature sensors arrayed substantially in a line upon the circuit device.
9. The host system of claim 1 , wherein the plurality of temperature sensors includes at least some temperature sensors arrayed substantially in a rectangular arrangement upon the circuit device.
10. The host system of claim 1 , wherein the plurality of temperature sensors includes at least some temperature sensors spaced substantially equidistant apart.
11. An apparatus comprising:
a computer readable medium that stores a plurality of instructions for dynamic thermal adaption, wherein the instructions, when executed by one or more computer processors, perform:
obtaining thermal data from a plurality of temperature sensors disposed within a computing system;
determining a thermal map from the obtained thermal data, wherein the thermal map includes a thermal condition at a first location and a thermal condition at a second location different from the first location;
comparing the thermal condition at a first location to a threshold that depends on the thermal condition at the second location; and
making a corrective response to the thermal condition based on the comparison to the threshold.
12. The apparatus of claim 11 ,
wherein the corrective response includes modifying at least one of: a voltage parameter, a clock parameter, or a workload parameter of the computing system.
13. The apparatus of claim 11 , wherein the computing system includes a processing element directly physically coupled to the plurality of temperature sensors by a device board, and wherein at least one of: the first location or the second location corresponds to the processing element.
14. The apparatus of claim 13 , wherein the device board includes at least one of: at least one expansion card, motherboard, or daughtercard upon which the plurality of temperature sensors are disposed.
15. The apparatus of claim 11 , wherein the computing system includes a processing element spaced away from a device board upon which the plurality of temperature sensors are disposed.
16. The apparatus of claim 11 , wherein the thermal map corresponds to locations in a three-dimensional (X-Y-Z) space.
17. A method of thermal adaption, the method comprising:
measuring thermal data conditions at a plurality of locations within a host system using a plurality of sensors disposed on a circuit board;
generating a thermal map of the host system based on the measured thermal data conditions, wherein the generating of the thermal map includes determining an interpolated thermal data condition at a location that is away from each of the plurality of sensors;
identifying a triggering event when the interpolated thermal data condition exceeds a threshold; and
modifying an operational condition of the host system based on the triggering event.
18. The method of claim 17 , wherein the generating of the thermal map accounts for at least one heat source that is physically separated from the circuit board.
19. The method of claim 17 , wherein the generating of the thermal map accounts for at least one heat source that disposed on the circuit board.
20. The method of claim 17 , wherein the threshold depends on a condition at another location that is different from the location of the interpolated thermal data condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/697,388 US20150241887A1 (en) | 2011-02-16 | 2015-04-27 | Thermal Management for Integrated Circuits |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161443430P | 2011-02-16 | 2011-02-16 | |
US201161443394P | 2011-02-16 | 2011-02-16 | |
US13/397,534 US8996192B2 (en) | 2011-02-16 | 2012-02-15 | Thermal management for integrated circuits |
US13/398,686 US9020655B2 (en) | 2011-02-16 | 2012-02-16 | Thermal management for integrated circuits |
US14/697,388 US20150241887A1 (en) | 2011-02-16 | 2015-04-27 | Thermal Management for Integrated Circuits |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/398,686 Continuation US9020655B2 (en) | 2011-02-16 | 2012-02-16 | Thermal management for integrated circuits |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150241887A1 true US20150241887A1 (en) | 2015-08-27 |
Family
ID=46637557
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/398,686 Expired - Fee Related US9020655B2 (en) | 2011-02-16 | 2012-02-16 | Thermal management for integrated circuits |
US14/697,388 Abandoned US20150241887A1 (en) | 2011-02-16 | 2015-04-27 | Thermal Management for Integrated Circuits |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/398,686 Expired - Fee Related US9020655B2 (en) | 2011-02-16 | 2012-02-16 | Thermal management for integrated circuits |
Country Status (1)
Country | Link |
---|---|
US (2) | US9020655B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180004260A1 (en) * | 2016-06-29 | 2018-01-04 | HGST Netherlands B.V. | Thermal aware workload scheduling |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8628236B2 (en) * | 2010-05-02 | 2014-01-14 | Mentor Graphics Corporation | Thermal analysis |
US8856567B2 (en) * | 2012-05-10 | 2014-10-07 | International Business Machines Corporation | Management of thermal condition in a data processing system by dynamic management of thermal loads |
US9304703B1 (en) | 2015-04-15 | 2016-04-05 | Symbolic Io Corporation | Method and apparatus for dense hyper IO digital retention |
US9817728B2 (en) | 2013-02-01 | 2017-11-14 | Symbolic Io Corporation | Fast system state cloning |
US10133636B2 (en) | 2013-03-12 | 2018-11-20 | Formulus Black Corporation | Data storage and retrieval mediation system and methods for using same |
US9720467B2 (en) * | 2013-08-09 | 2017-08-01 | Qualcomm Incorporated | Thermal mitigation adaptation for a mobile electronic device |
CN104700815B (en) * | 2013-12-06 | 2017-07-28 | 神讯电脑(昆山)有限公司 | Brightness controlling device and brightness control method |
US10416737B2 (en) | 2014-11-04 | 2019-09-17 | Qualcomm Incorporated | Thermal mitigation based on predicted temperatures |
US10061514B2 (en) | 2015-04-15 | 2018-08-28 | Formulus Black Corporation | Method and apparatus for dense hyper IO digital retention |
US20170083063A1 (en) * | 2015-09-21 | 2017-03-23 | Qualcomm Incorporated | Circuits and methods providing temperature mitigation for computing devices using in-package sensor |
US9733685B2 (en) | 2015-12-14 | 2017-08-15 | International Business Machines Corporation | Temperature-aware microprocessor voltage management |
US9603251B1 (en) * | 2016-03-09 | 2017-03-21 | Symbolic Io Corporation | Apparatus and method of midplane panel connections |
US10387607B2 (en) * | 2016-08-15 | 2019-08-20 | Cisco Technology, Inc. | Temperature-dependent printed circuit board trace analyzer |
CN106708698A (en) * | 2016-11-29 | 2017-05-24 | 东莞新能源科技有限公司 | Terminal control method and apparatus |
IT201700022534A1 (en) * | 2017-02-28 | 2018-08-28 | St Microelectronics Srl | CIRCUIT WITH THERMAL PROTECTION, EQUIPMENT AND CORRESPONDING PROCEDURE |
US11105689B2 (en) * | 2017-03-09 | 2021-08-31 | Keithley Instruments, Llc | Temperature and heat map system |
JP2019045777A (en) * | 2017-09-06 | 2019-03-22 | セイコーエプソン株式会社 | Electro-optical device, electronic apparatus, and projector |
US10580730B2 (en) | 2017-11-16 | 2020-03-03 | International Business Machines Corporation | Managed integrated circuit power supply distribution |
WO2019126072A1 (en) | 2017-12-18 | 2019-06-27 | Formulus Black Corporation | Random access memory (ram)-based computer systems, devices, and methods |
US10725853B2 (en) | 2019-01-02 | 2020-07-28 | Formulus Black Corporation | Systems and methods for memory failure prevention, management, and mitigation |
TWI760854B (en) * | 2020-09-22 | 2022-04-11 | 瑞昱半導體股份有限公司 | Chip, layout design system, and layout design method |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060047480A1 (en) * | 2004-08-31 | 2006-03-02 | Watlow Electric Manufacturing Company | Method of temperature sensing |
US20070011288A1 (en) * | 2005-05-31 | 2007-01-11 | International Business Machines Corporation | Apparatus and method for achieving thermal management through the allocation of redundant data processing devices |
US20070098137A1 (en) * | 2005-11-03 | 2007-05-03 | General Electric Company | Method of assembly and thermal management of ct detector electronics circuits |
US20080123238A1 (en) * | 2006-08-30 | 2008-05-29 | Freescale Semiconductor, Inc. | Multiple sensor thermal management for electronic devices |
US20080215283A1 (en) * | 2004-07-16 | 2008-09-04 | International Business Machines Corporation | Method and system for real-time estimation and prediction of the thermal state of a microprocessor unit |
US20090064164A1 (en) * | 2007-08-27 | 2009-03-05 | Pradip Bose | Method of virtualization and os-level thermal management and multithreaded processor with virtualization and os-level thermal management |
US20090144568A1 (en) * | 2000-09-27 | 2009-06-04 | Fung Henry T | Apparatus and method for modular dynamically power managed power supply and cooling system for computer systems, server applications, and other electronic devices |
US20090164852A1 (en) * | 2007-12-19 | 2009-06-25 | International Business Machines Corporation | Preemptive Thermal Management For A Computing System Based On Cache Performance |
US20090312887A1 (en) * | 2006-08-22 | 2009-12-17 | Barry Charles F | Apparatus and method for thermal stabilization of pcb-mounted electronic components within an enclosed housing |
US7638874B2 (en) * | 2006-06-23 | 2009-12-29 | Intel Corporation | Microelectronic package including temperature sensor connected to the package substrate and method of forming same |
US20100117579A1 (en) * | 2003-08-15 | 2010-05-13 | Michael Culbert | Methods and apparatuses for operating a data processing system |
US20100169585A1 (en) * | 2008-12-31 | 2010-07-01 | Robin Steinbrecher | Dynamic updating of thresholds in accordance with operating conditons |
US20100281884A1 (en) * | 2009-01-22 | 2010-11-11 | John Myron Rawski | Thermoelectric Management Unit |
US20110301909A1 (en) * | 2010-06-04 | 2011-12-08 | Tyco Electronics Corporation | Temperature measurement system for a light emitting diode (led) assembly |
US20120066439A1 (en) * | 2010-09-09 | 2012-03-15 | Fusion-Io, Inc. | Apparatus, system, and method for managing lifetime of a storage device |
US20120124590A1 (en) * | 2010-11-16 | 2012-05-17 | International Business Machines Corporation | Minimizing airflow using preferential memory allocation |
US8786449B1 (en) * | 2009-12-16 | 2014-07-22 | Applied Micro Circuits Corporation | System-on-chip with thermal management core |
US8996192B2 (en) * | 2011-02-16 | 2015-03-31 | Signalogic, Inc. | Thermal management for integrated circuits |
-
2012
- 2012-02-16 US US13/398,686 patent/US9020655B2/en not_active Expired - Fee Related
-
2015
- 2015-04-27 US US14/697,388 patent/US20150241887A1/en not_active Abandoned
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090144568A1 (en) * | 2000-09-27 | 2009-06-04 | Fung Henry T | Apparatus and method for modular dynamically power managed power supply and cooling system for computer systems, server applications, and other electronic devices |
US20100117579A1 (en) * | 2003-08-15 | 2010-05-13 | Michael Culbert | Methods and apparatuses for operating a data processing system |
US20080215283A1 (en) * | 2004-07-16 | 2008-09-04 | International Business Machines Corporation | Method and system for real-time estimation and prediction of the thermal state of a microprocessor unit |
US20060047480A1 (en) * | 2004-08-31 | 2006-03-02 | Watlow Electric Manufacturing Company | Method of temperature sensing |
US20070011288A1 (en) * | 2005-05-31 | 2007-01-11 | International Business Machines Corporation | Apparatus and method for achieving thermal management through the allocation of redundant data processing devices |
US20070098137A1 (en) * | 2005-11-03 | 2007-05-03 | General Electric Company | Method of assembly and thermal management of ct detector electronics circuits |
US7638874B2 (en) * | 2006-06-23 | 2009-12-29 | Intel Corporation | Microelectronic package including temperature sensor connected to the package substrate and method of forming same |
US20090312887A1 (en) * | 2006-08-22 | 2009-12-17 | Barry Charles F | Apparatus and method for thermal stabilization of pcb-mounted electronic components within an enclosed housing |
US20080123238A1 (en) * | 2006-08-30 | 2008-05-29 | Freescale Semiconductor, Inc. | Multiple sensor thermal management for electronic devices |
US20110096809A1 (en) * | 2006-08-30 | 2011-04-28 | Freescale Semiconductor, Inc. | Multiple sensor thermal management for electronic devices |
US20090064164A1 (en) * | 2007-08-27 | 2009-03-05 | Pradip Bose | Method of virtualization and os-level thermal management and multithreaded processor with virtualization and os-level thermal management |
US20090164852A1 (en) * | 2007-12-19 | 2009-06-25 | International Business Machines Corporation | Preemptive Thermal Management For A Computing System Based On Cache Performance |
US20100169585A1 (en) * | 2008-12-31 | 2010-07-01 | Robin Steinbrecher | Dynamic updating of thresholds in accordance with operating conditons |
US20100281884A1 (en) * | 2009-01-22 | 2010-11-11 | John Myron Rawski | Thermoelectric Management Unit |
US8786449B1 (en) * | 2009-12-16 | 2014-07-22 | Applied Micro Circuits Corporation | System-on-chip with thermal management core |
US20110301909A1 (en) * | 2010-06-04 | 2011-12-08 | Tyco Electronics Corporation | Temperature measurement system for a light emitting diode (led) assembly |
US20120066439A1 (en) * | 2010-09-09 | 2012-03-15 | Fusion-Io, Inc. | Apparatus, system, and method for managing lifetime of a storage device |
US20120124590A1 (en) * | 2010-11-16 | 2012-05-17 | International Business Machines Corporation | Minimizing airflow using preferential memory allocation |
US8996192B2 (en) * | 2011-02-16 | 2015-03-31 | Signalogic, Inc. | Thermal management for integrated circuits |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180004260A1 (en) * | 2016-06-29 | 2018-01-04 | HGST Netherlands B.V. | Thermal aware workload scheduling |
US10528098B2 (en) * | 2016-06-29 | 2020-01-07 | Western Digital Technologies, Inc. | Thermal aware workload scheduling |
Also Published As
Publication number | Publication date |
---|---|
US20120209559A1 (en) | 2012-08-16 |
US9020655B2 (en) | 2015-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9020655B2 (en) | Thermal management for integrated circuits | |
US8996192B2 (en) | Thermal management for integrated circuits | |
US11209886B2 (en) | Clock frequency adjustment for workload changes in integrated circuit devices | |
Memik et al. | Optimizing thermal sensor allocation for microprocessors | |
US9958921B2 (en) | Power management to change power limits based on device skin temperature | |
US20160266629A1 (en) | Changing power limits based on device state | |
US8595525B2 (en) | On-chip thermal management techniques using inter-processor time dependent power density data for indentification of thermal aggressors | |
TW200923632A (en) | Method for equalizing performance of computing components | |
CN103189814A (en) | Method and apparatus for thermal control of processing nodes | |
Zapater et al. | Leakage-aware cooling management for improving server energy efficiency | |
US10560022B2 (en) | Setting operating points for circuits in an integrated circuit chip using an integrated voltage regulator power loss model | |
KR102640309B1 (en) | Voltage regulators for integrated circuit chips | |
TWI497266B (en) | Matrix thermal sensing circuit and heat-dissipation system | |
US9753516B2 (en) | Method, apparatus, and system for energy efficiency and energy conservation by mitigating performance variations between integrated circuit devices | |
Shin et al. | Revealing power, energy and thermal dynamics of a 200pf pre-exascale supercomputer | |
Haghbayan et al. | A power-aware approach for online test scheduling in many-core architectures | |
US20130211752A1 (en) | Software power analysis | |
TWI756358B (en) | Apparatus, method and system for monitoring current | |
Liu et al. | Distributed task migration for thermal hot spot reduction in many-core microprocessors | |
US20140082580A1 (en) | Current-aware floorplanning to overcome current delivery limitations in integrated circuits | |
Agarwal et al. | Redcooper: Hardware sensor enabled variability software testbed for lifetime energy constrained application | |
JP2014021786A (en) | Computer system | |
WO2022166679A1 (en) | Computing core, computing core temperature adjustment method and device, medium, chip, and system | |
Sarood | Optimizing performance under thermal and power constraints for HPC data centers | |
Zhang et al. | On demand cooling with real time thermal information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SIGNALOGIC, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROWER, JEFFREY H.;REEL/FRAME:035505/0565 Effective date: 20120216 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |