US20150348226A1 - Selective gpu throttling - Google Patents
Selective gpu throttling Download PDFInfo
- Publication number
- US20150348226A1 US20150348226A1 US14/503,311 US201414503311A US2015348226A1 US 20150348226 A1 US20150348226 A1 US 20150348226A1 US 201414503311 A US201414503311 A US 201414503311A US 2015348226 A1 US2015348226 A1 US 2015348226A1
- Authority
- US
- United States
- Prior art keywords
- gpu
- thermal
- utilization
- gpu utilization
- priority process
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/20—Cooling means
- G06F1/206—Cooling means comprising thermal management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4893—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues taking into account power or heat criteria
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G5/00—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
- G09G5/36—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
- G09G5/363—Graphics controllers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/483—Multiproc
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G2320/00—Control of display operating conditions
- G09G2320/04—Maintaining the quality of display appearance
- G09G2320/041—Temperature compensation
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G2330/00—Aspects of power supply; Aspects of display protection and defect management
- G09G2330/02—Details of power systems and of start or stop of display operation
- G09G2330/021—Power management, e.g. power saving
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This invention relates generally to device thermal management and more particularly to managing device thermal management by selective throttling of the device graphics processing unit.
- a device can typically include one or more graphics processing units (GPU) that are used to process graphics or general purpose operations for the device.
- GPU graphics processing units
- Each of the GPUs is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display.
- the GPU can be used to transcode video, render graphics for a user interface (UI), video encoding/decoding, OpenCL, etc.
- UI user interface
- OpenCL OpenCL
- Each of these GPU operations will cause the device to consume power that leads to heat being generated by the device. This generated heat can add to the to a thermal load being applied to the device. An excessive thermal load can affect the device performance and, in extreme cases, can lead to a device shutdown.
- Existing devices can mitigate the thermal load by reducing the GPU operating frequency globally for all processes, regardless of whether the GPU operations are for a batch process or a process supporting a user interface operation.
- a method and apparatus of a device that manages a thermal profile of a device by selectively throttling graphics processing unit operations of the device is described.
- the device monitors the thermal profile of the device, where the device executes a plurality of processes that utilizes a graphics processing unit of the device.
- the plurality of processes include a high priority process and a low priority process. If the thermal profile of the device exceeds a thermal threshold, the device decreases a first GPU utilization for the low priority process and maintains a second GPU utilization for the high priority process. The device further executes the low priority process using the first GPU utilization with the GPU and executes the high priority process using the second GPU utilization with the GPU.
- FIG. 1 is a block diagram of one embodiment of a device that mitigates a thermal profile of a device by selectively throttling graphics processing unit operations of the device.
- FIGS. 2A-C are illustrations of graphics processing unit (GPU) throttling tables for different levels of GPU throttling.
- GPU graphics processing unit
- FIG. 3 is a flow diagram of one embodiment of a process to manage GPU throttling based on the thermal data of the device.
- FIG. 4 is a flow diagram of one embodiment of a process to manage GPU execution according to the quality of service of a process.
- FIGS. 5A-B is a block diagram of one embodiment of GPU utilization for high and low priority processes.
- FIG. 6 is a block diagram of one embodiment of a thermal daemon that manages I/GPU throttling based on the thermal data of the device.
- FIG. 7 is a block diagram of one embodiment of a GPU that manages GPU execution according to the quality of service of a process.
- FIG. 8 illustrates one example of a typical computer system, which may be used in conjunction with the embodiments described herein.
- FIG. 9 shows an example of a data processing system, which may be used with one embodiment of the present invention.
- Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
- Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
- processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system or a dedicated machine), or a combination of both.
- processing logic comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system or a dedicated machine), or a combination of both.
- server client
- device is intended to refer generally to data processing systems rather than specifically to a particular form factor for the server, client, and/or device.
- a method and apparatus of a device that manages a thermal profile of a device by selectively throttling graphics processing unit (GPU) operations of the device is described.
- the device selectively throttles the GPU operations by restricting GPU utilization for one, some, or all of the processes, so that the overall GPU utilization is reduced, but that the GPU operations for higher quality of service (QoS) priority process are not affected or are affected less than GPU operations for lower QoS priority processes.
- the device monitors the thermal data of the device. If the thermal data reaches or exceeds one or more thermal thresholds, the device selectively throttles the GPU operations for the different processes. In this embodiment, each process operation has a QoS priority (or “priority”).
- Each priority represents whether the process is an important process that should not be throttled (or throttled under more of a thermal load) or the process can be a less important process that can be throttled under a lesser thermal load.
- a process associated with user interface operation would have a higher priority (e.g., graphic rendering for a UI or graphics visual, encoding/decoding for a video call, encoding for framebuffer transmission to wireless display device(s), and/or other types of high priority processes), whereas a process associated with a batch process (e.g., video transcoding, batch decoding/encoding, background streaming compute workloads, infrequent or non-interactive graphical output, and/or other types of batch-type processes), would have a lower priority.
- the device can include multiple different process priorities. Each of the priorities has an associated GPU utilization.
- the GPU utilization is the allowable GPU resources that each process can use during a GPU execution time slot.
- the device selectively throttles overall device GPU usage by restricting the GPU utilization for the different priorities based on the current thermal load on the device. For example and in one embodiment, if the device thermal load increases, the device can restrict the lowest or lower priority GPU utilizations. This would decrease the GPU usage for these lower priorities, but leave the GPU usage for the higher priorities unchanged. As the thermal load on the device further increases, the device can either increase the GPU throttling of the lower priority processes and/or start to throttle the higher priority processes. In another embodiment, as the thermal load on the device lessens, the device can selectively relax the GPU throttling for the different priority processes.
- FIG. 1 is a block diagram of one embodiment of a device 100 that mitigates a thermal profile of a device by selectively throttling GPU operations of the device.
- the device 100 can be a personal computer, laptop, server, mobile device (e.g., smartphone, laptop, personal digital assistant, music playing device, gaming device, etc.), network element (e.g., router, switch, gateway, etc.), and/or any device capable of executing multiple applications.
- the device 100 can be a physical or virtual device. In FIG.
- the device 100 includes one or more central processing units (CPUs) 102 , graphics processing units 106 , input/output (I/O) subsystem 108 , driver(s) 114 , an operating system 124 , a system management controller (SMC) 118 , sensors 120 , and platform plug-ins 116 .
- the CPU 102 is a general-purpose processing device such as a microprocessor or another type of processor and is coupled to the operating system 124 . More particularly, the CPU 102 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets.
- CISC complex instruction set computing
- RISC reduced instruction set computing
- VLIW very long instruction word
- the central processing unit (CPU) 102 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like.
- the CPU 102 can include one or more CPUs and each of the CPUs can include one or more processing cores.
- one or more of the CPUs 102 can include integrated graphics 104 , in which the integrated graphics 104 is a graphics processing unit that shares memory with memory for the CPUs 102 . While in one embodiment, the integrated graphics 104 is part of a corresponding CPU 102 , in alternate embodiments, the integrated graphics is on the same motherboard as the corresponding CPU 102 .
- the device can have one or more GPUs 106 , an integrated graphics 104 , and/or a combination thereof.
- a GPU 104 is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display and is coupled to the operating system.
- the GPU 104 can be used for other general purpose computing, such as general purpose computing on graphics processing unit.
- the general purpose computing on graphics processing unit is the utilization of a graphics processing unit to perform computations in applications traditionally handled by a central processing unit.
- a graphics processing unit can be used for stream processing that is performing the same or similar operation on a set of records (e.g., vector processing, texture processing, or another type of data structure).
- the GPU 104 can include one or more GPUs and each of the GPUs can include one or more graphic processing cores.
- the I/O subsystem 108 includes a storage controller and storage that is used to store data for the device.
- the operating system 124 is a set of software used to manage device hardware resources and provides common services for other running computer programs, such as application programs.
- the system management controller 118 is a subsystem that controls the device power flow and fan speed. In this embodiment, the system management controller 118 couples to the operating system 124 , the sensors 120 , and hardware 122 .
- the sensors 106 include sensor(s) that monitor and record data regarding the thermal profile of the device 100 . In this embodiment, the thermal profile is data about the thermal characteristics of the device 100 .
- the thermal profile can include the device 100 temperature over time, device 100 module temperature over time (e.g., storage temperature, CPU die temperature, bottom case temperature of the device enclosure, fan speed, and/or other data related to the thermal characteristics of the device 100 .
- the sensors 120 are one or more sensors that measure the thermal characteristics of the device 100 .
- the sensors 120 can include a sensor for the device temperature, sensor for the I/O subsystem 108 , fan speed sensor, virtual sensors (e.g., values derived from other sensors being read and other thermal models).
- the driver(s) 114 are a software layer that translates high level commands coming from application or framework layers into commands that can be interpreted by the module it controls
- media drivers control the media functions on GPU
- 3D drivers controls the 3D functions of the GPU.
- the driver(s) 114 impact the behavior of the GPU, so GPU throttling can be performed in this layer.
- the driver(s) 114 include a module to perform GPU throttling as described in the FIGS. 4 and 7 below.
- the platform plug-ins 116 are modules that contain platform/device specific tuning parameters.
- the operating system 124 adjusts the operation of the GPU 106 to mitigate the thermal profile of the device 100 .
- the operating system 124 includes a thermal daemon (thermald) 110 and kernel 112 .
- thermald 110 is daemon that selectively throttles the GPU operations of one or more running processes in order to mitigate the thermal environment of the device 100 .
- thermald 110 receives the thermal data of the thermal profile and determines if the thermal data has crossed one of one or more thermal thresholds.
- the device can be configured for several different thermal thresholds, with each thermal threshold having different GPU throttling levels. In this embodiment, crossing a thermal threshold can mean that thermald 110 adjusts a set of GPU utilization values for different priority level processes.
- the priority utilization values are used by the GPU to schedule a GPU execution for a time slice of a process according to the priority of that operation.
- Each process has an associated priority that indicates the relative importance of how much of the GPU is to be utilized when that process is executed by the GPU. Processes with a higher priority are more likely to receive a higher GPU utilization than a lower priority process.
- there can be a plurality of different priorities e.g., two or more different priorities.
- each of the priorities will have a high GPU utilization (e.g. near or at 100%).
- thermald 110 adjusts the priority GPU utilization of one or more of the different process priorities.
- a thermal load on the device can increase because the power consumption of the device or one or more components of the device (e.g., the GPU 106 , CPU 102 , I/O 108 , etc.) increases.
- thermald 110 selectively decreases the GPU utilization for the lower priority processes before decreasing the higher priority processes, so that GPU executions for lower priority processes are throttled before the higher priority processes.
- the GPU execution for the higher priority processes are not throttled, but the overall GPU usage decreases, thus decreasing the power consumption of the storage system for the device 100 , and decreasing the heat generated by the device, and reducing the thermal load on the device 100 .
- thermald 110 can either further throttle the lower priority processes or start to throttle the higher priority processes.
- thermald 110 throttles both the lower and higher priority processes.
- thermald 110 lessens or removes the throttling of the lower and/or higher levels by relaxing the constraints placed on the different priority processes.
- thermald 110 restores the priority of one, some, or all processes to normal (e.g., 100% GPU utilization).
- the GPU throttling can occur by throttling either one, some or all of the GPU(s) 106 , the integrated graphics 104 , or both. Managing the GPU executions for different priority processes is further described in FIG. 4 below.
- the kernel 112 that is a basic component of the operating system 102 and provides a level of abstraction for the device resources (e.g., processor, input/output systems, network resources, etc.).
- FIGS. 2A-C are illustrations of graphics processing unit (GPU) throttling tables 200 A-C for different levels of GPU throttling.
- each of the GPU throttling tables 200 A-C represent GPU utilization percentages for low to high QoS priorities.
- FIG. 2A illustrates a GPU throttling table 200 A where there is no GPU throttling.
- the GPU throttling table 200 A illustrates low to high QoS priority processes on the y-axis and 0-100% GPU utilization on the x-axis. For no GPU throttling, the low and high QoS priority processes will each have 100% GPU utilization for each time the GPU executes for that process.
- full GPU utilization means that the GPU executes at the highest frequency the GPU can operate at. For example and in one embodiment, if a GPU can operate at frequencies up to 800 MHz, the GPU will execute the process at 800 MHz for all processes. In this embodiment, full GPU utilization also means that the GPU operates with a higher power and thermal load on the device.
- the device may throttle some of the lower priority processes for GPU execution.
- the GPU throttling table 200 B has reduced GPU utilization for a lower priority processes.
- processes with a high QOS e.g., processes with priority 0 or 1 will continue to have a GPU utilization of 100%.
- Processes with a lower QoS priority e.g., processes with priority 2 or 3 will have a GPU utilization that is less than 100%.
- a process for the priority of two will have a GPU utilization of 75% and a process with a priority of three will have a GPU utilization of 50%.
- a GPU can operate at frequencies up to 800 MHz
- the GPU will execute at 800 MHz for processes with priority 0 or 1, 600 MHz for a process of priority 2, and 400 MHz for a process of priority 3.
- the overall GPU utilization of the device is decreased thus decreasing the overall thermal load on the device.
- high priority processes to operate with full GPU utilization thus allowing important processes to execute normal.
- the lower priorities will operate with less GPU utilization.
- the device may further throttle the GPU, so that the GPU throttling affects both lower and higher priority processes.
- the GPU throttling table 200 C has reduced GPU utilization for both the lower and higher priority processes. For example and in one embodiment, a process with a priority of 0 or 1 will have a GPU utilization of 75%. A process with a priority of 2 will have a GPU utilization of 50%. A process with the priority of three, the lowest priority process, has a GPU utilization of 25%, which is the lowest GPU utilization. In this embodiment, the higher priority processes are partially throttled for GPU utilization, while the lower priority processes are more severely throttled. For example and in one embodiment, if a GPU can operate at frequencies up to 800 MHz, the GPU will execute at 600 MHz for processes with priority 0 or 1, 400 MHz for a process of priority 2, and 200 MHz for a process of priority 3.
- FIG. 3 is a flow diagram of one embodiment of a process 300 to manage GPU throttling based on the thermal data of the device.
- process 300 is performed by a thermal daemon to manage I/O throttling, such as thermald 112 as described above in FIG. 1 .
- process 300 begins by receiving the thermal data at block 302 .
- the thermal data is the data related to the thermal profile or other thermal characteristics of the device.
- the thermal data can be time-dependent thermal data regarding the device temperature, temperature of a particular module of the device, data regarding fan use, and/or other data related to the thermal characteristics of the device.
- process 300 determines if the thermal data is greater than a higher threshold.
- the higher threshold is a threshold that indicates that the device can have greater GPU throttling so as to mitigate the thermal load that is on the device.
- the thermal threshold can be related to the temperature of the device, a module of the device, fan speed, or some other thermal characteristic.
- a set of higher thermal threshold could be if the device temperature exceeded 40° C., 45° C., 50° C., etc.
- the thermal data may exceed more than one threshold.
- the thermal threshold can be based on time of day or user activity.
- process 300 may choose to throttle low QoS tasks to leave “thermal headroom” for the task users is likely to perform. This is especially important for devices that do not have fans to actively dissipate heat.
- process 300 adjusts the GPU throttling table to increase the GPU throttling at block 308 .
- process 300 adjusts the GPU throttling table by throttling one or more of the different GPU utilizations for one, some, or all of the process priorities.
- process 300 can start to throttle a priority or further throttle an already throttled priority level.
- process 300 throttles a priority by restricting the GPU utilization for GPU execution of that priority.
- process 300 can throttle a priority by restricting a 100% GPU execution down to 80% as described in FIG. 2 above.
- process 300 can throttle a priority that is restricted from a 50% GPU utilization down to 25%.
- process 300 selectively throttles the different priorities, thus allowing greater GPU utilization for higher priorities and lower GPU utilization for lower priorities. This allows for less power consumption of the GPU for the device, while having GPU utilization for higher priority processes at the expense of lower GPU utilization for lower priority processes.
- a lower power consumption of the storage system can help mitigate the thermal load on the device.
- process 300 determines if the thermal data is less than a lower threshold at block 306 . In one embodiment, if the thermal data is less than a lower threshold, process 300 may relax the GPU throttling as the thermal load on the device may be lessening. If the thermal data is less than a lower threshold, at block 310 , process 300 adjust the GPU utilization to decrease the GPU throttling. In one embodiment, process 300 relaxes the restrictions placed on the I/O throughput for one or more of the different priorities. For example and in one embodiment, process 300 can relax a priority with an 80% GPU utilization back to an unrestricted 100% GPU utilization.
- process 300 can relax a restricted priority at a 25% GPU utilization to a less restricted 50% GPU utilization. If the thermal data is not less than the lower threshold, process 300 maintains the current GPU throttling at block 312 . Execution proceeds to block 312 above.
- FIG. 4 is a flow diagram of one embodiment of a process 400 to manage GPU execution according to the quality of service of a process.
- process 400 is performed by a GPU processing module 700 that is part of the driver, such as GPU 106 or integrated graphics 104 as described in FIG. 1 above.
- process 400 begins by selecting a process for GPU execution for a current time slot at block 402 .
- the GPU schedules a process for execution in different time slot, so that the GPU can execute multiple processes.
- a GPU will execute a task of the process and, possibly, switch to another process for another time slot.
- a scheduler keeps track of the GPU performance by using a ring buffer for each time slot.
- the scheduler allows for preempting tasks of one or more processes in flight because the tasks are put in multiple priority based queues.
- the highest priority process is selected for execution by the GPU.
- the scheduler looks at the GPU utilization and determines if the GPU utilization has expired.
- the process 400 schedules the lower priority processes if there is room for the GPU to perform the work.
- there is a starvation mechanism implemented via a software scheduler to prevent the lower priority processes form starving. With different job queues, corresponding to different priority process, it is possible to hold off the scheduling of lower priority queue and lower the amount of work submitted on the GPU.
- process 400 executes the process using the GPU and according to the GPU utilization for the QoS priority of that process.
- each process will have a QoS priority that is used to determine the GPU utilization.
- the GPU utilization is the GPU utilization that the GPU operate at while executing the process for the time slot.
- the GPU utilization is a percentage of the GPU maximum or normal operating frequency.
- the GPU utilization is an average of GPU utilization.
- process 400 can execute the process at a high GPU utilization (e.g. at or near 100%) in one or more time slots and execute the process at a low GPU utilization in another time slot.
- process 400 could execute the process with a GPU utilization for three out four time slots and have the GPU idle (e.g. 0% utilization) for one out of four slots.
- having the GPU idle for a time slot may create conditions that allows for hardware components to be put into a low power state, thus allowing the device to cool down quicker.
- Process 400 advances to the next time slot at block 406 . Execution proceeds to block 402 above.
- FIGS. 5A-B are block diagrams of embodiments of GPU utilization 500 A-B for high and low priority processes.
- the GPU executes high and low priority processes with different GPU utilizations for different time slots.
- the high priority process is executed with a 100% GPU utilization (e.g., high priority process time slots 502 A-D) and the low priority process is executed with less than 100% GPU utilization (e.g., low priority process time slots 504 A-C).
- the low priority process is executed with the same low GPU utilization for each time slot.
- the low priority process is executed with different GPU utilizations for different time slots.
- time slot 554 A the GPU utilization is 100%, whereas the GPU is idle for time slots 554 B-C.
- the high priority process is executed with 100% GPU utilization for time slots 552 A-D.
- FIGS. 5A-B illustrate there being more time slots allocated for a high priority process, in alternate embodiments, lower priority processes may have the same or greater number of time slot allocated than for higher priority processes.
- FIG. 6 is a block diagram of one embodiment of a thermal daemon, thermald, 110 that manages GPU throttling based on the thermal data of the device.
- thermald 110 includes a GPU throttling module 600 that determines whether to selectively apply or relax a GPU throttle to a priority.
- the GPU throttling module 600 includes a receive thermal data module 602 , compare higher thermal threshold module 604 , compare lower thermal threshold 606 , increase GPU throttling module 608 , and decrease GPU throttling module 610 .
- the receive thermal data module 602 receives the thermal data as described in FIG. 3 , block 302 above.
- the compare higher thermal threshold module 604 compares the thermal data with a higher thermal threshold as described in FIG. 3 , block 304 above.
- the compare lower thermal threshold 606 compares the thermal data with a lower thermal threshold as described in FIG. 3 , block 306 above.
- the increase GPU throttling module 608 increases the GPU throttling for one or more processes as described in FIG. 3 , block 308 above.
- the decrease GPU throttling module 610 decreases the GPU throttling for one or more processes as described in FIG. 3 , block 310 above.
- FIG. 7 is a block diagram of one embodiment of a GPU 106 that manages GPU execution according to the quality of service of a process.
- the GPU 106 includes a GPU processing module 700 that applies the corresponding GPU utilization for a process.
- the GPU processing module 700 includes a select module 702 and an execute module 704 .
- the select module 702 selects a process to execute by the GPU and the execute module 704 executes that process with the GPU according to the GPU utilization as described above in FIG. 4 , blocks 402 and 404 , respectively.
- an integrated graphics 104 can also include a GPU processing module 700 .
- FIG. 8 shows one example of a data processing system 800 , which may be used with one embodiment of the present invention.
- the system 800 may be implemented including a device 100 as shown in FIG. 1 .
- FIG. 8 illustrates various components of a computer system, it is not intended to represent any particular architecture or manner of interconnecting the components as such details are not germane to the present invention. It will also be appreciated that network computers and other data processing systems or other consumer electronic devices, which have fewer components or perhaps more components, may also be used with the present invention.
- the computer system 800 which is a form of a data processing system, includes a bus 803 which is coupled to a microprocessor(s) 805 and a ROM (Read Only Memory) 807 and volatile RAM 809 and a non-volatile memory 811 .
- the microprocessor 805 may retrieve the instructions from the memories 807 , 809 , 811 and execute the instructions to perform operations described above.
- the bus 803 interconnects these various components together and also interconnects these components 805 , 807 , 809 , and 811 to a display controller and display device 88 and to peripheral devices such as input/output (I/O) devices which may be mice, keyboards, modems, network interfaces, printers and other devices which are well known in the art.
- I/O input/output
- the input/output devices 815 are coupled to the system through input/output controllers 813 .
- the volatile RAM (Random Access Memory) 809 is typically implemented as dynamic RAM (DRAM), which requires power continually in order to refresh or maintain the data in the memory.
- DRAM dynamic RAM
- the mass storage 811 is typically a magnetic hard drive or a magnetic optical drive or an optical drive or a DVD ROM or a flash memory or other types of memory systems, which maintain data (e.g. large amounts of data) even after power is removed from the system.
- the mass storage 811 will also be a random access memory although this is not required. While FIG. 8 shows that the mass storage 811 is a local device coupled directly to the rest of the components in the data processing system, it will be appreciated that the present invention may utilize a non-volatile memory which is remote from the system, such as a network storage device which is coupled to the data processing system through a network interface such as a modem, an Ethernet interface or a wireless network.
- the bus 803 may include one or more buses connected to each other through various bridges, controllers and/or adapters as is well known in the art.
- FIG. 9 shows an example of another data processing system 900 which may be used with one embodiment of the present invention.
- system 900 may be implemented as a device 100 as shown in FIG. 1 .
- the data processing system 900 shown in FIG. 9 includes a processing system 911 , which may be one or more microprocessors, or which may be a system on a chip integrated circuit, and the system also includes memory 901 for storing data and programs for execution by the processing system.
- the system 900 also includes an audio input/output subsystem 905 , which may include a microphone and a speaker for, for example, playing back music or providing telephone functionality through the speaker and microphone.
- a display controller and display device 909 provide a visual user interface for the user; this digital interface may include a graphical user interface which is similar to that shown on a Macintosh computer when running OS X operating system software, or Apple iPhone when running the iOS operating system, etc.
- the system 900 also includes one or more wireless transceivers 903 to communicate with another data processing system, such as the system 900 of FIG. 9 .
- a wireless transceiver may be a WLAN transceiver, an infrared transceiver, a Bluetooth transceiver, and/or a wireless cellular telephony transceiver. It will be appreciated that additional components, not shown, may also be part of the system 900 in certain embodiments, and in certain embodiments fewer components than shown in FIG. 9 may also be used in a data processing system.
- the system 900 further includes one or more communications ports 917 to communicate with another data processing system, such as the system 800 of FIG. 8 .
- the communications port may be a USB port, Firewire
- the data processing system 900 also includes one or more input devices 913 , which are provided to allow a user to provide input to the system. These input devices may be a keypad or a keyboard or a touch panel or a multi touch panel.
- the data processing system 900 also includes an optional input/output device 915 which may be a connector for a dock. It will be appreciated that one or more buses, not shown, may be used to interconnect the various components as is well known in the art.
- the data processing system 900 may be a network computer or an embedded processing device within another device, or other types of data processing systems, which have fewer components or perhaps more components than that shown in FIG. 9 .
- At least certain embodiments of the inventions may be part of a digital media player, such as a portable music and/or video media player, which may include a media processing system to present the media, a storage device to store the media and may further include a radio frequency (RF) transceiver (e.g., an RF transceiver for a cellular telephone) coupled with an antenna system and the media processing system.
- RF radio frequency
- media stored on a remote storage device may be transmitted to the media player through the RF transceiver.
- the media may be, for example, one or more of music or other audio, still pictures, or motion pictures.
- the portable media player may include a media selection device, such as a click wheel input device on an iPod® or iPod Nano® media player from Apple, Inc. of Cupertino, Calif., a touch screen input device, pushbutton device, movable pointing input device or other input device.
- the media selection device may be used to select the media stored on the storage device and/or the remote storage device.
- the portable media player may, in at least certain embodiments, include a display device which is coupled to the media processing system to display titles or other indicators of media being selected through the input device and being presented, either through a speaker or earphone(s), or on the display device, or on both display device and a speaker or earphone(s). Examples of a portable media player are described in published U.S. Pat. No. 7,345,671 and U.S. published patent number 2004/0224638, both of which are incorporated herein by reference.
- Portions of what was described above may be implemented with logic circuitry such as a dedicated logic circuit or with a microcontroller or other form of processing core that executes program code instructions.
- logic circuitry such as a dedicated logic circuit or with a microcontroller or other form of processing core that executes program code instructions.
- program code such as machine-executable instructions that cause a machine that executes these instructions to perform certain functions.
- a “machine” may be a machine that converts intermediate form (or “abstract”) instructions into processor specific instructions (e.g., an abstract execution environment such as a “virtual machine” (e.g., a Java Virtual Machine), an interpreter, a Common Language Runtime, a high-level language virtual machine, etc.), and/or, electronic circuitry disposed on a semiconductor chip (e.g., “logic circuitry” implemented with transistors) designed to execute instructions such as a general-purpose processor and/or a special-purpose processor. Processes taught by the discussion above may also be performed by (in the alternative to a machine or in combination with a machine) electronic circuitry designed to perform the processes (or a portion thereof) without the execution of program code.
- processor specific instructions e.g., an abstract execution environment such as a “virtual machine” (e.g., a Java Virtual Machine), an interpreter, a Common Language Runtime, a high-level language virtual machine, etc.
- the present invention also relates to an apparatus for performing the operations described herein.
- This apparatus may be specially constructed for the required purpose, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), RAMs, EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- a machine readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer).
- a machine readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; etc.
- An article of manufacture may be used to store program code.
- An article of manufacture that stores program code may be embodied as, but is not limited to, one or more memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions.
- Program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)).
Abstract
Description
- Applicant claims the benefit of priority of prior, co-pending provisional application Ser. No. 62/006,009 filed May 30, 2014, the entirety of which is incorporated by reference.
- This invention relates generally to device thermal management and more particularly to managing device thermal management by selective throttling of the device graphics processing unit.
- A device can typically include one or more graphics processing units (GPU) that are used to process graphics or general purpose operations for the device. Each of the GPUs is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. For example, the GPU can be used to transcode video, render graphics for a user interface (UI), video encoding/decoding, OpenCL, etc.
- Each of these GPU operations will cause the device to consume power that leads to heat being generated by the device. This generated heat can add to the to a thermal load being applied to the device. An excessive thermal load can affect the device performance and, in extreme cases, can lead to a device shutdown. Existing devices can mitigate the thermal load by reducing the GPU operating frequency globally for all processes, regardless of whether the GPU operations are for a batch process or a process supporting a user interface operation.
- A method and apparatus of a device that manages a thermal profile of a device by selectively throttling graphics processing unit operations of the device is described. In an exemplary embodiment, the device monitors the thermal profile of the device, where the device executes a plurality of processes that utilizes a graphics processing unit of the device. In addition, the plurality of processes include a high priority process and a low priority process. If the thermal profile of the device exceeds a thermal threshold, the device decreases a first GPU utilization for the low priority process and maintains a second GPU utilization for the high priority process. The device further executes the low priority process using the first GPU utilization with the GPU and executes the high priority process using the second GPU utilization with the GPU.
- Other methods and apparatuses are also described.
- The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.
-
FIG. 1 is a block diagram of one embodiment of a device that mitigates a thermal profile of a device by selectively throttling graphics processing unit operations of the device. -
FIGS. 2A-C are illustrations of graphics processing unit (GPU) throttling tables for different levels of GPU throttling. -
FIG. 3 is a flow diagram of one embodiment of a process to manage GPU throttling based on the thermal data of the device. -
FIG. 4 is a flow diagram of one embodiment of a process to manage GPU execution according to the quality of service of a process. -
FIGS. 5A-B is a block diagram of one embodiment of GPU utilization for high and low priority processes. -
FIG. 6 is a block diagram of one embodiment of a thermal daemon that manages I/GPU throttling based on the thermal data of the device. -
FIG. 7 is a block diagram of one embodiment of a GPU that manages GPU execution according to the quality of service of a process. -
FIG. 8 illustrates one example of a typical computer system, which may be used in conjunction with the embodiments described herein. -
FIG. 9 shows an example of a data processing system, which may be used with one embodiment of the present invention. - A method and apparatus of a device that manages a thermal profile of a device by selectively throttling graphics processing unit operations of the device is described. In the following description, numerous specific details are set forth to provide thorough explanation of embodiments of the present invention. It will be apparent, however, to one skilled in the art, that embodiments of the present invention may be practiced without these specific details. In other instances, well-known components, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description.
- Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification do not necessarily all refer to the same embodiment.
- In the following description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. “Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. “Connected” is used to indicate the establishment of communication between two or more elements that are coupled with each other.
- The processes depicted in the figures that follow, are performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose computer system or a dedicated machine), or a combination of both. Although the processes are described below in terms of some sequential operations, it should be appreciated that some of the operations described may be performed in different order. Moreover, some operations may be performed in parallel rather than sequentially.
- The terms “server,” “client,” and “device” are intended to refer generally to data processing systems rather than specifically to a particular form factor for the server, client, and/or device.
- A method and apparatus of a device that manages a thermal profile of a device by selectively throttling graphics processing unit (GPU) operations of the device is described. In one embodiment, the device selectively throttles the GPU operations by restricting GPU utilization for one, some, or all of the processes, so that the overall GPU utilization is reduced, but that the GPU operations for higher quality of service (QoS) priority process are not affected or are affected less than GPU operations for lower QoS priority processes. In one embodiment, the device monitors the thermal data of the device. If the thermal data reaches or exceeds one or more thermal thresholds, the device selectively throttles the GPU operations for the different processes. In this embodiment, each process operation has a QoS priority (or “priority”). Each priority represents whether the process is an important process that should not be throttled (or throttled under more of a thermal load) or the process can be a less important process that can be throttled under a lesser thermal load. For example and in one embodiment, a process associated with user interface operation would have a higher priority (e.g., graphic rendering for a UI or graphics visual, encoding/decoding for a video call, encoding for framebuffer transmission to wireless display device(s), and/or other types of high priority processes), whereas a process associated with a batch process (e.g., video transcoding, batch decoding/encoding, background streaming compute workloads, infrequent or non-interactive graphical output, and/or other types of batch-type processes), would have a lower priority. The device can include multiple different process priorities. Each of the priorities has an associated GPU utilization. The GPU utilization is the allowable GPU resources that each process can use during a GPU execution time slot. The device selectively throttles overall device GPU usage by restricting the GPU utilization for the different priorities based on the current thermal load on the device. For example and in one embodiment, if the device thermal load increases, the device can restrict the lowest or lower priority GPU utilizations. This would decrease the GPU usage for these lower priorities, but leave the GPU usage for the higher priorities unchanged. As the thermal load on the device further increases, the device can either increase the GPU throttling of the lower priority processes and/or start to throttle the higher priority processes. In another embodiment, as the thermal load on the device lessens, the device can selectively relax the GPU throttling for the different priority processes.
-
FIG. 1 is a block diagram of one embodiment of adevice 100 that mitigates a thermal profile of a device by selectively throttling GPU operations of the device. In one embodiment, thedevice 100 can be a personal computer, laptop, server, mobile device (e.g., smartphone, laptop, personal digital assistant, music playing device, gaming device, etc.), network element (e.g., router, switch, gateway, etc.), and/or any device capable of executing multiple applications. In one embodiment, thedevice 100 can be a physical or virtual device. InFIG. 1 , thedevice 100 includes one or more central processing units (CPUs) 102,graphics processing units 106, input/output (I/O)subsystem 108, driver(s) 114, anoperating system 124, a system management controller (SMC) 118,sensors 120, and platform plug-ins 116. In one embodiment, theCPU 102 is a general-purpose processing device such as a microprocessor or another type of processor and is coupled to theoperating system 124. More particularly, theCPU 102 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. The central processing unit (CPU) 102 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In one embodiment, theCPU 102 can include one or more CPUs and each of the CPUs can include one or more processing cores. In one embodiment, one or more of theCPUs 102 can includeintegrated graphics 104, in which theintegrated graphics 104 is a graphics processing unit that shares memory with memory for theCPUs 102. While in one embodiment, theintegrated graphics 104 is part of acorresponding CPU 102, in alternate embodiments, the integrated graphics is on the same motherboard as the correspondingCPU 102. In one embodiment, the device can have one ormore GPUs 106, anintegrated graphics 104, and/or a combination thereof. - In one embodiment, a
GPU 104 is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display and is coupled to the operating system. In another embodiment, theGPU 104 can be used for other general purpose computing, such as general purpose computing on graphics processing unit. In this embodiment, the general purpose computing on graphics processing unit is the utilization of a graphics processing unit to perform computations in applications traditionally handled by a central processing unit. For example and in one embodiment, a graphics processing unit can be used for stream processing that is performing the same or similar operation on a set of records (e.g., vector processing, texture processing, or another type of data structure). In one embodiment, theGPU 104 can include one or more GPUs and each of the GPUs can include one or more graphic processing cores. - In one embodiment, the I/
O subsystem 108 includes a storage controller and storage that is used to store data for the device. In one embodiment, theoperating system 124 is a set of software used to manage device hardware resources and provides common services for other running computer programs, such as application programs. In one embodiment, thesystem management controller 118 is a subsystem that controls the device power flow and fan speed. In this embodiment, thesystem management controller 118 couples to theoperating system 124, thesensors 120, andhardware 122. In one embodiment, thesensors 106 include sensor(s) that monitor and record data regarding the thermal profile of thedevice 100. In this embodiment, the thermal profile is data about the thermal characteristics of thedevice 100. For example and in one embodiment, the thermal profile can include thedevice 100 temperature over time,device 100 module temperature over time (e.g., storage temperature, CPU die temperature, bottom case temperature of the device enclosure, fan speed, and/or other data related to the thermal characteristics of thedevice 100. In one embodiment, thesensors 120 are one or more sensors that measure the thermal characteristics of thedevice 100. For example and in one embodiment, thesensors 120 can include a sensor for the device temperature, sensor for the I/O subsystem 108, fan speed sensor, virtual sensors (e.g., values derived from other sensors being read and other thermal models). In one embodiment, the driver(s) 114 are a software layer that translates high level commands coming from application or framework layers into commands that can be interpreted by the module it controls For example and in one embodiment, media drivers control the media functions on GPU, 3D drivers controls the 3D functions of the GPU. In one embodiment, the driver(s) 114 impact the behavior of the GPU, so GPU throttling can be performed in this layer. In one embodiment, the driver(s) 114 include a module to perform GPU throttling as described in theFIGS. 4 and 7 below. In one embodiment, the platform plug-ins 116 are modules that contain platform/device specific tuning parameters. - In one embodiment, the
operating system 124 adjusts the operation of theGPU 106 to mitigate the thermal profile of thedevice 100. In this embodiment, theoperating system 124 includes a thermal daemon (thermald) 110 andkernel 112. In this embodiment,thermald 110 is daemon that selectively throttles the GPU operations of one or more running processes in order to mitigate the thermal environment of thedevice 100. In one embodiment,thermald 110 receives the thermal data of the thermal profile and determines if the thermal data has crossed one of one or more thermal thresholds. In one embodiment, the device can be configured for several different thermal thresholds, with each thermal threshold having different GPU throttling levels. In this embodiment, crossing a thermal threshold can mean thatthermald 110 adjusts a set of GPU utilization values for different priority level processes. In this embodiment, the priority utilization values are used by the GPU to schedule a GPU execution for a time slice of a process according to the priority of that operation. Each process has an associated priority that indicates the relative importance of how much of the GPU is to be utilized when that process is executed by the GPU. Processes with a higher priority are more likely to receive a higher GPU utilization than a lower priority process. In one embodiment, there can be a plurality of different priorities (e.g., two or more different priorities). In one embodiment, under conditions of a low thermal load of thedevice 100, each of the priorities will have a high GPU utilization (e.g. near or at 100%). As the thermal load on thedevice 100 increases,thermald 110 adjusts the priority GPU utilization of one or more of the different process priorities. In one embodiment, a thermal load on the device can increase because the power consumption of the device or one or more components of the device (e.g., theGPU 106,CPU 102, I/O 108, etc.) increases. - In one embodiment,
thermald 110 selectively decreases the GPU utilization for the lower priority processes before decreasing the higher priority processes, so that GPU executions for lower priority processes are throttled before the higher priority processes. By selectively throttling the lower priority process GPU executions, the GPU execution for the higher priority processes are not throttled, but the overall GPU usage decreases, thus decreasing the power consumption of the storage system for thedevice 100, and decreasing the heat generated by the device, and reducing the thermal load on thedevice 100. If the thermal load on thedevice 100 continues to increase,thermald 110 can either further throttle the lower priority processes or start to throttle the higher priority processes. In one embodiment,thermald 110 throttles both the lower and higher priority processes. In another embodiment, as the thermal load of the device decreases,thermald 110 lessens or removes the throttling of the lower and/or higher levels by relaxing the constraints placed on the different priority processes. In this embodiment, if the thermal load of the device becomes low,thermald 110 restores the priority of one, some, or all processes to normal (e.g., 100% GPU utilization). The GPU throttling can occur by throttling either one, some or all of the GPU(s) 106, theintegrated graphics 104, or both. Managing the GPU executions for different priority processes is further described inFIG. 4 below. In one embodiment, thekernel 112 that is a basic component of theoperating system 102 and provides a level of abstraction for the device resources (e.g., processor, input/output systems, network resources, etc.). -
FIGS. 2A-C are illustrations of graphics processing unit (GPU) throttling tables 200A-C for different levels of GPU throttling. In one embodiment, each of the GPU throttling tables 200A-C represent GPU utilization percentages for low to high QoS priorities. For example in one embodiment,FIG. 2A illustrates a GPU throttling table 200A where there is no GPU throttling. The GPU throttling table 200A illustrates low to high QoS priority processes on the y-axis and 0-100% GPU utilization on the x-axis. For no GPU throttling, the low and high QoS priority processes will each have 100% GPU utilization for each time the GPU executes for that process. This means, one of process is being executed by the GPU, the GPU executes at full utilization. In one embodiment, full GPU utilization means that the GPU executes at the highest frequency the GPU can operate at. For example and in one embodiment, if a GPU can operate at frequencies up to 800 MHz, the GPU will execute the process at 800 MHz for all processes. In this embodiment, full GPU utilization also means that the GPU operates with a higher power and thermal load on the device. - As the thermal load on the device increases, the device may throttle some of the lower priority processes for GPU execution. As illustrated in
FIG. 2B , the GPU throttling table 200B has reduced GPU utilization for a lower priority processes. For example in one embodiment, processes with a high QOS (e.g., processes withpriority 0 or 1) will continue to have a GPU utilization of 100%. Processes with a lower QoS priority (e.g., processes withpriority 2 or 3) will have a GPU utilization that is less than 100%. As illustrated inFIG. 2B , a process for the priority of two will have a GPU utilization of 75% and a process with a priority of three will have a GPU utilization of 50%. For example and in one embodiment, if a GPU can operate at frequencies up to 800 MHz, the GPU will execute at 800 MHz for processes withpriority priority priority 3. In this example, the overall GPU utilization of the device is decreased thus decreasing the overall thermal load on the device. Furthermore, high priority processes to operate with full GPU utilization, thus allowing important processes to execute normal. The lower priorities will operate with less GPU utilization. - With increasing thermal load, the device may further throttle the GPU, so that the GPU throttling affects both lower and higher priority processes. As illustrated in
FIG. 2C , the GPU throttling table 200 C has reduced GPU utilization for both the lower and higher priority processes. For example and in one embodiment, a process with a priority of 0 or 1 will have a GPU utilization of 75%. A process with a priority of 2 will have a GPU utilization of 50%. A process with the priority of three, the lowest priority process, has a GPU utilization of 25%, which is the lowest GPU utilization. In this embodiment, the higher priority processes are partially throttled for GPU utilization, while the lower priority processes are more severely throttled. For example and in one embodiment, if a GPU can operate at frequencies up to 800 MHz, the GPU will execute at 600 MHz for processes withpriority priority 2, and 200 MHz for a process ofpriority 3. -
FIG. 3 is a flow diagram of one embodiment of aprocess 300 to manage GPU throttling based on the thermal data of the device. In one embodiment,process 300 is performed by a thermal daemon to manage I/O throttling, such asthermald 112 as described above inFIG. 1 . InFIG. 4 ,process 300 begins by receiving the thermal data atblock 302. In one embodiment, the thermal data is the data related to the thermal profile or other thermal characteristics of the device. For example in one embodiment, the thermal data can be time-dependent thermal data regarding the device temperature, temperature of a particular module of the device, data regarding fan use, and/or other data related to the thermal characteristics of the device. Atblock 304,process 300 determines if the thermal data is greater than a higher threshold. In one embodiment, the higher threshold is a threshold that indicates that the device can have greater GPU throttling so as to mitigate the thermal load that is on the device. For example and in one embodiment, the thermal threshold can be related to the temperature of the device, a module of the device, fan speed, or some other thermal characteristic. As another example and embodiment, a set of higher thermal threshold could be if the device temperature exceeded 40° C., 45° C., 50° C., etc. In one embodiment, the thermal data may exceed more than one threshold. In another embodiment, the thermal threshold can be based on time of day or user activity. For example and in one embodiment, ifprocess 300 knows that a user is very likely to use the machine in near future (say back from lunch),process 300 may choose to throttle low QoS tasks to leave “thermal headroom” for the task users is likely to perform. This is especially important for devices that do not have fans to actively dissipate heat. If the thermal data is greater than a higher threshold,process 300 adjusts the GPU throttling table to increase the GPU throttling atblock 308. In one embodiment,process 300 adjusts the GPU throttling table by throttling one or more of the different GPU utilizations for one, some, or all of the process priorities. In this embodiment,process 300 can start to throttle a priority or further throttle an already throttled priority level. In one embodiment,process 300 throttles a priority by restricting the GPU utilization for GPU execution of that priority. For example in one embodiment,process 300 can throttle a priority by restricting a 100% GPU execution down to 80% as described inFIG. 2 above. Alternatively,process 300 can throttle a priority that is restricted from a 50% GPU utilization down to 25%. By throttling the GPU utilization for a priority,process 300 selectively throttles the different priorities, thus allowing greater GPU utilization for higher priorities and lower GPU utilization for lower priorities. This allows for less power consumption of the GPU for the device, while having GPU utilization for higher priority processes at the expense of lower GPU utilization for lower priority processes. A lower power consumption of the storage system can help mitigate the thermal load on the device. - If the thermal data is not greater than a higher threshold,
process 300 determines if the thermal data is less than a lower threshold atblock 306. In one embodiment, if the thermal data is less than a lower threshold,process 300 may relax the GPU throttling as the thermal load on the device may be lessening. If the thermal data is less than a lower threshold, atblock 310,process 300 adjust the GPU utilization to decrease the GPU throttling. In one embodiment,process 300 relaxes the restrictions placed on the I/O throughput for one or more of the different priorities. For example and in one embodiment,process 300 can relax a priority with an 80% GPU utilization back to an unrestricted 100% GPU utilization. Alternatively,process 300 can relax a restricted priority at a 25% GPU utilization to a less restricted 50% GPU utilization. If the thermal data is not less than the lower threshold,process 300 maintains the current GPU throttling atblock 312. Execution proceeds to block 312 above. - As described above, the device can selectively restrict and relax different priority GPU utilizations in response to the thermal data of the device. The device uses the different GPU utilizations to process the device GPU operations.
FIG. 4 is a flow diagram of one embodiment of aprocess 400 to manage GPU execution according to the quality of service of a process. In one embodiment,process 400 is performed by aGPU processing module 700 that is part of the driver, such asGPU 106 orintegrated graphics 104 as described inFIG. 1 above. InFIG. 4 ,process 400 begins by selecting a process for GPU execution for a current time slot atblock 402. In one embodiment, the GPU schedules a process for execution in different time slot, so that the GPU can execute multiple processes. During a particular time slot, a GPU will execute a task of the process and, possibly, switch to another process for another time slot. In one embodiment, a scheduler keeps track of the GPU performance by using a ring buffer for each time slot. In this embodiment, the scheduler allows for preempting tasks of one or more processes in flight because the tasks are put in multiple priority based queues. When one unit of work is finished by the GPU, there is a decision of which process is to be executed next. If there is a process is an un-throttled queue, the highest priority process is selected for execution by the GPU. For lower priorities, the scheduler looks at the GPU utilization and determines if the GPU utilization has expired. In one embodiment, it is possible to either run the GPU at reduced frequency or run the GPU at same frequency but for shorter period of time so as to allow greater sleep period (e.g., GPU idle and in a low power state). In one embodiment, it may advantageous to run the GPU at a higher frequency for a shorter period of time and allow the GPU to go into idle power states, instead of running it at lower frequency, but keeping it busy for a longer period of time. For example and in one embodiment, theprocess 400 schedules the lower priority processes if there is room for the GPU to perform the work. In one embodiment, there is a starvation mechanism, implemented via a software scheduler to prevent the lower priority processes form starving. With different job queues, corresponding to different priority process, it is possible to hold off the scheduling of lower priority queue and lower the amount of work submitted on the GPU. - At
block 404,process 400 executes the process using the GPU and according to the GPU utilization for the QoS priority of that process. As described above, each process will have a QoS priority that is used to determine the GPU utilization. In one embodiment, the GPU utilization is the GPU utilization that the GPU operate at while executing the process for the time slot. In one embodiment, the GPU utilization is a percentage of the GPU maximum or normal operating frequency. In another embodiment, the GPU utilization is an average of GPU utilization. In this embodiment,process 400 can execute the process at a high GPU utilization (e.g. at or near 100%) in one or more time slots and execute the process at a low GPU utilization in another time slot. For example and in one embodiment, if the GPU utilization is 75%,process 400 could execute the process with a GPU utilization for three out four time slots and have the GPU idle (e.g. 0% utilization) for one out of four slots. In one embodiment, having the GPU idle for a time slot may create conditions that allows for hardware components to be put into a low power state, thus allowing the device to cool down quicker. Process 400 advances to the next time slot atblock 406. Execution proceeds to block 402 above. - As described above, the throttled GPU utilizations are applied by
process 400 during a time slot that the GPU is scheduling and executing a process.FIGS. 5A-B are block diagrams of embodiments ofGPU utilization 500A-B for high and low priority processes. InFIG. 5A , the GPU executes high and low priority processes with different GPU utilizations for different time slots. In one embodiment, the high priority process is executed with a 100% GPU utilization (e.g., high priorityprocess time slots 502A-D) and the low priority process is executed with less than 100% GPU utilization (e.g., low priorityprocess time slots 504A-C). In this embodiment, the low priority process is executed with the same low GPU utilization for each time slot. In contrast, inFIG. 5B , the low priority process is executed with different GPU utilizations for different time slots. In one embodiment, intime slot 554A the GPU utilization is 100%, whereas the GPU is idle fortime slots 554B-C. By utilizing different GPU utilizations for the low priority process, the overall GPU utilization for this process can be lowered. In addition, inFIG. 5B , the high priority process is executed with 100% GPU utilization fortime slots 552A-D. While in one embodiment,FIGS. 5A-B illustrate there being more time slots allocated for a high priority process, in alternate embodiments, lower priority processes may have the same or greater number of time slot allocated than for higher priority processes. -
FIG. 6 is a block diagram of one embodiment of a thermal daemon, thermald, 110 that manages GPU throttling based on the thermal data of the device. In one embodiment,thermald 110 includes aGPU throttling module 600 that determines whether to selectively apply or relax a GPU throttle to a priority. In one embodiment, theGPU throttling module 600 includes a receivethermal data module 602, compare higherthermal threshold module 604, compare lowerthermal threshold 606, increaseGPU throttling module 608, and decreaseGPU throttling module 610. In one embodiment, the receivethermal data module 602 receives the thermal data as described inFIG. 3 , block 302 above. The compare higherthermal threshold module 604 compares the thermal data with a higher thermal threshold as described inFIG. 3 , block 304 above. The compare lowerthermal threshold 606 compares the thermal data with a lower thermal threshold as described inFIG. 3 , block 306 above. The increaseGPU throttling module 608 increases the GPU throttling for one or more processes as described inFIG. 3 , block 308 above. The decreaseGPU throttling module 610 decreases the GPU throttling for one or more processes as described inFIG. 3 , block 310 above. -
FIG. 7 is a block diagram of one embodiment of aGPU 106 that manages GPU execution according to the quality of service of a process. In one embodiment, theGPU 106 includes aGPU processing module 700 that applies the corresponding GPU utilization for a process. In one embodiment, theGPU processing module 700 includes aselect module 702 and an executemodule 704. In one embodiment, theselect module 702 selects a process to execute by the GPU and the executemodule 704 executes that process with the GPU according to the GPU utilization as described above inFIG. 4 , blocks 402 and 404, respectively. In one embodiment, anintegrated graphics 104 can also include aGPU processing module 700. -
FIG. 8 shows one example of adata processing system 800, which may be used with one embodiment of the present invention. For example, thesystem 800 may be implemented including adevice 100 as shown inFIG. 1 . Note that whileFIG. 8 illustrates various components of a computer system, it is not intended to represent any particular architecture or manner of interconnecting the components as such details are not germane to the present invention. It will also be appreciated that network computers and other data processing systems or other consumer electronic devices, which have fewer components or perhaps more components, may also be used with the present invention. - As shown in
FIG. 8 , thecomputer system 800, which is a form of a data processing system, includes abus 803 which is coupled to a microprocessor(s) 805 and a ROM (Read Only Memory) 807 andvolatile RAM 809 and anon-volatile memory 811. Themicroprocessor 805 may retrieve the instructions from thememories bus 803 interconnects these various components together and also interconnects thesecomponents output devices 815 are coupled to the system through input/output controllers 813. The volatile RAM (Random Access Memory) 809 is typically implemented as dynamic RAM (DRAM), which requires power continually in order to refresh or maintain the data in the memory. - The
mass storage 811 is typically a magnetic hard drive or a magnetic optical drive or an optical drive or a DVD ROM or a flash memory or other types of memory systems, which maintain data (e.g. large amounts of data) even after power is removed from the system. Typically, themass storage 811 will also be a random access memory although this is not required. WhileFIG. 8 shows that themass storage 811 is a local device coupled directly to the rest of the components in the data processing system, it will be appreciated that the present invention may utilize a non-volatile memory which is remote from the system, such as a network storage device which is coupled to the data processing system through a network interface such as a modem, an Ethernet interface or a wireless network. Thebus 803 may include one or more buses connected to each other through various bridges, controllers and/or adapters as is well known in the art. -
FIG. 9 shows an example of anotherdata processing system 900 which may be used with one embodiment of the present invention. For example,system 900 may be implemented as adevice 100 as shown inFIG. 1 . Thedata processing system 900 shown inFIG. 9 includes aprocessing system 911, which may be one or more microprocessors, or which may be a system on a chip integrated circuit, and the system also includesmemory 901 for storing data and programs for execution by the processing system. Thesystem 900 also includes an audio input/output subsystem 905, which may include a microphone and a speaker for, for example, playing back music or providing telephone functionality through the speaker and microphone. - A display controller and
display device 909 provide a visual user interface for the user; this digital interface may include a graphical user interface which is similar to that shown on a Macintosh computer when running OS X operating system software, or Apple iPhone when running the iOS operating system, etc. Thesystem 900 also includes one or morewireless transceivers 903 to communicate with another data processing system, such as thesystem 900 ofFIG. 9 . A wireless transceiver may be a WLAN transceiver, an infrared transceiver, a Bluetooth transceiver, and/or a wireless cellular telephony transceiver. It will be appreciated that additional components, not shown, may also be part of thesystem 900 in certain embodiments, and in certain embodiments fewer components than shown inFIG. 9 may also be used in a data processing system. Thesystem 900 further includes one ormore communications ports 917 to communicate with another data processing system, such as thesystem 800 ofFIG. 8 . The communications port may be a USB port, Firewire port, Bluetooth interface, etc. - The
data processing system 900 also includes one ormore input devices 913, which are provided to allow a user to provide input to the system. These input devices may be a keypad or a keyboard or a touch panel or a multi touch panel. Thedata processing system 900 also includes an optional input/output device 915 which may be a connector for a dock. It will be appreciated that one or more buses, not shown, may be used to interconnect the various components as is well known in the art. The data processing system shown inFIG. 9 may be a handheld computer or a personal digital assistant (PDA), or a cellular telephone with PDA like functionality, or a handheld computer which includes a cellular telephone, or a media player, such as an iPod, or devices which combine aspects or functions of these devices, such as a media player combined with a PDA and a cellular telephone in one device or an embedded device or other consumer electronic devices. In other embodiments, thedata processing system 900 may be a network computer or an embedded processing device within another device, or other types of data processing systems, which have fewer components or perhaps more components than that shown inFIG. 9 . - At least certain embodiments of the inventions may be part of a digital media player, such as a portable music and/or video media player, which may include a media processing system to present the media, a storage device to store the media and may further include a radio frequency (RF) transceiver (e.g., an RF transceiver for a cellular telephone) coupled with an antenna system and the media processing system. In certain embodiments, media stored on a remote storage device may be transmitted to the media player through the RF transceiver. The media may be, for example, one or more of music or other audio, still pictures, or motion pictures.
- The portable media player may include a media selection device, such as a click wheel input device on an iPod® or iPod Nano® media player from Apple, Inc. of Cupertino, Calif., a touch screen input device, pushbutton device, movable pointing input device or other input device. The media selection device may be used to select the media stored on the storage device and/or the remote storage device. The portable media player may, in at least certain embodiments, include a display device which is coupled to the media processing system to display titles or other indicators of media being selected through the input device and being presented, either through a speaker or earphone(s), or on the display device, or on both display device and a speaker or earphone(s). Examples of a portable media player are described in published U.S. Pat. No. 7,345,671 and U.S. published patent number 2004/0224638, both of which are incorporated herein by reference.
- Portions of what was described above may be implemented with logic circuitry such as a dedicated logic circuit or with a microcontroller or other form of processing core that executes program code instructions. Thus processes taught by the discussion above may be performed with program code such as machine-executable instructions that cause a machine that executes these instructions to perform certain functions. In this context, a “machine” may be a machine that converts intermediate form (or “abstract”) instructions into processor specific instructions (e.g., an abstract execution environment such as a “virtual machine” (e.g., a Java Virtual Machine), an interpreter, a Common Language Runtime, a high-level language virtual machine, etc.), and/or, electronic circuitry disposed on a semiconductor chip (e.g., “logic circuitry” implemented with transistors) designed to execute instructions such as a general-purpose processor and/or a special-purpose processor. Processes taught by the discussion above may also be performed by (in the alternative to a machine or in combination with a machine) electronic circuitry designed to perform the processes (or a portion thereof) without the execution of program code.
- The present invention also relates to an apparatus for performing the operations described herein. This apparatus may be specially constructed for the required purpose, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), RAMs, EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- A machine readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; etc.
- An article of manufacture may be used to store program code. An article of manufacture that stores program code may be embodied as, but is not limited to, one or more memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions. Program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)).
- The preceding detailed descriptions are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the tools used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
- It should be kept in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “monitoring,” “decreasing,” “increasing,” “maintaining,” “executing,” “processing,” “computing,” “recording,” “restoring,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the operations described. The required structure for a variety of these systems will be evident from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
- The foregoing discussion merely describes some exemplary embodiments of the present invention. One skilled in the art will readily recognize from such discussion, the accompanying drawings and the claims that various modifications can be made without departing from the spirit and scope of the invention.
Claims (20)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/503,311 US9530174B2 (en) | 2014-05-30 | 2014-09-30 | Selective GPU throttling |
PCT/US2015/030894 WO2015183586A1 (en) | 2014-05-30 | 2015-05-14 | Selective gpu throttling |
TW104116883A TWI547811B (en) | 2014-05-30 | 2015-05-26 | Selective gpu throttling |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462006009P | 2014-05-30 | 2014-05-30 | |
US14/503,311 US9530174B2 (en) | 2014-05-30 | 2014-09-30 | Selective GPU throttling |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150348226A1 true US20150348226A1 (en) | 2015-12-03 |
US9530174B2 US9530174B2 (en) | 2016-12-27 |
Family
ID=53366256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/503,311 Active US9530174B2 (en) | 2014-05-30 | 2014-09-30 | Selective GPU throttling |
Country Status (3)
Country | Link |
---|---|
US (1) | US9530174B2 (en) |
TW (1) | TWI547811B (en) |
WO (1) | WO2015183586A1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150346800A1 (en) * | 2014-05-30 | 2015-12-03 | Apple Inc. | Thermally adaptive quality-of-service |
US20160371118A1 (en) * | 2015-06-17 | 2016-12-22 | Intel Corporation | Virtual machine management method and apparatus including idling and scheduling of virtual processors |
US20170188310A1 (en) * | 2015-12-26 | 2017-06-29 | Intel IP Corporation | Context-assisted thermal management scheme in a portable device |
US20180039317A1 (en) * | 2016-08-05 | 2018-02-08 | Ati Technologies Ulc | Fine-grain gpu power management and scheduling for virtual reality applications |
US20180300839A1 (en) * | 2017-04-17 | 2018-10-18 | Intel Corporation | Power-based and target-based graphics quality adjustment |
US10203746B2 (en) | 2014-05-30 | 2019-02-12 | Apple Inc. | Thermal mitigation using selective task modulation |
US10373283B2 (en) * | 2016-03-14 | 2019-08-06 | Dell Products, Lp | System and method for normalization of GPU workloads based on real-time GPU data |
WO2020005268A1 (en) * | 2018-06-29 | 2020-01-02 | Hewlett-Packard Development Company, L.P. | Thermal profile selections based on orientation |
US11042420B2 (en) * | 2016-08-16 | 2021-06-22 | International Business Machines Corporation | System, method and recording medium for temperature-aware task scheduling |
CN113687980A (en) * | 2020-05-19 | 2021-11-23 | 北京京东乾石科技有限公司 | Abnormal data self-recovery method, system, electronic equipment and readable storage medium |
US20220043504A1 (en) * | 2018-09-27 | 2022-02-10 | Intel Corporation | Throttling of components using priority ordering |
WO2022132435A1 (en) * | 2020-12-15 | 2022-06-23 | Advanced Micro Devices, Inc. | Throttling hull shaders based on tessellation factors in a graphics pipeline |
US11710207B2 (en) | 2021-03-30 | 2023-07-25 | Advanced Micro Devices, Inc. | Wave throttling based on a parameter buffer |
US11776085B2 (en) | 2020-12-16 | 2023-10-03 | Advanced Micro Devices, Inc. | Throttling shaders based on resource usage in a graphics pipeline |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9892024B2 (en) * | 2015-11-02 | 2018-02-13 | Sony Interactive Entertainment America Llc | Backward compatibility testing of software in a mode that disrupts timing |
US10175731B2 (en) * | 2016-06-17 | 2019-01-08 | Microsoft Technology Licensing, Llc | Shared cooling for thermally connected components in electronic devices |
EP3829157B1 (en) * | 2019-11-29 | 2024-04-17 | Canon Kabushiki Kaisha | Recording apparatus, method of controlling recording apparatus, and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050289362A1 (en) * | 2004-06-24 | 2005-12-29 | Merkin Aaron E | Maintaining server performance in a power constrained environment |
US20080028778A1 (en) * | 2006-08-04 | 2008-02-07 | Timothy John Millet | Method and apparatus for a thermal control system based on virtual temperature sensor |
US20100007646A1 (en) * | 2008-07-08 | 2010-01-14 | Dell Products L.P. | Systems, Methods and Media for Disabling Graphic Processing Units |
US20100117579A1 (en) * | 2003-08-15 | 2010-05-13 | Michael Culbert | Methods and apparatuses for operating a data processing system |
US20110072178A1 (en) * | 2009-09-15 | 2011-03-24 | Arm Limited | Data processing apparatus and a method for setting priority levels for transactions |
US20120271481A1 (en) * | 2011-04-22 | 2012-10-25 | Jon James Anderson | Method and system for thermal load management in a portable computing device |
US20130155081A1 (en) * | 2011-12-15 | 2013-06-20 | Ati Technologies Ulc | Power management in multiple processor system |
US20150006925A1 (en) * | 2013-07-01 | 2015-01-01 | Advanced Micro Devices, Inc. | Allocating power to compute units based on energy efficiency |
US20150067377A1 (en) * | 2013-08-28 | 2015-03-05 | Qualcomm Incorporated | Method, Devices and Systems for Dynamic Multimedia Data Flow Control for Thermal Power Budgeting |
US20150317762A1 (en) * | 2014-04-30 | 2015-11-05 | Qualcomm Incorporated | Cpu/gpu dcvs co-optimization for reducing power consumption in graphics frame processing |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7345671B2 (en) | 2001-10-22 | 2008-03-18 | Apple Inc. | Method and apparatus for use of rotational user inputs |
US7069189B2 (en) | 2002-12-31 | 2006-06-27 | Intel Corporation | Method and apparatus for controlling multiple resources using thermal related parameters |
US7627343B2 (en) | 2003-04-25 | 2009-12-01 | Apple Inc. | Media player system |
US8224639B2 (en) | 2004-03-29 | 2012-07-17 | Sony Computer Entertainment Inc. | Methods and apparatus for achieving thermal management using processing task scheduling |
US7466316B1 (en) | 2004-12-14 | 2008-12-16 | Nvidia Corporation | Apparatus, system, and method for distributing work to integrated heterogeneous processors |
US7454631B1 (en) | 2005-03-11 | 2008-11-18 | Sun Microsystems, Inc. | Method and apparatus for controlling power consumption in multiprocessor chip |
TWM363648U (en) * | 2006-05-12 | 2009-08-21 | Xgi Technology Inc | Plug-in graphics module architecture |
US8854381B2 (en) | 2009-09-03 | 2014-10-07 | Advanced Micro Devices, Inc. | Processing unit that enables asynchronous task dispatch |
EP2383648B1 (en) | 2010-04-28 | 2020-02-19 | Telefonaktiebolaget LM Ericsson (publ) | Technique for GPU command scheduling |
US8842122B2 (en) | 2011-12-15 | 2014-09-23 | Qualcomm Incorporated | Graphics processing unit with command processor |
US20130162661A1 (en) | 2011-12-21 | 2013-06-27 | Nvidia Corporation | System and method for long running compute using buffers as timeslices |
US10095526B2 (en) | 2012-10-12 | 2018-10-09 | Nvidia Corporation | Technique for improving performance in multi-threaded processing units |
-
2014
- 2014-09-30 US US14/503,311 patent/US9530174B2/en active Active
-
2015
- 2015-05-14 WO PCT/US2015/030894 patent/WO2015183586A1/en active Application Filing
- 2015-05-26 TW TW104116883A patent/TWI547811B/en active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100117579A1 (en) * | 2003-08-15 | 2010-05-13 | Michael Culbert | Methods and apparatuses for operating a data processing system |
US20050289362A1 (en) * | 2004-06-24 | 2005-12-29 | Merkin Aaron E | Maintaining server performance in a power constrained environment |
US20080028778A1 (en) * | 2006-08-04 | 2008-02-07 | Timothy John Millet | Method and apparatus for a thermal control system based on virtual temperature sensor |
US20100007646A1 (en) * | 2008-07-08 | 2010-01-14 | Dell Products L.P. | Systems, Methods and Media for Disabling Graphic Processing Units |
US20110072178A1 (en) * | 2009-09-15 | 2011-03-24 | Arm Limited | Data processing apparatus and a method for setting priority levels for transactions |
US20120271481A1 (en) * | 2011-04-22 | 2012-10-25 | Jon James Anderson | Method and system for thermal load management in a portable computing device |
US20130155081A1 (en) * | 2011-12-15 | 2013-06-20 | Ati Technologies Ulc | Power management in multiple processor system |
US20150006925A1 (en) * | 2013-07-01 | 2015-01-01 | Advanced Micro Devices, Inc. | Allocating power to compute units based on energy efficiency |
US20150067377A1 (en) * | 2013-08-28 | 2015-03-05 | Qualcomm Incorporated | Method, Devices and Systems for Dynamic Multimedia Data Flow Control for Thermal Power Budgeting |
US20150317762A1 (en) * | 2014-04-30 | 2015-11-05 | Qualcomm Incorporated | Cpu/gpu dcvs co-optimization for reducing power consumption in graphics frame processing |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10095286B2 (en) * | 2014-05-30 | 2018-10-09 | Apple Inc. | Thermally adaptive quality-of-service |
US11054873B2 (en) | 2014-05-30 | 2021-07-06 | Apple Inc. | Thermally adaptive quality-of-service |
US20150346800A1 (en) * | 2014-05-30 | 2015-12-03 | Apple Inc. | Thermally adaptive quality-of-service |
US10203746B2 (en) | 2014-05-30 | 2019-02-12 | Apple Inc. | Thermal mitigation using selective task modulation |
US20160371118A1 (en) * | 2015-06-17 | 2016-12-22 | Intel Corporation | Virtual machine management method and apparatus including idling and scheduling of virtual processors |
CN107015611A (en) * | 2015-12-26 | 2017-08-04 | 英特尔Ip公司 | Context supplemental heat administrative mechanism in portable equipment |
US10064139B2 (en) * | 2015-12-26 | 2018-08-28 | Intel IP Corporation | Context-assisted thermal management scheme in a portable device |
US20170188310A1 (en) * | 2015-12-26 | 2017-06-29 | Intel IP Corporation | Context-assisted thermal management scheme in a portable device |
US10373283B2 (en) * | 2016-03-14 | 2019-08-06 | Dell Products, Lp | System and method for normalization of GPU workloads based on real-time GPU data |
US20180039317A1 (en) * | 2016-08-05 | 2018-02-08 | Ati Technologies Ulc | Fine-grain gpu power management and scheduling for virtual reality applications |
US11474591B2 (en) * | 2016-08-05 | 2022-10-18 | Ati Technologies Ulc | Fine-grain GPU power management and scheduling for virtual reality applications |
US11740945B2 (en) | 2016-08-16 | 2023-08-29 | International Business Machines Corporation | System, method and recording medium for temperature-aware task scheduling |
US11042420B2 (en) * | 2016-08-16 | 2021-06-22 | International Business Machines Corporation | System, method and recording medium for temperature-aware task scheduling |
US10402932B2 (en) * | 2017-04-17 | 2019-09-03 | Intel Corporation | Power-based and target-based graphics quality adjustment |
US10909653B2 (en) | 2017-04-17 | 2021-02-02 | Intel Corporation | Power-based and target-based graphics quality adjustment |
US20180300839A1 (en) * | 2017-04-17 | 2018-10-18 | Intel Corporation | Power-based and target-based graphics quality adjustment |
WO2020005268A1 (en) * | 2018-06-29 | 2020-01-02 | Hewlett-Packard Development Company, L.P. | Thermal profile selections based on orientation |
US20220043504A1 (en) * | 2018-09-27 | 2022-02-10 | Intel Corporation | Throttling of components using priority ordering |
CN113687980A (en) * | 2020-05-19 | 2021-11-23 | 北京京东乾石科技有限公司 | Abnormal data self-recovery method, system, electronic equipment and readable storage medium |
WO2022132435A1 (en) * | 2020-12-15 | 2022-06-23 | Advanced Micro Devices, Inc. | Throttling hull shaders based on tessellation factors in a graphics pipeline |
US11508124B2 (en) | 2020-12-15 | 2022-11-22 | Advanced Micro Devices, Inc. | Throttling hull shaders based on tessellation factors in a graphics pipeline |
US11776085B2 (en) | 2020-12-16 | 2023-10-03 | Advanced Micro Devices, Inc. | Throttling shaders based on resource usage in a graphics pipeline |
US11710207B2 (en) | 2021-03-30 | 2023-07-25 | Advanced Micro Devices, Inc. | Wave throttling based on a parameter buffer |
Also Published As
Publication number | Publication date |
---|---|
TW201608391A (en) | 2016-03-01 |
US9530174B2 (en) | 2016-12-27 |
WO2015183586A1 (en) | 2015-12-03 |
TWI547811B (en) | 2016-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9530174B2 (en) | Selective GPU throttling | |
US11054873B2 (en) | Thermally adaptive quality-of-service | |
US10203746B2 (en) | Thermal mitigation using selective task modulation | |
US10970085B2 (en) | Resource management with dynamic resource policies | |
US9336070B1 (en) | Throttling of application access to resources | |
US9436628B2 (en) | Thermal mitigation using selective I/O throttling | |
US9690685B2 (en) | Performance management based on resource consumption | |
EP3535641B1 (en) | Thread importance based processor core partitioning | |
JP6072834B2 (en) | Method, program, apparatus, and system | |
US9904575B2 (en) | System and method for selective timer rate limiting | |
CN109906437B (en) | Processor core stall and frequency selection based on thread importance | |
JP2017526996A (en) | System and method for managing processor device power consumption | |
CN108604114B (en) | Forced idle of memory subsystem | |
JP2018505476A (en) | System and method for providing dynamic cache expansion in a multi-cluster heterogeneous processor architecture | |
US9542230B2 (en) | System and method for selective timer coalescing | |
US20160170470A1 (en) | Dynamic control of processors to reduce thermal and power costs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VAISHAMPAYAN, UMESH SURESH;KUMAR, DEREK R.;FORET, CECILE MARIE;AND OTHERS;SIGNING DATES FROM 20140723 TO 20140929;REEL/FRAME:033914/0217 |
|
AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INCORRECT APPL. NO. 14/503,111 PREVIOUSLY RECORDED AT REEL: 033914 FRAME: 0217. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:VAISHAMPAYAN, UMESH SURESH;KUMAR, DEREK R.;FORET, CECILE MARIE;AND OTHERS;SIGNING DATES FROM 20140723 TO 20140929;REEL/FRAME:035760/0637 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LIN, BIN;REEL/FRAME:037960/0724 Effective date: 20160310 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |