US20130205126A1 - Core-level dynamic voltage and frequency scaling in a chip multiprocessor - Google Patents
Core-level dynamic voltage and frequency scaling in a chip multiprocessor Download PDFInfo
- Publication number
- US20130205126A1 US20130205126A1 US13/811,280 US201213811280A US2013205126A1 US 20130205126 A1 US20130205126 A1 US 20130205126A1 US 201213811280 A US201213811280 A US 201213811280A US 2013205126 A1 US2013205126 A1 US 2013205126A1
- Authority
- US
- United States
- Prior art keywords
- performance
- reliability information
- processor cores
- processor core
- chip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 62
- 238000004806 packaging method and process Methods 0.000 claims abstract description 19
- 238000004519 manufacturing process Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims description 28
- 238000007667 floating Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 23
- 238000012360 testing method Methods 0.000 description 21
- 238000003860 storage Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 238000007726 management method Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012993 chemical processing Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000020169 heat generation Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/4401—Bootstrapping
- G06F9/4405—Initialisation of multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/324—Power saving characterised by the action undertaken by lowering clock frequency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3243—Power saving in microcontroller unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3296—Power saving characterised by the action undertaken by lowering the supply or operating voltage
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- Example methods described herein may include receiving performance or reliability information associated with each of the multiple processor cores, wherein the received performance or reliability information can be determined prior to packaging of the chip multiprocessor. Some described method may also include storing the received performance or reliability information such that the received performance or reliability information can be retrieved and used to adjust one or more operating parameters of one or more of the multiple processor cores of the chip multiprocessor.
- a method for managing the operating frequencies and voltages assigned to processor cores in a chip multiprocessor.
- Some example methods may include determining computational requirements for a task to be completed by the chip multiprocessor. Based on the determined computational requirements and on stored performance or reliability information associated with each of the processor cores, one or more operating parameters of at least one of the processor cores can be adjusted. In some examples, the stored performance or reliability information can be determined prior to packaging of the chip multiprocessor.
- a chip multiprocessor where the multiprocessor can be formed on a single die.
- a first processor core, a second processor core, and an on-chip registry can be formed on the die.
- the on-chip registry can be configured to store performance or reliability information associated with the first processor core and performance or reliability information associated with the second processor core.
- the stored performance or reliability information can be retrieved from the on-chip registry and various operating parameters of the first and/or the second processor cores can be adjusted.
- FIG. 1 shows a block diagram of an example embodiment of a chip multiprocessor
- FIG. 2 shows a block diagram of an example embodiment of a processor core
- FIG. 3 sets forth a flowchart summarizing an example method for manufacturing a chip multiprocessor having multiple processor cores
- FIG. 4 sets forth a flowchart summarizing an example method for managing frequency and voltage provided to processing cores in a chip multiprocessor
- FIG. 5 is a block diagram of an illustrative embodiment of a computer program product for implementing a method for manufacturing a chip multiprocessor having multiple processor cores;
- FIG. 6 is a block diagram illustrating an example computing device that is arranged for manufacturing a chip multiprocessor having multiple processor cores; all arranged in accordance with at least some embodiments of the present disclosure.
- Semiconductor chip manufacturing often includes a sequence of photographic and chemical processing steps during which electronic devices and circuits are gradually created on a wafer made of semiconducting material.
- the entire manufacturing process from front-end-of-line processing of a wafer to packaging of chips formed from the wafer, can include hundreds of process steps, each of which may be subject to a certain level of random variation. Consequently, chips that have nominally undergone identical processing, even chips formed from adjacent locations on the same wafer, may have varied performance.
- each chip is typically subjected to lengthy and complex testing to establish what maximum clock speed (or operating frequency) is considered reliable, so that chips can be binned and sold as different products based on the measured maximum reliable clock speed.
- CMPs chip multiprocessors
- DVFS Differential voltage frequency scaling
- DVFS Differential voltage frequency scaling
- a CMP having a low number of processor cores such an approach may be simple and effective.
- the assumption that all processor cores operate with essentially the same performance and/or reliability parameters can result in significant lost performance and/or unwanted power use by the CMP.
- Embodiments disclosed herein contemplate systems, methods and/or devices for providing core-level performance or reliability information of processor cores to a CMP containing the processor cores.
- the core-level performance or reliability information for each processor core may be obtained during testing of the CMP that occurs prior to packaging, such as wafer-level testing. Because such pre-packaging testing allows the collection of detailed performance and/or reliability information for each individual processor core and computing submodules of each processor core, such information may convey a more comprehensive characterization of the performance or reliability of the processor cores to a power management unit (PMU) of the CMP. Consequently, during operation the PMU may be configured to manage operating frequency and voltage of each individual processor core according to the unique characteristics thereof. Furthermore, in some embodiments, the PMU may be configured to adjust the usage, operating frequency, and/or operating voltage of the individual processor cores based on the make-up of a specific task assigned to the CMP.
- PMU power management unit
- FIG. 1 shows a block diagram of an example embodiment of a chip multiprocessor (CMP) 100 , arranged in accordance with at least some embodiments of the present disclosure.
- CMP 100 is a multi-core processor formed from a single integrated circuit die that can be configured to carry out parallel processing tasks (e.g., process multiple threads) using multiple processor cores formed on the die.
- CMP 100 may include a power management unit (PMU) 110 and multiple processor cores 140 .
- PMU power management unit
- CMP 100 may be coupled to a global queue 120 and a dispatcher 130 .
- PMU 110 may act as the global controller or multicore manager for CMP 100 , and may be configured to adjust the working voltage and/or frequency levels of each of the multiple processor cores 140 .
- PMU 110 may be substantially similar in organization and operation to existing PMUs that are configured for CMPs.
- PMU 110 may be configured to perform DVFS with respect to multiple processor cores 140 .
- PMU 110 may include multiple voltage control devices, each being configured to independently adjust operating voltages applied to each of processor cores 140 as desired.
- PMU 110 may include a DC/DC controller and multiple DC/DC converters.
- PMU 110 may also include a clock source unit or some other frequency control device that can be configured to dynamically adjust the clock signal provided to each processor core 140 as desired.
- PMU 110 is disposed on-chip with CMP 100 and therefore may be a component of CMP 100 . In other embodiments, PMU 110 may be disposed off-chip from CMP 100 .
- PMU 100 power management approaches have been developed that can be used by PMU 100 , covering a wide spectrum of system characteristics, including: high-level operating-system-driven policies, response to predicted usage, dynamic management of processor resources according to activity demands, dynamic scheduling of tasks to processors in a chip multi-processor (CMP) environment, and hardware techniques for DVFS.
- Other approaches include adaptive body biasing (ABB) and adaptive supply voltage (ASV) implementations.
- the hardware actuators available for such power management include: joint voltage and frequency scaling, frequency scaling, and microarchitectural switches, e.g., instruction fetch throttling.
- PMU 110 queries the performance, capabilities, and power of all components at regular time intervals and decides how to best control the available actuators of each component in order to comply with a given power management policy, e.g., a fixed power budget.
- Algorithms for implementing policies through hardware actuators exist for a single-core and can be used here on CMP 100 thanks to the individual core-level characterization data, i.e., performance/reliability information 145 .
- Two examples of such implementation algorithms suitable for use by PMU 110 for DVFS are MaxBIPS and LinOpt.
- the MaxBIPS algorithm assumes a set of discrete power modes (Vdd-frequency pairs), which PMU 110 can control for each of processor cores 140 individually.
- the goal of such an algorithm is to maximize the overall performance of CMP 100 , as measured by the total number of completed instructions by all of processor cores 140 per time period, under a given power budget.
- the MaxBIPS algorithm relies on the fact that when a given core switches from power mode A (VddA, freqA) in observation window N to power mode B (VddB, freqB) in observation window N+1, the future performance and power is predictable using simple formulas. LinOpt uses linear programming to find the best voltage and frequency levels for each of the cores in the CMP.
- these algorithms have used chipwide estimations of available performance windows. According to some embodiments, these algorithms may be modified to use data such as that provided by the on-chip registries 143 (described below), collected at the core level before processor cores 140 on CMP 100 are packaged.
- policies for processor cores 140 may be related to what tasks they are running and PMU 110 may have policies delivered by a higher level manager in the architecture such as a computer operating system (OS) 190 or virtual machine manager, which may associate different policies with different processor cores 140 or tasks 101 .
- OS computer operating system
- an observation window used by PMU 110 may be varied. Varying the PMU observation window, when the PM algorithm is run, between 100 ⁇ s and 500 ⁇ s had only little effect during experiments, so the PMU observation time can be changed with the number of active cores. Additionally, in some embodiments a single PMU 110 can be used for a large CMP 100 because the single PMU can divide observation of multiple processor cores 140 into a series of samples taken at different periods in time. PMU 110 may observe all of processor cores 140 at once, or only observe a subgroup of the multiple processor cores 140 at one time and observer different subgroups of processor cores at different times.
- CMP 100 may include multiple PMUs 110 , each of which may be configured to manage a subdomain of CMP 100 .
- PMUs 110 may be configured to manage a subdomain of CMP 100 .
- four PMUs 110 may each manage roughly one quarter of the CMP 100 as a whole.
- multiple PMUs 110 may communicate with each other to cooperatively manage global policies, or, alternatively, each PMU 110 may receive policy directed toward its subdomain from a software manager such as a virtual machine manager or OS 190 .
- Some control systems may find it easier to optimize a reduced number of controls, in which case the subdomain system described above may be used with subdomains grouped by performance/reliability information 145 related to processor cores 140 , for example by grouping together processor cores 140 having similar performance/reliability information 145 .
- grouping may be defined at the time of pre-package testing and stored in the manner described here for performance/reliability information 145 .
- such grouping may be performed later in order to simplify management.
- the grouping may also be used by dispatcher 130 , for example by dispersing a set of tasks 101 to processor cores 140 identified by the performance/reliability information 145 to run at lower power than the others.
- Global queue 120 may be configured to receive and store incoming tasks from OS 190 .
- Dispatcher 130 is a scheduler module that can be configured to periodically assign tasks 101 in global queue 120 to each of processor cores 140 .
- the functions of dispatcher 130 may be distributed between OS 190 and CMP 100 , but for clarity is illustrated as a single element in FIG. 1 .
- on-chip registries 143 may be located in each of processing cores 140 or otherwise associated respectively with processing cores 140 .
- on-chip registries may be on the same chip as processing cores 140 , but not physically part of the processing cores 140 as illustrated in FIG. 1 .
- an off-chip registry 102 (also shown in FIG. 1 ) may be disposed external to CMP 100 .
- FIG. 2 shows a block diagram of an example embodiment of one of processor cores 140 , arranged in accordance with at least some embodiments of the present disclosure.
- Processor core 140 may include a local queue 141 , processor circuitry 142 , and an on-chip registry 143 .
- Local queue 141 may be configured to receive and store tasks 101 that are assigned to core 140 by dispatcher 130 .
- Processor circuitry 142 may include various computing submodules 149 of processor core 140 that can be configured to perform the tasks stored in local queue 141 . Examples of such computing submodules 149 in processor circuitry 142 may include shifters, adders, cache, memory communications units, bus-processing units, network interfaces, floating point units, arithmetic units, specialty operations units, and the like.
- On-chip registry 143 is a registry that can be associated with processor core 140 that may be formed as part of the integrated circuit making up processor core 140 .
- on-chip registry 143 may be configured to store performance/reliability information 145 related to processor core 140 , where the stored information 145 can be retrieved by or provided to PMU 110 .
- On-chip registry 143 may be any technically feasible manifestation of registry, including read-only memory (ROM), Programmable ROM (PROM), erasable PROM (EPROM, electrically erasable PROM (EEPROM), a fuse map, flash memory, and the like.
- Performance/reliability information 145 may include metrology data that can be measured on processor core 140 prior to packaging of CMP 100 , e.g., during a wafer-level test process. Because performance/reliability information 145 may be collected at the wafer level or on diced chips prior to packaging of CMP 100 , more test contacts are available than for a packaged die—in some examples up to four times as many. In addition, testing of computer chips prior to packaging may be performed at controlled temperature and with sophisticated test signals and test equipment, and can establish a maximum clock frequency of an integrated circuit at a predetermined reliability, as well as other performance parameters.
- the integrated circuits available for such testing may include each individual processor core 140 and/or each of the computing submodules 149 of each processor core 140 . Consequently, in some embodiments, performance/reliability information 145 can include highly detailed, core-level and/or submodule-level information that can be utilized to enable PMU 110 to effectively optimize operating frequency and/or power use of CMP 100 during operation.
- performance/reliability information 145 may include maximum operating voltage and/or clock frequency values associated with the reliable operation of the associated processor core 140 .
- performance/reliability information 145 may include an average power consumption value and/or a peak power consumption value for each of processor cores 140 .
- performance/reliability information 145 may further include leakage rate and/or other performance metrics measured across the operating range of the associated processor core 140 .
- performance/reliability information 145 may be in the form of slope and intercept values for generating a function representing a specific behavior of the processor core 140 of interest, e.g., frequency vs. power use.
- performance/reliability information 145 may be in the form of multiple data points that can be used to construct a best-fit curve representing a specific behavior across the operating range of the processor core 140 of interest.
- the data points may correspond to performance characteristics measured during testing prior to packaging.
- the unique performance characteristics of each processor core 140 can be provided to PMU 110 , thereby facilitating optimal power use and/or frequency of each processor core 140 on a per-core basis.
- performance/reliability information 145 may include performance or reliability information for one or more of the computing modules 149 and/or other sub-circuits in processor core 140 .
- performance/reliability information 145 may include power use, leakage current, etc., for each of computing submodules and/or other sub-circuits in processor core 140 , where such information may include individual values or functions defined across the operating range of the processor core. In this way, the unique operating characteristics of each processor core 140 in CMP 100 can be determined, and PMU 100 can tailor the use of each processor core 140 based on said operating characteristics.
- PMU 110 in order to optimize power usage required for a specific task, PMU 110 may be configured to provide input data to dispatcher 130 so that the task can be assigned from global queue 110 to a specific processor core 140 .
- the assignment of the task may be based on the execution instructions contained in the task and on the unique operating characteristics of the processor core 140 being assigned the task.
- performance/reliability information 145 may include a frequency slope value and an intercept value for power use for each computing submodule 149 in each processor core 140 .
- a table of such power-use information can be generated (e.g., by PMU 110 , OS 190 , or dispatcher 130 ) so that the task, once compiled by OS 190 , can be weighted in categories of instructions performed by each computing submodule and subcircuit of a processor core.
- an effective metric may be produced whereby the performance of each processor core 140 can be estimated and the compiled task can be assigned to the processor core 140 that is determined to be most suitable for completing the task.
- the processor core 140 having the lowest power usage for the multiply operation would be selected to execute the task.
- the selection of processor core may be performed by PMU 110 .
- the selection of processor core may be performed by OS 190 or dispatcher 130 .
- CMP 100 may be configured to receive tasks 101 from OS 190 via dispatcher 130 .
- PMU 110 may be configured to operate as the global controller for CMP 100 such that PMU 110 can effectively set the working voltage and frequency levels of each of the multiple processor cores 140 .
- PMU 110 may be configured to receive performance/reliability information 145 from on-chip registries 143 located in each of processing cores 140 .
- performance/reliability information 145 may be stored in a single off-chip registry 102 .
- performance/reliability information 145 may be stored in a remote database that can be accessed by OS 190 (e.g., during an initial boot-up of CMP 100 ).
- the remote database may be configured for accessibility via the Internet such that CMP 100 or OS 190 can automatically access the remote database using a unique identifier code associated with CMP 100 , where the performance/reliability information 145 may be stored in the remote database.
- the unique identifier code may be deleted or blocked by OS 190 after performance/reliability information 145 is received by CMP 100 .
- a non-unique code may be included in a chip ID for CMP 100 which can be used by the remote database to algorithmically reproduce performance/reliability information 145 .
- Other means may also be used to provide performance/reliability information 145 to CMP 100 prior to normal operation.
- PMU 110 may be configured to dynamically vary the working voltage and/or frequency levels of each of the multiple processor cores 140 dynamically.
- DVFS of multiple cores in a chip multiprocessor is somewhat conventional, the effectiveness of DVFS can be significantly improved by various embodiments described in this disclosure since per-core, pre-packaging test information (i.e., performance/reliability information 145 ) may be employed to provide a wider reliable operating range for the majority of processor cores 140 in CMP 100 than may typically be available through conventional techniques.
- FIG. 3 sets forth a flowchart summarizing an example method 300 for manufacturing a CMP having multiple processor cores, in accordance with at least some embodiments of the present disclosure.
- Method 300 may include one or more operations, functions or actions as illustrated by one or more of blocks 301 and/or 302 . Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or eliminated based upon the desired implementation.
- method 300 is described in terms of a CMP substantially similar to CMP 100 .
- One of skill in the art will appreciate that method 300 may be performed by other configurations of CMPs and still fall within the scope of the present disclosure.
- CMP 100 Prior to the first operation of method 300 , CMP 100 may be formed on a semiconductor wafer and undergo testing prior to packaging.
- the pre-packaging testing may be utilized to produce performance/reliability information 145 associated with each of the individual processor cores 140 included in CMP 100 and, in some embodiments, performance/reliability information 145 for computing submodules 149 and/or other subcircuits of each processor core 140 .
- Processing for method 300 may begin in operation 301 , “receive core-level performance/reliability information.”
- Block 301 may be followed by block 302 , “store core-level performance/reliability information.”
- performance/reliability information 145 associated with each of the multiple processor cores 140 in CMP 100 may be received. It is noted that performance/reliability information 145 may be generated prior to packaging of the multi-core processor, during wafer-level or chip-level testing.
- performance/reliability information 145 may be stored so that an operating parameter of at least one of the processor cores 140 can be adjusted during operation of CMP 100 .
- storing performance/reliability information 145 may include recording the performance or reliability information to either a single off-chip registry 102 or to multiple on-chip registries 143 .
- a single on-chip registry may be used to store performance/reliability information 145 .
- storing performance/reliability information 145 may include storing performance/reliability information 145 in a database that may be accessible by PMU 110 , which may be accessed upon start-up of PMU 110 via OS 190 .
- adjusting an operating parameter of at least one of processor cores 140 may include programming PMU 110 to determine one of a power rating, a frequency, and an operating voltage of at least one of the processor cores 140 of CMP 100 based on performance/reliability information 145 previously stored in operation 301 .
- PMU 110 can be configured to use performance/reliability information 145 to optimize a DVFS procedure for reducing power use or increasing processing performance. Specifically, when CMP 100 is determined to have a light processing load, PMU 110 can assign tasks to processor cores 140 known to have lower power consumption.
- FIG. 4 sets forth a flowchart summarizing an example method 400 for managing frequency and/or voltage provided to processing cores in a chip multiprocessor, in accordance with at least some embodiments of the disclosure.
- Method 400 may include one or more operations, functions or actions as illustrated by one or more of blocks 401 , 402 , and/or 403 . Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or eliminated based upon the desired implementation.
- method 400 is described in terms of a CMP substantially similar to CMP 100 . In light of the present disclosure it is appreciated that method 400 may be performed by other configurations of CMP, which fall within the scope of the present disclosure.
- performance/reliability information 145 associated with each of the processor cores 140 may be collected during prepackaging testing of CMP 100 .
- Processing for method 400 may begin in operation 401 , “determine computational requirements for task.” Operation 401 may be followed by operation 402 , “adjust operating parameter of processor core.” Operation 402 may be followed by operation 403 , “select processor core.”
- the computational requirements for a task to be completed by CMP 100 may be determined.
- the execution instructions may be determined to be a low-demand or high-demand task by analyzing the instructions of the task.
- the performance of each processor core 140 in executing the task can be estimated based on the make-up of the execution instructions and also based on the unique operating characteristics of each processor core 140 .
- computational requirements for an instruction may be determined in step 401 by PMU 110 and in other embodiments, by OS 190 .
- an operating parameter of one or more of processor cores 140 may be adjusted based on the computational requirements determined in operation 401 and on stored performance/reliability information 145 associated with each of the processor cores.
- operating parameters adjusted in operation 402 may include optimizing power use and/or clock frequency of one or more of processor cores 140 .
- one or more processor cores 140 may be selected based on performance/reliability information 145 .
- the selected processor core or cores 140 may then be prevented from performing the task. For example, if it is determined in operation 401 that the execution instructions represent a low-demand task, processor cores having high power use, as indicated by performance/reliability information 145 , may not be used to execute some or all of the tasks, thereby minimizing power usage of CMP 100 .
- FIG. 5 is a block diagram of an illustrative embodiment of a computer program product 500 for implementing a method for manufacturing a CMP having multiple processor cores, arranged in accordance with at least some embodiments of the present disclosure.
- Computer program product 500 may include a signal bearing medium 504 .
- Signal bearing medium 504 may include one or more sets of executable instructions 502 that, when executed by, for example, a processor of a computing device, may provide at least the functionality described above with respect to FIG. 3 .
- signal bearing medium 504 may encompass a non-transitory computer readable medium 508 , such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, memory, etc.
- signal bearing medium 504 may encompass a recordable medium 510 , such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, etc.
- signal bearing medium 504 may encompass a communications medium 506 , such as, but not limited to, a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
- Computer program product 500 may be recorded on non-transitory computer readable medium 508 or another similar recordable medium 510 .
- FIG. 6 is a block diagram illustrating an example computing device 600 that is arranged for manufacturing a chip multiprocessor having multiple processor cores, according to at least some embodiments of the present disclosure.
- computing device 600 In a very basic configuration 602 , computing device 600 typically includes one or more processors 604 and a system memory 606 .
- a memory bus 608 may be used for communicating between processor 604 and system memory 606 .
- processor 604 may be of any type including but not limited to a microprocessor ( ⁇ P), a microcontroller ( ⁇ C), a digital signal processor (DSP), or any combination thereof.
- Processor 604 may include one more levels of caching, such as a level one cache 610 and a level two cache 612 , a processor core 614 , and registers 616 .
- An example processor core 614 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
- An example memory controller 618 may also be used with processor 604 , or in some implementations memory controller 618 may be an internal part of processor 604 .
- system memory 606 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
- System memory 606 may include an operating system 620 , one or more applications 622 , and program data 624 .
- Application 622 may include a an algorithm 626 that is arranged to manage the operating frequencies and voltages assigned to processor cores in a chip multiprocessor, as described with respect to method 300 of FIG. 3 and/or method 400 of FIG. 4 .
- Program data 624 may include performance/reliability data 628 that may be useful for operation with data monitoring algorithm 626 as is described herein.
- application 622 may be arranged to operate with program data 624 on operating system 620 . This described basic configuration 602 is illustrated in FIG. 6 by those components within the inner dashed line.
- Computing device 600 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 602 and any required devices and interfaces.
- a bus/interface controller 640 may be used to facilitate communications between basic configuration 602 and one or more data storage devices 650 via a storage interface bus 641 .
- Data storage devices 650 may be removable storage devices 651 , non-removable storage devices 652 , or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few.
- Example computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information and which may be accessed by computing device 600 . Any such computer storage media may be part of computing device 600 .
- Computing device 600 may also include an interface bus 660 for facilitating communication from various interface devices (e.g., output devices 642 , peripheral interfaces 670 , and communication devices 680 ) to basic configuration 602 via bus/interface controller 640 .
- Example output devices 660 include a graphics processing unit 661 and an audio processing unit 662 , which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 663 .
- Example peripheral interfaces 670 include a serial interface controller 671 or a parallel interface controller 672 , which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 673 .
- An example communication device 680 includes a network controller 681 , which may be arranged to facilitate communications with one or more other computing devices 690 over a network communication link, such as, without limitation, optical fiber, Long Term Evolution (LTE), 3G, WiMax, via one or more communication ports 682 .
- LTE Long Term Evolution
- the network communication link may be one example of a communication media.
- Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media.
- a “modulated data signal” may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media.
- RF radio frequency
- IR infrared
- the term computer readable media as used herein may include both storage media and communication media.
- Computing device 600 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
- a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
- PDA personal data assistant
- Computing device 600 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
- embodiments of the disclosure provide systems and methods for providing core-level performance or reliability information of processor cores to a CMP containing the processor cores. Incorporation of core-level performance or reliability information into the multicore management process may allow the multicore manager of a CMP to use a wider operating range for each core. This may enhance overall processing power for high-demand tasks and lowers overall power usage for low-demand tasks. Furthermore, it is noted that these improvements may be made with no change to existing chip designs or processes.
- the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
- a signal bearing medium examples include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
- a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).
- a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
- any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable”, to each other to achieve the desired functionality.
- operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Power Sources (AREA)
- Microcomputers (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
- Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
- In keeping with Moore's Law, the number of transistors that can be practicably incorporated into an integrated circuit has doubled approximately every two years. This trend has continued for more than half a century and is expected to continue until at least 2015 or 2020. However, simply adding more transistors to a single-threaded processor no longer produces a significantly faster processor. Instead, increased system performance has been attained by integrating multiple processor cores on a single chip to create a chip multiprocessor and sharing processes among the multiple processor cores of the chip multiprocessor. Furthermore, the multiple processor cores of a chip multiprocessor can share other common system components, which may facilitate the manufacture of a system that is lower in cost and smaller in size compared to multiple single-core processors that collectively may have the same processing performance.
- In accordance with at least some embodiments of the present disclosure, a method for manufacturing a chip multiprocessor having multiple processor cores is generally described. Example methods described herein may include receiving performance or reliability information associated with each of the multiple processor cores, wherein the received performance or reliability information can be determined prior to packaging of the chip multiprocessor. Some described method may also include storing the received performance or reliability information such that the received performance or reliability information can be retrieved and used to adjust one or more operating parameters of one or more of the multiple processor cores of the chip multiprocessor.
- In accordance with at least some embodiments of the present disclosure, a method is described for managing the operating frequencies and voltages assigned to processor cores in a chip multiprocessor. Some example methods may include determining computational requirements for a task to be completed by the chip multiprocessor. Based on the determined computational requirements and on stored performance or reliability information associated with each of the processor cores, one or more operating parameters of at least one of the processor cores can be adjusted. In some examples, the stored performance or reliability information can be determined prior to packaging of the chip multiprocessor.
- In accordance with at least some embodiments of the present disclosure, a chip multiprocessor is described where the multiprocessor can be formed on a single die. For example, a first processor core, a second processor core, and an on-chip registry can be formed on the die. The on-chip registry can be configured to store performance or reliability information associated with the first processor core and performance or reliability information associated with the second processor core. During operation of the multiprocessor, the stored performance or reliability information can be retrieved from the on-chip registry and various operating parameters of the first and/or the second processor cores can be adjusted.
- The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
- The foregoing and other features of the present disclosure will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. These drawings depict only several embodiments in accordance with the disclosure and are, therefore, not to be considered limiting of its scope. The disclosure will be described with additional specificity and detail through use of the accompanying drawings.
-
FIG. 1 shows a block diagram of an example embodiment of a chip multiprocessor; -
FIG. 2 shows a block diagram of an example embodiment of a processor core; -
FIG. 3 sets forth a flowchart summarizing an example method for manufacturing a chip multiprocessor having multiple processor cores; and -
FIG. 4 sets forth a flowchart summarizing an example method for managing frequency and voltage provided to processing cores in a chip multiprocessor; -
FIG. 5 is a block diagram of an illustrative embodiment of a computer program product for implementing a method for manufacturing a chip multiprocessor having multiple processor cores; and -
FIG. 6 is a block diagram illustrating an example computing device that is arranged for manufacturing a chip multiprocessor having multiple processor cores; all arranged in accordance with at least some embodiments of the present disclosure. - In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the disclosure, as generally described herein, and illustrated in the Figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated and make part of this disclosure.
- Semiconductor chip manufacturing often includes a sequence of photographic and chemical processing steps during which electronic devices and circuits are gradually created on a wafer made of semiconducting material. The entire manufacturing process, from front-end-of-line processing of a wafer to packaging of chips formed from the wafer, can include hundreds of process steps, each of which may be subject to a certain level of random variation. Consequently, chips that have nominally undergone identical processing, even chips formed from adjacent locations on the same wafer, may have varied performance. The same design, process, and masks may be used for entire wafer runs, but at the end of the manufacturing process each chip is typically subjected to lengthy and complex testing to establish what maximum clock speed (or operating frequency) is considered reliable, so that chips can be binned and sold as different products based on the measured maximum reliable clock speed.
- Currently, manufacturers of chip multiprocessors (CMPs) typically assign a single maximum operating frequency for each processor core in a CMP, regardless of core count, and pre-set the CMP die clocked to that maximum operating frequency. Differential voltage frequency scaling (DVFS), i.e., adjustment of the operating frequency and/or operating voltage of a microprocessor during operation, can then be used to manage power use of the CMP to conserve power, reduce heat generation, etc. For a CMP having a low number of processor cores, such an approach may be simple and effective. However, for a CMP with a higher core count, the assumption that all processor cores operate with essentially the same performance and/or reliability parameters can result in significant lost performance and/or unwanted power use by the CMP. These performance losses may occur since the large number of processor cores in a high core-count CMP generally has a wide distribution of maximum operating frequencies and other performance parameters. Therefore, setting the maximum operating frequency of a high core-count CMP to be compatible with the lowest performing processor core in the CMP may prevent the majority of processor cores from being used at their maximum performance levels. With higher core-count CMPs, the distribution of maximum operating frequencies may widen and lost potential performance of such CMPs may increase proportionally.
- Embodiments disclosed herein contemplate systems, methods and/or devices for providing core-level performance or reliability information of processor cores to a CMP containing the processor cores. The core-level performance or reliability information for each processor core may be obtained during testing of the CMP that occurs prior to packaging, such as wafer-level testing. Because such pre-packaging testing allows the collection of detailed performance and/or reliability information for each individual processor core and computing submodules of each processor core, such information may convey a more comprehensive characterization of the performance or reliability of the processor cores to a power management unit (PMU) of the CMP. Consequently, during operation the PMU may be configured to manage operating frequency and voltage of each individual processor core according to the unique characteristics thereof. Furthermore, in some embodiments, the PMU may be configured to adjust the usage, operating frequency, and/or operating voltage of the individual processor cores based on the make-up of a specific task assigned to the CMP.
-
FIG. 1 shows a block diagram of an example embodiment of a chip multiprocessor (CMP) 100, arranged in accordance with at least some embodiments of the present disclosure. CMP 100 is a multi-core processor formed from a single integrated circuit die that can be configured to carry out parallel processing tasks (e.g., process multiple threads) using multiple processor cores formed on the die. CMP 100 may include a power management unit (PMU) 110 andmultiple processor cores 140. In addition, CMP 100 may be coupled to aglobal queue 120 and adispatcher 130. - PMU 110 may act as the global controller or multicore manager for
CMP 100, and may be configured to adjust the working voltage and/or frequency levels of each of themultiple processor cores 140. PMU 110 may be substantially similar in organization and operation to existing PMUs that are configured for CMPs. In some embodiments PMU 110 may be configured to perform DVFS with respect tomultiple processor cores 140. Accordingly,PMU 110 may include multiple voltage control devices, each being configured to independently adjust operating voltages applied to each ofprocessor cores 140 as desired. In some examples, PMU 110 may include a DC/DC controller and multiple DC/DC converters. PMU 110 may also include a clock source unit or some other frequency control device that can be configured to dynamically adjust the clock signal provided to eachprocessor core 140 as desired. In the embodiment illustrated inFIG. 1 , PMU 110 is disposed on-chip withCMP 100 and therefore may be a component ofCMP 100. In other embodiments, PMU 110 may be disposed off-chip fromCMP 100. - Several power management approaches have been developed that can be used by
PMU 100, covering a wide spectrum of system characteristics, including: high-level operating-system-driven policies, response to predicted usage, dynamic management of processor resources according to activity demands, dynamic scheduling of tasks to processors in a chip multi-processor (CMP) environment, and hardware techniques for DVFS. Other approaches include adaptive body biasing (ABB) and adaptive supply voltage (ASV) implementations. - According to some embodiments, the hardware actuators available for such power management include: joint voltage and frequency scaling, frequency scaling, and microarchitectural switches, e.g., instruction fetch throttling.
PMU 110 queries the performance, capabilities, and power of all components at regular time intervals and decides how to best control the available actuators of each component in order to comply with a given power management policy, e.g., a fixed power budget. Algorithms for implementing policies through hardware actuators exist for a single-core and can be used here onCMP 100 thanks to the individual core-level characterization data, i.e., performance/reliability information 145. Two examples of such implementation algorithms suitable for use byPMU 110 for DVFS are MaxBIPS and LinOpt. - The MaxBIPS algorithm assumes a set of discrete power modes (Vdd-frequency pairs), which PMU 110 can control for each of
processor cores 140 individually. The goal of such an algorithm is to maximize the overall performance ofCMP 100, as measured by the total number of completed instructions by all ofprocessor cores 140 per time period, under a given power budget. The MaxBIPS algorithm relies on the fact that when a given core switches from power mode A (VddA, freqA) in observation window N to power mode B (VddB, freqB) in observation window N+1, the future performance and power is predictable using simple formulas. LinOpt uses linear programming to find the best voltage and frequency levels for each of the cores in the CMP. Previously these formulas have used chipwide estimations of available performance windows. According to some embodiments, these algorithms may be modified to use data such as that provided by the on-chip registries 143 (described below), collected at the core level beforeprocessor cores 140 onCMP 100 are packaged. - The performance of an application running on one of
processor cores 140, measured in instructions per cycle (IPC), depends on how much time the application spends doing computations versus time spent waiting for memory accesses. The frequency of aparticular processor core 140 directly affects computation speed but has little influence on the memory latency associated with theprocessor core 140. Therefore, the performance of computationally intensive applications is more sensitive to voltage and frequency scaling than that of memory-bound applications. Thus policies forprocessor cores 140 may be related to what tasks they are running andPMU 110 may have policies delivered by a higher level manager in the architecture such as a computer operating system (OS) 190 or virtual machine manager, which may associate different policies withdifferent processor cores 140 ortasks 101. - According to some embodiments, an observation window used by
PMU 110 may be varied. Varying the PMU observation window, when the PM algorithm is run, between 100 μs and 500 μs had only little effect during experiments, so the PMU observation time can be changed with the number of active cores. Additionally, in some embodiments asingle PMU 110 can be used for alarge CMP 100 because the single PMU can divide observation ofmultiple processor cores 140 into a series of samples taken at different periods in time.PMU 110 may observe all ofprocessor cores 140 at once, or only observe a subgroup of themultiple processor cores 140 at one time and observer different subgroups of processor cores at different times. According to some embodiments,CMP 100 may includemultiple PMUs 110, each of which may be configured to manage a subdomain ofCMP 100. For example, in one such embodiment, fourPMUs 110 may each manage roughly one quarter of theCMP 100 as a whole. In such embodiments,multiple PMUs 110 may communicate with each other to cooperatively manage global policies, or, alternatively, eachPMU 110 may receive policy directed toward its subdomain from a software manager such as a virtual machine manager orOS 190. - Some control systems may find it easier to optimize a reduced number of controls, in which case the subdomain system described above may be used with subdomains grouped by performance/
reliability information 145 related toprocessor cores 140, for example by grouping togetherprocessor cores 140 having similar performance/reliability information 145. In some embodiments, such grouping may be defined at the time of pre-package testing and stored in the manner described here for performance/reliability information 145. In other embodiments, such grouping may be performed later in order to simplify management. The grouping may also be used bydispatcher 130, for example by dispersing a set oftasks 101 toprocessor cores 140 identified by the performance/reliability information 145 to run at lower power than the others. -
Global queue 120 may be configured to receive and store incoming tasks fromOS 190.Dispatcher 130 is a scheduler module that can be configured to periodically assigntasks 101 inglobal queue 120 to each ofprocessor cores 140. The functions ofdispatcher 130 may be distributed betweenOS 190 andCMP 100, but for clarity is illustrated as a single element inFIG. 1 . In some embodiments, on-chip registries 143 may be located in each of processingcores 140 or otherwise associated respectively withprocessing cores 140. For example, in some such embodiments, on-chip registries may be on the same chip as processingcores 140, but not physically part of theprocessing cores 140 as illustrated inFIG. 1 . In other embodiments, an off-chip registry 102 (also shown inFIG. 1 ) may be disposed external toCMP 100. -
FIG. 2 shows a block diagram of an example embodiment of one ofprocessor cores 140, arranged in accordance with at least some embodiments of the present disclosure.Processor core 140 may include alocal queue 141,processor circuitry 142, and an on-chip registry 143.Local queue 141 may be configured to receive andstore tasks 101 that are assigned tocore 140 bydispatcher 130.Processor circuitry 142 may include various computingsubmodules 149 ofprocessor core 140 that can be configured to perform the tasks stored inlocal queue 141. Examples ofsuch computing submodules 149 inprocessor circuitry 142 may include shifters, adders, cache, memory communications units, bus-processing units, network interfaces, floating point units, arithmetic units, specialty operations units, and the like. On-chip registry 143 is a registry that can be associated withprocessor core 140 that may be formed as part of the integrated circuit making upprocessor core 140. In some embodiments, on-chip registry 143 may be configured to store performance/reliability information 145 related toprocessor core 140, where the storedinformation 145 can be retrieved by or provided toPMU 110. On-chip registry 143 may be any technically feasible manifestation of registry, including read-only memory (ROM), Programmable ROM (PROM), erasable PROM (EPROM, electrically erasable PROM (EEPROM), a fuse map, flash memory, and the like. - Performance/
reliability information 145 may include metrology data that can be measured onprocessor core 140 prior to packaging ofCMP 100, e.g., during a wafer-level test process. Because performance/reliability information 145 may be collected at the wafer level or on diced chips prior to packaging ofCMP 100, more test contacts are available than for a packaged die—in some examples up to four times as many. In addition, testing of computer chips prior to packaging may be performed at controlled temperature and with sophisticated test signals and test equipment, and can establish a maximum clock frequency of an integrated circuit at a predetermined reliability, as well as other performance parameters. Due to the additional test contacts available prior to packaging of die, the integrated circuits available for such testing may include eachindividual processor core 140 and/or each of thecomputing submodules 149 of eachprocessor core 140. Consequently, in some embodiments, performance/reliability information 145 can include highly detailed, core-level and/or submodule-level information that can be utilized to enablePMU 110 to effectively optimize operating frequency and/or power use ofCMP 100 during operation. - In some embodiments, performance/
reliability information 145 may include maximum operating voltage and/or clock frequency values associated with the reliable operation of the associatedprocessor core 140. In some embodiments, performance/reliability information 145 may include an average power consumption value and/or a peak power consumption value for each ofprocessor cores 140. In some embodiments, performance/reliability information 145 may further include leakage rate and/or other performance metrics measured across the operating range of the associatedprocessor core 140. In some embodiments, performance/reliability information 145 may be in the form of slope and intercept values for generating a function representing a specific behavior of theprocessor core 140 of interest, e.g., frequency vs. power use. Alternatively, in some embodiments, performance/reliability information 145 may be in the form of multiple data points that can be used to construct a best-fit curve representing a specific behavior across the operating range of theprocessor core 140 of interest. In such embodiments, the data points may correspond to performance characteristics measured during testing prior to packaging. Thus, during operation ofCMP 100, the unique performance characteristics of eachprocessor core 140 can be provided toPMU 110, thereby facilitating optimal power use and/or frequency of eachprocessor core 140 on a per-core basis. - In some embodiments, performance/
reliability information 145 may include performance or reliability information for one or more of thecomputing modules 149 and/or other sub-circuits inprocessor core 140. Specifically, performance/reliability information 145 may include power use, leakage current, etc., for each of computing submodules and/or other sub-circuits inprocessor core 140, where such information may include individual values or functions defined across the operating range of the processor core. In this way, the unique operating characteristics of eachprocessor core 140 inCMP 100 can be determined, andPMU 100 can tailor the use of eachprocessor core 140 based on said operating characteristics. - For example, in some embodiments, in order to optimize power usage required for a specific task,
PMU 110 may be configured to provide input data todispatcher 130 so that the task can be assigned fromglobal queue 110 to aspecific processor core 140. The assignment of the task may be based on the execution instructions contained in the task and on the unique operating characteristics of theprocessor core 140 being assigned the task. In such embodiments, performance/reliability information 145 may include a frequency slope value and an intercept value for power use for eachcomputing submodule 149 in eachprocessor core 140. A table of such power-use information can be generated (e.g., byPMU 110,OS 190, or dispatcher 130) so that the task, once compiled byOS 190, can be weighted in categories of instructions performed by each computing submodule and subcircuit of a processor core. In this way, an effective metric may be produced whereby the performance of eachprocessor core 140 can be estimated and the compiled task can be assigned to theprocessor core 140 that is determined to be most suitable for completing the task. In a simple example, given a task with a large number of multiplies, theprocessor core 140 having the lowest power usage for the multiply operation would be selected to execute the task. In some embodiments, the selection of processor core may be performed byPMU 110. In other embodiments, the selection of processor core may be performed byOS 190 ordispatcher 130. - In operation,
CMP 100 may be configured to receivetasks 101 fromOS 190 viadispatcher 130.PMU 110 may be configured to operate as the global controller forCMP 100 such thatPMU 110 can effectively set the working voltage and frequency levels of each of themultiple processor cores 140. In some embodiments,PMU 110 may be configured to receive performance/reliability information 145 from on-chip registries 143 located in each of processingcores 140. In other embodiments, performance/reliability information 145 may be stored in a single off-chip registry 102. In such embodiments, performance/reliability information 145 may be stored in a remote database that can be accessed by OS 190 (e.g., during an initial boot-up of CMP 100). For example, the remote database may be configured for accessibility via the Internet such thatCMP 100 orOS 190 can automatically access the remote database using a unique identifier code associated withCMP 100, where the performance/reliability information 145 may be stored in the remote database. In light of privacy concerns, the unique identifier code may be deleted or blocked by OS190 after performance/reliability information 145 is received byCMP 100. Alternatively, a non-unique code may be included in a chip ID forCMP 100 which can be used by the remote database to algorithmically reproduce performance/reliability information 145. Other means may also be used to provide performance/reliability information 145 toCMP 100 prior to normal operation. - In embodiments in which
CMP 100 may be configured for DVFS,PMU 110 may be configured to dynamically vary the working voltage and/or frequency levels of each of themultiple processor cores 140 dynamically. Although DVFS of multiple cores in a chip multiprocessor is somewhat conventional, the effectiveness of DVFS can be significantly improved by various embodiments described in this disclosure since per-core, pre-packaging test information (i.e., performance/reliability information 145) may be employed to provide a wider reliable operating range for the majority ofprocessor cores 140 inCMP 100 than may typically be available through conventional techniques. - It is noted that information equivalent to performance/
reliability information 145 cannot typically be measured afterCMP 100 has been packaged. While some indirect measurements and estimates of some core-level performance characteristics can be made using “torture tests” on packaged CMPs and by performing tests on CMPs with only one core activated at a time, such results are inherently less accurate than those described in the present disclosure. In addition, such indirect measurements are largely inaccurate since they do not determine computing submodule or subcircuit performance, are time-consuming, and inherently conflate multiple performance parameters. Thus, such conventional measurements do not provide comparable information to the core-level information generated by testing of CMPs prior to packaging. -
FIG. 3 sets forth a flowchart summarizing anexample method 300 for manufacturing a CMP having multiple processor cores, in accordance with at least some embodiments of the present disclosure.Method 300 may include one or more operations, functions or actions as illustrated by one or more ofblocks 301 and/or 302. Although the blocks are illustrated in a sequential order, these blocks may also be performed in parallel, and/or in a different order than those described herein. Also, the various blocks may be combined into fewer blocks, divided into additional blocks, and/or eliminated based upon the desired implementation. - For ease of description,
method 300 is described in terms of a CMP substantially similar toCMP 100. One of skill in the art will appreciate thatmethod 300 may be performed by other configurations of CMPs and still fall within the scope of the present disclosure. Prior to the first operation ofmethod 300,CMP 100 may be formed on a semiconductor wafer and undergo testing prior to packaging. - The pre-packaging testing may be utilized to produce performance/
reliability information 145 associated with each of theindividual processor cores 140 included inCMP 100 and, in some embodiments, performance/reliability information 145 for computing submodules 149 and/or other subcircuits of eachprocessor core 140. - Processing for
method 300 may begin inoperation 301, “receive core-level performance/reliability information.”Block 301 may be followed byblock 302, “store core-level performance/reliability information.” - In
operation 301, performance/reliability information 145 associated with each of themultiple processor cores 140 inCMP 100 may be received. It is noted that performance/reliability information 145 may be generated prior to packaging of the multi-core processor, during wafer-level or chip-level testing. - In
operation 302, performance/reliability information 145 may be stored so that an operating parameter of at least one of theprocessor cores 140 can be adjusted during operation ofCMP 100. In some embodiments, storing performance/reliability information 145 may include recording the performance or reliability information to either a single off-chip registry 102 or to multiple on-chip registries 143. In other embodiments, a single on-chip registry may be used to store performance/reliability information 145. In yet other embodiments, storing performance/reliability information 145 may include storing performance/reliability information 145 in a database that may be accessible byPMU 110, which may be accessed upon start-up ofPMU 110 viaOS 190. - In some embodiments, adjusting an operating parameter of at least one of
processor cores 140 may include programmingPMU 110 to determine one of a power rating, a frequency, and an operating voltage of at least one of theprocessor cores 140 ofCMP 100 based on performance/reliability information 145 previously stored inoperation 301. For example,PMU 110 can be configured to use performance/reliability information 145 to optimize a DVFS procedure for reducing power use or increasing processing performance. Specifically, whenCMP 100 is determined to have a light processing load,PMU 110 can assign tasks toprocessor cores 140 known to have lower power consumption. Core-to-core power consumption differences at the same computational performance have been shown in the literature to exceed about 20%, a power savings that can be realized whenPMU 110 has the appropriate performance/reliability information 145 available. Similarly, whenCMP 100 has a heavy processing load,PMU 110 can direct tasks to processor cores known to have higher computational performance. In addition, becausePMU 110 has detailed information regarding performance or reliability of each ofprocessor cores 140,PMU 110 can operate eachprocessor core 140 at an approximately peak frequency, rather than operating allprocessor cores 140 at a single, nominal peak frequency forCMP 100. As chip multiprocessors are designed with larger numbers of processing cores, the operation of eachprocessing core 140 at an individually measured peak frequency, as described herein, can provide significant improvement in the overall computational performance ofCMP 100, e.g., up to about 100%. -
FIG. 4 sets forth a flowchart summarizing anexample method 400 for managing frequency and/or voltage provided to processing cores in a chip multiprocessor, in accordance with at least some embodiments of the disclosure.Method 400 may include one or more operations, functions or actions as illustrated by one or more ofblocks - For ease of description,
method 400 is described in terms of a CMP substantially similar toCMP 100. In light of the present disclosure it is appreciated thatmethod 400 may be performed by other configurations of CMP, which fall within the scope of the present disclosure. Prior to the first operation ofmethod 400, performance/reliability information 145 associated with each of theprocessor cores 140 may be collected during prepackaging testing ofCMP 100. - Processing for
method 400 may begin inoperation 401, “determine computational requirements for task.”Operation 401 may be followed byoperation 402, “adjust operating parameter of processor core.”Operation 402 may be followed byoperation 403, “select processor core.” - In
operation 401, the computational requirements for a task to be completed byCMP 100 may be determined. For example, the execution instructions may be determined to be a low-demand or high-demand task by analyzing the instructions of the task. In some embodiments, the performance of eachprocessor core 140 in executing the task can be estimated based on the make-up of the execution instructions and also based on the unique operating characteristics of eachprocessor core 140. In some embodiments, computational requirements for an instruction may be determined instep 401 byPMU 110 and in other embodiments, byOS 190. - In
operation 402, an operating parameter of one or more ofprocessor cores 140 may be adjusted based on the computational requirements determined inoperation 401 and on stored performance/reliability information 145 associated with each of the processor cores. In some embodiments, operating parameters adjusted inoperation 402 may include optimizing power use and/or clock frequency of one or more ofprocessor cores 140. - In
optional operation 403, one ormore processor cores 140 may be selected based on performance/reliability information 145. The selected processor core orcores 140 may then be prevented from performing the task. For example, if it is determined inoperation 401 that the execution instructions represent a low-demand task, processor cores having high power use, as indicated by performance/reliability information 145, may not be used to execute some or all of the tasks, thereby minimizing power usage ofCMP 100. -
FIG. 5 is a block diagram of an illustrative embodiment of acomputer program product 500 for implementing a method for manufacturing a CMP having multiple processor cores, arranged in accordance with at least some embodiments of the present disclosure.Computer program product 500 may include a signal bearing medium 504. Signal bearing medium 504 may include one or more sets ofexecutable instructions 502 that, when executed by, for example, a processor of a computing device, may provide at least the functionality described above with respect toFIG. 3 . - In some implementations, signal bearing medium 504 may encompass a non-transitory computer
readable medium 508, such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, memory, etc. In some implementations, signal bearing medium 504 may encompass arecordable medium 510, such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, etc. In some implementations, signal bearing medium 504 may encompass acommunications medium 506, such as, but not limited to, a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).Computer program product 500 may be recorded on non-transitory computerreadable medium 508 or anothersimilar recordable medium 510. -
FIG. 6 is a block diagram illustrating anexample computing device 600 that is arranged for manufacturing a chip multiprocessor having multiple processor cores, according to at least some embodiments of the present disclosure. In a very basic configuration 602,computing device 600 typically includes one ormore processors 604 and a system memory 606. A memory bus 608 may be used for communicating betweenprocessor 604 and system memory 606. - Depending on the desired configuration,
processor 604 may be of any type including but not limited to a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or any combination thereof.Processor 604 may include one more levels of caching, such as a level onecache 610 and a level twocache 612, aprocessor core 614, and registers 616. Anexample processor core 614 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. Anexample memory controller 618 may also be used withprocessor 604, or in someimplementations memory controller 618 may be an internal part ofprocessor 604. - Depending on the desired configuration, system memory 606 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. System memory 606 may include an
operating system 620, one ormore applications 622, andprogram data 624.Application 622 may include a analgorithm 626 that is arranged to manage the operating frequencies and voltages assigned to processor cores in a chip multiprocessor, as described with respect tomethod 300 ofFIG. 3 and/ormethod 400 ofFIG. 4 .Program data 624 may include performance/reliability data 628 that may be useful for operation withdata monitoring algorithm 626 as is described herein. In some embodiments,application 622 may be arranged to operate withprogram data 624 onoperating system 620. This described basic configuration 602 is illustrated inFIG. 6 by those components within the inner dashed line. -
Computing device 600 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 602 and any required devices and interfaces. For example, a bus/interface controller 640 may be used to facilitate communications between basic configuration 602 and one or moredata storage devices 650 via a storage interface bus 641.Data storage devices 650 may beremovable storage devices 651, non-removable storage devices 652, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. - System memory 606,
removable storage devices 651 and non-removable storage devices 652 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information and which may be accessed by computingdevice 600. Any such computer storage media may be part ofcomputing device 600. -
Computing device 600 may also include aninterface bus 660 for facilitating communication from various interface devices (e.g., output devices 642,peripheral interfaces 670, and communication devices 680) to basic configuration 602 via bus/interface controller 640.Example output devices 660 include agraphics processing unit 661 and anaudio processing unit 662, which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 663. Exampleperipheral interfaces 670 include aserial interface controller 671 or aparallel interface controller 672, which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 673. Anexample communication device 680 includes anetwork controller 681, which may be arranged to facilitate communications with one or moreother computing devices 690 over a network communication link, such as, without limitation, optical fiber, Long Term Evolution (LTE), 3G, WiMax, via one ormore communication ports 682. - The network communication link may be one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media. A “modulated data signal” may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media. The term computer readable media as used herein may include both storage media and communication media.
-
Computing device 600 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.Computing device 600 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations. - In sum, embodiments of the disclosure provide systems and methods for providing core-level performance or reliability information of processor cores to a CMP containing the processor cores. Incorporation of core-level performance or reliability information into the multicore management process may allow the multicore manager of a CMP to use a wider operating range for each core. This may enhance overall processing power for high-demand tasks and lowers overall power usage for low-demand tasks. Furthermore, it is noted that these improvements may be made with no change to existing chip designs or processes.
- There is little distinction left between hardware and software implementations of aspects of systems; the use of hardware or software is generally (but not always, in that in certain contexts the choice between hardware and software can become significant) a design choice representing cost vs. efficiency trade-offs. There are various vehicles by which processes and/or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and that the preferred vehicle will vary with the context in which the processes and/or systems and/or other technologies are deployed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
- The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one embodiment, several portions of the subject matter described herein may be implemented via Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure. In addition, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing medium used to actually carry out the distribution. Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
- Those skilled in the art will recognize that it is common within the art to describe devices and/or processes in the fashion set forth herein, and thereafter use engineering practices to integrate such described devices and/or processes into data processing systems. That is, at least a portion of the devices and/or processes described herein can be integrated into a data processing system via a reasonable amount of experimentation. Those having skill in the art will recognize that a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities). A typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
- The herein described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable”, to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
- With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
- It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
- While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Claims (20)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2012/023896 WO2013115829A2 (en) | 2012-02-04 | 2012-02-04 | Core-level dynamic voltage and frequency scaling in a chip multiprocessor |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130205126A1 true US20130205126A1 (en) | 2013-08-08 |
US9619240B2 US9619240B2 (en) | 2017-04-11 |
Family
ID=48903967
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/811,280 Active 2033-05-23 US9619240B2 (en) | 2012-02-04 | 2012-02-04 | Core-level dynamic voltage and frequency scaling in a chip multiprocessor |
Country Status (5)
Country | Link |
---|---|
US (1) | US9619240B2 (en) |
KR (1) | KR101655137B1 (en) |
CN (1) | CN104205087B (en) |
TW (1) | TWI497410B (en) |
WO (1) | WO2013115829A2 (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140176581A1 (en) * | 2012-12-21 | 2014-06-26 | Jeremy J. Shrall | Controlling configurable peak performance limits of a processor |
US20150046923A1 (en) * | 2013-07-09 | 2015-02-12 | Empire Technology Development, Llc | Differential voltage and frequency scaling (dvfs) switch reduction |
US20150134987A1 (en) * | 2012-05-21 | 2015-05-14 | International Business Machines Corporation | Power Shifting in Multicore Platforms by Varying SMT Levels |
WO2015152939A1 (en) * | 2014-04-04 | 2015-10-08 | Empire Technology Development Llc | Instruction optimization using voltage-based functional performance variation |
US20160026507A1 (en) * | 2014-07-24 | 2016-01-28 | Qualcomm Innovation Center, Inc. | Power aware task scheduling on multi-processor systems |
US20160098077A1 (en) * | 2014-10-06 | 2016-04-07 | Denso Corporation | Electronic control unit |
EP3021194A1 (en) * | 2014-11-17 | 2016-05-18 | MediaTek, Inc | An energy efficiency strategy for interrupt handling in a multi-cluster system |
US20160180013A1 (en) * | 2014-12-22 | 2016-06-23 | Hyundai Autron Co., Ltd. | Method for designing vehicle controller-only semiconductor based on die and vehicle controller-only semiconductor by the same |
WO2016114771A1 (en) * | 2015-01-14 | 2016-07-21 | Hewlett Packard Enterprise Development Lp | Reduced core count system configuration |
US9513688B2 (en) | 2013-03-16 | 2016-12-06 | Intel Corporation | Measurement of performance scalability in a microprocessor |
US9977699B2 (en) | 2014-11-17 | 2018-05-22 | Mediatek, Inc. | Energy efficient multi-cluster system and its operations |
US10281971B2 (en) * | 2016-04-13 | 2019-05-07 | Fujitsu Limited | Information processing device, and method of analyzing power consumption of processor |
US20210406092A1 (en) * | 2020-06-26 | 2021-12-30 | Advanced Micro Devices, Inc. | Core selection based on usage policy and core constraints |
US11422852B2 (en) | 2017-12-11 | 2022-08-23 | Samsung Electronics Co., Ltd. | Electronic device capable of increasing task management efficiency of digital signal processor |
US11425189B2 (en) * | 2019-02-06 | 2022-08-23 | Magic Leap, Inc. | Target intent-based clock speed determination and adjustment to limit total heat generated by multiple processors |
US11442774B2 (en) * | 2019-08-05 | 2022-09-13 | Samsung Electronics Co., Ltd. | Scheduling tasks based on calculated processor performance efficiencies |
US11445232B2 (en) | 2019-05-01 | 2022-09-13 | Magic Leap, Inc. | Content provisioning system and method |
US20220300062A1 (en) * | 2021-03-18 | 2022-09-22 | Dell Products L.P. | Power/workload management system |
US11510027B2 (en) | 2018-07-03 | 2022-11-22 | Magic Leap, Inc. | Systems and methods for virtual and augmented reality |
US11514673B2 (en) | 2019-07-26 | 2022-11-29 | Magic Leap, Inc. | Systems and methods for augmented reality |
US11521296B2 (en) | 2018-11-16 | 2022-12-06 | Magic Leap, Inc. | Image size triggered clarification to maintain image sharpness |
US11567324B2 (en) | 2017-07-26 | 2023-01-31 | Magic Leap, Inc. | Exit pupil expander |
US11579441B2 (en) | 2018-07-02 | 2023-02-14 | Magic Leap, Inc. | Pixel intensity modulation using modifying gain values |
US11598651B2 (en) | 2018-07-24 | 2023-03-07 | Magic Leap, Inc. | Temperature dependent calibration of movement detection devices |
US11609645B2 (en) | 2018-08-03 | 2023-03-21 | Magic Leap, Inc. | Unfused pose-based drift correction of a fused pose of a totem in a user interaction system |
US20230098742A1 (en) * | 2021-09-30 | 2023-03-30 | Advanced Micro Devices, Inc. | Processor Power Management Utilizing Dedicated DMA Engines |
US11624929B2 (en) | 2018-07-24 | 2023-04-11 | Magic Leap, Inc. | Viewing device with dust seal integration |
US11630507B2 (en) | 2018-08-02 | 2023-04-18 | Magic Leap, Inc. | Viewing system with interpupillary distance compensation based on head motion |
US11737832B2 (en) | 2019-11-15 | 2023-08-29 | Magic Leap, Inc. | Viewing system for use in a surgical environment |
US11756335B2 (en) | 2015-02-26 | 2023-09-12 | Magic Leap, Inc. | Apparatus for a near-eye display |
US11762222B2 (en) | 2017-12-20 | 2023-09-19 | Magic Leap, Inc. | Insert for augmented reality viewing device |
US11762623B2 (en) | 2019-03-12 | 2023-09-19 | Magic Leap, Inc. | Registration of local content between first and second augmented reality viewers |
US11776509B2 (en) | 2018-03-15 | 2023-10-03 | Magic Leap, Inc. | Image correction due to deformation of components of a viewing device |
US11790554B2 (en) | 2016-12-29 | 2023-10-17 | Magic Leap, Inc. | Systems and methods for augmented reality |
US11856479B2 (en) | 2018-07-03 | 2023-12-26 | Magic Leap, Inc. | Systems and methods for virtual and augmented reality along a route with markers |
US11874468B2 (en) | 2016-12-30 | 2024-01-16 | Magic Leap, Inc. | Polychromatic light out-coupling apparatus, near-eye displays comprising the same, and method of out-coupling polychromatic light |
US11885871B2 (en) | 2018-05-31 | 2024-01-30 | Magic Leap, Inc. | Radar head pose localization |
US11953653B2 (en) | 2017-12-10 | 2024-04-09 | Magic Leap, Inc. | Anti-reflective coatings on optical waveguides |
US12016719B2 (en) | 2019-08-22 | 2024-06-25 | Magic Leap, Inc. | Patient viewing system |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170023997A1 (en) * | 2015-07-20 | 2017-01-26 | Mediatek Inc. | Dynamic switching of voltage regulators in a multiprocessor system |
US9996138B2 (en) * | 2015-09-04 | 2018-06-12 | Mediatek Inc. | Electronic system and related clock managing method |
EP3539063A1 (en) * | 2016-12-15 | 2019-09-18 | Siemens Aktiengesellschaft | Configuration and parameterization of energy control system |
CN106843815B (en) * | 2017-01-18 | 2019-02-19 | 电子科技大学 | The optimization method that on-chip multi-processor system multithreading is run simultaneously |
US10649518B2 (en) * | 2017-01-26 | 2020-05-12 | Ati Technologies Ulc | Adaptive power control loop |
US10551901B2 (en) * | 2017-07-01 | 2020-02-04 | Microsoft Technology Licensing, Llc | Core frequency management using effective utilization for power-efficient performance |
KR102539044B1 (en) * | 2017-10-30 | 2023-06-01 | 삼성전자주식회사 | Method of operating system on chip, system on chip performing the same and electronic system including the same |
TWI789064B (en) * | 2021-10-20 | 2023-01-01 | 鯨鏈科技股份有限公司 | Computer system based on wafer-on-wafer architecture |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040128567A1 (en) * | 2002-12-31 | 2004-07-01 | Tom Stewart | Adaptive power control based on post package characterization of integrated circuits |
US20080012583A1 (en) * | 2006-07-11 | 2008-01-17 | Jean Audet | Power grid structure to optimize performance of a multiple core processor |
US20090313623A1 (en) * | 2008-06-12 | 2009-12-17 | Sun Microsystems, Inc. | Managing the performance of a computer system |
US20100100357A1 (en) * | 2008-10-20 | 2010-04-22 | International Business Machines Corporation | Information Collection and Storage for Single Core Chips to 'N Core Chips |
US20100332909A1 (en) * | 2009-06-30 | 2010-12-30 | Texas Instruments Incorporated | Circuits, systems, apparatus and processes for monitoring activity in multi-processing systems |
US20110239017A1 (en) * | 2008-10-03 | 2011-09-29 | The University Of Sydney | Scheduling an application for performance on a heterogeneous computing system |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6684341B1 (en) * | 2000-03-09 | 2004-01-27 | International Business Machines Corporation | Method of altering the appearance of an icon of a program to provide an indication to a user that a power management is associated with the particular program |
US7941675B2 (en) * | 2002-12-31 | 2011-05-10 | Burr James B | Adaptive power control |
US7065641B2 (en) * | 2002-06-13 | 2006-06-20 | Intel Corporation | Weighted processor selection apparatus and method for use in multiprocessor systems |
JP4196333B2 (en) * | 2003-05-27 | 2008-12-17 | 日本電気株式会社 | Parallel processing system and parallel processing program |
EP1555595A3 (en) | 2004-01-13 | 2011-11-23 | LG Electronics, Inc. | Apparatus for controlling power of processor having a plurality of cores and control method of the same |
KR101108397B1 (en) | 2005-06-10 | 2012-01-30 | 엘지전자 주식회사 | Apparatus and method for controlling power supply in a multi-core processor |
US20070226795A1 (en) | 2006-02-09 | 2007-09-27 | Texas Instruments Incorporated | Virtual cores and hardware-supported hypervisor integrated circuits, systems, methods and processes of manufacture |
-
2012
- 2012-02-04 US US13/811,280 patent/US9619240B2/en active Active
- 2012-02-04 CN CN201280071995.XA patent/CN104205087B/en not_active Expired - Fee Related
- 2012-02-04 WO PCT/US2012/023896 patent/WO2013115829A2/en active Application Filing
- 2012-02-04 KR KR1020147024520A patent/KR101655137B1/en active IP Right Grant
-
2013
- 2013-02-01 TW TW102103918A patent/TWI497410B/en not_active IP Right Cessation
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040128567A1 (en) * | 2002-12-31 | 2004-07-01 | Tom Stewart | Adaptive power control based on post package characterization of integrated circuits |
US20080012583A1 (en) * | 2006-07-11 | 2008-01-17 | Jean Audet | Power grid structure to optimize performance of a multiple core processor |
US20090313623A1 (en) * | 2008-06-12 | 2009-12-17 | Sun Microsystems, Inc. | Managing the performance of a computer system |
US20110239017A1 (en) * | 2008-10-03 | 2011-09-29 | The University Of Sydney | Scheduling an application for performance on a heterogeneous computing system |
US20100100357A1 (en) * | 2008-10-20 | 2010-04-22 | International Business Machines Corporation | Information Collection and Storage for Single Core Chips to 'N Core Chips |
US20100332909A1 (en) * | 2009-06-30 | 2010-12-30 | Texas Instruments Incorporated | Circuits, systems, apparatus and processes for monitoring activity in multi-processing systems |
Cited By (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10444812B2 (en) | 2012-05-21 | 2019-10-15 | International Business Machines Corporation | Power shifting in multicore platforms by varying SMT levels |
US9710044B2 (en) * | 2012-05-21 | 2017-07-18 | International Business Machines Corporation | Power shifting in multicore platforms by varying SMT levels |
US20150134987A1 (en) * | 2012-05-21 | 2015-05-14 | International Business Machines Corporation | Power Shifting in Multicore Platforms by Varying SMT Levels |
US20140176581A1 (en) * | 2012-12-21 | 2014-06-26 | Jeremy J. Shrall | Controlling configurable peak performance limits of a processor |
US9086834B2 (en) * | 2012-12-21 | 2015-07-21 | Intel Corporation | Controlling configurable peak performance limits of a processor |
US9671854B2 (en) | 2012-12-21 | 2017-06-06 | Intel Corporation | Controlling configurable peak performance limits of a processor |
US9075556B2 (en) * | 2012-12-21 | 2015-07-07 | Intel Corporation | Controlling configurable peak performance limits of a processor |
US9513688B2 (en) | 2013-03-16 | 2016-12-06 | Intel Corporation | Measurement of performance scalability in a microprocessor |
US9195490B2 (en) * | 2013-07-09 | 2015-11-24 | Empire Technology Development Llc | Differential voltage and frequency scaling (DVFS) switch reduction |
US9792135B2 (en) | 2013-07-09 | 2017-10-17 | Empire Technology Development Llc | Differential voltage and frequency scaling (DVFS) switch reduction |
US20150046923A1 (en) * | 2013-07-09 | 2015-02-12 | Empire Technology Development, Llc | Differential voltage and frequency scaling (dvfs) switch reduction |
WO2015152939A1 (en) * | 2014-04-04 | 2015-10-08 | Empire Technology Development Llc | Instruction optimization using voltage-based functional performance variation |
US10409350B2 (en) * | 2014-04-04 | 2019-09-10 | Empire Technology Development Llc | Instruction optimization using voltage-based functional performance variation |
CN106164810A (en) * | 2014-04-04 | 2016-11-23 | 英派尔科技开发有限公司 | Use the optimization that the performance of function based on voltage changes |
US20160026507A1 (en) * | 2014-07-24 | 2016-01-28 | Qualcomm Innovation Center, Inc. | Power aware task scheduling on multi-processor systems |
US9785481B2 (en) * | 2014-07-24 | 2017-10-10 | Qualcomm Innovation Center, Inc. | Power aware task scheduling on multi-processor systems |
US20160098077A1 (en) * | 2014-10-06 | 2016-04-07 | Denso Corporation | Electronic control unit |
US9588579B2 (en) * | 2014-10-06 | 2017-03-07 | Denso Corporation | Electronic control unit |
EP3021194A1 (en) * | 2014-11-17 | 2016-05-18 | MediaTek, Inc | An energy efficiency strategy for interrupt handling in a multi-cluster system |
TWI561970B (en) * | 2014-11-17 | 2016-12-11 | Mediatek Inc | Method for managing energy efficiency in computing system and system for managing energy efficiency |
US10031573B2 (en) | 2014-11-17 | 2018-07-24 | Mediatek, Inc. | Energy efficiency strategy for interrupt handling in a multi-cluster system |
US9977699B2 (en) | 2014-11-17 | 2018-05-22 | Mediatek, Inc. | Energy efficient multi-cluster system and its operations |
US20160180013A1 (en) * | 2014-12-22 | 2016-06-23 | Hyundai Autron Co., Ltd. | Method for designing vehicle controller-only semiconductor based on die and vehicle controller-only semiconductor by the same |
US10748887B2 (en) * | 2014-12-22 | 2020-08-18 | Hyundai Autron Co., Ltd. | Method for designing vehicle controller-only semiconductor based on die and vehicle controller-only semiconductor by the same |
CN105720014A (en) * | 2014-12-22 | 2016-06-29 | 现代奥特劳恩株式会社 | Method for designing vehicle controller-only semiconductor and semiconductor manufactured by the same |
WO2016114771A1 (en) * | 2015-01-14 | 2016-07-21 | Hewlett Packard Enterprise Development Lp | Reduced core count system configuration |
US11756335B2 (en) | 2015-02-26 | 2023-09-12 | Magic Leap, Inc. | Apparatus for a near-eye display |
US10281971B2 (en) * | 2016-04-13 | 2019-05-07 | Fujitsu Limited | Information processing device, and method of analyzing power consumption of processor |
US11790554B2 (en) | 2016-12-29 | 2023-10-17 | Magic Leap, Inc. | Systems and methods for augmented reality |
US11874468B2 (en) | 2016-12-30 | 2024-01-16 | Magic Leap, Inc. | Polychromatic light out-coupling apparatus, near-eye displays comprising the same, and method of out-coupling polychromatic light |
US11927759B2 (en) | 2017-07-26 | 2024-03-12 | Magic Leap, Inc. | Exit pupil expander |
US11567324B2 (en) | 2017-07-26 | 2023-01-31 | Magic Leap, Inc. | Exit pupil expander |
US11953653B2 (en) | 2017-12-10 | 2024-04-09 | Magic Leap, Inc. | Anti-reflective coatings on optical waveguides |
US11422852B2 (en) | 2017-12-11 | 2022-08-23 | Samsung Electronics Co., Ltd. | Electronic device capable of increasing task management efficiency of digital signal processor |
US11762222B2 (en) | 2017-12-20 | 2023-09-19 | Magic Leap, Inc. | Insert for augmented reality viewing device |
US11776509B2 (en) | 2018-03-15 | 2023-10-03 | Magic Leap, Inc. | Image correction due to deformation of components of a viewing device |
US11908434B2 (en) | 2018-03-15 | 2024-02-20 | Magic Leap, Inc. | Image correction due to deformation of components of a viewing device |
US11885871B2 (en) | 2018-05-31 | 2024-01-30 | Magic Leap, Inc. | Radar head pose localization |
US12001013B2 (en) | 2018-07-02 | 2024-06-04 | Magic Leap, Inc. | Pixel intensity modulation using modifying gain values |
US11579441B2 (en) | 2018-07-02 | 2023-02-14 | Magic Leap, Inc. | Pixel intensity modulation using modifying gain values |
US11510027B2 (en) | 2018-07-03 | 2022-11-22 | Magic Leap, Inc. | Systems and methods for virtual and augmented reality |
US11856479B2 (en) | 2018-07-03 | 2023-12-26 | Magic Leap, Inc. | Systems and methods for virtual and augmented reality along a route with markers |
US11598651B2 (en) | 2018-07-24 | 2023-03-07 | Magic Leap, Inc. | Temperature dependent calibration of movement detection devices |
US11624929B2 (en) | 2018-07-24 | 2023-04-11 | Magic Leap, Inc. | Viewing device with dust seal integration |
US11630507B2 (en) | 2018-08-02 | 2023-04-18 | Magic Leap, Inc. | Viewing system with interpupillary distance compensation based on head motion |
US11960661B2 (en) | 2018-08-03 | 2024-04-16 | Magic Leap, Inc. | Unfused pose-based drift correction of a fused pose of a totem in a user interaction system |
US11609645B2 (en) | 2018-08-03 | 2023-03-21 | Magic Leap, Inc. | Unfused pose-based drift correction of a fused pose of a totem in a user interaction system |
US11521296B2 (en) | 2018-11-16 | 2022-12-06 | Magic Leap, Inc. | Image size triggered clarification to maintain image sharpness |
US11425189B2 (en) * | 2019-02-06 | 2022-08-23 | Magic Leap, Inc. | Target intent-based clock speed determination and adjustment to limit total heat generated by multiple processors |
US11762623B2 (en) | 2019-03-12 | 2023-09-19 | Magic Leap, Inc. | Registration of local content between first and second augmented reality viewers |
US11445232B2 (en) | 2019-05-01 | 2022-09-13 | Magic Leap, Inc. | Content provisioning system and method |
US11514673B2 (en) | 2019-07-26 | 2022-11-29 | Magic Leap, Inc. | Systems and methods for augmented reality |
US11442774B2 (en) * | 2019-08-05 | 2022-09-13 | Samsung Electronics Co., Ltd. | Scheduling tasks based on calculated processor performance efficiencies |
US12016719B2 (en) | 2019-08-22 | 2024-06-25 | Magic Leap, Inc. | Patient viewing system |
US11737832B2 (en) | 2019-11-15 | 2023-08-29 | Magic Leap, Inc. | Viewing system for use in a surgical environment |
US11886224B2 (en) * | 2020-06-26 | 2024-01-30 | Advanced Micro Devices, Inc. | Core selection based on usage policy and core constraints |
EP4172770A4 (en) * | 2020-06-26 | 2024-03-20 | Advanced Micro Devices Inc | Core selection based on usage policy and core constraints |
US20210406092A1 (en) * | 2020-06-26 | 2021-12-30 | Advanced Micro Devices, Inc. | Core selection based on usage policy and core constraints |
US20220300062A1 (en) * | 2021-03-18 | 2022-09-22 | Dell Products L.P. | Power/workload management system |
US11755100B2 (en) * | 2021-03-18 | 2023-09-12 | Dell Products L.P. | Power/workload management system |
US20230098742A1 (en) * | 2021-09-30 | 2023-03-30 | Advanced Micro Devices, Inc. | Processor Power Management Utilizing Dedicated DMA Engines |
Also Published As
Publication number | Publication date |
---|---|
WO2013115829A3 (en) | 2014-04-17 |
CN104205087B (en) | 2018-01-16 |
TW201403464A (en) | 2014-01-16 |
WO2013115829A2 (en) | 2013-08-08 |
TWI497410B (en) | 2015-08-21 |
KR101655137B1 (en) | 2016-09-07 |
US9619240B2 (en) | 2017-04-11 |
CN104205087A (en) | 2014-12-10 |
KR20140122745A (en) | 2014-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9619240B2 (en) | Core-level dynamic voltage and frequency scaling in a chip multiprocessor | |
JP6005895B1 (en) | Intelligent multi-core control for optimal performance per watt | |
JP6019160B2 (en) | Apparatus and method for adaptive thread scheduling for asymmetric multiprocessors | |
US9557804B2 (en) | Dynamic power limit sharing in a platform | |
US8924758B2 (en) | Method for SOC performance and power optimization | |
TWI512447B (en) | Methods, apparatuses, and system for allocating power budgets to a processor and a computer-readable medium therefor | |
CN105830035B (en) | Multi-core dynamic workload management | |
US20140380025A1 (en) | Management of hardware accelerator configurations in a processor chip | |
TWI528180B (en) | Fuzzy logic control of thermoelectric cooling in a processor | |
US20130300386A1 (en) | Integrated circuit device, voltage regulation circuitry and method for regulating a voltage supply signal | |
TWI477955B (en) | Method for performance improvement of a graphics processor, non-transitory computer readable medium and graphics processor | |
KR20130061747A (en) | Providing per core voltage and frequency control | |
KR20150054152A (en) | System on-chip having a symmetric multi-processor, and method of determining a maximum operating clock frequency for the same | |
US20100057404A1 (en) | Optimal Performance and Power Management With Two Dependent Actuators | |
Sahin et al. | On the impacts of greedy thermal management in mobile devices | |
TW201830194A (en) | System and method for context-aware thermal management and workload scheduling in a portable computing device | |
US10242652B2 (en) | Reconfigurable graphics processor for performance improvement | |
Begum et al. | Algorithms for CPU and DRAM DVFS under inefficiency constraints | |
Steinfeld et al. | Low-power processors require effective memory partitioning | |
TWI536260B (en) | Heterogeneous multicore processor with graphene-based transistors | |
JP2014186522A (en) | Calculation system, and power management method therein | |
Asad et al. | Exploiting heterogeneity in cache hierarchy in dark-silicon 3d chip multi-processors | |
Rexha et al. | Energy Efficiency Platform Characterization for Heterogeneous Multicore Architectures. | |
Eratne et al. | A thermal-aware scheduling algorithm for core migration in multicore processors | |
Zhou et al. | Temperature-aware register reallocation for register file power-density minimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARDENT RESEARCH CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KRUGLICK, EZEKIEL;REEL/FRAME:027653/0796 Effective date: 20120123 Owner name: EMPIRE TECHNOLOGY DEVELOPMENT LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ARDENT RESEARCH CORPORATION;REEL/FRAME:027653/0800 Effective date: 20120123 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
AS | Assignment |
Owner name: CRESTLINE DIRECT FINANCE, L.P., TEXAS Free format text: SECURITY INTEREST;ASSIGNOR:EMPIRE TECHNOLOGY DEVELOPMENT LLC;REEL/FRAME:048373/0217 Effective date: 20181228 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: M1554); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: EMPIRE TECHNOLOGY DEVELOPMENT LLC, WASHINGTON Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CRESTLINE DIRECT FINANCE, L.P.;REEL/FRAME:065712/0585 Effective date: 20231004 |