US20140173312A1 - Dynamic re-configuration for low power in a data processor - Google Patents

Dynamic re-configuration for low power in a data processor Download PDF

Info

Publication number
US20140173312A1
US20140173312A1 US13/714,011 US201213714011A US2014173312A1 US 20140173312 A1 US20140173312 A1 US 20140173312A1 US 201213714011 A US201213714011 A US 201213714011A US 2014173312 A1 US2014173312 A1 US 2014173312A1
Authority
US
United States
Prior art keywords
mode
data processor
processor
decode
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/714,011
Other versions
US9164570B2 (en
Inventor
David J. Shippy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Micro Devices Inc
Original Assignee
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Micro Devices Inc filed Critical Advanced Micro Devices Inc
Priority to US13/714,011 priority Critical patent/US9164570B2/en
Assigned to ADVANCED MICRO DEVICES, INC. reassignment ADVANCED MICRO DEVICES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHIPPY, DAVID J.
Publication of US20140173312A1 publication Critical patent/US20140173312A1/en
Application granted granted Critical
Publication of US9164570B2 publication Critical patent/US9164570B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This disclosure relates generally to data processors, and more specifically to configurable data processors.
  • Data processors are typically designed to meet specific product needs. For example, desktop microprocessors operate at very high speeds and have long execution pipelines and multiple parallel resources to achieve very high performance. Microprocessors for laptop computers offer reasonably high performance but have low power features to lengthen battery life. Embedded microprocessors operate at relatively slow speeds and have simple architectures in order to reduce product cost.
  • multimedia mobile devices can at various times require either high performance or low power consumption. For example when running certain games, a multimedia mobile device requires the microprocessor to provide high performance. However when running other tasks such as EMAIL, the multimedia mobile device requires much lower performance from the microprocessor. These varying processing environments make it difficult for the microprocessor designer to achieve the right balance between performance and preservation of battery life.
  • FIG. 1 illustrates in block diagram form a data processor known in the prior art.
  • FIG. 2 illustrates in block diagram form another data processor known in the prior art.
  • FIG. 3 illustrates in block diagram form a data processor according to some embodiments.
  • FIG. 4 illustrates in block diagram form a central processing unit (CPU) core that may be used in the data processor of FIG. 3 according to some embodiments.
  • CPU central processing unit
  • FIG. 5 illustrates in block diagram form a pipeline of the CPU core of FIG. 4 in a first mode according to some embodiments.
  • FIG. 6 illustrates in block diagram form a pipeline of the CPU core of FIG. 4 in a second mode according to some embodiments.
  • FIG. 7 illustrates a flow diagram of a method for configuring a processor core according to some embodiments.
  • FIG. 8 illustrates a flow diagram of a method for configuring a multi-core data processor according to some embodiments.
  • FIG. 1 illustrates in block diagram form a data processor 100 known in the prior art.
  • Data processor 100 generally includes a big core 110 and a little core 120 .
  • big core 110 and little core 120 each have the capability to execute the same instruction set.
  • the micro-architecture of big core 110 is constructed to support high intensity tasks and the micro-architecture of little core 120 is constructed to support low power and low intensity tasks.
  • big core 110 is a core known as the Cortex A15 core available from Advanced RISC Machines, Ltd. of Cambridge, G.B, and has the capability to decode, dispatch, issue and execute multiple, out-of-order instructions. Big core 110 operates multiple pipelines having 15 to 24 stages.
  • little core 120 is a core known as the Cortex A7 core also available from ARM Ltd. and decodes, dispatches, issues, and executes in-order instructions and operates a smaller number of pipelines, having 8 to 10 stages.
  • data processor 100 migrates instructions to big core 110 or to little core 120 .
  • data processor 100 consumes additional silicon area for little core 120 , and in general, data processor 100 requires overhead to migrate instructions between the two cores when the processing task changes.
  • FIG. 2 illustrates in block diagram form another data processor 200 known in the prior art.
  • Data processor 200 generally includes a companion core 210 labeled “COMPANION CORE1”, a main core 220 labeled “CORE1”, a main core 230 labeled “CORE2”, a main core 240 labeled “CORE3”, a main core 250 labeled “CORE4”, and a clock generator 260 .
  • Companion core 210 has a clock input.
  • Main cores 220 , 230 , 240 , and 250 each have a clock input.
  • Clock generator 260 has a first output connected to the clock input of companion core 210 and a second output connected to each clock input of main cores 220 , 230 , 240 , and 250 .
  • main cores 220 , 230 , 240 , and 250 each have the capability to execute the same instruction set.
  • main cores 220 , 230 , 240 , and 250 and companion core 210 execute instructions in a consistent way
  • data processor 200 enables and disables cores based on the work load. For example, data processor 200 could enable only companion core 210 to execute low intensity tasks such as audio, video, and email, only two main cores to execute higher intensity tasks such as flash enabled browsing and multitasking, and all four main cores to execute high intensity tasks such as console class gaming and media processing.
  • Main cores 220 , 230 , 240 , and 250 are each constructed to support high frequency, performance intensive tasks, whereas companion core 210 is constructed to support low frequency, low power, low intensity tasks.
  • Clock generator 260 provides a high frequency clock to main cores 220 , 230 , 240 , and 250 , but provides a low frequency clock to companion core 210 .
  • CPU power management hardware and the operating system migrate instructions to selected ones of main cores 220 , 230 , 240 , and 250 or to companion core 210 .
  • data processor 200 consumes additional silicon area to operate companion core 210 , and in general, data processor 200 requires overhead to migrate instructions between any of cores 220 - 250 and companion core 210 when the processing task changes.
  • FIG. 3 illustrates in block diagram form a data processor 300 according to some embodiments.
  • Data processor 300 generally includes a CPU cluster 310 .
  • CPU cluster 310 includes a CPU core 312 labeled “CPU0”, a CPU core 314 labeled “CPU1”, a CPU core 316 labeled “CPU2”, a CPU core 318 labeled “CPU3”, and a cache 320 which is a shared L2 cache.
  • CPU cores 312 - 318 each include a fetch unit for fetching a stream of instructions, an execution unit connected to the fetch unit that has a multiple number of redundant resources, and a configuration circuit that operates in a first mode and a second mode. In the first mode, the configuration circuit enables the multiple number of redundant resources, and in the second mode, the configuration circuit selectively disables the multiple number of redundant resources.
  • Each of CPU cores 312 - 318 has the capability to execute the same instruction set. Also, each CPU core has a substantially identical architecture and executes instructions in a consistent way. Unlike data processors 100 and 200 , however, data processor 300 can configure the micro-architecture of each of CPU cores 312 - 318 to support either high intensity tasks or low intensity tasks, where the associated CPU is configured for desired power management and in some applications, longest potential battery life.
  • data processor 300 could configure CPU cores 312 and 314 to decode, dispatch, issue, and execute multiple out-of-order instructions, and to operate multiple pipelines each having a multiple number of stages.
  • data processor 300 could re-configure CPU cores 316 and 318 to decode, dispatch, issue, and execute instructions using a smaller number of pipelines.
  • data processor 300 functionally throttles, for example, CPU core 312 and gates off CPU core 314 , CPU core 316 , and CPU core 318 . Subsequently, data processor 300 would execute instructions using CPU core 312 . By eliminating a dedicated little core, data processor 300 preserves silicon area and saves power.
  • data processor 300 reconfigures CPU cores 312 - 318 to perform high intensity tasks, by modifying at least one pipeline for high intensity operation, by increasing a width of a decode pipeline, by enabling an execution pipeline, and/or by enabling or disabling portions of one or more caches, while managing the high frequency, high intensity operation of each core.
  • data processor 300 processes instructions using a quad core cluster.
  • CPU cores 312 , 314 , 316 , and/or 318 can be dynamically and selectively reconfigured.
  • CPU power management hardware and the operating system can migrate instructions to any CPU core.
  • a data processor can have other than four cores that can be dynamically and selectively reconfigured.
  • FIG. 4 illustrates in block diagram form a central processing unit (CPU) core 400 that may be used in data processor 300 of FIG. 3 according to some embodiments.
  • CPU core 400 generally includes a fetch unit 410 , a level one instruction cache (“L1 ICACHE”) 415 , an execution unit 420 , a decode unit 430 , and a configuration circuit 450 .
  • L1 ICACHE level one instruction cache
  • Fetch unit 410 has an input/output port and an output for providing instructions fetched from cache 415 or main memory.
  • Cache 415 has an input, and an input/output port connected to the input/output port of fetch unit 410 .
  • Execution unit 420 includes multiple execution pipelines 440 including exemplary execution pipelines 442 and 444 , and a level one data cache (“L1 DCACHE”) 460 .
  • Decode unit 430 has a first input connected to the output of fetch unit 410 , a second input, and an output.
  • Execution pipeline 442 has a first input connected to the output of decode unit 430 , a second input, and a data input/output port.
  • Execution pipeline 444 has a first input connected to the output of decode unit 430 , a second input, and a data input/output port.
  • Cache 460 has an input and data input/output ports connected to the data input/output ports of one or more execution pipelines such as execution pipelines 442 and 444 , depending on their respective function.
  • Configuration circuit 450 includes a register 452 and a functional throttling circuit 456 .
  • Register 452 is a model specific register (MSR) that includes a field 454 defining a mode of CPU core 400 , and has an output for providing the contents of field 454 .
  • Functional throttling circuit 456 has an input connected to the output of register 452 , an output connected to the input of cache 415 , the second input of decode unit 430 , the second input of execution pipelines 442 and 444 , and the input of cache 460 .
  • fetch unit 410 fetches a stream of instructions from cache 415 (or main memory through cache 415 if the fetch misses in cache 415 ), and provides the instructions to decode unit 430 .
  • Decode unit 430 decodes the instructions and dispatches them to selected execution units for execution.
  • Execution unit 420 includes redundant resources that are not needed to execute the instruction set of CPU core 400 .
  • execution unit 420 may have two identical pipelines that can be used to execute the same type of instruction.
  • each execution pipeline may queue a large number of operations to handle high workloads without stalling decode unit 430 but can operate properly with a smaller queue.
  • decode unit 430 can decode multiple operations in parallel to increase throughput. Each of these features is useful for meeting the performance requirements of high intensity tasks, but consumes unneeded power for low intensity tasks.
  • each of caches 415 and 460 have configurable sizes and can operate at full size for high intensity tasks, or reduced size for low intensity tasks.
  • Configuration circuit 450 has at least a first mode and a second mode. In the first mode, configuration circuit 450 causes CPU core 400 to operate as a “big core” by enabling the redundant resources. In the second mode, configuration circuit 450 causes CPU core 400 to operate as a “little core” by disabling the redundant resources. Thus a single, generic core can easily be reconfigured for different processing tasks.
  • CPU core 400 provides a protected mechanism to dynamically reconfigure CPU core 312 , CPU core 314 , CPU core 316 , and/or CPU core 318 by writing field 454 of register 452 .
  • FIG. 5 illustrates in block diagram form a pipeline 500 of CPU core 400 of FIG. 4 in a first mode according to some embodiments.
  • Pipeline 500 generally includes a fetch stage 510 , a decode/dispatch/rename stage 520 , and an execute stage 530 .
  • Fetch stage 510 is a four-deep stage that provides instructions in program order to decode/dispatch/rename stage 520 .
  • Decode/dispatch/rename stage 520 includes a seven-deep set of sub-stages 522 and a three-deep set of sub-stages 524 associated with floating point operations that can be recognized near the end of decode/dispatch/rename stage 520 .
  • Decode sub-stages 522 provide up to two decoded instructions in parallel to execute stage 530
  • decode sub-stages 524 provide up to two decoded floating point instructions in parallel to execute stage 530 .
  • Execute stage 530 includes a set of execution pipelines 540 each of which has its own corresponding pipeline segment organized into a queue sub-stage 532 , an issue sub-stage 534 , an execute sub-stage 536 , and a writeback (WB) sub-stage 538 .
  • execution piplines 540 include an integer pipeline 542 , an integer pipeline 544 , a multiply divide (“Mult/Div”) pipeline 546 , a load/store pipeline 548 , a load/store pipeline 550 , a floating point (“FP”) pipeline 552 , and a FP pipeline 554 .
  • Mult/Div multiply divide
  • FP floating point
  • FP floating point
  • FIG. 5 shows each queue sub-stage 532 as having three entries but these are representative of an arbitrary number of multiple entries.
  • queue sub-stage 532 in integer pipeline 542 has twenty-four queue stages.
  • FIG. 5 shows the remaining sub-stages with their actual depth.
  • some execute pipelines, such as integer pipeline 544 and Mult/Div pipeline 546 share a common queue sub-stage as is illustrated in FIG. 5 .
  • pipeline 500 represents the pipeline of CPU core 400 in the first mode, in which configuration circuit 450 enables the multiple redundant resources to support high frequency, high intensity tasks.
  • FIG. 6 illustrates in block diagram form a pipeline 600 of CPU core 400 of FIG. 4 in a second mode according to some embodiments.
  • Pipeline 600 generally includes a fetch stage 610 , a decode/dispatch/rename stage 620 , and an execute stage 630 corresponding to fetch stage 510 , decode/dispatch/rename stage 520 , and execute stage 530 , respectively of FIG. 5 .
  • pipeline 600 identifies redundant resources that have now been disabled.
  • Pipeline 600 illustrates four types of redundant resources. First, since integer pipeline 642 and 644 both execute the same types of instructions, the second one is redundant and CPU core 400 disables integer pipeline 642 in the second mode.
  • each slot of queue sub-stage 632 beyond the first stage is redundant, and CPU core 400 reduces the size of each queue sub-stage 632 by half.
  • the size of queue sub-stage 632 can be reduced from twenty-four slots to twelve slots.
  • the second half of decode/dispatch/rename stage 620 is redundant since decode/dispatch/rename stage 620 decodes two instructions in parallel.
  • CPU core 400 disables the redundant half of decode/dispatch/rename stage 620 such that it can only issue a single instruction per clock cycle.
  • the effective sizes of caches 415 and 460 can be reduced, such as by half. FIG.
  • FIG. 6 shows the disabling of these redundant resources in the second sub-stage of fetch stage 610 , and in the second sub-stage of execute sub-stage 636 of load/store pipeline 648 , in response to receiving a signal from functional throttling circuit 456 labeled “DISABLE”.
  • pipeline 600 is able to fully execute the instruction set of CPU core 400 , but to consume less power for low intensity tasks.
  • each pipeline can transition seamlessly. For example, when disabling a redundant half of decode/dispatch/rename stage 620 , the hardware may simply disable sub-stages in the unneeded half as the last instruction flows down decode/dispatch/rename stage 620 .
  • CPU core 400 can allow the size of each queue sub-stage to be reduced by stalling decode/dispatch/rename stage 620 until only half of the slots are used, and then disabling the unused half.
  • CPU core 400 can also disable a redundant pipeline by stopping the input of new decoded instructions into the pipeline and waiting until the pipeline naturally drains. Moreover CPU core 400 can reduce the sizes of instruction and data caches. In these ways, CPU core 400 can transition from the first (big core) mode to the second (little core) mode seamlessly and without the need for slow instruction migration.
  • FIG. 7 illustrates a flow diagram of a method 700 for configuring a processor core according to some embodiments.
  • Action box 710 includes fetching and decoding a write MSR instruction (“WMSR”) in a processor core.
  • the flow proceeds to decision box 720 , which determines whether CPU core 400 is in a privileged state. If the CPU core 400 is not in a privileged state, flow proceeds to action box 730 in which the method ends by some appropriate action, such as taking a privilege mode violation exception. If the processor core is in a privileged state, then flow proceeds to action box 740 , which updates a power control field in the MSR.
  • WMSR write MSR instruction
  • method 700 proceeds to action box 750 , which reconfigures the execution pipeline of CPU core 400 in response to a change in the power control field. Finally flow proceeds to action box 760 in which CPU core 400 executes instructions using the reconfigured core.
  • FIG. 8 illustrates a flow diagram of a method 800 for configuring a multi-core data processor according to some embodiments.
  • Action box 810 includes functionally throttling a processor core of a CPU cluster.
  • Action box 820 includes gating off remaining processor cores of the CPU cluster.
  • Action box 830 includes executing instructions using the processor core that was enabled.
  • FIGS. 3-6 may be implemented with various combinations of hardware and software, and the software component may be stored in a computer readable storage medium for execution by at least one processor. Moreover the methods illustrated in FIGS. 7 and 8 may also be governed by instructions that are stored in a computer readable storage medium and that are executed by at least one processor. Each of the operations shown in FIGS. 7 and 8 may correspond to instructions stored in a non-transitory computer memory or computer readable storage medium.
  • the non-transitory computer readable storage medium includes a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices.
  • the computer readable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted and/or executable by one or more processors.
  • FIGS. 3-6 may be described or represented by a computer accessible data structure in the form of a database or other data structure which can be read by a program and used, directly or indirectly, to fabricate integrated circuits of FIGS. 3-6 .
  • this data structure may be a behavioral-level description or register-transfer level (RTL) description of the hardware functionality in a high level design language (HDL) such as Verilog or VHDL.
  • HDL high level design language
  • the description may be read by a synthesis tool which may synthesize the description to produce a netlist comprising a list of gates from a synthesis library.
  • the netlist comprises a set of gates which also represent the functionality of the hardware comprising integrated circuits of FIGS. 3-6 .
  • the netlist may then be placed and routed to produce a data set describing geometric shapes to be applied to masks.
  • the masks may then be used in various semiconductor fabrication steps to produce integrated circuits of FIGS. 3-6 .
  • the database on the computer accessible storage medium may be the netlist (with or without the synthesis library) or the data set, as desired, or Graphic Data System (GDS) II data.
  • GDS Graphic Data System
  • CPU cluster 310 includes four CPU cores 312 , 314 , 316 , and 318 , and a cache 320 , which is a shared L2 cache.
  • CPU cluster 310 could include a different number of cores, and different cache memory hierarchies, including shared and dedicated cache memories.
  • CPU cores 312 , 314 , 316 , and 318 could use a common circuit design and process technology or different circuit design and process technologies.
  • a software write to register 452 could include selectively executing the write based on whether CPU core 400 is in a privileged state.
  • configuration circuit 450 could reconfigure different redundant functions of a CPU core of CPU cluster 310 , including an arithmetic logic unit (ALU), a schedule queue cluster, a FP unit, a multimedia extension unit (MMX), a cache memory, a cache controller, a translation lookaside buffer (TLB), a branch prediction unit, and the like.
  • ALU arithmetic logic unit
  • MMX multimedia extension unit
  • cache memory a cache controller
  • TLB translation lookaside buffer
  • branch prediction unit and the like.

Abstract

A data processor includes an execution unit having a multiple number of redundant resources, and a configuration circuit having first and second modes, wherein in the first mode, the configuration circuit enables the multiple number of redundant resources, and in the second mode, the configuration circuit disables the multiple number of redundant resources.

Description

    FIELD
  • This disclosure relates generally to data processors, and more specifically to configurable data processors.
  • BACKGROUND
  • Data processors are typically designed to meet specific product needs. For example, desktop microprocessors operate at very high speeds and have long execution pipelines and multiple parallel resources to achieve very high performance. Microprocessors for laptop computers offer reasonably high performance but have low power features to lengthen battery life. Embedded microprocessors operate at relatively slow speeds and have simple architectures in order to reduce product cost.
  • Certain products such as multimedia mobile devices can at various times require either high performance or low power consumption. For example when running certain games, a multimedia mobile device requires the microprocessor to provide high performance. However when running other tasks such as EMAIL, the multimedia mobile device requires much lower performance from the microprocessor. These varying processing environments make it difficult for the microprocessor designer to achieve the right balance between performance and preservation of battery life.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates in block diagram form a data processor known in the prior art.
  • FIG. 2 illustrates in block diagram form another data processor known in the prior art.
  • FIG. 3 illustrates in block diagram form a data processor according to some embodiments.
  • FIG. 4 illustrates in block diagram form a central processing unit (CPU) core that may be used in the data processor of FIG. 3 according to some embodiments.
  • FIG. 5 illustrates in block diagram form a pipeline of the CPU core of FIG. 4 in a first mode according to some embodiments.
  • FIG. 6 illustrates in block diagram form a pipeline of the CPU core of FIG. 4 in a second mode according to some embodiments.
  • FIG. 7 illustrates a flow diagram of a method for configuring a processor core according to some embodiments.
  • FIG. 8 illustrates a flow diagram of a method for configuring a multi-core data processor according to some embodiments.
  • In the following description, the use of the same reference numerals in different drawings indicates similar or identical items. Unless otherwise noted, the word “coupled” and its associated verb forms include both direct connection and indirect electrical connection by means known in the art, and unless otherwise noted any description of direct connection implies alternate embodiments using suitable forms of indirect electrical connection as well.
  • DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
  • FIG. 1 illustrates in block diagram form a data processor 100 known in the prior art. Data processor 100 generally includes a big core 110 and a little core 120.
  • In operation, big core 110 and little core 120 each have the capability to execute the same instruction set. However, the micro-architecture of big core 110 is constructed to support high intensity tasks and the micro-architecture of little core 120 is constructed to support low power and low intensity tasks.
  • For one particular example, big core 110 is a core known as the Cortex A15 core available from Advanced RISC Machines, Ltd. of Cambridge, G.B, and has the capability to decode, dispatch, issue and execute multiple, out-of-order instructions. Big core 110 operates multiple pipelines having 15 to 24 stages. On the other hand, little core 120 is a core known as the Cortex A7 core also available from ARM Ltd. and decodes, dispatches, issues, and executes in-order instructions and operates a smaller number of pipelines, having 8 to 10 stages.
  • Depending on the intensity and target power consumption of a task, data processor 100 migrates instructions to big core 110 or to little core 120. However data processor 100 consumes additional silicon area for little core 120, and in general, data processor 100 requires overhead to migrate instructions between the two cores when the processing task changes.
  • FIG. 2 illustrates in block diagram form another data processor 200 known in the prior art. Data processor 200 generally includes a companion core 210 labeled “COMPANION CORE1”, a main core 220 labeled “CORE1”, a main core 230 labeled “CORE2”, a main core 240 labeled “CORE3”, a main core 250 labeled “CORE4”, and a clock generator 260.
  • Companion core 210 has a clock input. Main cores 220, 230, 240, and 250 each have a clock input. Clock generator 260 has a first output connected to the clock input of companion core 210 and a second output connected to each clock input of main cores 220, 230, 240, and 250.
  • In operation, main cores 220, 230, 240, and 250 (forming a quad core cluster) and companion core 210 each have the capability to execute the same instruction set. Although main cores 220, 230, 240, and 250 and companion core 210 execute instructions in a consistent way, data processor 200 enables and disables cores based on the work load. For example, data processor 200 could enable only companion core 210 to execute low intensity tasks such as audio, video, and email, only two main cores to execute higher intensity tasks such as flash enabled browsing and multitasking, and all four main cores to execute high intensity tasks such as console class gaming and media processing.
  • Main cores 220, 230, 240, and 250 are each constructed to support high frequency, performance intensive tasks, whereas companion core 210 is constructed to support low frequency, low power, low intensity tasks. Clock generator 260 provides a high frequency clock to main cores 220, 230, 240, and 250, but provides a low frequency clock to companion core 210.
  • However, depending on the intensity and target power consumption of a task, CPU power management hardware and the operating system migrate instructions to selected ones of main cores 220, 230, 240, and 250 or to companion core 210. Like data processor 100, data processor 200 consumes additional silicon area to operate companion core 210, and in general, data processor 200 requires overhead to migrate instructions between any of cores 220-250 and companion core 210 when the processing task changes.
  • FIG. 3 illustrates in block diagram form a data processor 300 according to some embodiments. Data processor 300 generally includes a CPU cluster 310. CPU cluster 310 includes a CPU core 312 labeled “CPU0”, a CPU core 314 labeled “CPU1”, a CPU core 316 labeled “CPU2”, a CPU core 318 labeled “CPU3”, and a cache 320 which is a shared L2 cache.
  • In operation, CPU cores 312-318 each include a fetch unit for fetching a stream of instructions, an execution unit connected to the fetch unit that has a multiple number of redundant resources, and a configuration circuit that operates in a first mode and a second mode. In the first mode, the configuration circuit enables the multiple number of redundant resources, and in the second mode, the configuration circuit selectively disables the multiple number of redundant resources.
  • Each of CPU cores 312-318 has the capability to execute the same instruction set. Also, each CPU core has a substantially identical architecture and executes instructions in a consistent way. Unlike data processors 100 and 200, however, data processor 300 can configure the micro-architecture of each of CPU cores 312-318 to support either high intensity tasks or low intensity tasks, where the associated CPU is configured for desired power management and in some applications, longest potential battery life.
  • For example, data processor 300 could configure CPU cores 312 and 314 to decode, dispatch, issue, and execute multiple out-of-order instructions, and to operate multiple pipelines each having a multiple number of stages. On the other hand, data processor 300 could re-configure CPU cores 316 and 318 to decode, dispatch, issue, and execute instructions using a smaller number of pipelines.
  • For example, to reduce the power of data processor 300 for low intensity tasks, data processor 300 functionally throttles, for example, CPU core 312 and gates off CPU core 314, CPU core 316, and CPU core 318. Subsequently, data processor 300 would execute instructions using CPU core 312. By eliminating a dedicated little core, data processor 300 preserves silicon area and saves power.
  • On the other hand to increase performance, data processor 300 reconfigures CPU cores 312-318 to perform high intensity tasks, by modifying at least one pipeline for high intensity operation, by increasing a width of a decode pipeline, by enabling an execution pipeline, and/or by enabling or disabling portions of one or more caches, while managing the high frequency, high intensity operation of each core.
  • Like data processor 200, data processor 300 processes instructions using a quad core cluster. However, depending on the intensity of the task, the desired performance level, and the desired power consumption target, CPU cores 312, 314, 316, and/or 318 can be dynamically and selectively reconfigured. CPU power management hardware and the operating system can migrate instructions to any CPU core. As should be apparent, in some embodiments a data processor can have other than four cores that can be dynamically and selectively reconfigured.
  • FIG. 4 illustrates in block diagram form a central processing unit (CPU) core 400 that may be used in data processor 300 of FIG. 3 according to some embodiments. CPU core 400 generally includes a fetch unit 410, a level one instruction cache (“L1 ICACHE”) 415, an execution unit 420, a decode unit 430, and a configuration circuit 450.
  • Fetch unit 410 has an input/output port and an output for providing instructions fetched from cache 415 or main memory. Cache 415 has an input, and an input/output port connected to the input/output port of fetch unit 410. Execution unit 420 includes multiple execution pipelines 440 including exemplary execution pipelines 442 and 444, and a level one data cache (“L1 DCACHE”) 460. Decode unit 430 has a first input connected to the output of fetch unit 410, a second input, and an output. Execution pipeline 442 has a first input connected to the output of decode unit 430, a second input, and a data input/output port. Execution pipeline 444 has a first input connected to the output of decode unit 430, a second input, and a data input/output port. Cache 460 has an input and data input/output ports connected to the data input/output ports of one or more execution pipelines such as execution pipelines 442 and 444, depending on their respective function.
  • Configuration circuit 450 includes a register 452 and a functional throttling circuit 456. Register 452 is a model specific register (MSR) that includes a field 454 defining a mode of CPU core 400, and has an output for providing the contents of field 454. Functional throttling circuit 456 has an input connected to the output of register 452, an output connected to the input of cache 415, the second input of decode unit 430, the second input of execution pipelines 442 and 444, and the input of cache 460.
  • In operation, fetch unit 410 fetches a stream of instructions from cache 415 (or main memory through cache 415 if the fetch misses in cache 415), and provides the instructions to decode unit 430. Decode unit 430 decodes the instructions and dispatches them to selected execution units for execution. Execution unit 420 includes redundant resources that are not needed to execute the instruction set of CPU core 400. For example, execution unit 420 may have two identical pipelines that can be used to execute the same type of instruction. Also each execution pipeline may queue a large number of operations to handle high workloads without stalling decode unit 430 but can operate properly with a smaller queue. Moreover, decode unit 430 can decode multiple operations in parallel to increase throughput. Each of these features is useful for meeting the performance requirements of high intensity tasks, but consumes unneeded power for low intensity tasks. In addition, each of caches 415 and 460 have configurable sizes and can operate at full size for high intensity tasks, or reduced size for low intensity tasks.
  • Configuration circuit 450 has at least a first mode and a second mode. In the first mode, configuration circuit 450 causes CPU core 400 to operate as a “big core” by enabling the redundant resources. In the second mode, configuration circuit 450 causes CPU core 400 to operate as a “little core” by disabling the redundant resources. Thus a single, generic core can easily be reconfigured for different processing tasks.
  • Moreover by using a model specific register that can only be accessed in privileged mode to establish the mode of operation, CPU core 400 provides a protected mechanism to dynamically reconfigure CPU core 312, CPU core 314, CPU core 316, and/or CPU core 318 by writing field 454 of register 452.
  • FIG. 5 illustrates in block diagram form a pipeline 500 of CPU core 400 of FIG. 4 in a first mode according to some embodiments. Pipeline 500 generally includes a fetch stage 510, a decode/dispatch/rename stage 520, and an execute stage 530.
  • Fetch stage 510 is a four-deep stage that provides instructions in program order to decode/dispatch/rename stage 520. Decode/dispatch/rename stage 520 includes a seven-deep set of sub-stages 522 and a three-deep set of sub-stages 524 associated with floating point operations that can be recognized near the end of decode/dispatch/rename stage 520. Decode sub-stages 522 provide up to two decoded instructions in parallel to execute stage 530, whereas decode sub-stages 524 provide up to two decoded floating point instructions in parallel to execute stage 530.
  • Execute stage 530 includes a set of execution pipelines 540 each of which has its own corresponding pipeline segment organized into a queue sub-stage 532, an issue sub-stage 534, an execute sub-stage 536, and a writeback (WB) sub-stage 538. In pipeline 500, execution piplines 540 include an integer pipeline 542, an integer pipeline 544, a multiply divide (“Mult/Div”) pipeline 546, a load/store pipeline 548, a load/store pipeline 550, a floating point (“FP”) pipeline 552, and a FP pipeline 554. However the number and composition of the pipelines will vary in other embodiments.
  • Note that FIG. 5 shows each queue sub-stage 532 as having three entries but these are representative of an arbitrary number of multiple entries. For example, queue sub-stage 532 in integer pipeline 542 has twenty-four queue stages. FIG. 5 shows the remaining sub-stages with their actual depth. Moreover some execute pipelines, such as integer pipeline 544 and Mult/Div pipeline 546, share a common queue sub-stage as is illustrated in FIG. 5.
  • In operation, pipeline 500 represents the pipeline of CPU core 400 in the first mode, in which configuration circuit 450 enables the multiple redundant resources to support high frequency, high intensity tasks.
  • FIG. 6 illustrates in block diagram form a pipeline 600 of CPU core 400 of FIG. 4 in a second mode according to some embodiments. Pipeline 600 generally includes a fetch stage 610, a decode/dispatch/rename stage 620, and an execute stage 630 corresponding to fetch stage 510, decode/dispatch/rename stage 520, and execute stage 530, respectively of FIG. 5. However unlike pipeline 500, pipeline 600 identifies redundant resources that have now been disabled. Pipeline 600 illustrates four types of redundant resources. First, since integer pipeline 642 and 644 both execute the same types of instructions, the second one is redundant and CPU core 400 disables integer pipeline 642 in the second mode. Second, each slot of queue sub-stage 632 beyond the first stage is redundant, and CPU core 400 reduces the size of each queue sub-stage 632 by half. For example, the size of queue sub-stage 632 can be reduced from twenty-four slots to twelve slots. Third, the second half of decode/dispatch/rename stage 620 is redundant since decode/dispatch/rename stage 620 decodes two instructions in parallel. CPU core 400 disables the redundant half of decode/dispatch/rename stage 620 such that it can only issue a single instruction per clock cycle. Fourth, the effective sizes of caches 415 and 460 can be reduced, such as by half. FIG. 6 shows the disabling of these redundant resources in the second sub-stage of fetch stage 610, and in the second sub-stage of execute sub-stage 636 of load/store pipeline 648, in response to receiving a signal from functional throttling circuit 456 labeled “DISABLE”. By reducing the cache size in half, the power consumed in performing associative lookups and in maintaining valid data is reduced.
  • In this way, pipeline 600 is able to fully execute the instruction set of CPU core 400, but to consume less power for low intensity tasks. Moreover, when CPU core 400 transitions from the first mode to the second mode, each pipeline can transition seamlessly. For example, when disabling a redundant half of decode/dispatch/rename stage 620, the hardware may simply disable sub-stages in the unneeded half as the last instruction flows down decode/dispatch/rename stage 620. Moreover, CPU core 400 can allow the size of each queue sub-stage to be reduced by stalling decode/dispatch/rename stage 620 until only half of the slots are used, and then disabling the unused half. CPU core 400 can also disable a redundant pipeline by stopping the input of new decoded instructions into the pipeline and waiting until the pipeline naturally drains. Moreover CPU core 400 can reduce the sizes of instruction and data caches. In these ways, CPU core 400 can transition from the first (big core) mode to the second (little core) mode seamlessly and without the need for slow instruction migration.
  • FIG. 7 illustrates a flow diagram of a method 700 for configuring a processor core according to some embodiments. Action box 710 includes fetching and decoding a write MSR instruction (“WMSR”) in a processor core. The flow proceeds to decision box 720, which determines whether CPU core 400 is in a privileged state. If the CPU core 400 is not in a privileged state, flow proceeds to action box 730 in which the method ends by some appropriate action, such as taking a privilege mode violation exception. If the processor core is in a privileged state, then flow proceeds to action box 740, which updates a power control field in the MSR.
  • Continuing on, method 700 proceeds to action box 750, which reconfigures the execution pipeline of CPU core 400 in response to a change in the power control field. Finally flow proceeds to action box 760 in which CPU core 400 executes instructions using the reconfigured core.
  • FIG. 8 illustrates a flow diagram of a method 800 for configuring a multi-core data processor according to some embodiments. Action box 810 includes functionally throttling a processor core of a CPU cluster. Action box 820 includes gating off remaining processor cores of the CPU cluster. Action box 830 includes executing instructions using the processor core that was enabled.
  • The functions of FIGS. 3-6 may be implemented with various combinations of hardware and software, and the software component may be stored in a computer readable storage medium for execution by at least one processor. Moreover the methods illustrated in FIGS. 7 and 8 may also be governed by instructions that are stored in a computer readable storage medium and that are executed by at least one processor. Each of the operations shown in FIGS. 7 and 8 may correspond to instructions stored in a non-transitory computer memory or computer readable storage medium. In various embodiments, the non-transitory computer readable storage medium includes a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices. The computer readable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted and/or executable by one or more processors.
  • Moreover, the functions of FIGS. 3-6 may be described or represented by a computer accessible data structure in the form of a database or other data structure which can be read by a program and used, directly or indirectly, to fabricate integrated circuits of FIGS. 3-6. For example, this data structure may be a behavioral-level description or register-transfer level (RTL) description of the hardware functionality in a high level design language (HDL) such as Verilog or VHDL. The description may be read by a synthesis tool which may synthesize the description to produce a netlist comprising a list of gates from a synthesis library. The netlist comprises a set of gates which also represent the functionality of the hardware comprising integrated circuits of FIGS. 3-6. The netlist may then be placed and routed to produce a data set describing geometric shapes to be applied to masks. The masks may then be used in various semiconductor fabrication steps to produce integrated circuits of FIGS. 3-6. Alternatively, the database on the computer accessible storage medium may be the netlist (with or without the synthesis library) or the data set, as desired, or Graphic Data System (GDS) II data.
  • While particular embodiments have been described, various modifications to these embodiments will be apparent to those skilled in the art. For example, in the illustrated embodiment, CPU cluster 310 includes four CPU cores 312, 314, 316, and 318, and a cache 320, which is a shared L2 cache. In some embodiments, CPU cluster 310 could include a different number of cores, and different cache memory hierarchies, including shared and dedicated cache memories. CPU cores 312, 314, 316, and 318 could use a common circuit design and process technology or different circuit design and process technologies. A software write to register 452 could include selectively executing the write based on whether CPU core 400 is in a privileged state. Also, configuration circuit 450 could reconfigure different redundant functions of a CPU core of CPU cluster 310, including an arithmetic logic unit (ALU), a schedule queue cluster, a FP unit, a multimedia extension unit (MMX), a cache memory, a cache controller, a translation lookaside buffer (TLB), a branch prediction unit, and the like.
  • Accordingly, it is intended by the appended claims to cover all modifications of the disclosed embodiments that fall within the scope of the disclosed embodiments.

Claims (25)

What is claimed is:
1. A data processor comprising:
an execution unit having a plurality of redundant resources; and
a configuration circuit having first and second modes, wherein in said first mode, said configuration circuit enables said plurality of redundant resources, and in said second mode, said configuration circuit disables said plurality of redundant resources.
2. The data processor of claim 1, wherein said configuration circuit comprises:
a register having a field for indicating a mode of operation of the data processor; and
a functional throttling circuit coupled to said field and to said execution unit, to place the data processor in said first mode in response to a first state of said field, and in said second mode in response to a second state of said field.
3. The data processor of claim 2, wherein:
said register comprises a model specific register wherein the data processor can modify said register only while in a privileged operating state.
4. The data processor of claim 2 wherein said execution unit comprises:
a plurality of execution pipelines,
wherein in said first mode, said functional throttling circuit enables all of said plurality of execution pipelines, and in said second mode, said functional throttling circuit disables at least one of said plurality of execution pipelines.
5. The data processor of claim 1 further comprising:
a fetch unit to fetch a stream of instructions; and
a decode unit coupled between said fetch unit and said execution unit to decode said stream of instructions to provide decoded instructions to said execution unit,
wherein in said first mode, said configuration circuit enables said decode unit to decode a plurality of instructions in parallel, and in said second mode, said configuration circuit enables said decode unit to decode only one instruction.
6. The data processor of claim 1 wherein said execution unit comprises:
a plurality of execution pipelines having at least one of said plurality of redundant resources.
7. A data processor comprising:
a plurality of processor cores;
each processor core comprising:
an execution unit having a plurality of redundant resources; and
a configuration circuit having first and second modes, wherein in said first mode, said configuration circuit enables said plurality of redundant resources, and in said second mode, said configuration circuit disables said plurality of redundant resources.
8. The data processor of claim 7, wherein said configuration circuit comprises:
a register having a field for indicating a mode of operation of the data processor; and
a functional throttling circuit coupled to said field and to said execution unit, to place the data processor in said first mode in response to a first state of said field, and in said second mode in response to a second state of said field.
9. The data processor of claim 8, wherein:
said register comprises a model specific register and the data processor can access said register only while in a privileged operating state.
10. The data processor of claim 7 further comprising:
a fetch unit to fetch a stream of instructions; and
a decode unit coupled between said fetch unit and said execution unit to decode said stream of instructions to provide decoded instructions to said execution unit,
wherein in said first mode, said configuration circuit enables said decode unit to decode a plurality of instructions in parallel, and in said second mode, said configuration circuit enables said decode unit to decode only one instruction.
11. The data processor of claim 10, further comprising:
a shared cache coupled to each of said plurality of processor cores.
12. The data processor of claim 11, wherein each processor core further comprises:
at least one cache coupled to at least one of said fetch unit and said execution unit and to said configuration circuit, wherein in said first mode, each said at least one cache has a first size, and in said second mode, each said at least one cache has a second size smaller than said first size.
13. The data processor of claim 7, wherein said plurality of processor cores comprises four processor cores.
14. The data processor of claim 7, wherein said configuration circuit of each processor core further has a third mode for gating off said processor core.
15. A method comprising:
reconfiguring at least one pipeline of a processor core in response to a low power mode input signal; and
executing a plurality of instructions using said processor core so reconfigured.
16. The method of claim 15 wherein said reconfiguring comprises:
reducing a width of an execution pipeline.
17. The method of claim 15 wherein said reconfiguring comprises:
reducing a width of a decode pipeline.
18. The method of claim 15 wherein said reconfiguring comprises:
disabling an execution pipeline.
19. The method of claim 15, further comprising:
fetching and decoding a write to model specific register instruction;
executing said write to model specific register by updating a field in said model specific register; and
providing said low power mode input signal in response to said field having a predetermined state.
20. The method of claim 19, wherein said executing said write to model specific register instruction comprises:
selectively executing said write to model specific register based on whether the processor core is in a privileged state.
21. A method of reducing power consumption of a data processor having a plurality of processor cores comprising:
functionally throttling a first processor core of the plurality of processor cores;
gating off remaining processor cores of the plurality of processor cores; and
subsequently executing a plurality of instructions using said first processor core.
22. The method of claim 21 wherein said functionally throttling comprises:
reconfiguring at least one pipeline of said first processor core.
23. The method of claim 22 wherein said reconfiguring comprises:
reducing a width of an execution pipeline of said first processor core.
24. The method of claim 22 wherein said reconfiguring comprises:
reducing a width of a decode pipeline of said first processor core.
25. The method of claim 22 wherein said reconfiguring comprises:
disabling an execution pipeline of said first processor core.
US13/714,011 2012-12-13 2012-12-13 Dynamic re-configuration for low power in a data processor Active 2033-10-07 US9164570B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/714,011 US9164570B2 (en) 2012-12-13 2012-12-13 Dynamic re-configuration for low power in a data processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/714,011 US9164570B2 (en) 2012-12-13 2012-12-13 Dynamic re-configuration for low power in a data processor

Publications (2)

Publication Number Publication Date
US20140173312A1 true US20140173312A1 (en) 2014-06-19
US9164570B2 US9164570B2 (en) 2015-10-20

Family

ID=50932424

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/714,011 Active 2033-10-07 US9164570B2 (en) 2012-12-13 2012-12-13 Dynamic re-configuration for low power in a data processor

Country Status (1)

Country Link
US (1) US9164570B2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130346778A1 (en) * 2012-06-20 2013-12-26 Douglas D. Boom Controlling An Asymmetrical Processor
US20150370306A1 (en) * 2014-06-23 2015-12-24 Mediatek Inc. Method and System Providing Power Management for Multimedia Processing
US20160357554A1 (en) * 2015-06-05 2016-12-08 Arm Limited Controlling execution of instructions for a processing pipeline having first and second execution circuitry
US10089155B2 (en) * 2015-09-22 2018-10-02 Advanced Micro Devices, Inc. Power aware work stealing
US20220035635A1 (en) * 2014-11-26 2022-02-03 Texas Instruments Incorporated Processor with multiple execution pipelines
US11429173B2 (en) * 2018-12-21 2022-08-30 Intel Corporation Apparatus and method for proactive power management to avoid unintentional processor shutdown

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170212581A1 (en) * 2016-01-25 2017-07-27 Qualcomm Incorporated Systems and methods for providing power efficiency via memory latency control
US10591966B1 (en) * 2019-02-20 2020-03-17 Blockchain Asics Llc Actively controlled series string power supply

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164812A1 (en) * 2007-12-19 2009-06-25 Capps Jr Louis B Dynamic processor reconfiguration for low power without reducing performance based on workload execution characteristics
US20100268968A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Managing processor power-performance states
US7865667B2 (en) * 2001-10-22 2011-01-04 Oracle America, Inc. Multi-core multi-thread processor
US20120198207A1 (en) * 2011-12-22 2012-08-02 Varghese George Asymmetric performance multicore architecture with same instruction set architecture
US8321362B2 (en) * 2009-12-22 2012-11-27 Intel Corporation Methods and apparatus to dynamically optimize platforms

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5392437A (en) 1992-11-06 1995-02-21 Intel Corporation Method and apparatus for independently stopping and restarting functional units
US8086825B2 (en) 2007-12-31 2011-12-27 Advanced Micro Devices, Inc. Processing pipeline having stage-specific thread selection and method thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7865667B2 (en) * 2001-10-22 2011-01-04 Oracle America, Inc. Multi-core multi-thread processor
US20090164812A1 (en) * 2007-12-19 2009-06-25 Capps Jr Louis B Dynamic processor reconfiguration for low power without reducing performance based on workload execution characteristics
US7962770B2 (en) * 2007-12-19 2011-06-14 International Business Machines Corporation Dynamic processor reconfiguration for low power without reducing performance based on workload execution characteristics
US20100268968A1 (en) * 2009-04-16 2010-10-21 International Business Machines Corporation Managing processor power-performance states
US8171319B2 (en) * 2009-04-16 2012-05-01 International Business Machines Corporation Managing processor power-performance states
US8321362B2 (en) * 2009-12-22 2012-11-27 Intel Corporation Methods and apparatus to dynamically optimize platforms
US20120198207A1 (en) * 2011-12-22 2012-08-02 Varghese George Asymmetric performance multicore architecture with same instruction set architecture

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130346778A1 (en) * 2012-06-20 2013-12-26 Douglas D. Boom Controlling An Asymmetrical Processor
US9164573B2 (en) * 2012-06-20 2015-10-20 Intel Corporation Controlling an asymmetrical processor
US20150370306A1 (en) * 2014-06-23 2015-12-24 Mediatek Inc. Method and System Providing Power Management for Multimedia Processing
US9965021B2 (en) * 2014-06-23 2018-05-08 Mediatek, Inc. Method and system providing power management for multimedia processing
US20220035635A1 (en) * 2014-11-26 2022-02-03 Texas Instruments Incorporated Processor with multiple execution pipelines
US20160357554A1 (en) * 2015-06-05 2016-12-08 Arm Limited Controlling execution of instructions for a processing pipeline having first and second execution circuitry
US9952871B2 (en) * 2015-06-05 2018-04-24 Arm Limited Controlling execution of instructions for a processing pipeline having first out-of order execution circuitry and second execution circuitry
US10089155B2 (en) * 2015-09-22 2018-10-02 Advanced Micro Devices, Inc. Power aware work stealing
US11429173B2 (en) * 2018-12-21 2022-08-30 Intel Corporation Apparatus and method for proactive power management to avoid unintentional processor shutdown

Also Published As

Publication number Publication date
US9164570B2 (en) 2015-10-20

Similar Documents

Publication Publication Date Title
US9164570B2 (en) Dynamic re-configuration for low power in a data processor
JP6708335B2 (en) User-level branch and join processor, method, system, and instructions
CN105144082B (en) Optimal logical processor count and type selection for a given workload based on platform thermal and power budget constraints
US8190863B2 (en) Apparatus and method for heterogeneous chip multiprocessors via resource allocation and restriction
US10162687B2 (en) Selective migration of workloads between heterogeneous compute elements based on evaluation of migration performance benefit and available energy and thermal budgets
US8589665B2 (en) Instruction set architecture extensions for performing power versus performance tradeoffs
US9519324B2 (en) Local power gate (LPG) interfaces for power-aware operations
CN108885586B (en) Processor, method, system, and instruction for fetching data to an indicated cache level with guaranteed completion
US9329666B2 (en) Power throttling queue
Seki et al. A fine-grain dynamic sleep control scheme in MIPS R3000
Burd et al. Energy efficient microprocessor design
Roy et al. State-retentive power gating of register files in multicore processors featuring multithreaded in-order cores
US11886918B2 (en) Apparatus and method for dynamic control of microprocessor configuration
US20190095231A1 (en) Dynamic platform feature tuning based on virtual machine runtime requirements
US20130086357A1 (en) Staggered read operations for multiple operand instructions
US20210191725A1 (en) System, apparatus and method for dynamic pipeline stage control of data path dominant circuitry of an integrated circuit
Mangalwedhe et al. Low power implementation of 32-bit RISC processor with pipelining
CN113366458A (en) System, apparatus and method for adaptive interconnect routing
Gary Low-power microprocessor design
US20240086198A1 (en) Register reorganisation
US20240103868A1 (en) Virtual Idle Loops
Tseng Energy-efficient register file design
Francisco Lorenzon et al. Fundamental Concepts
Murti et al. Embedded Processor Architectures
Praveen et al. A survey on control implementation scheme

Legal Events

Date Code Title Description
AS Assignment

Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHIPPY, DAVID J.;REEL/FRAME:029570/0648

Effective date: 20121226

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8