CN106020424A - Active power efficiency processor system structure - Google Patents
Active power efficiency processor system structure Download PDFInfo
- Publication number
- CN106020424A CN106020424A CN201610364515.9A CN201610364515A CN106020424A CN 106020424 A CN106020424 A CN 106020424A CN 201610364515 A CN201610364515 A CN 201610364515A CN 106020424 A CN106020424 A CN 106020424A
- Authority
- CN
- China
- Prior art keywords
- core
- processor
- interruption
- state
- logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3293—Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention relates to an active power efficiency processor system structure. According to one embodiment, a method comprises the steps that an interrupt is received from an accelerator and responded, restoring signals are directly sent to a small core, a subset of an execution status of a larger core is provided for the first small core, whether the small core can process a request related to the interrupt or not is judged, if it is determined that the small core can process the request related to the interrupt, an operation corresponding to the request is executed in the small core, and if not, the execution status of the large core and the restoring signals are provided for the large core. Other embodiments of the invention are described and required to be protected.
Description
The application is filing date JIUYUE in 2011 Application No. 201180073263.X invention name on the 6th
It is referred to as the divisional application of the Chinese patent application of " processor architecture of power efficient ".
Background
Generally, when it is possible, processor uses electric energy to save sleep pattern, such as according to advanced configuration
With power interface (ACPI) standard (Rev.3.0b that such as, on October 10th, 2006 is issued).
When core is idle or is not exclusively used, except voltage and frequency adjust (DVFS or ACPI performance state
(P-state)) outside, these so-called C-state core low power states (ACPI C-state) are permissible
Save electric energy.But, even in polycaryon processor context, core is usually from the sleep state come into force
Wake up, to perform relatively simple operation, then, return to sleep state.This operation can be to power
Efficiency has adverse effect on, because exiting and return low power state to there is delay and power consumption
Cost.In state conversion process, electric energy may be consumed but the completeest in some type of processor
Becoming useful work, this is unfavorable to power efficiency.
When exiting low power state, the example of operation to be processed includes: in input through keyboard, timer
Disconnected, network interrupts, etc..For processing these operations, current operation system in the way of power sensitive
System (OS) passes through the bigger data volume of single treatment or moves to without idle loop OS (without the cycle
Property timer interrupt, the most fragmentary programming is interrupted), carry out reprogramming behavior.Another strategy is
Use timer is polymerized, and wherein, multiple interrupt groups is processed altogether and simultaneously.But, except changing
Outside the behavior of range sequence, these options all produce complexity, and can still result in power efficiency
Low operation.Further, certain form of software (such as, media play) can be by request frequency
Numerous periodic wakeup (no matter how many job demand complete), and attempt defeating hardware power efficiency machine
System.So, can be unwanted from deep C-state by reducing without idle loop/timer aggregation strategy
The number of times waken up, saves certain power, but, they need to carry out OS the change of invasive,
And may devote a tremendous amount of time through calculating ecosystem, because such change is until operating system
Redaction be distributed before will not be implemented.
Accompanying drawing is sketched
Fig. 1 is the block diagram of processor according to an embodiment of the invention.
Fig. 2 is the block diagram of processor according to another embodiment of the present invention.
Fig. 3 is the flow chart recovering stream option between core according to an embodiment of the invention.
Fig. 4 is the flow chart of method according to an embodiment of the invention.
Fig. 5 is the flow process of the method for transmitting execution state according to an embodiment of the invention
Figure.
Fig. 6 is the block diagram of processor according to still a further embodiment.
Fig. 7 shows the block diagram of the processor of the further embodiment according to the present invention.
Fig. 8 is the block diagram of processor according to still a further embodiment.
Fig. 9 is sequential chart according to an embodiment of the invention.
Figure 10 is the graphic extension of amount of electricity saving according to an embodiment of the invention.
Figure 11 is the block diagram of system according to an embodiment of the invention.
Detailed description of the invention
In various embodiments, in heterogeneous processor environment, average power consumption can reduce.This
Heterogeneous environment, due to system and power efficiency reason, can include large-scale quick core and less more
The core of power efficient.Further, each embodiment can be to the operating system performed on a processor
(OS) transparent mode provides this power to control.But, the scope of the present invention is not limited only to isomery
Type environment, it is also possible to for the environment of homogeneity (with transparent to OS, but the angle of not necessarily hardware isomery
For degree), (such as, in multi-processor environment, make core as much as possible reducing mean power
Sleep).Each embodiment can hardware-accelerated environment (such as its center usually sleep based on flat
Plate computer and system on chip (SoC) architecture) in the most suitable.
It is said that in general, each embodiment by being directed to less core and the biggest by all wake-up signals
Core, carry out power control.So, when system 95% is idle, mean power can reduce super
Cross twice.As described below, in many examples, the core that this is less can be separated with OS.
That is, the existence of this less core is unknown for OS, and so, OS is invisible in this verification.As
This, each embodiment can be logical to the transparent mode of OS and the application program performed on a processor
Cross processor hardware and be provided with the processor operation of power efficiency.
With reference now to Fig. 1, shown in be the block diagram of processor according to an embodiment of the invention.As
Shown in Fig. 1, processor 100 can be to have at the heterogeneous of several macronucleus, small nut and accelerator
Reason device.Although herein is in described in the context of polycaryon processor, however, it will be understood that real
Execute example unrestricted, in each realization, can be in SoC or other processing equipments based on quasiconductor.
Note that accelerator queue based on input service can perform work, no matter whether processor core
It has been energized.In the embodiment in figure 1, processor 100 includes multiple macronucleus.Shown spy
Determine in embodiment, it is shown that two such core 110a and 110b (in general manner, macronucleus 110),
Although it will be appreciated that plural such macronucleus can be provided.In each realization, these macronucleus
Can be to there is relative complex pipeline architecture and calculate (CISC) body according to sophisticated vocabulary
Unordered (out-of-order) processor that architecture operates.
It addition, processor 100 also includes multiple small nut 120a-120n (in general manner, small nut 120).
Although showing 8 such cores in the embodiment in figure 1, however, it will be understood that the present invention
Scope is not limited in this respect.In various embodiments, small nut 120 can be power efficient
(in-order) processor in order, such as, with according to CISC or Jing Ke Cao Neng (RISC)
Architecture performs instruction.In some implementations, the two or more cores in these cores can be gone here and there
Connection is coupled, to perform relevant treatment, such as, if multiple macronucleus is in power save mode, that
, one or more less cores may be at movable to perform work, and otherwise these work will wake up up
Macronucleus.In many examples, small nut 120 can be transparent to OS, although in other embodiments
In, small nut and macronucleus can be exposed to OS, have config option to use.It is said that in general, can be in difference
Embodiment in use macronucleus and any core between small nut mixing.For example, it is possible to each greatly
Core provides single small nut, or in other embodiments, single small nut can be associated with multiple macronucleus.
As used herein, term " macronucleus " can be to have relative complex design and with " little
Core " compare the processor core that may consume relatively large chip area, and small nut can have complexity
Less design also consumes smaller chip area.It addition, smaller core compared to
For bigger core, power efficiency is higher, because they may have less thermal design by bigger core
Power consumption (TDP).However, it is to be appreciated that compared with macronucleus, less core manages ability side at which
Face is restricted.Such as, these less cores may not process all behaviour feasible in macronucleus
Make.It addition, less core is probably relatively inefficient when instruction processes.That is, in macronucleus
Ratio is more quickly performed instruction in small nut.
It can further be seen that macronucleus 110 and small nut 120 may be coupled to interconnect 130.Not
In same embodiment, it is possible to achieve the different realization of this interconnection structure.Such as, in some embodiment
In, interconnection structure can according to Front Side Bus (FSB) architecture orQuick Path Interconnect
(QPI) agreement.In other embodiments, interconnection structure can be according to a given system interconnection fabric.
Again referring to Fig. 1, multiple accelerator 140a-140c are also coupled to interconnect 130.Although this
Bright scope is unrestricted in this regard, but, accelerator can also include Media Processor, such as
Audio frequency and/or video processor, cipher processor, fixed-function unit, etc..These accelerators can
Design with the same design personnel by design core, or can be to include independent the in processor
Tripartite's intelligent attributes (IP) block.It is said that in general, dedicated processes task can compare in these accelerators
They efficiently perform in macronucleus or small nut, either for performance or with regard to power consumption
For.Although utilizing in the embodiment in figure 1 shown in this specific implementation, it will be appreciated that this
The scope of invention is unrestricted in this regard.Such as, only two kinds of core (that is, macronucleus is replaced
And small nut), other embodiments can have the hierarchical structure of multiple core, including at least macronucleus, medium
Core and small nut, medium core has a chip area bigger than small nut, but the chip face less than macronucleus
Long-pending, and there is the corresponding power consumption between macronucleus and the power consumption of small nut.Implement at other
In example, small nut can be embedded in bigger core, such as, as the logic of bigger core and structure
Subset.
Although additionally, be shown as including multiple macronucleus and multiple small nut in the embodiment in figure 1, but,
Some such as moving processor or SoC etc is realized, single macronucleus and single can be provided only
Small nut.Specifically, with reference now to Fig. 2, shown in be processor according to another embodiment of the present invention
Block diagram, wherein, processor 100 " includes single macronucleus 110 and single small nut 120, and interconnection
130 and accelerator 140a-c.As mentioned above, this realization may be adapted to Mobile solution.
As the example power figure of typical macronucleus, power consumption can be about substantially 6000 milliwatts
(mW), and for medium core, power consumption can be the most substantially 500mW, and for very
Small nut, power consumption can be the most substantially 15mW.In avoiding the one waking up macronucleus up to realize,
Significant Power Benefit can be realized.
Each embodiment makes bigger, and the slightly lower core of power efficiency is maintained at the time of low power sleep state
Than they in other cases institute retainable longer.By interrupting and other core wake events orientation
To less core rather than bigger core, less core can run the longer time, and wake up up more
Frequently, but, this than wake up up macronucleus perform footy task that such as data move etc and
Say more power efficient.Note that as described below for described by some operation, such as when less
Core may support vector computing (such as, AVX computing), complicated addressing mode or floating-point (FP)
During computing, macronucleus is energized to perform.In this case, can be by wake-up signal from small nut weight
Newly it is routed to macronucleus.
Such as, when performing hardware-accelerated 1080p video playback on a processor, per second enter
Go out the transition more than 1000 times and the almost interruption of 1200 times of core C6 state.If using this
A part for bright embodiment, even these wake events is redirected to less core, the most permissible
Realize significant amount of electricity saving.
Fig. 3 outlines the recovery stream option between core according to an embodiment of the invention.Such as Fig. 3
Shown in, there is software domain 210 and hardware domain 220.It is said that in general, software domain 210 corresponding to relative to
The OS operation of power management, such as, realizes according to ACPI.It is said that in general, OS, dispatch according to it
Mechanism, the understanding to upcoming task based on it, in multiple C-state can be selected,
Low-power mode is entered into request processor.Such as, OS can send MWAIT and call, this tune
With including just at requested specific low power state.
It is said that in general, CO is corresponding to performing the normal operating state of instruction, and state C1-C3 is OS
Lower power state, each state all has the amount of electricity saving of different stage, and returns to CO state
The delay of corresponding different stage.It can be seen that depend on the intended workload of processor, OS
Busy state, such as, in OS CO or multiple idle condition, such as, OS can be selected
C-state C1-C3.Each in these idle conditions may be mapped at processor hardware
Corresponding hardware low power state under control.So, processor hardware can be by given OS C-shaped
State is mapped to the hardware C-state of correspondence, and this C-state can provide than by the amount of electricity saving specified by OS
Bigger amount of electricity saving.It is said that in general, shallower C-state (such as, Cl) is compared with deeper C-state
(such as, C3) saves less power, but has relatively low recovery time.In various embodiments,
Hardware domain 220 and OS C-state can be controlled by the power of processor to the mapping of processor C-state
Unit (PCU) performs, although the scope of the present invention is unrestricted in this regard.This mapping is permissible
History based on former power management request based on OS.Moreover, it is judged that can be based on whole system
State, configuration information etc..
It addition, PCU or other processor logics can be configured to be directed to all wake events
Minimum available core (in various embodiments, can be the sightless core of OS).As it is shown on figure 3,
When exiting from given hardware based idle condition, control to be directly returned to the available core of minimum, by
This state is transferred to the core of this minimum.By contrast, in custom hardware/software rejuvenation, only control
Return to macronucleus.It is said that in general, OS based on intended free time and recovers to postpone requirement, select
C-state, this C-state is mapped to hardware C-state by architecture.So, such as the embodiment of Fig. 3
Shown in, all recovery signals (such as interrupting) are routed to the available core of minimum, and this core judges that it is
No can process recovery operation, or on the contrary, wake-up signal is sent to bigger core to continue.Please note
Meaning, each embodiment does not disturb existing P-state or C-state automatically to demote, wherein, at existing P
During state or C-state are demoted automatically, hardware automatically selects based on the experimental efficiency measured
With the relatively low hardware C-state recovering to postpone.Note that PCU or another programmable entity can
To check incoming wake events judges to route them to which core (macronucleus or small nut).
As described above, in some implementations, small nut itself can not allow OS and application program
Software is seen.Such as, little-macronucleus pairing can be separated, and does not allow application software see.
At low power state, all cores can be sleep, and accelerator (such as video decoding accelerator)
Perform the Given task of such as decoding task etc.When accelerator is finished data, its orientation wakes up letter up
Number, may come from the other data of small nut with request, this small nut wakes up up and judges to realize this
Simple data moves and operates without waking up macronucleus up, so, saves electric energy.If timer interrupts
Arrive and small nut wakes up up and detects and there is complicated vector operation (such as 256 bit in instruction stream
AVX instruct), then may wake up up macronucleus with process complicated order (and this stream in other instruction),
To shorten delay.In replacing realization, global hardware is observed mechanism and be may be located in PCU, or is positioned at
Another non-nuclear location near PCU, or the unitary part as the hardware logic on globally interconnected, or
Supplementing as the inner control logic to small nut, global hardware observes mechanism can detect that small nut is met
Instructing to AVX, it is possible to generate undefined instruction fault, this fault may result in small nut and closes,
And after waking up bigger core up, instruction stream is re-introduced into this bigger core.Note that this behavior
Instruction can be not limited to, and expand to configuration or feature.Such as, if small nut runs into only existing big
The write of the configuration space on core, it can request that the waking up up of macronucleus.
With reference now to Fig. 4, shown in be the flow chart of method according to an embodiment of the invention.Note that
Depending on given realization, the method for Fig. 4 can be performed by various agencies.Such as, real at some
Executing in example, method 300 can be partially by the such as power control unit etc in processor
System agent circuit (may be in System Agent or the non-core part of processor) realizes.At other
In embodiment, method 300 can be partially by the such as power control logic etc in interconnection structure
Interconnection logic realize, interconnection logic can such as from be coupled to interconnection structure accelerator receive
Disconnected, and interruption is forwarded to the position selected.
As shown in Figure 4, method 300 can start (frame by macronucleus and small nut are placed in sleep state
310).I.e., it is assumed that do not have the operation of activity to be performed in core.So, will be able to put with them
In selected low power state, to reduce power consumption.Although core is not likely to be movable, but,
Other agencies in the SoC of processor or the most one or more accelerator etc can perform task.
At frame 320, interruption can be received from such accelerator.When accelerator completes task, runs into mistake
Miss or when accelerator needs other data or other process (such as, will be given by another assembly
Core) perform time, can send this interrupt.Controlling to enter frame 330, there, logic can will be recovered
Signal is sent directly to small nut.That is, logic can be programmed to when macronucleus and small nut are both in low
During power rating, it is sent to small nut (or is sent to recovering signal in multiple such small nut all the time
A selected small nut, depends on that system realizes).By interruption directly and is sent to small nut all the time,
Many situations of the interruption being asked operation can be processed for those small nuts, can avoid in macronucleus
Bigger power consumption.Note that and can add certain form of filtration or cache to frame 330
Mechanism, in order to some interrupt source is routed to a core or another core the most all the time, with balance
Performance and power.
Again referring to Fig. 4, control to turn next to rhombus 340, there, it can be determined that small nut whether may be used
To process the request being associated with interruption.Although the scope of the present invention is unrestricted in this regard, but,
Judge in some embodiments it is possible to itself carries out this at small nut after small nut is waken up.Or
Person, the logic of the method performing Fig. 4 can perform judgement and (in this case, send out to small nut
Before sending recovery signal, can be performed this and analyze).
As example, small nut can performance requirement based on small nut and/or and/or instruction set architecture
(ISA) ability judges whether it can process asked operation.If small nut does not has due to it
ISA supports and can not process asked operation, then the front end logic of small nut can resolve and to receive
Instruction stream, and judge that at least one instruction in stream is not supported by small nut.Correspondingly, small nut can be sent out
It has undefined instruction fault.This undefined fault can be sent to PCU (or another entity), should
PCU (or another entity) can analyze the state of fault and small nut to judge that whether undefined fault is
Due to small nut not used for process instruction hardware supported, if or it be real undefined fault.
In the case of the latter, undefined fault may be forwarded to OS, for processing further.If therefore
Barrier is that then PCU can will be transmitted to due to small nut not suitably for processing the hardware supported of instruction
The execution state transfer of this small nut is to corresponding macronucleus, to process the instruction of request.
In other embodiments, when judge small nut have been carried out the long time or performance class the lowest time,
The transmission between small nut and macronucleus of the execution state can occur.I.e., it is assumed that small nut has been carried out number
Thousand or millions of processor cycles, to perform the task of request.More favourable owing to having in macronucleus
Perform available, by by state transfer to macronucleus so that macronucleus can end task more quickly, permissible
There is bigger power reduction.
Again referring to Fig. 4, if it is determined that the operation asked can be processed in small nut, then control to enter
Frame 350, there, so, performs operation in small nut.For example, it is assumed that the operation searching request is several
According to mobile operation, then small nut can perform asked process, if not having other tasks for small nut
It is pending, then can again place it into low power state.
If instead in rhombus 340 judging, small nut can not process asked operation, such as, if
Operation is the relative complex operation that small nut is configured without processing, then control to forward frame 360 to.There, may be used
To send wake-up signal, such as, directly it is sent to macronucleus from small nut, so that macronucleus is energized.Accordingly
Ground, controls to enter frame 370, and there, the operation of request so can perform in macronucleus.Note that
Although utilizing this specific operation group to describe in the fig. 4 embodiment, however, it will be understood that this
The scope of invention is unrestricted in this regard.
So, in various embodiments, it is provided that allow hardware interrupts and other wake-up signals by directly
Connect and be routed to small nut without waking up the mechanism of macronucleus up.Note that in different realizations, small nut its
Itself or supervision agency can decide whether can to complete in the case of not waking up macronucleus up wake-up signal and
Process.It is in the case of representational, much higher compared with the core that the power efficiency of little core can be bigger,
And result can only support that macronucleus is supported the subset of instruction.To hold when waking up from low power state
Many operations of row can be offloaded to simpler, and the higher core of power efficiency, to avoid at isomery
Environment wakes up up bigger more strength core (due to performance or power efficiency reason in isomerous environment,
The core of many all sizes is included in systems).
With reference now to Figure 55, shown in be according to an embodiment of the invention for transmitting execution state
The flow chart of method.As it is shown in figure 5, in one embodiment, method 380 can be by PCU's
Logic performs.This logic can trigger in response to macronucleus is placed in the request of low power state.Ring
Should be in such request, method 380 can be from the beginning of frame 382, there, the execution state of macronucleus
Can be stored in scratchpad area (SPA).Note that this scratchpad area (SPA) can associate with nuclear phase
Single user state conservation zone, or, it can sharing at such as last level cache (LLC) etc
In Cache.Although the scope of the present invention is unrestricted in this regard, but, the state of execution can
Including general register, state and configuration register, execution flag etc..It addition, at this point it is possible to
Perform the extra operation making macronucleus be placed in low power state.Such operation includes emptying internal delaying
Deposit, and other states and for closing the signaling of given core.
Again referring to Fig. 5, it can be determined that whether small nut recovers (rhombus 384).This recovery can be as sound
Should receive in interruption recovers the result of signal and occurs, and this interruption adds from such as processor
Speed device.The part recovered as small nut, controls to enter frame 386, there, can store from interim
District extracts at least some of of big nuclear state.More specifically, this part extracted can be macronucleus
Execution state in the part that will be used by small nut.As example, this status sections can include
Master register content, the various labellings of such as some execution flag etc, machine status register(MSR) etc..
But, some state may not be extracted, such as with macronucleus present in but in small nut the most right
Answer the state that one or more performance elements of performance element are associated.Can be by this extraction of state
Part is sent to small nut (frame 388), so, makes little nuclear energy perform any conjunction in response to given interruption
Suitable operation.Although utilizing in the 5 embodiment of figure 5 shown in this specific implementation, however, it is possible to reason
Solving, the scope of the present invention is unrestricted in this regard.
With reference now to Fig. 6, shown in be the block diagram of processor according to an embodiment of the invention.Such as Fig. 6
Shown in, processor 400 can be polycaryon processor, including can be to more than first core disclosed in OS
410i-410n, to more than second transparent for OS core 410a-x.
It can be seen that various cores can be coupled to include the system generation of various assembly by interconnection 415
Reason or non-core 420.It can be seen that non-core 420 can include the shared high speed as last level cache
Buffer 430.It addition, non-core can include integrated Memory Controller 440, various interface 450a-n,
Power control unit 455, and Advanced Programmable Interrupt Controllers APICs (APIC) 465.
PCU 450 can include the operation realizing power efficient according to an embodiment of the invention
Various logic.It can be seen that PCU 450 waking up up of can including performing waking up up as described above
Logic 452.So, logic 452 may be configured to first wake up up small nut.But, this logic
Can be configured dynamically, with in some cases, not perform such small nut and directly wake up up.Such as,
System can be configured dynamically into saves operation for electric energy, such as, when system is to utilize battery to transport
During the mobile system gone.In this case, logic may be configured to wake up up all the time small nut.Phase
Instead, if system is attached to the server system of wall power source, desktop computer or laptop computer system
System, then embodiment can provide selection based on user, to select to postpone and performance rather than amount of electricity saving.
So, in this case, wakeup logic 452 can be configured to respond to interrupt, and wakes up up big
Core, and not small nut.When judge substantial amounts of small nut wake up up can cause being redirected to macronucleus time, can hold
Similar the waking up up of row macronucleus.
For realizing the operation of power efficient further, PCU 450 can also include can at macronucleus and
Carry out performing the state transfer logic 454 of state transfer between small nut.As discussed above, in low merit
Rate state, it is possible to use this logic obtains the execution state of the macronucleus stored in temporary memory,
And extract at least some of of this state, to be supplied to small nut when small nut wakes up up.
Further, PCU 450 can include interrupting historical memory 456.Such memorizer is permissible
Including multiple entries, each entry all identifies the interruption occurred in system operation procedure and interruption is
No successfully processed by small nut.Then, based on this history, when receiving given interruption, Ke Yifang
Ask the corresponding entry of this memorizer, to judge that the previous interruption of same type is the most successfully by small nut
Process.If it is, the interruption of new incoming can be directed to identical small nut by PCU.On the contrary, if
Judging based on this history, such interruption is not successfully processed (or with not making by small nut
The low performance that people is satisfied), on the contrary, interruption can be sent to macronucleus.
Undefined process logic 458 can also be included again referring to Fig. 6, PCU 450.Such logic can
The undefined fault sent by small nut with reception.Based on this logic, the information in small nut can be accessed.
It is then possible to judge whether undefined fault is owing to lacking the support for the instruction in small nut or another
A kind of reason.Judging in response to this, logic can cause the state of small nut (to be deposited with macronucleus execution state
Storage is in scratchpad area (SPA)) the merging of remainder and be sent to macronucleus after that for centering
Disconnected process, or undefined fault is sent to OS for further processing.When judging small nut
When can not process interruption, obtain a part for the execution state being supplied to small nut immediately from small nut, and protect
It is stored back into temporary storage location, correspondingly, can be by small nut power-off.It is then possible to this is merged
The residue of state and macronucleus performs state and is provided back to macronucleus so that macronucleus can process small nut can not
The interruption processed.It shall yet further be noted that and can deal with improperly in response to the such of small nut, in can writing
Entry in disconnected historical memory 456.Although utilizing this certain logic to illustrate in the embodiment in fig 6
, however, it will be understood that the scope of the present invention is unrestricted in this regard.Such as, real at other
Executing in example, the various logic of PCU 450 can realize with unity logic block.
APIC 465 can receive various interruption (such as, send) from accelerator, and correspondingly will
Interrupt the one or more cores being directed to give.In certain embodiments, for small nut is maintained OS
Hiding, APIC 465 can dynamically by incoming interruption, (each interruption can include and its phase
The APIC identifier of association) it is remapped to relevant with small nut from the APIC ID being associated to macronucleus
The APIC ID of connection.
With further reference to Fig. 6, processor 400 is permissible, such as, by memory bus, with system
Memorizer 460 communicates.It addition, by interface 450, it may be connected to such as ancillary equipment, big
The various chip component of capacity memory etc.Although utilizing this special in the embodiment in fig 6
Shown in fixed realization, but, the scope of the present invention is unrestricted in this regard.
Note that various architecture realize macronucleus and the different coupling of small nut or integrated be also permissible
's.As example, the degree of coupling between these diverse cores may rely on die area,
The various design optimization parameters that power, performance are relevant with response.
With reference now to Fig. 7, shown in be the block diagram of processor according to another embodiment of the invention.
As it is shown in fig. 7, processor 500 can be the real heterogeneous including macronucleus 510 and small nut 520
Processor.It can be seen that each processor can be with the private cache memory layer of their own
Aggregated(particle) structure is (i.e., it is possible to include the cache memory 515 of 1 grade and 2 grades cache memory
With 525) it is associated.Core can be coupled by annular interconnection 530 again.Multiple accelerator 540a
It is also coupled to 540b and LLC (that is, L3 cache 550 can be shared cache)
Annular interconnection.In this implementation, the execution state between two cores can interconnect 530 by annular
Transmission.As described above, the execution state of macronucleus 500 can enter into given low-power shape
It is stored in before state in cache 550.Then, when waking up up of small nut 520, at least this holds
The subset of row state can be provided to small nut, to read core, in order to performs to trigger its operation waken up up.
So, in the embodiment of Fig. 7, core is by this annular interconnection loose couplings.Although for ease of diagram
Utilize shown in single macronucleus and single small nut, however, it is understood that the scope of the present invention is in this respect
Unrestricted.By using the realization of such as Fig. 7, (can be can also is that by annular solid architecture
Bus or interconnection structure architecture) process any state to be switched or communication.Or,
In other embodiments, this communication (can not shown in the figure 7 by the dedicated bus between two cores
Go out).
With reference now to Fig. 8, shown in be the block diagram of processor according to still a further embodiment.
As shown in Figure 8, processor 500 " can be mixed type heterogeneous processor, wherein, at macronucleus and little
Close-coupled or integrated is had between core.Specifically, as shown in Figure 8, macronucleus 510 and small nut 520
Can share cache memory 518, this memorizer 518 is the most permissible
Including 1 grade and 2 grades of caches.So, perform state can by this cache memory from
One core is transferred to other core, so, it is to avoid by the delay of the communication of annular interconnection 530.
Note that this layout faster obtains more owing to the data reduced move the communication between expense and core
Low power, however it is possible to underaction.
It should be noted that Fig. 7 and 8 merely illustrates two kinds of possible realizations and (merely illustrates a limited number of
Core).Can also there is more realization, including the different layout of core, the combination of two schemes, two kinds
Above core of type etc..In the variant of Fig. 8, two cores can share some assembly, such as
Performance element, instruction pointer or register file.
As discussed, each embodiment can be fully transparent, invisible to operating system, so,
There is no software modification, the prolongation of the minimum recovery time from C-state.In other embodiments,
The existence of small nut and availability can be open to OS, so so that it is interruption to be provided that OS can make
To small nut or the decision of macronucleus.Additionally, each embodiment can also be in such as basic input output system
Etc (BIOS) systems soft ware provides to the open macronucleus of OS and small nut, or whether configuration discloses
The mechanism of small nut.Each embodiment can increase significantly from the recovery time of C-state, but, this is can
With accept, because current platform is recovering variant in terms of delay, currently, core state
The time being resumed, do not perform useful work.Small nut and the most different ratio of macronucleus can be from micro-
Inappreciaple difference changes between bigger microarchitecture difference.According to each embodiment, isomery core
Between most of main distinctions can be die area and by the power of karyophthisis.
In some implementations, it is provided that control mechanism, in order to if be detected that macronucleus is big when recovering
Part-time all wakes, then can avoid waking up small nut up, it is possible to directly wake up macronucleus up extremely
Reach predetermined time span less to keep performance.Note that in certain embodiments, usually by institute
The mechanism having interruption and other wake-up signals to be re-introduced into small nut or macronucleus can be to software (system
Software with user class) open, this depends on application and the power of system and performance requirement.As
One such example, it is provided that the instruction of user class, be directed to specify by wake operation
Core.Such instruction can be analogous to the variant of the instruction of MWAIT.
In certain embodiments, accelerator can will be sent to PCU or other pipes with the hint interrupted
Reason agency, to point out that asked operation is relatively simple operation, thus can in small nut effectively
Ground processes it.The hint that this accelerator provides can be used for automatically incoming interruption being oriented by PCU
To small nut, it is used for processing.
With reference now to Fig. 9, shown in show according to an embodiment of the invention at macronucleus 710
Sequential chart with the operation occurred in small nut 720.It can be seen that can be by allowing device interrupt quilt
It is directly provided to small nut 720, and in small nut, judges whether it can process interruption, realize macronucleus
The long sleep time of 710.If it can, macronucleus 710 may remain in sleep state,
And in small nut 720, process interruption.
With reference now to Figure 10, shown in be the graphic extension of amount of electricity saving according to an embodiment of the invention.
As shown in Figure 10, have from mobile C O state to deep low power state (such as, C6 state)
Transition conventional system in, the core power consumption of macronucleus from of a relatively high rank (such as, every time
Enter into the 500mW in CO state procedure) to the zero energy consumption level (medial view) in C6
Between change.On the contrary, in one embodiment of the invention (bottom view), to calling out of CO state
Wake up and can be left from macronucleus and be directed to small nut, thus, be not 500mW power-consumption level
Not, small nut can process CO state in much lower power level, such as, in the embodiment of Figure 10
In be 10mW.
Each embodiment can realize with many different system types.With reference now to Figure 11, shown in be
The block diagram of system according to an embodiment of the invention.As shown in figure 11, multicomputer system 600 is a little
To an interconnection system, and include the first processor 670 and coupled by point-to-point interconnection 650
Two processors 680.As shown in figure 11, each in processor 670 and 680 can be multinuclear
Processor, including (that is, processor core 674a and 674b and the process of the first and second processor cores
Device core 684a and 684b), although during the most more multinuclear may reside in processor.More specifically,
Each in processor may comprise macronucleus, small nut (and may also have medium core), accelerator
Etc. mixing, also have when at least macronucleus is in low power state, by wake up up be directed to minimum can
By the logic of core, as described herein.
Again referring to Figure 11, first processor 670 also includes memory controller hub (MCH) 672
With point-to-point (P-P) interface 676 and 678.Similarly, the second processor 680 includes MCH 682
With P-P interface 686 and 688.As shown in figure 11, MCH 672 and 682 couples the processor to phase
The memorizer answered, i.e. memorizer 632 and memorizer 634, they can be to be connected locally to accordingly
A part for the system storage (such as, DRAM) of processor.First processor 670 and second
Processor 680 can be coupled to chipset 690 by P-P interconnection 652 and 654 respectively.Such as figure
Shown in 11, chipset 690 includes P-P interface 694 and 698.
Additionally, chipset 690 also includes interface 692, interface 692 by P-P interconnection 639 and by core
Sheet collection 690 couples with high performance graphics engine 638.Chipset 690 can pass through again interface 696 coupling
Close to the first bus 616.As shown in figure 11, various input/output (I/O) equipment 614 and total
Line bridger 618 is alternatively coupled to the first bus 616, and bus bridge 618 is by the first bus 616
It is coupled to the second bus 620.Various equipment are alternatively coupled to the second bus 620, including, such as, key
Dish/mouse 622, communication equipment 626 and data storage cell 628, data storage cell 628 such as magnetic
Disk drive maybe can include other mass-memory units of code 630.Further, audio frequency I/O
624 are alternatively coupled to the second bus 620.Each embodiment can be included in other kinds of system,
Including such as smart cellular phone, panel computer, the mobile device of net book etc.
Each embodiment can realize with code, it is possible to is stored in the non-wink that have stored thereon instruction
Time storage medium on, instruction can be used to be programmed performing instruction to system.Storage medium
Can include but not limited to, any kind of disk, including floppy disk, CD, solid-state drive
(SSD), compact disc read-only memory (solid-state drive), Ray Disc Rewritable (CD-RW), with
And the semiconductor device of magneto-optic disk, such as read only memory (ROM) etc, such as dynamic random deposit
The random access of access to memory (DRAM) and static RAM (SRAM) etc is deposited
Reservoir (RAM), erasable programmable read only memory (EPROM), flash memory, electric erasable program
Read only memory (EEPROM), magnetic or optical card, or be suitable to any other of storage e-command
The medium of type.
Despite with reference to a limited number of embodiments, the present invention is described, but, those are proficient in this skill
The people of art will understand a lot of amendment and variant from which.Appended claims contains all such repair
Change with variant all by the real spirit and scope in the present invention.
Claims (20)
1. a processor, including:
Cryptography accelerators;
Video accelerator;
Memory Controller;
More than first core;
More than second core, described more than second core and described more than first core are homogeneities and have relatively low
Power consumption;
Interconnection, is used for coupling described more than first core and described more than second core;
The shared cache memory coupled with the most described more than first core;And
Make the core in described more than second core perform the logic of operation, wherein, be at least partially based on described second
The performance class of the described core in multiple cores, described logic is for making described core in described more than second core
Execution state is transferred to the core in described more than first core to make the described core in described more than first core
Perform described operation.
2. processor as claimed in claim 1, it is characterised in that described logic is for described first
When described core in multiple cores and the described core in described more than second core are in low power state, make described
Described core in more than second core rather than the described core in described more than first core interrupt in response to one and are called out
Wake up.
3. processor as claimed in claim 2, it is characterised in that described logic is for the bar at form
Mesh points out that the described core in described more than second core is in response to the previous middle pregnancy ceased with described interruption same type
When giving birth to undefined fault, make the described core in described more than first core rather than institute in described more than second core
State core to be waken up in response to described interruption.
4. processor as claimed in claim 2, it is characterised in that described logic is in response to described
Interrupt and the subset of the execution state of the described core in described more than first core is supplied to described more than second
Described core in core.
5. processor as claimed in claim 4, it is characterised in that in response in described more than second core
Described core can not process the determination of at least one operation asked, described logic is for from described more than second
Described core in individual core obtains the described subset of described execution state and for by described execution subsets of states
Remainder with the execution state of the described core in described more than first core of storage in temporary storage area
Merge.
6. processor as claimed in claim 2, it is characterised in that described video accelerator is used for performing
One task and for described interruption being sent to described logic when described task completes.
7. processor as claimed in claim 2, it is characterised in that described logic be used for analyzing multiple in
Disconnected, and if the major part of the plurality of interruption to be processed by the described core in described more than first core, then institute
State logic to be not responsive to described interruption and wake up the described core in described more than second core up, but wake up described up
Described core in more than one core.
8. processor as claimed in claim 1, it is characterised in that described processor includes that multinuclear processes
Device, described logic includes:
Wakeup logic;
State transfer logic;
Undefined process logic;And
Interrupt historical memory.
9. processor as claimed in claim 1, it is characterised in that also include interrupt control unit, be used for connecing
Receive multiple interruption and the plurality of interruption is guided to described more than first core and described more than second core
At least one in one or more cores.
10. a method, including:
Make the core in more than second core of processor perform operation, wherein, be at least partially based on described more than second
The performance class of the described core in individual core, described processor includes cryptography accelerators, video accelerator, storage
Device controller, more than first core, described more than second core, interconnection and with the most described more than first core coupling
The shared cache memory closed, described more than second core and described more than first core homogeneity and have relatively
Low-power consumption, described interconnection is for coupling described more than first core with described more than second core;And
The execution state making the described core in described more than second core is transferred in described more than first core
Core is to make the described core in described more than first core perform described operation.
11. methods as claimed in claim 10, it is characterised in that also include, at described more than first core
In described core and described core in described more than second core when being in low power state, make described more than second
Described core in individual core rather than the described core in described more than first core interrupt in response to one and are waken up.
12. methods as claimed in claim 11, it is characterised in that also including, the entry at form is pointed out
Described core in described more than second core produced not in response to the previous interruption with described interruption same type
During failure definition, make the described core in described more than first core rather than described core in described more than second core rings
Interrupt described in Ying Yu and be waken up.
13. methods as claimed in claim 11, it is characterised in that also include, in response to described interruption
The subset of the execution state of the described core in described more than first core is supplied in described more than second core
Described core.
14. methods as claimed in claim 13, it is characterised in that in response in described more than second core
Described core can not process the determination of at least one operation asked, described in from described more than second core
Core obtains the described subset of described execution state and for by described execution subsets of states and scratchpad area (SPA)
In territory, the remainder of the execution state of the described core in described more than first core of storage merges.
15. methods as claimed in claim 11, it is characterised in that also include, analyze multiple interruption, and
If the major part of the plurality of interruption to be processed by the described core in described more than first core, then it is not responsive to
Described interruption and wake up the described core in described more than second core up, but wake up the institute in described more than first core up
State core.
16. at least one computer-readable recording medium including instruction, described instruction makes one when executed
System is used for:
Make the core in more than second core of processor perform operation, wherein, be at least partially based on described more than second
The performance class of the described core in individual core, described processor includes cryptography accelerators, video accelerator, storage
Device controller, more than first core, described more than second core, interconnection and with the most described more than first core coupling
The shared cache memory closed, described more than second core and described more than first core homogeneity and have relatively
Low-power consumption, described interconnection is for coupling described more than first core with described more than second core;And
The execution state making the described core in described more than second core is transferred in described more than first core
Core is to make the described core in described more than first core perform described operation.
17. at least one computer-readable recording medium as claimed in claim 16, it is characterised in that also
Including instruction, described instruction make when executed the described system described core in described more than first core and
When described core in described more than second core is in low power state, make the described core in described more than second core
Rather than the described core in described more than first core interrupts in response to one and is waken up.
18. at least one computer-readable recording medium as claimed in claim 17, it is characterised in that also
Including instruction, described instruction makes described system point out described more than second core in the entry of form when executed
In described core when producing undefined fault in response to the previous interruption with described interruption same type, make institute
State the described core in more than first core rather than described core in described more than second core is in response to described interruption
It is waken up.
19. at least one computer-readable recording medium as claimed in claim 17, it is characterised in that also
Including instruction, described instruction makes described system in response to described interruption by described more than first when executed
The subset of the execution state of the described core in core is supplied to the described core in described more than second core.
20. at least one computer-readable recording medium as claimed in claim 19, it is characterised in that also
Including instruction, described instruction makes described system in response to the described core in described more than second core when executed
Can not process the determination of at least one operation asked, the described core from described more than second core obtains institute
State the described subset of execution state and for described execution subsets of states being stored in temporary storage area
Described more than first core in described core execution state remainder merge.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610369305.9A CN106095046A (en) | 2011-09-06 | 2011-09-06 | The processor architecture of power efficient |
CN201610364515.9A CN106020424B (en) | 2011-09-06 | 2011-09-06 | The processor architecture of power efficient |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610364515.9A CN106020424B (en) | 2011-09-06 | 2011-09-06 | The processor architecture of power efficient |
CN201180073263.XA CN103765409A (en) | 2011-09-06 | 2011-09-06 | Power efficient processor architecture |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180073263.XA Division CN103765409A (en) | 2011-09-06 | 2011-09-06 | Power efficient processor architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106020424A true CN106020424A (en) | 2016-10-12 |
CN106020424B CN106020424B (en) | 2019-08-06 |
Family
ID=57128003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610364515.9A Active CN106020424B (en) | 2011-09-06 | 2011-09-06 | The processor architecture of power efficient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106020424B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356445A (en) * | 2021-12-28 | 2022-04-15 | 山东华芯半导体有限公司 | Multi-core chip starting method based on large and small core architectures |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070245164A1 (en) * | 2004-08-05 | 2007-10-18 | Shuichi Mitarai | Information Processing Device |
US20080126747A1 (en) * | 2006-11-28 | 2008-05-29 | Griffen Jeffrey L | Methods and apparatus to implement high-performance computing |
US20080263324A1 (en) * | 2006-08-10 | 2008-10-23 | Sehat Sutardja | Dynamic core switching |
US20090248934A1 (en) * | 2008-03-26 | 2009-10-01 | International Business Machines Corporation | Interrupt dispatching method in multi-core environment and multi-core processor |
US20100030927A1 (en) * | 2008-07-29 | 2010-02-04 | Telefonaktiebolaget Lm Ericsson (Publ) | General purpose hardware acceleration via deirect memory access |
-
2011
- 2011-09-06 CN CN201610364515.9A patent/CN106020424B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070245164A1 (en) * | 2004-08-05 | 2007-10-18 | Shuichi Mitarai | Information Processing Device |
US20080263324A1 (en) * | 2006-08-10 | 2008-10-23 | Sehat Sutardja | Dynamic core switching |
US20080126747A1 (en) * | 2006-11-28 | 2008-05-29 | Griffen Jeffrey L | Methods and apparatus to implement high-performance computing |
US20090248934A1 (en) * | 2008-03-26 | 2009-10-01 | International Business Machines Corporation | Interrupt dispatching method in multi-core environment and multi-core processor |
US20100030927A1 (en) * | 2008-07-29 | 2010-02-04 | Telefonaktiebolaget Lm Ericsson (Publ) | General purpose hardware acceleration via deirect memory access |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356445A (en) * | 2021-12-28 | 2022-04-15 | 山东华芯半导体有限公司 | Multi-core chip starting method based on large and small core architectures |
CN114356445B (en) * | 2021-12-28 | 2023-09-29 | 山东华芯半导体有限公司 | Multi-core chip starting method based on large and small core architecture |
Also Published As
Publication number | Publication date |
---|---|
CN106020424B (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106155265A (en) | The processor architecture of power efficient | |
US9098274B2 (en) | Methods and apparatuses to improve turbo performance for events handling | |
CN102495756B (en) | The method and system that operating system switches between different central processing units | |
CN102566739A (en) | Multicore processor system and dynamic power management method and control device thereof | |
CN112486312A (en) | Low-power-consumption processor | |
CN106020424A (en) | Active power efficiency processor system structure | |
CN106095046A (en) | The processor architecture of power efficient | |
GB2537300A (en) | Power efficient processor architecture | |
JP6409218B2 (en) | Power efficient processor architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |