CN106020424A - Active power efficiency processor system structure - Google Patents

Active power efficiency processor system structure Download PDF

Info

Publication number
CN106020424A
CN106020424A CN201610364515.9A CN201610364515A CN106020424A CN 106020424 A CN106020424 A CN 106020424A CN 201610364515 A CN201610364515 A CN 201610364515A CN 106020424 A CN106020424 A CN 106020424A
Authority
CN
China
Prior art keywords
core
processor
interruption
state
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610364515.9A
Other languages
Chinese (zh)
Other versions
CN106020424B (en
Inventor
A·J·赫德瑞奇
R·G·伊利卡尔
R·艾耶
S·斯里尼瓦桑
J·摩西
S·马基嫩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to CN201610369305.9A priority Critical patent/CN106095046A/en
Priority to CN201610364515.9A priority patent/CN106020424B/en
Priority claimed from CN201180073263.XA external-priority patent/CN103765409A/en
Publication of CN106020424A publication Critical patent/CN106020424A/en
Application granted granted Critical
Publication of CN106020424B publication Critical patent/CN106020424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to an active power efficiency processor system structure. According to one embodiment, a method comprises the steps that an interrupt is received from an accelerator and responded, restoring signals are directly sent to a small core, a subset of an execution status of a larger core is provided for the first small core, whether the small core can process a request related to the interrupt or not is judged, if it is determined that the small core can process the request related to the interrupt, an operation corresponding to the request is executed in the small core, and if not, the execution status of the large core and the restoring signals are provided for the large core. Other embodiments of the invention are described and required to be protected.

Description

The processor architecture of power efficient
The application is filing date JIUYUE in 2011 Application No. 201180073263.X invention name on the 6th It is referred to as the divisional application of the Chinese patent application of " processor architecture of power efficient ".
Background
Generally, when it is possible, processor uses electric energy to save sleep pattern, such as according to advanced configuration With power interface (ACPI) standard (Rev.3.0b that such as, on October 10th, 2006 is issued). When core is idle or is not exclusively used, except voltage and frequency adjust (DVFS or ACPI performance state (P-state)) outside, these so-called C-state core low power states (ACPI C-state) are permissible Save electric energy.But, even in polycaryon processor context, core is usually from the sleep state come into force Wake up, to perform relatively simple operation, then, return to sleep state.This operation can be to power Efficiency has adverse effect on, because exiting and return low power state to there is delay and power consumption Cost.In state conversion process, electric energy may be consumed but the completeest in some type of processor Becoming useful work, this is unfavorable to power efficiency.
When exiting low power state, the example of operation to be processed includes: in input through keyboard, timer Disconnected, network interrupts, etc..For processing these operations, current operation system in the way of power sensitive System (OS) passes through the bigger data volume of single treatment or moves to without idle loop OS (without the cycle Property timer interrupt, the most fragmentary programming is interrupted), carry out reprogramming behavior.Another strategy is Use timer is polymerized, and wherein, multiple interrupt groups is processed altogether and simultaneously.But, except changing Outside the behavior of range sequence, these options all produce complexity, and can still result in power efficiency Low operation.Further, certain form of software (such as, media play) can be by request frequency Numerous periodic wakeup (no matter how many job demand complete), and attempt defeating hardware power efficiency machine System.So, can be unwanted from deep C-state by reducing without idle loop/timer aggregation strategy The number of times waken up, saves certain power, but, they need to carry out OS the change of invasive, And may devote a tremendous amount of time through calculating ecosystem, because such change is until operating system Redaction be distributed before will not be implemented.
Accompanying drawing is sketched
Fig. 1 is the block diagram of processor according to an embodiment of the invention.
Fig. 2 is the block diagram of processor according to another embodiment of the present invention.
Fig. 3 is the flow chart recovering stream option between core according to an embodiment of the invention.
Fig. 4 is the flow chart of method according to an embodiment of the invention.
Fig. 5 is the flow process of the method for transmitting execution state according to an embodiment of the invention Figure.
Fig. 6 is the block diagram of processor according to still a further embodiment.
Fig. 7 shows the block diagram of the processor of the further embodiment according to the present invention.
Fig. 8 is the block diagram of processor according to still a further embodiment.
Fig. 9 is sequential chart according to an embodiment of the invention.
Figure 10 is the graphic extension of amount of electricity saving according to an embodiment of the invention.
Figure 11 is the block diagram of system according to an embodiment of the invention.
Detailed description of the invention
In various embodiments, in heterogeneous processor environment, average power consumption can reduce.This Heterogeneous environment, due to system and power efficiency reason, can include large-scale quick core and less more The core of power efficient.Further, each embodiment can be to the operating system performed on a processor (OS) transparent mode provides this power to control.But, the scope of the present invention is not limited only to isomery Type environment, it is also possible to for the environment of homogeneity (with transparent to OS, but the angle of not necessarily hardware isomery For degree), (such as, in multi-processor environment, make core as much as possible reducing mean power Sleep).Each embodiment can hardware-accelerated environment (such as its center usually sleep based on flat Plate computer and system on chip (SoC) architecture) in the most suitable.
It is said that in general, each embodiment by being directed to less core and the biggest by all wake-up signals Core, carry out power control.So, when system 95% is idle, mean power can reduce super Cross twice.As described below, in many examples, the core that this is less can be separated with OS. That is, the existence of this less core is unknown for OS, and so, OS is invisible in this verification.As This, each embodiment can be logical to the transparent mode of OS and the application program performed on a processor Cross processor hardware and be provided with the processor operation of power efficiency.
With reference now to Fig. 1, shown in be the block diagram of processor according to an embodiment of the invention.As Shown in Fig. 1, processor 100 can be to have at the heterogeneous of several macronucleus, small nut and accelerator Reason device.Although herein is in described in the context of polycaryon processor, however, it will be understood that real Execute example unrestricted, in each realization, can be in SoC or other processing equipments based on quasiconductor. Note that accelerator queue based on input service can perform work, no matter whether processor core It has been energized.In the embodiment in figure 1, processor 100 includes multiple macronucleus.Shown spy Determine in embodiment, it is shown that two such core 110a and 110b (in general manner, macronucleus 110), Although it will be appreciated that plural such macronucleus can be provided.In each realization, these macronucleus Can be to there is relative complex pipeline architecture and calculate (CISC) body according to sophisticated vocabulary Unordered (out-of-order) processor that architecture operates.
It addition, processor 100 also includes multiple small nut 120a-120n (in general manner, small nut 120). Although showing 8 such cores in the embodiment in figure 1, however, it will be understood that the present invention Scope is not limited in this respect.In various embodiments, small nut 120 can be power efficient (in-order) processor in order, such as, with according to CISC or Jing Ke Cao Neng (RISC) Architecture performs instruction.In some implementations, the two or more cores in these cores can be gone here and there Connection is coupled, to perform relevant treatment, such as, if multiple macronucleus is in power save mode, that , one or more less cores may be at movable to perform work, and otherwise these work will wake up up Macronucleus.In many examples, small nut 120 can be transparent to OS, although in other embodiments In, small nut and macronucleus can be exposed to OS, have config option to use.It is said that in general, can be in difference Embodiment in use macronucleus and any core between small nut mixing.For example, it is possible to each greatly Core provides single small nut, or in other embodiments, single small nut can be associated with multiple macronucleus.
As used herein, term " macronucleus " can be to have relative complex design and with " little Core " compare the processor core that may consume relatively large chip area, and small nut can have complexity Less design also consumes smaller chip area.It addition, smaller core compared to For bigger core, power efficiency is higher, because they may have less thermal design by bigger core Power consumption (TDP).However, it is to be appreciated that compared with macronucleus, less core manages ability side at which Face is restricted.Such as, these less cores may not process all behaviour feasible in macronucleus Make.It addition, less core is probably relatively inefficient when instruction processes.That is, in macronucleus Ratio is more quickly performed instruction in small nut.
It can further be seen that macronucleus 110 and small nut 120 may be coupled to interconnect 130.Not In same embodiment, it is possible to achieve the different realization of this interconnection structure.Such as, in some embodiment In, interconnection structure can according to Front Side Bus (FSB) architecture orQuick Path Interconnect (QPI) agreement.In other embodiments, interconnection structure can be according to a given system interconnection fabric.
Again referring to Fig. 1, multiple accelerator 140a-140c are also coupled to interconnect 130.Although this Bright scope is unrestricted in this regard, but, accelerator can also include Media Processor, such as Audio frequency and/or video processor, cipher processor, fixed-function unit, etc..These accelerators can Design with the same design personnel by design core, or can be to include independent the in processor Tripartite's intelligent attributes (IP) block.It is said that in general, dedicated processes task can compare in these accelerators They efficiently perform in macronucleus or small nut, either for performance or with regard to power consumption For.Although utilizing in the embodiment in figure 1 shown in this specific implementation, it will be appreciated that this The scope of invention is unrestricted in this regard.Such as, only two kinds of core (that is, macronucleus is replaced And small nut), other embodiments can have the hierarchical structure of multiple core, including at least macronucleus, medium Core and small nut, medium core has a chip area bigger than small nut, but the chip face less than macronucleus Long-pending, and there is the corresponding power consumption between macronucleus and the power consumption of small nut.Implement at other In example, small nut can be embedded in bigger core, such as, as the logic of bigger core and structure Subset.
Although additionally, be shown as including multiple macronucleus and multiple small nut in the embodiment in figure 1, but, Some such as moving processor or SoC etc is realized, single macronucleus and single can be provided only Small nut.Specifically, with reference now to Fig. 2, shown in be processor according to another embodiment of the present invention Block diagram, wherein, processor 100 " includes single macronucleus 110 and single small nut 120, and interconnection 130 and accelerator 140a-c.As mentioned above, this realization may be adapted to Mobile solution.
As the example power figure of typical macronucleus, power consumption can be about substantially 6000 milliwatts (mW), and for medium core, power consumption can be the most substantially 500mW, and for very Small nut, power consumption can be the most substantially 15mW.In avoiding the one waking up macronucleus up to realize, Significant Power Benefit can be realized.
Each embodiment makes bigger, and the slightly lower core of power efficiency is maintained at the time of low power sleep state Than they in other cases institute retainable longer.By interrupting and other core wake events orientation To less core rather than bigger core, less core can run the longer time, and wake up up more Frequently, but, this than wake up up macronucleus perform footy task that such as data move etc and Say more power efficient.Note that as described below for described by some operation, such as when less Core may support vector computing (such as, AVX computing), complicated addressing mode or floating-point (FP) During computing, macronucleus is energized to perform.In this case, can be by wake-up signal from small nut weight Newly it is routed to macronucleus.
Such as, when performing hardware-accelerated 1080p video playback on a processor, per second enter Go out the transition more than 1000 times and the almost interruption of 1200 times of core C6 state.If using this A part for bright embodiment, even these wake events is redirected to less core, the most permissible Realize significant amount of electricity saving.
Fig. 3 outlines the recovery stream option between core according to an embodiment of the invention.Such as Fig. 3 Shown in, there is software domain 210 and hardware domain 220.It is said that in general, software domain 210 corresponding to relative to The OS operation of power management, such as, realizes according to ACPI.It is said that in general, OS, dispatch according to it Mechanism, the understanding to upcoming task based on it, in multiple C-state can be selected, Low-power mode is entered into request processor.Such as, OS can send MWAIT and call, this tune With including just at requested specific low power state.
It is said that in general, CO is corresponding to performing the normal operating state of instruction, and state C1-C3 is OS Lower power state, each state all has the amount of electricity saving of different stage, and returns to CO state The delay of corresponding different stage.It can be seen that depend on the intended workload of processor, OS Busy state, such as, in OS CO or multiple idle condition, such as, OS can be selected C-state C1-C3.Each in these idle conditions may be mapped at processor hardware Corresponding hardware low power state under control.So, processor hardware can be by given OS C-shaped State is mapped to the hardware C-state of correspondence, and this C-state can provide than by the amount of electricity saving specified by OS Bigger amount of electricity saving.It is said that in general, shallower C-state (such as, Cl) is compared with deeper C-state (such as, C3) saves less power, but has relatively low recovery time.In various embodiments, Hardware domain 220 and OS C-state can be controlled by the power of processor to the mapping of processor C-state Unit (PCU) performs, although the scope of the present invention is unrestricted in this regard.This mapping is permissible History based on former power management request based on OS.Moreover, it is judged that can be based on whole system State, configuration information etc..
It addition, PCU or other processor logics can be configured to be directed to all wake events Minimum available core (in various embodiments, can be the sightless core of OS).As it is shown on figure 3, When exiting from given hardware based idle condition, control to be directly returned to the available core of minimum, by This state is transferred to the core of this minimum.By contrast, in custom hardware/software rejuvenation, only control Return to macronucleus.It is said that in general, OS based on intended free time and recovers to postpone requirement, select C-state, this C-state is mapped to hardware C-state by architecture.So, such as the embodiment of Fig. 3 Shown in, all recovery signals (such as interrupting) are routed to the available core of minimum, and this core judges that it is No can process recovery operation, or on the contrary, wake-up signal is sent to bigger core to continue.Please note Meaning, each embodiment does not disturb existing P-state or C-state automatically to demote, wherein, at existing P During state or C-state are demoted automatically, hardware automatically selects based on the experimental efficiency measured With the relatively low hardware C-state recovering to postpone.Note that PCU or another programmable entity can To check incoming wake events judges to route them to which core (macronucleus or small nut).
As described above, in some implementations, small nut itself can not allow OS and application program Software is seen.Such as, little-macronucleus pairing can be separated, and does not allow application software see. At low power state, all cores can be sleep, and accelerator (such as video decoding accelerator) Perform the Given task of such as decoding task etc.When accelerator is finished data, its orientation wakes up letter up Number, may come from the other data of small nut with request, this small nut wakes up up and judges to realize this Simple data moves and operates without waking up macronucleus up, so, saves electric energy.If timer interrupts Arrive and small nut wakes up up and detects and there is complicated vector operation (such as 256 bit in instruction stream AVX instruct), then may wake up up macronucleus with process complicated order (and this stream in other instruction), To shorten delay.In replacing realization, global hardware is observed mechanism and be may be located in PCU, or is positioned at Another non-nuclear location near PCU, or the unitary part as the hardware logic on globally interconnected, or Supplementing as the inner control logic to small nut, global hardware observes mechanism can detect that small nut is met Instructing to AVX, it is possible to generate undefined instruction fault, this fault may result in small nut and closes, And after waking up bigger core up, instruction stream is re-introduced into this bigger core.Note that this behavior Instruction can be not limited to, and expand to configuration or feature.Such as, if small nut runs into only existing big The write of the configuration space on core, it can request that the waking up up of macronucleus.
With reference now to Fig. 4, shown in be the flow chart of method according to an embodiment of the invention.Note that Depending on given realization, the method for Fig. 4 can be performed by various agencies.Such as, real at some Executing in example, method 300 can be partially by the such as power control unit etc in processor System agent circuit (may be in System Agent or the non-core part of processor) realizes.At other In embodiment, method 300 can be partially by the such as power control logic etc in interconnection structure Interconnection logic realize, interconnection logic can such as from be coupled to interconnection structure accelerator receive Disconnected, and interruption is forwarded to the position selected.
As shown in Figure 4, method 300 can start (frame by macronucleus and small nut are placed in sleep state 310).I.e., it is assumed that do not have the operation of activity to be performed in core.So, will be able to put with them In selected low power state, to reduce power consumption.Although core is not likely to be movable, but, Other agencies in the SoC of processor or the most one or more accelerator etc can perform task. At frame 320, interruption can be received from such accelerator.When accelerator completes task, runs into mistake Miss or when accelerator needs other data or other process (such as, will be given by another assembly Core) perform time, can send this interrupt.Controlling to enter frame 330, there, logic can will be recovered Signal is sent directly to small nut.That is, logic can be programmed to when macronucleus and small nut are both in low During power rating, it is sent to small nut (or is sent to recovering signal in multiple such small nut all the time A selected small nut, depends on that system realizes).By interruption directly and is sent to small nut all the time, Many situations of the interruption being asked operation can be processed for those small nuts, can avoid in macronucleus Bigger power consumption.Note that and can add certain form of filtration or cache to frame 330 Mechanism, in order to some interrupt source is routed to a core or another core the most all the time, with balance Performance and power.
Again referring to Fig. 4, control to turn next to rhombus 340, there, it can be determined that small nut whether may be used To process the request being associated with interruption.Although the scope of the present invention is unrestricted in this regard, but, Judge in some embodiments it is possible to itself carries out this at small nut after small nut is waken up.Or Person, the logic of the method performing Fig. 4 can perform judgement and (in this case, send out to small nut Before sending recovery signal, can be performed this and analyze).
As example, small nut can performance requirement based on small nut and/or and/or instruction set architecture (ISA) ability judges whether it can process asked operation.If small nut does not has due to it ISA supports and can not process asked operation, then the front end logic of small nut can resolve and to receive Instruction stream, and judge that at least one instruction in stream is not supported by small nut.Correspondingly, small nut can be sent out It has undefined instruction fault.This undefined fault can be sent to PCU (or another entity), should PCU (or another entity) can analyze the state of fault and small nut to judge that whether undefined fault is Due to small nut not used for process instruction hardware supported, if or it be real undefined fault. In the case of the latter, undefined fault may be forwarded to OS, for processing further.If therefore Barrier is that then PCU can will be transmitted to due to small nut not suitably for processing the hardware supported of instruction The execution state transfer of this small nut is to corresponding macronucleus, to process the instruction of request.
In other embodiments, when judge small nut have been carried out the long time or performance class the lowest time, The transmission between small nut and macronucleus of the execution state can occur.I.e., it is assumed that small nut has been carried out number Thousand or millions of processor cycles, to perform the task of request.More favourable owing to having in macronucleus Perform available, by by state transfer to macronucleus so that macronucleus can end task more quickly, permissible There is bigger power reduction.
Again referring to Fig. 4, if it is determined that the operation asked can be processed in small nut, then control to enter Frame 350, there, so, performs operation in small nut.For example, it is assumed that the operation searching request is several According to mobile operation, then small nut can perform asked process, if not having other tasks for small nut It is pending, then can again place it into low power state.
If instead in rhombus 340 judging, small nut can not process asked operation, such as, if Operation is the relative complex operation that small nut is configured without processing, then control to forward frame 360 to.There, may be used To send wake-up signal, such as, directly it is sent to macronucleus from small nut, so that macronucleus is energized.Accordingly Ground, controls to enter frame 370, and there, the operation of request so can perform in macronucleus.Note that Although utilizing this specific operation group to describe in the fig. 4 embodiment, however, it will be understood that this The scope of invention is unrestricted in this regard.
So, in various embodiments, it is provided that allow hardware interrupts and other wake-up signals by directly Connect and be routed to small nut without waking up the mechanism of macronucleus up.Note that in different realizations, small nut its Itself or supervision agency can decide whether can to complete in the case of not waking up macronucleus up wake-up signal and Process.It is in the case of representational, much higher compared with the core that the power efficiency of little core can be bigger, And result can only support that macronucleus is supported the subset of instruction.To hold when waking up from low power state Many operations of row can be offloaded to simpler, and the higher core of power efficiency, to avoid at isomery Environment wakes up up bigger more strength core (due to performance or power efficiency reason in isomerous environment, The core of many all sizes is included in systems).
With reference now to Figure 55, shown in be according to an embodiment of the invention for transmitting execution state The flow chart of method.As it is shown in figure 5, in one embodiment, method 380 can be by PCU's Logic performs.This logic can trigger in response to macronucleus is placed in the request of low power state.Ring Should be in such request, method 380 can be from the beginning of frame 382, there, the execution state of macronucleus Can be stored in scratchpad area (SPA).Note that this scratchpad area (SPA) can associate with nuclear phase Single user state conservation zone, or, it can sharing at such as last level cache (LLC) etc In Cache.Although the scope of the present invention is unrestricted in this regard, but, the state of execution can Including general register, state and configuration register, execution flag etc..It addition, at this point it is possible to Perform the extra operation making macronucleus be placed in low power state.Such operation includes emptying internal delaying Deposit, and other states and for closing the signaling of given core.
Again referring to Fig. 5, it can be determined that whether small nut recovers (rhombus 384).This recovery can be as sound Should receive in interruption recovers the result of signal and occurs, and this interruption adds from such as processor Speed device.The part recovered as small nut, controls to enter frame 386, there, can store from interim District extracts at least some of of big nuclear state.More specifically, this part extracted can be macronucleus Execution state in the part that will be used by small nut.As example, this status sections can include Master register content, the various labellings of such as some execution flag etc, machine status register(MSR) etc.. But, some state may not be extracted, such as with macronucleus present in but in small nut the most right Answer the state that one or more performance elements of performance element are associated.Can be by this extraction of state Part is sent to small nut (frame 388), so, makes little nuclear energy perform any conjunction in response to given interruption Suitable operation.Although utilizing in the 5 embodiment of figure 5 shown in this specific implementation, however, it is possible to reason Solving, the scope of the present invention is unrestricted in this regard.
With reference now to Fig. 6, shown in be the block diagram of processor according to an embodiment of the invention.Such as Fig. 6 Shown in, processor 400 can be polycaryon processor, including can be to more than first core disclosed in OS 410i-410n, to more than second transparent for OS core 410a-x.
It can be seen that various cores can be coupled to include the system generation of various assembly by interconnection 415 Reason or non-core 420.It can be seen that non-core 420 can include the shared high speed as last level cache Buffer 430.It addition, non-core can include integrated Memory Controller 440, various interface 450a-n, Power control unit 455, and Advanced Programmable Interrupt Controllers APICs (APIC) 465.
PCU 450 can include the operation realizing power efficient according to an embodiment of the invention Various logic.It can be seen that PCU 450 waking up up of can including performing waking up up as described above Logic 452.So, logic 452 may be configured to first wake up up small nut.But, this logic Can be configured dynamically, with in some cases, not perform such small nut and directly wake up up.Such as, System can be configured dynamically into saves operation for electric energy, such as, when system is to utilize battery to transport During the mobile system gone.In this case, logic may be configured to wake up up all the time small nut.Phase Instead, if system is attached to the server system of wall power source, desktop computer or laptop computer system System, then embodiment can provide selection based on user, to select to postpone and performance rather than amount of electricity saving. So, in this case, wakeup logic 452 can be configured to respond to interrupt, and wakes up up big Core, and not small nut.When judge substantial amounts of small nut wake up up can cause being redirected to macronucleus time, can hold Similar the waking up up of row macronucleus.
For realizing the operation of power efficient further, PCU 450 can also include can at macronucleus and Carry out performing the state transfer logic 454 of state transfer between small nut.As discussed above, in low merit Rate state, it is possible to use this logic obtains the execution state of the macronucleus stored in temporary memory, And extract at least some of of this state, to be supplied to small nut when small nut wakes up up.
Further, PCU 450 can include interrupting historical memory 456.Such memorizer is permissible Including multiple entries, each entry all identifies the interruption occurred in system operation procedure and interruption is No successfully processed by small nut.Then, based on this history, when receiving given interruption, Ke Yifang Ask the corresponding entry of this memorizer, to judge that the previous interruption of same type is the most successfully by small nut Process.If it is, the interruption of new incoming can be directed to identical small nut by PCU.On the contrary, if Judging based on this history, such interruption is not successfully processed (or with not making by small nut The low performance that people is satisfied), on the contrary, interruption can be sent to macronucleus.
Undefined process logic 458 can also be included again referring to Fig. 6, PCU 450.Such logic can The undefined fault sent by small nut with reception.Based on this logic, the information in small nut can be accessed. It is then possible to judge whether undefined fault is owing to lacking the support for the instruction in small nut or another A kind of reason.Judging in response to this, logic can cause the state of small nut (to be deposited with macronucleus execution state Storage is in scratchpad area (SPA)) the merging of remainder and be sent to macronucleus after that for centering Disconnected process, or undefined fault is sent to OS for further processing.When judging small nut When can not process interruption, obtain a part for the execution state being supplied to small nut immediately from small nut, and protect It is stored back into temporary storage location, correspondingly, can be by small nut power-off.It is then possible to this is merged The residue of state and macronucleus performs state and is provided back to macronucleus so that macronucleus can process small nut can not The interruption processed.It shall yet further be noted that and can deal with improperly in response to the such of small nut, in can writing Entry in disconnected historical memory 456.Although utilizing this certain logic to illustrate in the embodiment in fig 6 , however, it will be understood that the scope of the present invention is unrestricted in this regard.Such as, real at other Executing in example, the various logic of PCU 450 can realize with unity logic block.
APIC 465 can receive various interruption (such as, send) from accelerator, and correspondingly will Interrupt the one or more cores being directed to give.In certain embodiments, for small nut is maintained OS Hiding, APIC 465 can dynamically by incoming interruption, (each interruption can include and its phase The APIC identifier of association) it is remapped to relevant with small nut from the APIC ID being associated to macronucleus The APIC ID of connection.
With further reference to Fig. 6, processor 400 is permissible, such as, by memory bus, with system Memorizer 460 communicates.It addition, by interface 450, it may be connected to such as ancillary equipment, big The various chip component of capacity memory etc.Although utilizing this special in the embodiment in fig 6 Shown in fixed realization, but, the scope of the present invention is unrestricted in this regard.
Note that various architecture realize macronucleus and the different coupling of small nut or integrated be also permissible 's.As example, the degree of coupling between these diverse cores may rely on die area, The various design optimization parameters that power, performance are relevant with response.
With reference now to Fig. 7, shown in be the block diagram of processor according to another embodiment of the invention. As it is shown in fig. 7, processor 500 can be the real heterogeneous including macronucleus 510 and small nut 520 Processor.It can be seen that each processor can be with the private cache memory layer of their own Aggregated(particle) structure is (i.e., it is possible to include the cache memory 515 of 1 grade and 2 grades cache memory With 525) it is associated.Core can be coupled by annular interconnection 530 again.Multiple accelerator 540a It is also coupled to 540b and LLC (that is, L3 cache 550 can be shared cache) Annular interconnection.In this implementation, the execution state between two cores can interconnect 530 by annular Transmission.As described above, the execution state of macronucleus 500 can enter into given low-power shape It is stored in before state in cache 550.Then, when waking up up of small nut 520, at least this holds The subset of row state can be provided to small nut, to read core, in order to performs to trigger its operation waken up up. So, in the embodiment of Fig. 7, core is by this annular interconnection loose couplings.Although for ease of diagram Utilize shown in single macronucleus and single small nut, however, it is understood that the scope of the present invention is in this respect Unrestricted.By using the realization of such as Fig. 7, (can be can also is that by annular solid architecture Bus or interconnection structure architecture) process any state to be switched or communication.Or, In other embodiments, this communication (can not shown in the figure 7 by the dedicated bus between two cores Go out).
With reference now to Fig. 8, shown in be the block diagram of processor according to still a further embodiment. As shown in Figure 8, processor 500 " can be mixed type heterogeneous processor, wherein, at macronucleus and little Close-coupled or integrated is had between core.Specifically, as shown in Figure 8, macronucleus 510 and small nut 520 Can share cache memory 518, this memorizer 518 is the most permissible Including 1 grade and 2 grades of caches.So, perform state can by this cache memory from One core is transferred to other core, so, it is to avoid by the delay of the communication of annular interconnection 530. Note that this layout faster obtains more owing to the data reduced move the communication between expense and core Low power, however it is possible to underaction.
It should be noted that Fig. 7 and 8 merely illustrates two kinds of possible realizations and (merely illustrates a limited number of Core).Can also there is more realization, including the different layout of core, the combination of two schemes, two kinds Above core of type etc..In the variant of Fig. 8, two cores can share some assembly, such as Performance element, instruction pointer or register file.
As discussed, each embodiment can be fully transparent, invisible to operating system, so, There is no software modification, the prolongation of the minimum recovery time from C-state.In other embodiments, The existence of small nut and availability can be open to OS, so so that it is interruption to be provided that OS can make To small nut or the decision of macronucleus.Additionally, each embodiment can also be in such as basic input output system Etc (BIOS) systems soft ware provides to the open macronucleus of OS and small nut, or whether configuration discloses The mechanism of small nut.Each embodiment can increase significantly from the recovery time of C-state, but, this is can With accept, because current platform is recovering variant in terms of delay, currently, core state The time being resumed, do not perform useful work.Small nut and the most different ratio of macronucleus can be from micro- Inappreciaple difference changes between bigger microarchitecture difference.According to each embodiment, isomery core Between most of main distinctions can be die area and by the power of karyophthisis.
In some implementations, it is provided that control mechanism, in order to if be detected that macronucleus is big when recovering Part-time all wakes, then can avoid waking up small nut up, it is possible to directly wake up macronucleus up extremely Reach predetermined time span less to keep performance.Note that in certain embodiments, usually by institute The mechanism having interruption and other wake-up signals to be re-introduced into small nut or macronucleus can be to software (system Software with user class) open, this depends on application and the power of system and performance requirement.As One such example, it is provided that the instruction of user class, be directed to specify by wake operation Core.Such instruction can be analogous to the variant of the instruction of MWAIT.
In certain embodiments, accelerator can will be sent to PCU or other pipes with the hint interrupted Reason agency, to point out that asked operation is relatively simple operation, thus can in small nut effectively Ground processes it.The hint that this accelerator provides can be used for automatically incoming interruption being oriented by PCU To small nut, it is used for processing.
With reference now to Fig. 9, shown in show according to an embodiment of the invention at macronucleus 710 Sequential chart with the operation occurred in small nut 720.It can be seen that can be by allowing device interrupt quilt It is directly provided to small nut 720, and in small nut, judges whether it can process interruption, realize macronucleus The long sleep time of 710.If it can, macronucleus 710 may remain in sleep state, And in small nut 720, process interruption.
With reference now to Figure 10, shown in be the graphic extension of amount of electricity saving according to an embodiment of the invention. As shown in Figure 10, have from mobile C O state to deep low power state (such as, C6 state) Transition conventional system in, the core power consumption of macronucleus from of a relatively high rank (such as, every time Enter into the 500mW in CO state procedure) to the zero energy consumption level (medial view) in C6 Between change.On the contrary, in one embodiment of the invention (bottom view), to calling out of CO state Wake up and can be left from macronucleus and be directed to small nut, thus, be not 500mW power-consumption level Not, small nut can process CO state in much lower power level, such as, in the embodiment of Figure 10 In be 10mW.
Each embodiment can realize with many different system types.With reference now to Figure 11, shown in be The block diagram of system according to an embodiment of the invention.As shown in figure 11, multicomputer system 600 is a little To an interconnection system, and include the first processor 670 and coupled by point-to-point interconnection 650 Two processors 680.As shown in figure 11, each in processor 670 and 680 can be multinuclear Processor, including (that is, processor core 674a and 674b and the process of the first and second processor cores Device core 684a and 684b), although during the most more multinuclear may reside in processor.More specifically, Each in processor may comprise macronucleus, small nut (and may also have medium core), accelerator Etc. mixing, also have when at least macronucleus is in low power state, by wake up up be directed to minimum can By the logic of core, as described herein.
Again referring to Figure 11, first processor 670 also includes memory controller hub (MCH) 672 With point-to-point (P-P) interface 676 and 678.Similarly, the second processor 680 includes MCH 682 With P-P interface 686 and 688.As shown in figure 11, MCH 672 and 682 couples the processor to phase The memorizer answered, i.e. memorizer 632 and memorizer 634, they can be to be connected locally to accordingly A part for the system storage (such as, DRAM) of processor.First processor 670 and second Processor 680 can be coupled to chipset 690 by P-P interconnection 652 and 654 respectively.Such as figure Shown in 11, chipset 690 includes P-P interface 694 and 698.
Additionally, chipset 690 also includes interface 692, interface 692 by P-P interconnection 639 and by core Sheet collection 690 couples with high performance graphics engine 638.Chipset 690 can pass through again interface 696 coupling Close to the first bus 616.As shown in figure 11, various input/output (I/O) equipment 614 and total Line bridger 618 is alternatively coupled to the first bus 616, and bus bridge 618 is by the first bus 616 It is coupled to the second bus 620.Various equipment are alternatively coupled to the second bus 620, including, such as, key Dish/mouse 622, communication equipment 626 and data storage cell 628, data storage cell 628 such as magnetic Disk drive maybe can include other mass-memory units of code 630.Further, audio frequency I/O 624 are alternatively coupled to the second bus 620.Each embodiment can be included in other kinds of system, Including such as smart cellular phone, panel computer, the mobile device of net book etc.
Each embodiment can realize with code, it is possible to is stored in the non-wink that have stored thereon instruction Time storage medium on, instruction can be used to be programmed performing instruction to system.Storage medium Can include but not limited to, any kind of disk, including floppy disk, CD, solid-state drive (SSD), compact disc read-only memory (solid-state drive), Ray Disc Rewritable (CD-RW), with And the semiconductor device of magneto-optic disk, such as read only memory (ROM) etc, such as dynamic random deposit The random access of access to memory (DRAM) and static RAM (SRAM) etc is deposited Reservoir (RAM), erasable programmable read only memory (EPROM), flash memory, electric erasable program Read only memory (EEPROM), magnetic or optical card, or be suitable to any other of storage e-command The medium of type.
Despite with reference to a limited number of embodiments, the present invention is described, but, those are proficient in this skill The people of art will understand a lot of amendment and variant from which.Appended claims contains all such repair Change with variant all by the real spirit and scope in the present invention.

Claims (20)

1. a processor, including:
Cryptography accelerators;
Video accelerator;
Memory Controller;
More than first core;
More than second core, described more than second core and described more than first core are homogeneities and have relatively low Power consumption;
Interconnection, is used for coupling described more than first core and described more than second core;
The shared cache memory coupled with the most described more than first core;And
Make the core in described more than second core perform the logic of operation, wherein, be at least partially based on described second The performance class of the described core in multiple cores, described logic is for making described core in described more than second core Execution state is transferred to the core in described more than first core to make the described core in described more than first core Perform described operation.
2. processor as claimed in claim 1, it is characterised in that described logic is for described first When described core in multiple cores and the described core in described more than second core are in low power state, make described Described core in more than second core rather than the described core in described more than first core interrupt in response to one and are called out Wake up.
3. processor as claimed in claim 2, it is characterised in that described logic is for the bar at form Mesh points out that the described core in described more than second core is in response to the previous middle pregnancy ceased with described interruption same type When giving birth to undefined fault, make the described core in described more than first core rather than institute in described more than second core State core to be waken up in response to described interruption.
4. processor as claimed in claim 2, it is characterised in that described logic is in response to described Interrupt and the subset of the execution state of the described core in described more than first core is supplied to described more than second Described core in core.
5. processor as claimed in claim 4, it is characterised in that in response in described more than second core Described core can not process the determination of at least one operation asked, described logic is for from described more than second Described core in individual core obtains the described subset of described execution state and for by described execution subsets of states Remainder with the execution state of the described core in described more than first core of storage in temporary storage area Merge.
6. processor as claimed in claim 2, it is characterised in that described video accelerator is used for performing One task and for described interruption being sent to described logic when described task completes.
7. processor as claimed in claim 2, it is characterised in that described logic be used for analyzing multiple in Disconnected, and if the major part of the plurality of interruption to be processed by the described core in described more than first core, then institute State logic to be not responsive to described interruption and wake up the described core in described more than second core up, but wake up described up Described core in more than one core.
8. processor as claimed in claim 1, it is characterised in that described processor includes that multinuclear processes Device, described logic includes:
Wakeup logic;
State transfer logic;
Undefined process logic;And
Interrupt historical memory.
9. processor as claimed in claim 1, it is characterised in that also include interrupt control unit, be used for connecing Receive multiple interruption and the plurality of interruption is guided to described more than first core and described more than second core At least one in one or more cores.
10. a method, including:
Make the core in more than second core of processor perform operation, wherein, be at least partially based on described more than second The performance class of the described core in individual core, described processor includes cryptography accelerators, video accelerator, storage Device controller, more than first core, described more than second core, interconnection and with the most described more than first core coupling The shared cache memory closed, described more than second core and described more than first core homogeneity and have relatively Low-power consumption, described interconnection is for coupling described more than first core with described more than second core;And
The execution state making the described core in described more than second core is transferred in described more than first core Core is to make the described core in described more than first core perform described operation.
11. methods as claimed in claim 10, it is characterised in that also include, at described more than first core In described core and described core in described more than second core when being in low power state, make described more than second Described core in individual core rather than the described core in described more than first core interrupt in response to one and are waken up.
12. methods as claimed in claim 11, it is characterised in that also including, the entry at form is pointed out Described core in described more than second core produced not in response to the previous interruption with described interruption same type During failure definition, make the described core in described more than first core rather than described core in described more than second core rings Interrupt described in Ying Yu and be waken up.
13. methods as claimed in claim 11, it is characterised in that also include, in response to described interruption The subset of the execution state of the described core in described more than first core is supplied in described more than second core Described core.
14. methods as claimed in claim 13, it is characterised in that in response in described more than second core Described core can not process the determination of at least one operation asked, described in from described more than second core Core obtains the described subset of described execution state and for by described execution subsets of states and scratchpad area (SPA) In territory, the remainder of the execution state of the described core in described more than first core of storage merges.
15. methods as claimed in claim 11, it is characterised in that also include, analyze multiple interruption, and If the major part of the plurality of interruption to be processed by the described core in described more than first core, then it is not responsive to Described interruption and wake up the described core in described more than second core up, but wake up the institute in described more than first core up State core.
16. at least one computer-readable recording medium including instruction, described instruction makes one when executed System is used for:
Make the core in more than second core of processor perform operation, wherein, be at least partially based on described more than second The performance class of the described core in individual core, described processor includes cryptography accelerators, video accelerator, storage Device controller, more than first core, described more than second core, interconnection and with the most described more than first core coupling The shared cache memory closed, described more than second core and described more than first core homogeneity and have relatively Low-power consumption, described interconnection is for coupling described more than first core with described more than second core;And
The execution state making the described core in described more than second core is transferred in described more than first core Core is to make the described core in described more than first core perform described operation.
17. at least one computer-readable recording medium as claimed in claim 16, it is characterised in that also Including instruction, described instruction make when executed the described system described core in described more than first core and When described core in described more than second core is in low power state, make the described core in described more than second core Rather than the described core in described more than first core interrupts in response to one and is waken up.
18. at least one computer-readable recording medium as claimed in claim 17, it is characterised in that also Including instruction, described instruction makes described system point out described more than second core in the entry of form when executed In described core when producing undefined fault in response to the previous interruption with described interruption same type, make institute State the described core in more than first core rather than described core in described more than second core is in response to described interruption It is waken up.
19. at least one computer-readable recording medium as claimed in claim 17, it is characterised in that also Including instruction, described instruction makes described system in response to described interruption by described more than first when executed The subset of the execution state of the described core in core is supplied to the described core in described more than second core.
20. at least one computer-readable recording medium as claimed in claim 19, it is characterised in that also Including instruction, described instruction makes described system in response to the described core in described more than second core when executed Can not process the determination of at least one operation asked, the described core from described more than second core obtains institute State the described subset of execution state and for described execution subsets of states being stored in temporary storage area Described more than first core in described core execution state remainder merge.
CN201610364515.9A 2011-09-06 2011-09-06 The processor architecture of power efficient Active CN106020424B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610369305.9A CN106095046A (en) 2011-09-06 2011-09-06 The processor architecture of power efficient
CN201610364515.9A CN106020424B (en) 2011-09-06 2011-09-06 The processor architecture of power efficient

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610364515.9A CN106020424B (en) 2011-09-06 2011-09-06 The processor architecture of power efficient
CN201180073263.XA CN103765409A (en) 2011-09-06 2011-09-06 Power efficient processor architecture

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201180073263.XA Division CN103765409A (en) 2011-09-06 2011-09-06 Power efficient processor architecture

Publications (2)

Publication Number Publication Date
CN106020424A true CN106020424A (en) 2016-10-12
CN106020424B CN106020424B (en) 2019-08-06

Family

ID=57128003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610364515.9A Active CN106020424B (en) 2011-09-06 2011-09-06 The processor architecture of power efficient

Country Status (1)

Country Link
CN (1) CN106020424B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356445A (en) * 2021-12-28 2022-04-15 山东华芯半导体有限公司 Multi-core chip starting method based on large and small core architectures

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070245164A1 (en) * 2004-08-05 2007-10-18 Shuichi Mitarai Information Processing Device
US20080126747A1 (en) * 2006-11-28 2008-05-29 Griffen Jeffrey L Methods and apparatus to implement high-performance computing
US20080263324A1 (en) * 2006-08-10 2008-10-23 Sehat Sutardja Dynamic core switching
US20090248934A1 (en) * 2008-03-26 2009-10-01 International Business Machines Corporation Interrupt dispatching method in multi-core environment and multi-core processor
US20100030927A1 (en) * 2008-07-29 2010-02-04 Telefonaktiebolaget Lm Ericsson (Publ) General purpose hardware acceleration via deirect memory access

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070245164A1 (en) * 2004-08-05 2007-10-18 Shuichi Mitarai Information Processing Device
US20080263324A1 (en) * 2006-08-10 2008-10-23 Sehat Sutardja Dynamic core switching
US20080126747A1 (en) * 2006-11-28 2008-05-29 Griffen Jeffrey L Methods and apparatus to implement high-performance computing
US20090248934A1 (en) * 2008-03-26 2009-10-01 International Business Machines Corporation Interrupt dispatching method in multi-core environment and multi-core processor
US20100030927A1 (en) * 2008-07-29 2010-02-04 Telefonaktiebolaget Lm Ericsson (Publ) General purpose hardware acceleration via deirect memory access

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356445A (en) * 2021-12-28 2022-04-15 山东华芯半导体有限公司 Multi-core chip starting method based on large and small core architectures
CN114356445B (en) * 2021-12-28 2023-09-29 山东华芯半导体有限公司 Multi-core chip starting method based on large and small core architecture

Also Published As

Publication number Publication date
CN106020424B (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN106155265A (en) The processor architecture of power efficient
US9098274B2 (en) Methods and apparatuses to improve turbo performance for events handling
CN102495756B (en) The method and system that operating system switches between different central processing units
CN102566739A (en) Multicore processor system and dynamic power management method and control device thereof
CN112486312A (en) Low-power-consumption processor
CN106020424A (en) Active power efficiency processor system structure
CN106095046A (en) The processor architecture of power efficient
GB2537300A (en) Power efficient processor architecture
JP6409218B2 (en) Power efficient processor architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant