CN102103484A - Instruction for enabling a procesor wait state - Google Patents
Instruction for enabling a procesor wait state Download PDFInfo
- Publication number
- CN102103484A CN102103484A CN2010106151670A CN201010615167A CN102103484A CN 102103484 A CN102103484 A CN 102103484A CN 2010106151670 A CN2010106151670 A CN 2010106151670A CN 201010615167 A CN201010615167 A CN 201010615167A CN 102103484 A CN102103484 A CN 102103484A
- Authority
- CN
- China
- Prior art keywords
- processor
- low power
- instruction
- kernel
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims description 22
- 238000012360 testing method Methods 0.000 claims description 19
- 230000000873 masking effect Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 14
- 229910003460 diamond Inorganic materials 0.000 description 14
- 239000010432 diamond Substances 0.000 description 14
- 238000007726 management method Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 230000008878 coupling Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000013519 translation Methods 0.000 description 3
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 210000000352 storage cell Anatomy 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3228—Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3293—Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30083—Power or thermal control instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/3009—Thread control instructions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Power Sources (AREA)
- Executing Machine-Instructions (AREA)
- Debugging And Monitoring (AREA)
- Microcomputers (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
In one embodiment, the present invention includes a processor having a core with decode logic to decode an instruction prescribing an identification of a location to be monitored and a timer value, and a timer coupled to the decode logic to perform a count with respect to the timer value. The processor may further include a power management unit coupled to the core to determine a type of a low power state based at least in part on the timer value and cause the processor to enter the low power state responsive to the determination. Other embodiments are described and claimed.
Description
Technical field
The present invention relates to be used to enable the instruction of processor waiting status.
Background technology
Along with the development of processor technology, processor becomes the kernel that can utilize bigger quantity.For executive software efficiently, task can be tasked these kernels to carry out the different threads of single application.Such arrangement is called cooperative thread software.In modern cooperative thread software, a thread will wait for that usually another thread finishes.In the usual course, the processor of operation wait thread can be wasted useful power when waiting for.In addition, the time of wait may be uncertain, and therefore processor may not know how long it should wait for.
The mechanism that another kind makes kernel to wait for is to make kernel be in waiting status such as low power state.In order to realize this task, call operation system (OS).OS can carry out a pair of instruction that is called MONITOR instruction and MWAIT instruction.Notice that these instructions are unavailable for application layer software.But these instructions are only used with address realm that supervision is set and are made processor can enter low power state till the address realm that is monitored upgrades in the OS level of privilege.But, enter OS and have significant expense to carry out these instructions.This expense is with the form of high stand-by period, but also can increase complicacy, is not the thread of next scheduling when withdrawing from waiting status because the OS scheduling problem can cause waiting for thread.
Summary of the invention
The present invention relates to a kind of processor, comprising:
Kernel comprises: decode logic, be used for from first use receiving instruction and with described instruction decoding, and described instruction regulation is with the sign and the timer value of the position that is monitored; And timer, be coupled to described decode logic to carry out counting about described timer value; And
Power management block, be coupled to described kernel so that the type of the low power state of described processor is determined on small part ground based on described timer value, if and the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond and describedly determine to make described processor to enter described low power state, get involved and need not operating system (OS).
The present invention relates to a kind of method, comprising:
Use to receive instruction and with described instruction decoding from first in processor, described instruction regulation is with the sign and the timer value of the position that is monitored;
Respond described instruction, in described processor, determine the type of the low power state of described processor at least in part based on described timer value; And
If the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond the described described low power state of determining to enter described processor.
The present invention relates to a kind of system, comprising:
The polycaryon processor that comprises first kernel and second kernel, described first kernel comprises decode logic and timer, described decode logic is used for the user level instruction decoding so that make waiting status is taken place, described user level instruction regulation is with the position and the timer value that are monitored, described timer is coupled to described decode logic to carry out counting about described timer value, described polycaryon processor also comprises Power management logic, described Power management logic is coupled to described first and second kernels so that select one of a plurality of low power states based on described timer value at least in part, get involved and need not operating system (OS), if and the described value that is monitored the position is not equal to desired value, then responds described selection and make described first kernel enter selected low power state;
Be coupled to the dynamic RAM (DRAM) of described polycaryon processor.
The present invention relates to a kind of article that comprise the machine-accessible storage medium, described machine-accessible storage medium comprises instruction, and described instruction makes system when carrying out:
During first thread execution, in first kernel of polycaryon processor, receive regulation instruction is waited in the position that is monitored and the user class processor of timer value;
In described first kernel, determine whether to satisfy the condition that described user class processor is waited for instruction, and if do not satisfy, then enter into the low power state of selecting by the Power management logic of described polycaryon processor;
During second thread execution, updating value on second kernel of described polycaryon processor;
Respond described value and upgrade, withdraw from the described low power state of described first kernel and determine whether to satisfy described condition; And
If satisfy, then continue on described first kernel, to carry out described first thread.
Description of drawings
Fig. 1 is the process flow diagram of method according to an embodiment of the invention.
Fig. 2 is the process flow diagram of testing according to the desired value that one embodiment of the invention is carried out.
Fig. 3 is the block diagram of processor cores according to an embodiment of the invention.
Fig. 4 is the block diagram of processor according to an embodiment of the invention.
Fig. 5 is the block diagram of processor in accordance with another embodiment of the present invention.
Fig. 6 is the mutual process flow diagram between the cooperative thread according to an embodiment of the invention.
Fig. 7 is the block diagram of system according to an embodiment of the invention.
Embodiment
In various embodiments, can provide and use user level instruction (that is application layer instruction) to take place to allow one or more situations of applications wait.In applications wait, the processor (for example, the kernel of polycaryon processor) of carrying out this application can be in low power state or changeable one-tenth is carried out another thread.The situation that processor is waited for can comprise and detects certain value, timer expired or receive look-at-me from for example another processor, but scope of the present invention is unrestricted in this regard.
In this way, application can be waited for one or more operations of carrying out in another thread for example, and need not to obey operating system (OS) or other management software.In addition, based on the command information that provides with this instruction, this pending state can be undertaken by limited mode of time, so that processor can be selected the suitable low power state that will enter.That is, the steering logic of processor itself can be based on command information that is provided and the definite suitable low power state that will enter of the various calculating of carrying out in processor.Therefore, can avoid causing OS gets involved to enter the expense of low power state.Notice that processor need not to wait for another peer processes device, but can wait for coprocessor, for example floating-point coprocessor or other fixed function device.
In various embodiments, user level instruction can have the various information that are associated with it, comprises the position that will monitor, the value that will search and timeout value.For the ease of discussing, this user level instruction can be called processor and wait for instruction, but scope of the present invention is unrestricted in this regard.This user level instruction of different-style can be provided, and every kind of style can be indicated and for example wait for particular value, value set, scope, maybe will be waited for operation (for example, in case value becomes very just counter is increased progressively) and be coupled.
In general, but exercises are carried out in processor answer processor wait instruction, and processor wait instruction can comprise following command information or be associated with following command information: source field, the position of the value that its indication will be tested; Overtime or closing time timer value, the time point (if not reaching the value that will test) that its indication waiting status should finish; And result field, the value that its indication will obtain.In other is used, except these fields, in shielding source value and realization, can there be destination or mask field at predetermined value test source value (for example, whether the result's of shielding masking value is non-zero).
As mentioned above, processor can respond this instruction and carry out various operations.In general, these operations can comprise: whether the value that test is monitored the position is desired value (for example, carrying out boolean operation with test " very " condition); And whether test arrives timer value closing time.If satisfy the arbitrary situation (for example, being " very ") in these situations, if perhaps receive interruption from another entity, then instruction can finish.Otherwise the mechanism that can start this position of supervision is to check whether this value will change.Therefore, at this moment, can enter waiting status.In this pending state, processor can enter low power state, perhaps can cause the execution that starts another processor hardware thread.If want low power state, then processor can be selected suitable low power state based on remaining time quantum before arriving timer closing time at least in part.Then, can enter low power state, and processor can remain in this state till being waken up by one of situation discussed above.Although utilize this generality operation to be described, should be appreciated that in difference realized, various features and operation can differently be carried out.
With reference now to Fig. 1,, the process flow diagram of method according to an embodiment of the invention is shown.As shown in Figure 1, method 100 can be carried out user level instruction by processor and wait for that to handle processor operation realizes.As seen from the figure, method 100 can begin by the instruction (square frame 110) that decoding is received.As an example, instruction can be the user level instruction that is provided by application, and application can be the application that for example utilizes a plurality of threads to realize, each thread is included in the instruction that can have certain interdependent property when carrying out the cooperative thread application.After decoding instruction, processor can be loaded into memory value (square frame 120) in high-speed cache and the register.More particularly, the source operand of instruction can identify the position in the storer that for example will obtain certain value.This value can be loaded in the cache memory, for example with the lower level of cache that is associated of kernel of execution command, as private cache.In addition, this value can be stored in the register of this kernel.As an example, this register can be the general-purpose register of the logic processor of thread.Then, control forwards square frame 130 to.At square frame 130, but response instruction information calculations closing time.More particularly, can be the time quantum that waiting status should be carried out when not satisfying condition (for example, expectation value is not upgraded) this closing time.In one embodiment, order format can comprise the information that timer value closing time is provided.In order to determine the suitable time before arriving this closing time, in some implementations, timer value closing time that is received can be compared with the current time Counter Value (for example, Time Stamp Counter (TSC) value) that is present in the processor.This difference can be loaded in timer closing time, in certain embodiments, closing time, timer can utilize counter or register to realize.In one embodiment, this, timer can be the count down timer that begins to count down closing time.In this is realized, deduct closing time from current TSC value, and count down timer is in so a plurality of cycle timing.When the TSC value surpassed closing time, it triggered restarting of processor.That is,, when closing time, timer was decremented to zero,, then can stop waiting status if waiting status is still carried out this moment as hereinafter discussing.In register was realized, comparer can compare in value and the closing time of each cycle with the TSC counter.
More than operate thereby suitably be provided with the various structures that during waiting status, will visit and test.Therefore, can enter waiting status.This pending state generally is the part of circulation 155, and circulation 155 can be carried out iteratively till one of multiple situation occurring.As seen from the figure, can determine whether mate (diamond 140) with the value that is stored in the register from the desired value of command information.Comprise in the realization of this desired value at command information, can test from storer obtain and be stored in the register data with the value of determining it whether with this desired value coupling.If the coupling, then satisfy this condition, and control forward square frame 195 to, at square frame 195, can finish to wait for the execution of instruction.This order fulfillment can cause in addition and various signs or other value is set to enable the indication to the following code of the reason that withdraws from waiting status.In case order fulfillment, the operation of the thread of request waiting status just can continue.
If opposite, determine not satisfy this condition at diamond 140, then control forwards diamond 150 to, at diamond 150, can determine whether to occur (occur) closing time.If then instruction can finish as mentioned above.Otherwise control forwards diamond 160 to, at diamond 160, can determine whether another nextport hardware component NextPort is managing the wake up process device.If then instruction finishes as mentioned above.Otherwise control forwards square frame 170 to, at square frame 170, can be at least in part based on closing time timer value determine low power state.That is, based on the amount excess time before closing time occurring, processor itself can be determined suitable low power state under the hands off situation of OS.Definite in order to realize this, in certain embodiments, can utilize for example logic of the outer core of processor (uncore).This logic can comprise table or can be related with epiphase, as hereinafter will discussing, this table with various low power states and closing time timer value be associated.Determine that based on this of square frame 170 processor can enter low power state (square frame 180).At low power state, the various structures of processor, the kernel and other assembly that promptly execute instruction all can be in low power state.To be in the ad hoc structure of low power state and the grade of low power state can change with realization.Note, if owing to the value after upgrading is not that desired value travels through this circulation, then can carry out determining of new low power state, because if the only remaining limited amount time based on timer value closing time after upgrading, then entering certain low power state (for example, deep sleep) may be improper.
May occur making kernel to withdraw from the variety of event of low power state.Obviously, if data in buffer (that is, corresponding to being monitored the position) is updated (diamond 190), then can carry out low power state.If then control rotates back into diamond 140.Similarly, if go over (pass) and/or receive wake-up signal from another nextport hardware component NextPort closing time, then control can forward one of diamond 150 and 160 to from low power state.Although the senior realization with it illustrates in the embodiment in figure 1, should be appreciated that scope of the present invention is unrestricted in this regard.
In other is realized, can carry out test to desired value based on mask.That is, user level instruction can impliedly be indicated the desired value that will obtain.As an example, this desired value can be the nonzero value of the masking operation between source value that obtains from storer and the mask value the source/destination operand that is present in this instruction.In one embodiment, if user level instruction can be zero instruction of waiting for (LDMWZ) of loading, shielding of processor ISA.In one embodiment, this instruction can be adopted LDMWZ r32/64, the form of M32/64.In this form, first operand (r32/64) can be stored mask, second operand (M32/64) but identification sources value (that is, being monitored the position).And timeout value can be stored in the 3rd register.For example, closing time can be in the implicit expression register.Specifically, can use the EDX:EAX register, they are identity sets of the register that writes when reading the TSC counter.In general, instruction can be carried out the non-busy poll to semaphore (semaphore) value, and if semaphore unavailable, then enter the low-power waiting status.In difference realizes, can handle step-by-step semaphore and counting semaphore, wherein zero indication does not have thing waiting for.Timeout value can indicate before recovery operation unconditionally processor should wait for the time quantum that the TSC of non zero results measured in the cycle.In one embodiment, can provide the information which concurrent physical processor to be in low power state about to software via memory map registers (for example, configuration and status register (CSR)).
In this embodiment, the LDMWZ instruction will shield it with source/destination value from source memory position loading data, and test to check whether income value serves as zero.If masking value is non-vanishing, then will place unscreened source/destination register from the value of memory load.Otherwise processor will enter the low-power waiting status.Notice that this low power state can or can not correspond to the low power state of current definition, for example according to ACPI (ACPI) the specification version 4 so-called C state in (on June 16th, 2009).Processor can remain in low power state, be till the value of non-zero writes the time of source memory position when making the fixed time disappear, send outside abnormal signal (for example, general interrupt (INTR), maskable interrupts (NMI) or system management interrupt (SMI)) at interval or being used in shielding.As a part that enters this pending state, processor can be removed current memory map registers (CSR) position of waiting for of instruction processorunit.
Owing to be used in value that when shielding produce nonzero value and write and be monitored the position and when waiting status withdraws from, but the null value designator of clear flag register, and unscreened value can be read and place destination register.If timer expires to cause from low power state and withdraws from, then the null value designator of flag register can be arranged to allow this situation of software detection.If occur withdrawing from unusually owing to outside, then the state of processor and storer can be to make this instruction is considered as unenforced state.Therefore, in case turn back to normal execution stream, just will re-execute identical LDMWZ instruction.
With reference now to Fig. 2,, the process flow diagram of the desired value test of carrying out according to a further embodiment of the invention is shown.As shown in Figure 2, method 200 can be by being loaded into source data (square frame 210) beginning in first register.Can shield this source data (square frame 220) with the mask that is present in second register.In various embodiments, first and second registers can be stipulated by instruction, and can correspond respectively to the position of storage source data and destination data.Then, whether the result that can determine masking operation is zero (diamond 230).If then do not meet the desired condition, and processor can enter low power state (square frame 240).Otherwise, source data can be stored into (square frame 250) in second register, and end (square frame 260) is carried out in instruction.
During waiting status, according to definite renewal target location at diamond 265 places, and control rotates back into square frame 220 to carry out masking operation.If determine to have occurred another kind of situation (according to determining of diamond 270 places) during waiting status, then control forwards square frame 260 to finish this instruction.Although utilize this specific implementation to illustrate in the embodiment of Fig. 2, scope of the present invention is unrestricted in this regard.
With reference now to Fig. 3,, the block diagram of processor cores according to an embodiment of the invention is shown.As shown in Figure 3, processor cores 300 can be a multi-stage pipeline formula out-of-order processors.Utilize the view of the relative simplification among Fig. 3 that the various features that processor cores 300 uses according to one embodiment of the invention associative processor waiting status with explanation are shown.
As shown in Figure 3, kernel 300 comprises front end unit 310, and front end unit 310 can be used for extracting pending instruction and they are ready to so that use in processor later on.For example, front end unit 310 can comprise extraction unit 301, instruction cache 303 and instruction decoder 305.In some implementations, front end unit 310 also can comprise trace cache and microcode store equipment and microoperation memory device.Extraction unit 301 can extract macro instruction from for example storer or instruction cache 303, and they are fed to instruction decoder 305 so that they are decoded as primitive, that is, and and for the microoperation of processor execution.A kind of such instruction that will handle in front end unit 310 can be that the user class processor is waited for instruction according to an embodiment of the invention.This instruction can make front end unit can visit various microoperations so that can carry out the operation that is associated with the wait instruction such as above-mentioned.
Be coupling between front end unit 310 and the performance element 320 is to can be used for receiving micro-order and they are ready for unordered (OOO) engine 3 15 of execution.More particularly, OOO engine 3 15 can comprise and is used for micro instruction flow rearrangement and distributes carrying out required various resources and being used for logic register is renamed such as the various impact dampers on the memory location in the various register files of register file 330 and extended pattern register file 335.Register file 330 can comprise the independent register file that is used for integer and floating-point operation.Extended pattern register file 335 can provide the storage of unit to vector magnitude (for example, each register 256 or 512).
Can there be various resources in the performance element 320, comprise for example various integers, floating-point and single instruction multiple data (SIMD) logical block and other specialised hardware.For example, these performance elements can comprise one or more ALUs (ALU) 322.In addition, can there be wakeup logic 324 according to an embodiment of the invention.This wakeup logic can be used for carrying out some operation that relates to when the response user level instruction is carried out the processor standby mode.As hereinafter will further discussing, processor such as another part of outer core in can have the added logic that is used to handle this waiting status.Timer set 326 also is shown among Fig. 3.The relevant timer that is used to analyze comprises the TSC timer here and can use timer closing time that is provided with corresponding to the value of closing time that before closing time, if do not satisfy other condition, then processor will leave waiting status.When closing time, timer reached predetermined count value (in certain embodiments, can be to count down toward 0), wakeup logic 324 can activate some operation.The result can be offered the resignation logic, that is, and resequencing buffer (ROB) 340.More particularly, ROB 340 can comprise various arrays and the logic that is used to receive the information that is associated with performed instruction.Then, ROB 340 checks these information to determine whether to retire from office effectively these instructions and result data submitted to the architecture state of processor, and one or more that the correct resignation of instructing perhaps whether occurs preventing are unusual.Certainly, ROB 340 can handle and other operation of retiring from office and being associated.Wait in the context of instruction that at processor according to an embodiment of the invention resignation can make ROB 340 that the state of one or more designators of flag register or other status register is set, but its instruction processorunit withdraws from the reason of waiting status.
As shown in Figure 3, ROB 340 is coupled to high-speed cache 350, and in one embodiment, high-speed cache 350 can be lower level of cache (as the L1 high-speed cache), but scope of the present invention is unrestricted in this regard.And performance element 320 also can be directly coupled to high-speed cache 350.As seen from the figure, high-speed cache 350 comprises supervision engine 3 52, it can be configured to monitor particular cache line, promptly is monitored the position, and the cache coherence state of renewal on duty, this row changes and/or this row offers wakeup logic 324 (and/or outer nuclear component) with feedback when losing.Monitor that engine 3 52 obtains given row and it is remained on shared state.If once lost this row, then will start waking up to processor from shared state.From high-speed cache 350, can be with more higher level cache, system storage etc. carry out data communication.Although with this high-level illustrating, should be appreciated that among the embodiment of Fig. 3 that scope of the present invention is unrestricted in this regard.
With reference now to Fig. 4,, the block diagram of processor according to an embodiment of the invention is shown.As shown in Figure 4, processor 400 can be to comprise a plurality of kernels 410
a-410
nPolycaryon processor.In one embodiment, each is endorsed as above disposing about the described kernel 300 of Fig. 3 in such.Endorse in each via interconnection 415 and be coupled to the outer core 420 that comprises various assemblies.As seen from the figure, outer core 420 can comprise shared cache 430, and it can be the afterbody high-speed cache.In addition, endorse outward and comprise integrated memory controller 440, various interface 450 and power management block 455.In various embodiments, at least some that can realize waiting for processor that the execution of instruction is associated in power management block 455 are functional.For example, based on the information that receives with this instruction, for example closing time timer value, power management block 455 can determine to carry out the suitable low power state that the given kernel of waiting for instruction will be in.In one embodiment, power management block 455 can comprise the table that timer value is associated with low power state.This table can be searched based on value closing time that determined and instruction is associated in unit 455, and selects corresponding waiting status.Then, power management block 455 can generate a plurality of control signals and enter low power state so that comprise the various assemblies of given kernel and other processor unit.As seen from the figure, processor 400 can be communicated by letter with system storage 460 via for example memory bus.In addition, by interface 450, can be connected to various chips outer assembly, for example peripheral unit, mass-memory unit etc.Although illustrate with this specific implementation in the embodiment of Fig. 4, scope of the present invention is unrestricted in this regard.
In other embodiments, processor architecture can comprise dummy feature so that processor can be carried out the instruction of an ISA who is called source ISA, and wherein this architecture is according to the 2nd ISA that is called target ISA.In general, comprise that the software of OS and application program is observed source ISA, hardware then realizes being in particular the target ISA that the given hardware with property and/or efficiency feature is realized design.
With reference now to Fig. 5,, the block diagram of processor in accordance with another embodiment of the present invention is shown.As seen, system 500 comprises processor 510 and storer 520 in Fig. 5.Storer 520 comprises conventional memory 522 that is used for saved system and application software and the hidden storer 524 that is used to save as the instrumented software of target ISA.As seen from the figure, processor 510 comprises the simulation engine 530 that is used for source code is converted to object code.Emulation can utilize decipher or binary translation to carry out.Decipher is used for code usually when running into code first.Then, when finding frequent code zone (as the hot-zone) of carrying out by dynamic profile, they are translated as target ISA and are stored in the code cache of hidden storer 524.Part as translation process is optimized, and can further optimize code commonly used afterwards.They code block after the translation are kept in the code cache 524, so that can repeatedly re-use.
Still with reference to figure 5, processor 510 can be a kernel of polycaryon processor, and it comprises the programmable counter 540 that is used for instruction pointer address is offered instruction cache (I-high-speed cache) 550.As seen from the figure, I-high-speed cache 550 also can be from the 524 direct receiving target ISA instructions of hidden memory portion when miss given instruction address.Therefore, I-high-speed cache 550 can be stored target ISA instruction, these targets ISA instruction can be offered demoder 560, demoder 560 can be the demoder of target ISA, is that micro-order is to carry out in processor pipeline 570 with the input instruction of reception macro level and with these instruction transformation.Streamline 570 can be to comprise the disordered flow waterline that is used to carry out with the various levels of instruction retired, but scope of the present invention is unrestricted in this regard.Can exist in the streamline 570 such as above-mentioned various performance elements, timer, counter, memory location and monitor and wait for instruction to carry out processor according to an embodiment of the invention.That is,, still can on basic hardware, carry out this instruction even have in the realization of the microarchitecture that the microarchitecture that provides the user class processor to wait for instruction is provided at processor 510.
With reference now to Fig. 6,, illustrates according to the mutual process flow diagram of one embodiment of the invention between cooperative thread.As shown in Figure 6, method 600 for example is used in and carries out a plurality of threads in the multithreading processor.In the context of Fig. 6, two threads, be that thread 1 and thread 2 have single application, and can interdepend, so that the data of using for a thread at first must be upgraded by second thread.Therefore, as seen from the figure, thread 1 can be waited for instruction (square frame 610) at its term of execution receiving processor.The term of execution that this waits for instruction, can determine whether to have satisfied test condition (diamond 620).If do not satisfy, then this thread can enter low power state (square frame 630).Although not shown among Fig. 6, should be appreciated that, can when one of various situations occurring, withdraw from this state.If opposite definite this test condition that satisfied, then control forwards square frame 640 to, at square frame 640, can continue to carry out code and carry out in first thread.Notice when test condition can complete successfully renewal to indicate second thread about being monitored the position.Therefore, before carrying out, do not satisfy test condition, and processor enters low power state about the code shown in the thread 2.
Still with reference to figure 6, about thread 2, it can be carried out and the complementary code of first thread (square frame 650).For example, the second thread executable code is to upgrade the one or more values that can use during first thread execution.Value after upgrading in order to ensure the first thread utilization is carried out, and application can be written as to make the thread of winning enter low power state up to second thread more till the new data.Therefore, during second thread execution, can determine whether to have finished the execution (diamond 660) of complementary code.If not, then continue to carry out complementary code.If finished this complementary code segment on the contrary, then control forwards square frame 670 to, at square frame 670, predetermined value can be written to and be monitored position (square frame 670).For example, this predetermined value can be corresponding to waiting for the test value that instruction is associated with processor.In other embodiments, predetermined value can be such value, and this value makes when with the value shielding that is monitored in the position or when being monitored value in the position as mask, the result is a non-zero, has satisfied test condition and first thread can continue execution to indicate.Still with reference to thread 2, after writing this predetermined value, continue the code of second thread and carry out (square frame 680).Although utilize this specific implementation among the embodiment of Fig. 6 to illustrate, should be appreciated that scope of the present invention is unrestricted in this regard.
Therefore, embodiment enables light-duty stagnation mechanism, and this mechanism allows the processor stagnation one or more predetermined states to occur with wait, gets involved and need not OS.In this way, need not to make application poll semaphore/value true, thereby make processor waste power, and prevent that in the hyperthread machine other thread from utilizing these cycles in the circulation that comprises test, time-out and skip operation, to become.Thereby expense and the schedule constraints (waiting for that using may not be the next thread that will be scheduled) that can avoid OS to monitor.Therefore, between cooperative thread, can carry out light-duty communication, and processor can be selected sleep state flexibly based on the indicated time parameter of user.
Embodiment can realize with many different system types.With reference now to Fig. 7,, the block diagram of system according to an embodiment of the invention is shown.As shown in Figure 7, multicomputer system 700 is point-to-point interconnection systems, and comprises the first processor 770 and second processor 780 via point-to-point interconnection 750 couplings.As shown in Figure 7, each processor 770 and 780 can be a polycaryon processor, and they comprise first and second processor cores (that is, processor cores 774a and 774b and processor cores 784a and 784b), but can have the much more kernel of possibility in the processor.Processor cores can be carried out various instructions, comprises user class processor wait instruction.
Still with reference to figure 7, first processor 770 also comprises Memory Controller hub (MCH) 772 and point-to-point (P-P) interface 776 and 778.Similarly, second processor 780 comprises MCH 782 and P-P interface 786 and 788.As shown in Figure 7, MCH 772 and 782 is coupled to respective memory with processor, that is, storer 732 and storer 734, they can be the parts of this locality primary memory (for example, dynamic RAM (DRAM)) of being attached to respective processor.The first processor 770 and second processor 780 can be coupled to chipset 790 via P-P interconnection 752 and 754 respectively.As shown in Figure 7, chipset 790 comprises P-P interface 794 and 798.
In addition, chipset 790 comprises and being used for by P-P interconnection 739 interfaces 792 of chipset 790 with 738 couplings of high performance graphics engine.Then, chipset 790 can be coupled to first bus 716 via interface 796.As shown in Figure 7, various I/O (I/O) device 714 can be coupled to first bus 716 and bus bridge 718, and bus bridge 718 is coupled to second bus 720 with first bus 716.Various devices can be coupled to second bus 720, comprise keyboard/mouse 722 for example, communicator 726 and such as the data storage cell 728 of disc driver or other mass storage device, in one embodiment, data storage cell 728 can comprise code 730.In addition, audio frequency I/O 724 can be coupled to second bus 720.
Embodiment can realize with code, and can be stored on the storage medium, stores instruction on this storage medium, and these instructions can be used for systems programming for carrying out these instructions.Storage medium can include but not limited to: the dish of any kind comprises floppy disk, CD, CD, solid state drive (SSD), compact disk ROM (read-only memory) (CD-ROM) but rewriteable compact disc (CD-RW) and magneto-optic disk; Semiconductor device, for example ROM (read-only memory) (ROM), random-access memory (ram) (for example, dynamic RAM (DRAM), static RAM (SRAM)), Erasable Programmable Read Only Memory EPROM (EPROM), flash memory, Electrically Erasable Read Only Memory (EEPROM); Magnetic or light-card; Or be suitable for the medium of any other type of store electrons instruction.
Although the embodiment about limited quantity has described the present invention, those skilled in the art will understand numerous modifications and change thus.The claim of enclosing will contain all these and drop on true spirit of the present invention and interior modification and the change of scope.
Claims (24)
1. processor comprises:
Kernel comprises: decode logic, be used for from first use receiving instruction and with described instruction decoding, and described instruction regulation is with the sign and the timer value of the position that is monitored; And timer, be coupled to described decode logic to carry out counting about described timer value; And
Power management block, be coupled to described kernel so that the type of the low power state of described processor is determined on small part ground based on described timer value, if and the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond and describedly determine to make described processor to enter described low power state, get involved and need not operating system (OS).
2. processor as claimed in claim 1 also comprises the supervision engine, and whether described supervision engine is coupled to cache memory and is updated with the row of the described cache memory of determining to comprise the described copy that is monitored the position.
3. processor as claimed in claim 2, copy and wake-up signal after wherein said supervision engine will upgrade are sent to described kernel.
4. whether processor as claimed in claim 3, wherein said kernel determine copy after the described renewal corresponding to described desired value, and if then withdraw from described low power state, otherwise determine new low power state and enter described new low power state.
5. processor as claimed in claim 1, wherein said instruction is a user level instruction, it makes described processor load first value, in described first value and be stored between the data in the destination locations and carry out masking operation, if the result of described masking operation is first result, then enter described low power state, otherwise described processor is loaded into described destination locations with described first value.
6. processor as claimed in claim 5, if wherein described result equals 0, then described processor is provided with the zero designator of flag register.
7. processor as claimed in claim 1, wherein said timer are arranged to the value corresponding to the difference of Time Stamp Counter value and described timer value.
8. processor as claimed in claim 1, wherein said processor comprise the polycaryon processor that comprises the described kernel and second kernel, and wherein said instruction has first thread of carrying out on described kernel, and second thread upgrades the described position that is monitored.
9. processor as claimed in claim 8, wherein said kernel responds is monitored the described renewal of position and withdraws from described low power state described.
10. processor as claimed in claim 9, wherein said kernel are describedly carried out at least one operation of described first thread by the described second thread data updated before being monitored the position after this utilizing to upgrade at described second thread.
11. a method comprises:
Use to receive instruction and with described instruction decoding from first in processor, described instruction regulation is with the sign and the timer value of the position that is monitored;
Respond described instruction, in described processor, determine the type of the low power state of described processor at least in part based on described timer value; And
If the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond the described described low power state of determining to enter described processor.
12. method as claimed in claim 11, wherein said instruction are also stipulated the described described desired value that is monitored the position.
13. method as claimed in claim 11 comprises that also the described timer value of response withdraws from described low power state in the past.
14. method as claimed in claim 11, also comprise and when the described value that is monitored the position equals described desired value, withdraw from described low power state, comprise from the supervision engine of the cache memory of described processor and receive wake-up signal, when the storing value of the cache line that comprises the described copy that is monitored the position changed, described supervision engine sent described wake-up signal.
15. method as claimed in claim 11, also comprise: based on the information in the table with a plurality of clauses and subclauses, utilize the power management block (PMU) of described processor to select the type of described low power state from a plurality of low power states, each clauses and subclauses in described a plurality of clauses and subclauses are associated low power state with timer value; And will send to the kernel of described processor so that described kernel enters described low power state from least one control signal of described PMU.
16. method as claimed in claim 11 also comprises: receive wake-up signal from second processor that is coupled to described processor; And respond described wake-up signal and withdraw from described low power state.
17. a system comprises:
The polycaryon processor that comprises first kernel and second kernel, described first kernel comprises decode logic and timer, described decode logic is used for the user level instruction decoding so that make waiting status is taken place, described user level instruction regulation is with the position and the timer value that are monitored, described timer is coupled to described decode logic to carry out counting about described timer value, described polycaryon processor also comprises Power management logic, described Power management logic is coupled to described first and second kernels so that select one of a plurality of low power states based on described timer value at least in part, get involved and need not operating system (OS), if and the described value that is monitored the position is not equal to desired value, then responds described selection and make described first kernel enter selected low power state;
Be coupled to the dynamic RAM (DRAM) of described polycaryon processor.
18. system as claimed in claim 17, the described user level instruction of wherein said first kernel responds is carried out masking operation between first operand and second operand, if and the result of described masking operation is not described desired value, then enter selected low power state.
19. system as claimed in claim 18 also comprises the watchdog logic that is coupled to described first kernel, the response of described watchdog logic is monitored the renewal of position and makes described first kernel withdraw from described low power state described.
20. system as claimed in claim 19, wherein when being monitored that cache line that the position is associated has been updated or the coherency state of described cache line when being updated with described, described watchdog logic sends to described first kernel with wake-up signal.
21. article that comprise the machine-accessible storage medium, described machine-accessible storage medium comprises instruction, and described instruction makes system when carrying out:
During first thread execution, in first kernel of polycaryon processor, receive regulation instruction is waited in the position that is monitored and the user class processor of timer value;
In described first kernel, determine whether to satisfy the condition that described user class processor is waited for instruction, and if do not satisfy, then enter into the low power state of selecting by the Power management logic of described polycaryon processor;
During second thread execution, updating value on second kernel of described polycaryon processor;
Respond described value and upgrade, withdraw from the described low power state of described first kernel and determine whether to satisfy described condition; And
If satisfy, then continue on described first kernel, to carry out described first thread.
22. article as claimed in claim 21 also comprise the instruction that makes described system can carry out following steps: make described first kernel responds be monitored the renewal of position and withdraw from described low power state and utilize the described condition of described value refresh test to described.
23. article as claimed in claim 22, also comprise the instruction that makes described system can carry out following steps:, determine to the described described renewal that is monitored the position when being monitored that cache line that the position is associated has been updated or the coherency state of described cache line when being updated with described; And response makes described first kernel withdraw from described low power state to the described described renewal that is monitored the position.
24. article as claimed in claim 21, also comprise the instruction that makes described system can carry out following steps: based on the information in the table with a plurality of clauses and subclauses, utilize described Power management logic to select described low power state from a plurality of low power states, each clauses and subclauses in described a plurality of clauses and subclauses are associated low power state with timer value; And send at least one control signal so that described first kernel enters described low power state.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/641,534 US8464035B2 (en) | 2009-12-18 | 2009-12-18 | Instruction for enabling a processor wait state |
US12/641534 | 2009-12-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102103484A true CN102103484A (en) | 2011-06-22 |
CN102103484B CN102103484B (en) | 2015-08-19 |
Family
ID=44152840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201010615167.0A Expired - Fee Related CN102103484B (en) | 2009-12-18 | 2010-12-17 | For enabling the instruction of processor waiting status |
Country Status (8)
Country | Link |
---|---|
US (3) | US8464035B2 (en) |
JP (2) | JP5571784B2 (en) |
KR (1) | KR101410634B1 (en) |
CN (1) | CN102103484B (en) |
DE (1) | DE102010052680A1 (en) |
GB (1) | GB2483012B (en) |
TW (1) | TWI512448B (en) |
WO (1) | WO2011075246A2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104750225A (en) * | 2013-12-31 | 2015-07-01 | 联想(北京)有限公司 | Processor and processing method thereof |
CN105589336A (en) * | 2014-11-07 | 2016-05-18 | 三星电子株式会社 | Multi-Processor Device |
CN107430425A (en) * | 2015-04-16 | 2017-12-01 | 英特尔公司 | For adjusting the apparatus and method of processor power utilization rate based on network load |
CN108369495A (en) * | 2015-12-22 | 2018-08-03 | 英特尔公司 | Hardware for floating-point operation eliminates monitor |
CN104750225B (en) * | 2013-12-31 | 2018-08-31 | 联想(北京)有限公司 | The processing method and processor of processor |
CN109661656A (en) * | 2016-09-30 | 2019-04-19 | 英特尔公司 | Method and apparatus for the intelligent storage operation using the request of condition ownership |
CN110214299A (en) * | 2017-01-30 | 2019-09-06 | 国际商业机器公司 | Processor economize on electricity during waiting event |
CN110471699A (en) * | 2011-12-23 | 2019-11-19 | 英特尔公司 | The instruction execution of broadcast and mask is carried out to data value under different granular levels |
CN113867518A (en) * | 2021-09-15 | 2021-12-31 | 珠海亿智电子科技有限公司 | Processor low-power consumption blocking type time delay method, device and readable medium |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10621092B2 (en) | 2008-11-24 | 2020-04-14 | Intel Corporation | Merging level cache and data cache units having indicator bits related to speculative execution |
US9672019B2 (en) | 2008-11-24 | 2017-06-06 | Intel Corporation | Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads |
US8464035B2 (en) * | 2009-12-18 | 2013-06-11 | Intel Corporation | Instruction for enabling a processor wait state |
US8775153B2 (en) * | 2009-12-23 | 2014-07-08 | Intel Corporation | Transitioning from source instruction set architecture (ISA) code to translated code in a partial emulation environment |
US8977878B2 (en) * | 2011-05-19 | 2015-03-10 | Texas Instruments Incorporated | Reducing current leakage in L1 program memory |
US9207730B2 (en) * | 2011-06-02 | 2015-12-08 | Apple Inc. | Multi-level thermal management in an electronic device |
WO2013048468A1 (en) | 2011-09-30 | 2013-04-04 | Intel Corporation | Instruction and logic to perform dynamic binary translation |
US9063760B2 (en) * | 2011-10-13 | 2015-06-23 | International Business Machines Corporation | Employing native routines instead of emulated routines in an application being emulated |
US9829951B2 (en) | 2011-12-13 | 2017-11-28 | Intel Corporation | Enhanced system sleep state support in servers using non-volatile random access memory |
WO2013101165A1 (en) * | 2011-12-30 | 2013-07-04 | Intel Corporation | Register error protection through binary translation |
WO2013145282A1 (en) * | 2012-03-30 | 2013-10-03 | 富士通株式会社 | Data processing device |
US20140075163A1 (en) * | 2012-09-07 | 2014-03-13 | Paul N. Loewenstein | Load-monitor mwait |
JP5715107B2 (en) * | 2012-10-29 | 2015-05-07 | 富士通テン株式会社 | Control system |
DE112012007058T5 (en) * | 2012-12-19 | 2015-08-06 | Intel Corporation | Vector mask-driven clock gating for power efficiency of a processor |
US9164565B2 (en) | 2012-12-28 | 2015-10-20 | Intel Corporation | Apparatus and method to manage energy usage of a processor |
US9081577B2 (en) | 2012-12-28 | 2015-07-14 | Intel Corporation | Independent control of processor core retention states |
US9405551B2 (en) | 2013-03-12 | 2016-08-02 | Intel Corporation | Creating an isolated execution environment in a co-designed processor |
JP6175980B2 (en) * | 2013-08-23 | 2017-08-09 | 富士通株式会社 | CPU control method, control program, and information processing apparatus |
US9507404B2 (en) * | 2013-08-28 | 2016-11-29 | Via Technologies, Inc. | Single core wakeup multi-core synchronization mechanism |
US9891936B2 (en) | 2013-09-27 | 2018-02-13 | Intel Corporation | Method and apparatus for page-level monitoring |
US9513904B2 (en) | 2013-10-15 | 2016-12-06 | Mill Computing, Inc. | Computer processor employing cache memory with per-byte valid bits |
CN105094747B (en) * | 2014-05-07 | 2018-12-04 | 阿里巴巴集团控股有限公司 | The device of central processing unit based on SMT and the data dependence for detection instruction |
US10467011B2 (en) * | 2014-07-21 | 2019-11-05 | Intel Corporation | Thread pause processors, methods, systems, and instructions |
KR102476357B1 (en) | 2015-08-06 | 2022-12-09 | 삼성전자주식회사 | Clock management unit, integrated circuit and system on chip adopting the same, and clock managing method |
US11023233B2 (en) | 2016-02-09 | 2021-06-01 | Intel Corporation | Methods, apparatus, and instructions for user level thread suspension |
US10185564B2 (en) | 2016-04-28 | 2019-01-22 | Oracle International Corporation | Method for managing software threads dependent on condition variables |
US11061730B2 (en) * | 2016-11-18 | 2021-07-13 | Red Hat Israel, Ltd. | Efficient scheduling for hyper-threaded CPUs using memory monitoring |
US10394678B2 (en) | 2016-12-29 | 2019-08-27 | Intel Corporation | Wait and poll instructions for monitoring a plurality of addresses |
US11086672B2 (en) * | 2019-05-07 | 2021-08-10 | International Business Machines Corporation | Low latency management of processor core wait state |
CN113986663A (en) * | 2021-10-22 | 2022-01-28 | 上海兆芯集成电路有限公司 | Electronic device and power consumption control method thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1955931A (en) * | 2005-09-30 | 2007-05-02 | 科威尔公司 | Scheduling in a multicore architecture |
CN101203831A (en) * | 2005-06-23 | 2008-06-18 | 英特尔公司 | Primitives to enhance line-level speculation |
CN101458558A (en) * | 2007-12-10 | 2009-06-17 | 英特尔公司 | Transitioning a processor package to a low power state |
US20090235105A1 (en) * | 2008-03-11 | 2009-09-17 | Alexander Branover | Hardware Monitoring and Decision Making for Transitioning In and Out of Low-Power State |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001318742A (en) * | 2000-05-08 | 2001-11-16 | Mitsubishi Electric Corp | Computer system and computer readable recording medium |
US7127561B2 (en) * | 2001-12-31 | 2006-10-24 | Intel Corporation | Coherency techniques for suspending execution of a thread until a specified memory access occurs |
US7363474B2 (en) | 2001-12-31 | 2008-04-22 | Intel Corporation | Method and apparatus for suspending execution of a thread until a specified memory access occurs |
US7213093B2 (en) | 2003-06-27 | 2007-05-01 | Intel Corporation | Queued locks using monitor-memory wait |
JP4376692B2 (en) | 2004-04-30 | 2009-12-02 | 富士通株式会社 | Information processing device, processor, processor control method, information processing device control method, cache memory |
GB2414573B (en) | 2004-05-26 | 2007-08-08 | Advanced Risc Mach Ltd | Control of access to a shared resource in a data processing apparatus |
US8607241B2 (en) * | 2004-06-30 | 2013-12-10 | Intel Corporation | Compare and exchange operation using sleep-wakeup mechanism |
US7810083B2 (en) * | 2004-12-30 | 2010-10-05 | Intel Corporation | Mechanism to emulate user-level multithreading on an OS-sequestered sequencer |
US8607235B2 (en) * | 2004-12-30 | 2013-12-10 | Intel Corporation | Mechanism to schedule threads on OS-sequestered sequencers without operating system intervention |
US8719819B2 (en) * | 2005-06-30 | 2014-05-06 | Intel Corporation | Mechanism for instruction set based thread execution on a plurality of instruction sequencers |
US8516483B2 (en) * | 2005-05-13 | 2013-08-20 | Intel Corporation | Transparent support for operating system services for a sequestered sequencer |
US8010969B2 (en) * | 2005-06-13 | 2011-08-30 | Intel Corporation | Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers |
US8028295B2 (en) * | 2005-09-30 | 2011-09-27 | Intel Corporation | Apparatus, system, and method for persistent user-level thread |
US7941681B2 (en) * | 2007-08-17 | 2011-05-10 | International Business Machines Corporation | Proactive power management in a parallel computer |
US9081687B2 (en) * | 2007-12-28 | 2015-07-14 | Intel Corporation | Method and apparatus for MONITOR and MWAIT in a distributed cache architecture |
DE102009001142A1 (en) * | 2009-02-25 | 2010-08-26 | Robert Bosch Gmbh | Electromechanical brake booster |
US8156275B2 (en) * | 2009-05-13 | 2012-04-10 | Apple Inc. | Power managed lock optimization |
US8464035B2 (en) * | 2009-12-18 | 2013-06-11 | Intel Corporation | Instruction for enabling a processor wait state |
-
2009
- 2009-12-18 US US12/641,534 patent/US8464035B2/en active Active
-
2010
- 2010-10-26 TW TW099136477A patent/TWI512448B/en not_active IP Right Cessation
- 2010-11-11 WO PCT/US2010/056320 patent/WO2011075246A2/en active Application Filing
- 2010-11-11 JP JP2012517935A patent/JP5571784B2/en active Active
- 2010-11-11 KR KR1020127018822A patent/KR101410634B1/en active IP Right Grant
- 2010-11-11 GB GB1119728.2A patent/GB2483012B/en not_active Expired - Fee Related
- 2010-11-26 DE DE102010052680A patent/DE102010052680A1/en not_active Withdrawn
- 2010-12-17 CN CN201010615167.0A patent/CN102103484B/en not_active Expired - Fee Related
-
2013
- 2013-03-06 US US13/786,939 patent/US9032232B2/en active Active
- 2013-05-10 US US13/891,747 patent/US8990597B2/en active Active
-
2014
- 2014-06-26 JP JP2014131157A patent/JP5795820B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101203831A (en) * | 2005-06-23 | 2008-06-18 | 英特尔公司 | Primitives to enhance line-level speculation |
CN1955931A (en) * | 2005-09-30 | 2007-05-02 | 科威尔公司 | Scheduling in a multicore architecture |
CN101458558A (en) * | 2007-12-10 | 2009-06-17 | 英特尔公司 | Transitioning a processor package to a low power state |
US20090235105A1 (en) * | 2008-03-11 | 2009-09-17 | Alexander Branover | Hardware Monitoring and Decision Making for Transitioning In and Out of Low-Power State |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110471699A (en) * | 2011-12-23 | 2019-11-19 | 英特尔公司 | The instruction execution of broadcast and mask is carried out to data value under different granular levels |
CN110471699B (en) * | 2011-12-23 | 2023-07-28 | 英特尔公司 | Processor core, method and system for instruction processing |
US11709961B2 (en) | 2011-12-23 | 2023-07-25 | Intel Corporation | Instruction execution that broadcasts and masks data values at different levels of granularity |
CN104750225A (en) * | 2013-12-31 | 2015-07-01 | 联想(北京)有限公司 | Processor and processing method thereof |
CN104750225B (en) * | 2013-12-31 | 2018-08-31 | 联想(北京)有限公司 | The processing method and processor of processor |
CN105589336B (en) * | 2014-11-07 | 2021-01-01 | 三星电子株式会社 | Multi-processor device |
CN105589336A (en) * | 2014-11-07 | 2016-05-18 | 三星电子株式会社 | Multi-Processor Device |
CN107430425B (en) * | 2015-04-16 | 2022-09-23 | 英特尔公司 | Apparatus and method for adjusting processor power usage based on network load |
CN107430425A (en) * | 2015-04-16 | 2017-12-01 | 英特尔公司 | For adjusting the apparatus and method of processor power utilization rate based on network load |
CN108369495A (en) * | 2015-12-22 | 2018-08-03 | 英特尔公司 | Hardware for floating-point operation eliminates monitor |
CN109661656A (en) * | 2016-09-30 | 2019-04-19 | 英特尔公司 | Method and apparatus for the intelligent storage operation using the request of condition ownership |
US11550721B2 (en) | 2016-09-30 | 2023-01-10 | Intel Corporation | Method and apparatus for smart store operations with conditional ownership requests |
CN109661656B (en) * | 2016-09-30 | 2023-10-03 | 英特尔公司 | Method and apparatus for intelligent storage operation with conditional ownership request |
CN110214299A (en) * | 2017-01-30 | 2019-09-06 | 国际商业机器公司 | Processor economize on electricity during waiting event |
CN110214299B (en) * | 2017-01-30 | 2023-07-14 | 国际商业机器公司 | Processor power saving during a wait event |
CN113867518A (en) * | 2021-09-15 | 2021-12-31 | 珠海亿智电子科技有限公司 | Processor low-power consumption blocking type time delay method, device and readable medium |
Also Published As
Publication number | Publication date |
---|---|
KR20120110120A (en) | 2012-10-09 |
WO2011075246A3 (en) | 2011-08-18 |
JP5571784B2 (en) | 2014-08-13 |
JP5795820B2 (en) | 2015-10-14 |
GB2483012B (en) | 2017-10-18 |
US8990597B2 (en) | 2015-03-24 |
US20130185580A1 (en) | 2013-07-18 |
US9032232B2 (en) | 2015-05-12 |
TW201131349A (en) | 2011-09-16 |
DE102010052680A1 (en) | 2011-07-07 |
JP2014222520A (en) | 2014-11-27 |
TWI512448B (en) | 2015-12-11 |
US20130246824A1 (en) | 2013-09-19 |
CN102103484B (en) | 2015-08-19 |
WO2011075246A2 (en) | 2011-06-23 |
GB201119728D0 (en) | 2011-12-28 |
GB2483012A (en) | 2012-02-22 |
KR101410634B1 (en) | 2014-06-20 |
US20110154079A1 (en) | 2011-06-23 |
US8464035B2 (en) | 2013-06-11 |
JP2012531681A (en) | 2012-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102103484B (en) | For enabling the instruction of processor waiting status | |
US20210357214A1 (en) | Methods, apparatus, and instructions for user-level thread suspension | |
CN101105711B (en) | System and method for distributing processing function between main processor and assistant processor | |
CN101454753A (en) | Handling address translations and exceptions for heterogeneous resources | |
EP3588288B1 (en) | A multithreaded processor core with hardware-assisted task scheduling | |
TW201220183A (en) | Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region | |
CN103562870A (en) | Automatic load balancing for heterogeneous cores | |
CN102103525A (en) | Controlling time stamp counter (TSC) offsets for mulitple cores and threads | |
CN106293894B (en) | Hardware device and method for performing transactional power management | |
EP3716046B1 (en) | Technology for providing memory atomicity with low overhead | |
US11048516B2 (en) | Systems, methods, and apparatuses for last branch record support compatible with binary translation and speculative execution using an architectural bit array and a write bit array | |
US20110173420A1 (en) | Processor resume unit | |
US20110173422A1 (en) | Pause processor hardware thread until pin | |
CN116302868A (en) | System, method and apparatus for high-level microarchitectural event performance monitoring using fixed counters | |
Radaideh et al. | Exploiting zero data to reduce register file and execution unit dynamic power consumption in GPGPUs | |
US11880231B2 (en) | Accurate timestamp or derived counter value generation on a complex CPU | |
CN101615115B (en) | Device, method and system for instruction retire | |
US20240103914A1 (en) | Dynamically adjusting thread affinitization using hardware-based core availability notifications | |
CN103235716B (en) | A kind of for detecting the relevant device of pipeline data | |
US20130159740A1 (en) | Electronic device and method for energy efficient status determination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150819 Termination date: 20191217 |