CN102103484A - Instruction for enabling a procesor wait state - Google Patents

Instruction for enabling a procesor wait state Download PDF

Info

Publication number
CN102103484A
CN102103484A CN2010106151670A CN201010615167A CN102103484A CN 102103484 A CN102103484 A CN 102103484A CN 2010106151670 A CN2010106151670 A CN 2010106151670A CN 201010615167 A CN201010615167 A CN 201010615167A CN 102103484 A CN102103484 A CN 102103484A
Authority
CN
China
Prior art keywords
processor
low power
instruction
kernel
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010106151670A
Other languages
Chinese (zh)
Other versions
CN102103484B (en
Inventor
M·G·狄克逊
S·D·罗杰斯
T·巴拉米
S·H·冈瑟
P·塞西
P·哈马尔伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN102103484A publication Critical patent/CN102103484A/en
Application granted granted Critical
Publication of CN102103484B publication Critical patent/CN102103484B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3228Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30083Power or thermal control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Power Sources (AREA)
  • Executing Machine-Instructions (AREA)
  • Debugging And Monitoring (AREA)
  • Microcomputers (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

In one embodiment, the present invention includes a processor having a core with decode logic to decode an instruction prescribing an identification of a location to be monitored and a timer value, and a timer coupled to the decode logic to perform a count with respect to the timer value. The processor may further include a power management unit coupled to the core to determine a type of a low power state based at least in part on the timer value and cause the processor to enter the low power state responsive to the determination. Other embodiments are described and claimed.

Description

Be used to enable the instruction of processor waiting status
Technical field
The present invention relates to be used to enable the instruction of processor waiting status.
Background technology
Along with the development of processor technology, processor becomes the kernel that can utilize bigger quantity.For executive software efficiently, task can be tasked these kernels to carry out the different threads of single application.Such arrangement is called cooperative thread software.In modern cooperative thread software, a thread will wait for that usually another thread finishes.In the usual course, the processor of operation wait thread can be wasted useful power when waiting for.In addition, the time of wait may be uncertain, and therefore processor may not know how long it should wait for.
The mechanism that another kind makes kernel to wait for is to make kernel be in waiting status such as low power state.In order to realize this task, call operation system (OS).OS can carry out a pair of instruction that is called MONITOR instruction and MWAIT instruction.Notice that these instructions are unavailable for application layer software.But these instructions are only used with address realm that supervision is set and are made processor can enter low power state till the address realm that is monitored upgrades in the OS level of privilege.But, enter OS and have significant expense to carry out these instructions.This expense is with the form of high stand-by period, but also can increase complicacy, is not the thread of next scheduling when withdrawing from waiting status because the OS scheduling problem can cause waiting for thread.
Summary of the invention
The present invention relates to a kind of processor, comprising:
Kernel comprises: decode logic, be used for from first use receiving instruction and with described instruction decoding, and described instruction regulation is with the sign and the timer value of the position that is monitored; And timer, be coupled to described decode logic to carry out counting about described timer value; And
Power management block, be coupled to described kernel so that the type of the low power state of described processor is determined on small part ground based on described timer value, if and the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond and describedly determine to make described processor to enter described low power state, get involved and need not operating system (OS).
The present invention relates to a kind of method, comprising:
Use to receive instruction and with described instruction decoding from first in processor, described instruction regulation is with the sign and the timer value of the position that is monitored;
Respond described instruction, in described processor, determine the type of the low power state of described processor at least in part based on described timer value; And
If the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond the described described low power state of determining to enter described processor.
The present invention relates to a kind of system, comprising:
The polycaryon processor that comprises first kernel and second kernel, described first kernel comprises decode logic and timer, described decode logic is used for the user level instruction decoding so that make waiting status is taken place, described user level instruction regulation is with the position and the timer value that are monitored, described timer is coupled to described decode logic to carry out counting about described timer value, described polycaryon processor also comprises Power management logic, described Power management logic is coupled to described first and second kernels so that select one of a plurality of low power states based on described timer value at least in part, get involved and need not operating system (OS), if and the described value that is monitored the position is not equal to desired value, then responds described selection and make described first kernel enter selected low power state;
Be coupled to the dynamic RAM (DRAM) of described polycaryon processor.
The present invention relates to a kind of article that comprise the machine-accessible storage medium, described machine-accessible storage medium comprises instruction, and described instruction makes system when carrying out:
During first thread execution, in first kernel of polycaryon processor, receive regulation instruction is waited in the position that is monitored and the user class processor of timer value;
In described first kernel, determine whether to satisfy the condition that described user class processor is waited for instruction, and if do not satisfy, then enter into the low power state of selecting by the Power management logic of described polycaryon processor;
During second thread execution, updating value on second kernel of described polycaryon processor;
Respond described value and upgrade, withdraw from the described low power state of described first kernel and determine whether to satisfy described condition; And
If satisfy, then continue on described first kernel, to carry out described first thread.
Description of drawings
Fig. 1 is the process flow diagram of method according to an embodiment of the invention.
Fig. 2 is the process flow diagram of testing according to the desired value that one embodiment of the invention is carried out.
Fig. 3 is the block diagram of processor cores according to an embodiment of the invention.
Fig. 4 is the block diagram of processor according to an embodiment of the invention.
Fig. 5 is the block diagram of processor in accordance with another embodiment of the present invention.
Fig. 6 is the mutual process flow diagram between the cooperative thread according to an embodiment of the invention.
Fig. 7 is the block diagram of system according to an embodiment of the invention.
Embodiment
In various embodiments, can provide and use user level instruction (that is application layer instruction) to take place to allow one or more situations of applications wait.In applications wait, the processor (for example, the kernel of polycaryon processor) of carrying out this application can be in low power state or changeable one-tenth is carried out another thread.The situation that processor is waited for can comprise and detects certain value, timer expired or receive look-at-me from for example another processor, but scope of the present invention is unrestricted in this regard.
In this way, application can be waited for one or more operations of carrying out in another thread for example, and need not to obey operating system (OS) or other management software.In addition, based on the command information that provides with this instruction, this pending state can be undertaken by limited mode of time, so that processor can be selected the suitable low power state that will enter.That is, the steering logic of processor itself can be based on command information that is provided and the definite suitable low power state that will enter of the various calculating of carrying out in processor.Therefore, can avoid causing OS gets involved to enter the expense of low power state.Notice that processor need not to wait for another peer processes device, but can wait for coprocessor, for example floating-point coprocessor or other fixed function device.
In various embodiments, user level instruction can have the various information that are associated with it, comprises the position that will monitor, the value that will search and timeout value.For the ease of discussing, this user level instruction can be called processor and wait for instruction, but scope of the present invention is unrestricted in this regard.This user level instruction of different-style can be provided, and every kind of style can be indicated and for example wait for particular value, value set, scope, maybe will be waited for operation (for example, in case value becomes very just counter is increased progressively) and be coupled.
In general, but exercises are carried out in processor answer processor wait instruction, and processor wait instruction can comprise following command information or be associated with following command information: source field, the position of the value that its indication will be tested; Overtime or closing time timer value, the time point (if not reaching the value that will test) that its indication waiting status should finish; And result field, the value that its indication will obtain.In other is used, except these fields, in shielding source value and realization, can there be destination or mask field at predetermined value test source value (for example, whether the result's of shielding masking value is non-zero).
As mentioned above, processor can respond this instruction and carry out various operations.In general, these operations can comprise: whether the value that test is monitored the position is desired value (for example, carrying out boolean operation with test " very " condition); And whether test arrives timer value closing time.If satisfy the arbitrary situation (for example, being " very ") in these situations, if perhaps receive interruption from another entity, then instruction can finish.Otherwise the mechanism that can start this position of supervision is to check whether this value will change.Therefore, at this moment, can enter waiting status.In this pending state, processor can enter low power state, perhaps can cause the execution that starts another processor hardware thread.If want low power state, then processor can be selected suitable low power state based on remaining time quantum before arriving timer closing time at least in part.Then, can enter low power state, and processor can remain in this state till being waken up by one of situation discussed above.Although utilize this generality operation to be described, should be appreciated that in difference realized, various features and operation can differently be carried out.
With reference now to Fig. 1,, the process flow diagram of method according to an embodiment of the invention is shown.As shown in Figure 1, method 100 can be carried out user level instruction by processor and wait for that to handle processor operation realizes.As seen from the figure, method 100 can begin by the instruction (square frame 110) that decoding is received.As an example, instruction can be the user level instruction that is provided by application, and application can be the application that for example utilizes a plurality of threads to realize, each thread is included in the instruction that can have certain interdependent property when carrying out the cooperative thread application.After decoding instruction, processor can be loaded into memory value (square frame 120) in high-speed cache and the register.More particularly, the source operand of instruction can identify the position in the storer that for example will obtain certain value.This value can be loaded in the cache memory, for example with the lower level of cache that is associated of kernel of execution command, as private cache.In addition, this value can be stored in the register of this kernel.As an example, this register can be the general-purpose register of the logic processor of thread.Then, control forwards square frame 130 to.At square frame 130, but response instruction information calculations closing time.More particularly, can be the time quantum that waiting status should be carried out when not satisfying condition (for example, expectation value is not upgraded) this closing time.In one embodiment, order format can comprise the information that timer value closing time is provided.In order to determine the suitable time before arriving this closing time, in some implementations, timer value closing time that is received can be compared with the current time Counter Value (for example, Time Stamp Counter (TSC) value) that is present in the processor.This difference can be loaded in timer closing time, in certain embodiments, closing time, timer can utilize counter or register to realize.In one embodiment, this, timer can be the count down timer that begins to count down closing time.In this is realized, deduct closing time from current TSC value, and count down timer is in so a plurality of cycle timing.When the TSC value surpassed closing time, it triggered restarting of processor.That is,, when closing time, timer was decremented to zero,, then can stop waiting status if waiting status is still carried out this moment as hereinafter discussing.In register was realized, comparer can compare in value and the closing time of each cycle with the TSC counter.
More than operate thereby suitably be provided with the various structures that during waiting status, will visit and test.Therefore, can enter waiting status.This pending state generally is the part of circulation 155, and circulation 155 can be carried out iteratively till one of multiple situation occurring.As seen from the figure, can determine whether mate (diamond 140) with the value that is stored in the register from the desired value of command information.Comprise in the realization of this desired value at command information, can test from storer obtain and be stored in the register data with the value of determining it whether with this desired value coupling.If the coupling, then satisfy this condition, and control forward square frame 195 to, at square frame 195, can finish to wait for the execution of instruction.This order fulfillment can cause in addition and various signs or other value is set to enable the indication to the following code of the reason that withdraws from waiting status.In case order fulfillment, the operation of the thread of request waiting status just can continue.
If opposite, determine not satisfy this condition at diamond 140, then control forwards diamond 150 to, at diamond 150, can determine whether to occur (occur) closing time.If then instruction can finish as mentioned above.Otherwise control forwards diamond 160 to, at diamond 160, can determine whether another nextport hardware component NextPort is managing the wake up process device.If then instruction finishes as mentioned above.Otherwise control forwards square frame 170 to, at square frame 170, can be at least in part based on closing time timer value determine low power state.That is, based on the amount excess time before closing time occurring, processor itself can be determined suitable low power state under the hands off situation of OS.Definite in order to realize this, in certain embodiments, can utilize for example logic of the outer core of processor (uncore).This logic can comprise table or can be related with epiphase, as hereinafter will discussing, this table with various low power states and closing time timer value be associated.Determine that based on this of square frame 170 processor can enter low power state (square frame 180).At low power state, the various structures of processor, the kernel and other assembly that promptly execute instruction all can be in low power state.To be in the ad hoc structure of low power state and the grade of low power state can change with realization.Note, if owing to the value after upgrading is not that desired value travels through this circulation, then can carry out determining of new low power state, because if the only remaining limited amount time based on timer value closing time after upgrading, then entering certain low power state (for example, deep sleep) may be improper.
May occur making kernel to withdraw from the variety of event of low power state.Obviously, if data in buffer (that is, corresponding to being monitored the position) is updated (diamond 190), then can carry out low power state.If then control rotates back into diamond 140.Similarly, if go over (pass) and/or receive wake-up signal from another nextport hardware component NextPort closing time, then control can forward one of diamond 150 and 160 to from low power state.Although the senior realization with it illustrates in the embodiment in figure 1, should be appreciated that scope of the present invention is unrestricted in this regard.
In other is realized, can carry out test to desired value based on mask.That is, user level instruction can impliedly be indicated the desired value that will obtain.As an example, this desired value can be the nonzero value of the masking operation between source value that obtains from storer and the mask value the source/destination operand that is present in this instruction.In one embodiment, if user level instruction can be zero instruction of waiting for (LDMWZ) of loading, shielding of processor ISA.In one embodiment, this instruction can be adopted LDMWZ r32/64, the form of M32/64.In this form, first operand (r32/64) can be stored mask, second operand (M32/64) but identification sources value (that is, being monitored the position).And timeout value can be stored in the 3rd register.For example, closing time can be in the implicit expression register.Specifically, can use the EDX:EAX register, they are identity sets of the register that writes when reading the TSC counter.In general, instruction can be carried out the non-busy poll to semaphore (semaphore) value, and if semaphore unavailable, then enter the low-power waiting status.In difference realizes, can handle step-by-step semaphore and counting semaphore, wherein zero indication does not have thing waiting for.Timeout value can indicate before recovery operation unconditionally processor should wait for the time quantum that the TSC of non zero results measured in the cycle.In one embodiment, can provide the information which concurrent physical processor to be in low power state about to software via memory map registers (for example, configuration and status register (CSR)).
In this embodiment, the LDMWZ instruction will shield it with source/destination value from source memory position loading data, and test to check whether income value serves as zero.If masking value is non-vanishing, then will place unscreened source/destination register from the value of memory load.Otherwise processor will enter the low-power waiting status.Notice that this low power state can or can not correspond to the low power state of current definition, for example according to ACPI (ACPI) the specification version 4 so-called C state in (on June 16th, 2009).Processor can remain in low power state, be till the value of non-zero writes the time of source memory position when making the fixed time disappear, send outside abnormal signal (for example, general interrupt (INTR), maskable interrupts (NMI) or system management interrupt (SMI)) at interval or being used in shielding.As a part that enters this pending state, processor can be removed current memory map registers (CSR) position of waiting for of instruction processorunit.
Owing to be used in value that when shielding produce nonzero value and write and be monitored the position and when waiting status withdraws from, but the null value designator of clear flag register, and unscreened value can be read and place destination register.If timer expires to cause from low power state and withdraws from, then the null value designator of flag register can be arranged to allow this situation of software detection.If occur withdrawing from unusually owing to outside, then the state of processor and storer can be to make this instruction is considered as unenforced state.Therefore, in case turn back to normal execution stream, just will re-execute identical LDMWZ instruction.
With reference now to Fig. 2,, the process flow diagram of the desired value test of carrying out according to a further embodiment of the invention is shown.As shown in Figure 2, method 200 can be by being loaded into source data (square frame 210) beginning in first register.Can shield this source data (square frame 220) with the mask that is present in second register.In various embodiments, first and second registers can be stipulated by instruction, and can correspond respectively to the position of storage source data and destination data.Then, whether the result that can determine masking operation is zero (diamond 230).If then do not meet the desired condition, and processor can enter low power state (square frame 240).Otherwise, source data can be stored into (square frame 250) in second register, and end (square frame 260) is carried out in instruction.
During waiting status, according to definite renewal target location at diamond 265 places, and control rotates back into square frame 220 to carry out masking operation.If determine to have occurred another kind of situation (according to determining of diamond 270 places) during waiting status, then control forwards square frame 260 to finish this instruction.Although utilize this specific implementation to illustrate in the embodiment of Fig. 2, scope of the present invention is unrestricted in this regard.
With reference now to Fig. 3,, the block diagram of processor cores according to an embodiment of the invention is shown.As shown in Figure 3, processor cores 300 can be a multi-stage pipeline formula out-of-order processors.Utilize the view of the relative simplification among Fig. 3 that the various features that processor cores 300 uses according to one embodiment of the invention associative processor waiting status with explanation are shown.
As shown in Figure 3, kernel 300 comprises front end unit 310, and front end unit 310 can be used for extracting pending instruction and they are ready to so that use in processor later on.For example, front end unit 310 can comprise extraction unit 301, instruction cache 303 and instruction decoder 305.In some implementations, front end unit 310 also can comprise trace cache and microcode store equipment and microoperation memory device.Extraction unit 301 can extract macro instruction from for example storer or instruction cache 303, and they are fed to instruction decoder 305 so that they are decoded as primitive, that is, and and for the microoperation of processor execution.A kind of such instruction that will handle in front end unit 310 can be that the user class processor is waited for instruction according to an embodiment of the invention.This instruction can make front end unit can visit various microoperations so that can carry out the operation that is associated with the wait instruction such as above-mentioned.
Be coupling between front end unit 310 and the performance element 320 is to can be used for receiving micro-order and they are ready for unordered (OOO) engine 3 15 of execution.More particularly, OOO engine 3 15 can comprise and is used for micro instruction flow rearrangement and distributes carrying out required various resources and being used for logic register is renamed such as the various impact dampers on the memory location in the various register files of register file 330 and extended pattern register file 335.Register file 330 can comprise the independent register file that is used for integer and floating-point operation.Extended pattern register file 335 can provide the storage of unit to vector magnitude (for example, each register 256 or 512).
Can there be various resources in the performance element 320, comprise for example various integers, floating-point and single instruction multiple data (SIMD) logical block and other specialised hardware.For example, these performance elements can comprise one or more ALUs (ALU) 322.In addition, can there be wakeup logic 324 according to an embodiment of the invention.This wakeup logic can be used for carrying out some operation that relates to when the response user level instruction is carried out the processor standby mode.As hereinafter will further discussing, processor such as another part of outer core in can have the added logic that is used to handle this waiting status.Timer set 326 also is shown among Fig. 3.The relevant timer that is used to analyze comprises the TSC timer here and can use timer closing time that is provided with corresponding to the value of closing time that before closing time, if do not satisfy other condition, then processor will leave waiting status.When closing time, timer reached predetermined count value (in certain embodiments, can be to count down toward 0), wakeup logic 324 can activate some operation.The result can be offered the resignation logic, that is, and resequencing buffer (ROB) 340.More particularly, ROB 340 can comprise various arrays and the logic that is used to receive the information that is associated with performed instruction.Then, ROB 340 checks these information to determine whether to retire from office effectively these instructions and result data submitted to the architecture state of processor, and one or more that the correct resignation of instructing perhaps whether occurs preventing are unusual.Certainly, ROB 340 can handle and other operation of retiring from office and being associated.Wait in the context of instruction that at processor according to an embodiment of the invention resignation can make ROB 340 that the state of one or more designators of flag register or other status register is set, but its instruction processorunit withdraws from the reason of waiting status.
As shown in Figure 3, ROB 340 is coupled to high-speed cache 350, and in one embodiment, high-speed cache 350 can be lower level of cache (as the L1 high-speed cache), but scope of the present invention is unrestricted in this regard.And performance element 320 also can be directly coupled to high-speed cache 350.As seen from the figure, high-speed cache 350 comprises supervision engine 3 52, it can be configured to monitor particular cache line, promptly is monitored the position, and the cache coherence state of renewal on duty, this row changes and/or this row offers wakeup logic 324 (and/or outer nuclear component) with feedback when losing.Monitor that engine 3 52 obtains given row and it is remained on shared state.If once lost this row, then will start waking up to processor from shared state.From high-speed cache 350, can be with more higher level cache, system storage etc. carry out data communication.Although with this high-level illustrating, should be appreciated that among the embodiment of Fig. 3 that scope of the present invention is unrestricted in this regard.
With reference now to Fig. 4,, the block diagram of processor according to an embodiment of the invention is shown.As shown in Figure 4, processor 400 can be to comprise a plurality of kernels 410 a-410 nPolycaryon processor.In one embodiment, each is endorsed as above disposing about the described kernel 300 of Fig. 3 in such.Endorse in each via interconnection 415 and be coupled to the outer core 420 that comprises various assemblies.As seen from the figure, outer core 420 can comprise shared cache 430, and it can be the afterbody high-speed cache.In addition, endorse outward and comprise integrated memory controller 440, various interface 450 and power management block 455.In various embodiments, at least some that can realize waiting for processor that the execution of instruction is associated in power management block 455 are functional.For example, based on the information that receives with this instruction, for example closing time timer value, power management block 455 can determine to carry out the suitable low power state that the given kernel of waiting for instruction will be in.In one embodiment, power management block 455 can comprise the table that timer value is associated with low power state.This table can be searched based on value closing time that determined and instruction is associated in unit 455, and selects corresponding waiting status.Then, power management block 455 can generate a plurality of control signals and enter low power state so that comprise the various assemblies of given kernel and other processor unit.As seen from the figure, processor 400 can be communicated by letter with system storage 460 via for example memory bus.In addition, by interface 450, can be connected to various chips outer assembly, for example peripheral unit, mass-memory unit etc.Although illustrate with this specific implementation in the embodiment of Fig. 4, scope of the present invention is unrestricted in this regard.
In other embodiments, processor architecture can comprise dummy feature so that processor can be carried out the instruction of an ISA who is called source ISA, and wherein this architecture is according to the 2nd ISA that is called target ISA.In general, comprise that the software of OS and application program is observed source ISA, hardware then realizes being in particular the target ISA that the given hardware with property and/or efficiency feature is realized design.
With reference now to Fig. 5,, the block diagram of processor in accordance with another embodiment of the present invention is shown.As seen, system 500 comprises processor 510 and storer 520 in Fig. 5.Storer 520 comprises conventional memory 522 that is used for saved system and application software and the hidden storer 524 that is used to save as the instrumented software of target ISA.As seen from the figure, processor 510 comprises the simulation engine 530 that is used for source code is converted to object code.Emulation can utilize decipher or binary translation to carry out.Decipher is used for code usually when running into code first.Then, when finding frequent code zone (as the hot-zone) of carrying out by dynamic profile, they are translated as target ISA and are stored in the code cache of hidden storer 524.Part as translation process is optimized, and can further optimize code commonly used afterwards.They code block after the translation are kept in the code cache 524, so that can repeatedly re-use.
Still with reference to figure 5, processor 510 can be a kernel of polycaryon processor, and it comprises the programmable counter 540 that is used for instruction pointer address is offered instruction cache (I-high-speed cache) 550.As seen from the figure, I-high-speed cache 550 also can be from the 524 direct receiving target ISA instructions of hidden memory portion when miss given instruction address.Therefore, I-high-speed cache 550 can be stored target ISA instruction, these targets ISA instruction can be offered demoder 560, demoder 560 can be the demoder of target ISA, is that micro-order is to carry out in processor pipeline 570 with the input instruction of reception macro level and with these instruction transformation.Streamline 570 can be to comprise the disordered flow waterline that is used to carry out with the various levels of instruction retired, but scope of the present invention is unrestricted in this regard.Can exist in the streamline 570 such as above-mentioned various performance elements, timer, counter, memory location and monitor and wait for instruction to carry out processor according to an embodiment of the invention.That is,, still can on basic hardware, carry out this instruction even have in the realization of the microarchitecture that the microarchitecture that provides the user class processor to wait for instruction is provided at processor 510.
With reference now to Fig. 6,, illustrates according to the mutual process flow diagram of one embodiment of the invention between cooperative thread.As shown in Figure 6, method 600 for example is used in and carries out a plurality of threads in the multithreading processor.In the context of Fig. 6, two threads, be that thread 1 and thread 2 have single application, and can interdepend, so that the data of using for a thread at first must be upgraded by second thread.Therefore, as seen from the figure, thread 1 can be waited for instruction (square frame 610) at its term of execution receiving processor.The term of execution that this waits for instruction, can determine whether to have satisfied test condition (diamond 620).If do not satisfy, then this thread can enter low power state (square frame 630).Although not shown among Fig. 6, should be appreciated that, can when one of various situations occurring, withdraw from this state.If opposite definite this test condition that satisfied, then control forwards square frame 640 to, at square frame 640, can continue to carry out code and carry out in first thread.Notice when test condition can complete successfully renewal to indicate second thread about being monitored the position.Therefore, before carrying out, do not satisfy test condition, and processor enters low power state about the code shown in the thread 2.
Still with reference to figure 6, about thread 2, it can be carried out and the complementary code of first thread (square frame 650).For example, the second thread executable code is to upgrade the one or more values that can use during first thread execution.Value after upgrading in order to ensure the first thread utilization is carried out, and application can be written as to make the thread of winning enter low power state up to second thread more till the new data.Therefore, during second thread execution, can determine whether to have finished the execution (diamond 660) of complementary code.If not, then continue to carry out complementary code.If finished this complementary code segment on the contrary, then control forwards square frame 670 to, at square frame 670, predetermined value can be written to and be monitored position (square frame 670).For example, this predetermined value can be corresponding to waiting for the test value that instruction is associated with processor.In other embodiments, predetermined value can be such value, and this value makes when with the value shielding that is monitored in the position or when being monitored value in the position as mask, the result is a non-zero, has satisfied test condition and first thread can continue execution to indicate.Still with reference to thread 2, after writing this predetermined value, continue the code of second thread and carry out (square frame 680).Although utilize this specific implementation among the embodiment of Fig. 6 to illustrate, should be appreciated that scope of the present invention is unrestricted in this regard.
Therefore, embodiment enables light-duty stagnation mechanism, and this mechanism allows the processor stagnation one or more predetermined states to occur with wait, gets involved and need not OS.In this way, need not to make application poll semaphore/value true, thereby make processor waste power, and prevent that in the hyperthread machine other thread from utilizing these cycles in the circulation that comprises test, time-out and skip operation, to become.Thereby expense and the schedule constraints (waiting for that using may not be the next thread that will be scheduled) that can avoid OS to monitor.Therefore, between cooperative thread, can carry out light-duty communication, and processor can be selected sleep state flexibly based on the indicated time parameter of user.
Embodiment can realize with many different system types.With reference now to Fig. 7,, the block diagram of system according to an embodiment of the invention is shown.As shown in Figure 7, multicomputer system 700 is point-to-point interconnection systems, and comprises the first processor 770 and second processor 780 via point-to-point interconnection 750 couplings.As shown in Figure 7, each processor 770 and 780 can be a polycaryon processor, and they comprise first and second processor cores (that is, processor cores 774a and 774b and processor cores 784a and 784b), but can have the much more kernel of possibility in the processor.Processor cores can be carried out various instructions, comprises user class processor wait instruction.
Still with reference to figure 7, first processor 770 also comprises Memory Controller hub (MCH) 772 and point-to-point (P-P) interface 776 and 778.Similarly, second processor 780 comprises MCH 782 and P-P interface 786 and 788.As shown in Figure 7, MCH 772 and 782 is coupled to respective memory with processor, that is, storer 732 and storer 734, they can be the parts of this locality primary memory (for example, dynamic RAM (DRAM)) of being attached to respective processor.The first processor 770 and second processor 780 can be coupled to chipset 790 via P-P interconnection 752 and 754 respectively.As shown in Figure 7, chipset 790 comprises P-P interface 794 and 798.
In addition, chipset 790 comprises and being used for by P-P interconnection 739 interfaces 792 of chipset 790 with 738 couplings of high performance graphics engine.Then, chipset 790 can be coupled to first bus 716 via interface 796.As shown in Figure 7, various I/O (I/O) device 714 can be coupled to first bus 716 and bus bridge 718, and bus bridge 718 is coupled to second bus 720 with first bus 716.Various devices can be coupled to second bus 720, comprise keyboard/mouse 722 for example, communicator 726 and such as the data storage cell 728 of disc driver or other mass storage device, in one embodiment, data storage cell 728 can comprise code 730.In addition, audio frequency I/O 724 can be coupled to second bus 720.
Embodiment can realize with code, and can be stored on the storage medium, stores instruction on this storage medium, and these instructions can be used for systems programming for carrying out these instructions.Storage medium can include but not limited to: the dish of any kind comprises floppy disk, CD, CD, solid state drive (SSD), compact disk ROM (read-only memory) (CD-ROM) but rewriteable compact disc (CD-RW) and magneto-optic disk; Semiconductor device, for example ROM (read-only memory) (ROM), random-access memory (ram) (for example, dynamic RAM (DRAM), static RAM (SRAM)), Erasable Programmable Read Only Memory EPROM (EPROM), flash memory, Electrically Erasable Read Only Memory (EEPROM); Magnetic or light-card; Or be suitable for the medium of any other type of store electrons instruction.
Although the embodiment about limited quantity has described the present invention, those skilled in the art will understand numerous modifications and change thus.The claim of enclosing will contain all these and drop on true spirit of the present invention and interior modification and the change of scope.

Claims (24)

1. processor comprises:
Kernel comprises: decode logic, be used for from first use receiving instruction and with described instruction decoding, and described instruction regulation is with the sign and the timer value of the position that is monitored; And timer, be coupled to described decode logic to carry out counting about described timer value; And
Power management block, be coupled to described kernel so that the type of the low power state of described processor is determined on small part ground based on described timer value, if and the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond and describedly determine to make described processor to enter described low power state, get involved and need not operating system (OS).
2. processor as claimed in claim 1 also comprises the supervision engine, and whether described supervision engine is coupled to cache memory and is updated with the row of the described cache memory of determining to comprise the described copy that is monitored the position.
3. processor as claimed in claim 2, copy and wake-up signal after wherein said supervision engine will upgrade are sent to described kernel.
4. whether processor as claimed in claim 3, wherein said kernel determine copy after the described renewal corresponding to described desired value, and if then withdraw from described low power state, otherwise determine new low power state and enter described new low power state.
5. processor as claimed in claim 1, wherein said instruction is a user level instruction, it makes described processor load first value, in described first value and be stored between the data in the destination locations and carry out masking operation, if the result of described masking operation is first result, then enter described low power state, otherwise described processor is loaded into described destination locations with described first value.
6. processor as claimed in claim 5, if wherein described result equals 0, then described processor is provided with the zero designator of flag register.
7. processor as claimed in claim 1, wherein said timer are arranged to the value corresponding to the difference of Time Stamp Counter value and described timer value.
8. processor as claimed in claim 1, wherein said processor comprise the polycaryon processor that comprises the described kernel and second kernel, and wherein said instruction has first thread of carrying out on described kernel, and second thread upgrades the described position that is monitored.
9. processor as claimed in claim 8, wherein said kernel responds is monitored the described renewal of position and withdraws from described low power state described.
10. processor as claimed in claim 9, wherein said kernel are describedly carried out at least one operation of described first thread by the described second thread data updated before being monitored the position after this utilizing to upgrade at described second thread.
11. a method comprises:
Use to receive instruction and with described instruction decoding from first in processor, described instruction regulation is with the sign and the timer value of the position that is monitored;
Respond described instruction, in described processor, determine the type of the low power state of described processor at least in part based on described timer value; And
If the described value that is monitored the position is not equal to desired value and described timer value is not gone over, then respond the described described low power state of determining to enter described processor.
12. method as claimed in claim 11, wherein said instruction are also stipulated the described described desired value that is monitored the position.
13. method as claimed in claim 11 comprises that also the described timer value of response withdraws from described low power state in the past.
14. method as claimed in claim 11, also comprise and when the described value that is monitored the position equals described desired value, withdraw from described low power state, comprise from the supervision engine of the cache memory of described processor and receive wake-up signal, when the storing value of the cache line that comprises the described copy that is monitored the position changed, described supervision engine sent described wake-up signal.
15. method as claimed in claim 11, also comprise: based on the information in the table with a plurality of clauses and subclauses, utilize the power management block (PMU) of described processor to select the type of described low power state from a plurality of low power states, each clauses and subclauses in described a plurality of clauses and subclauses are associated low power state with timer value; And will send to the kernel of described processor so that described kernel enters described low power state from least one control signal of described PMU.
16. method as claimed in claim 11 also comprises: receive wake-up signal from second processor that is coupled to described processor; And respond described wake-up signal and withdraw from described low power state.
17. a system comprises:
The polycaryon processor that comprises first kernel and second kernel, described first kernel comprises decode logic and timer, described decode logic is used for the user level instruction decoding so that make waiting status is taken place, described user level instruction regulation is with the position and the timer value that are monitored, described timer is coupled to described decode logic to carry out counting about described timer value, described polycaryon processor also comprises Power management logic, described Power management logic is coupled to described first and second kernels so that select one of a plurality of low power states based on described timer value at least in part, get involved and need not operating system (OS), if and the described value that is monitored the position is not equal to desired value, then responds described selection and make described first kernel enter selected low power state;
Be coupled to the dynamic RAM (DRAM) of described polycaryon processor.
18. system as claimed in claim 17, the described user level instruction of wherein said first kernel responds is carried out masking operation between first operand and second operand, if and the result of described masking operation is not described desired value, then enter selected low power state.
19. system as claimed in claim 18 also comprises the watchdog logic that is coupled to described first kernel, the response of described watchdog logic is monitored the renewal of position and makes described first kernel withdraw from described low power state described.
20. system as claimed in claim 19, wherein when being monitored that cache line that the position is associated has been updated or the coherency state of described cache line when being updated with described, described watchdog logic sends to described first kernel with wake-up signal.
21. article that comprise the machine-accessible storage medium, described machine-accessible storage medium comprises instruction, and described instruction makes system when carrying out:
During first thread execution, in first kernel of polycaryon processor, receive regulation instruction is waited in the position that is monitored and the user class processor of timer value;
In described first kernel, determine whether to satisfy the condition that described user class processor is waited for instruction, and if do not satisfy, then enter into the low power state of selecting by the Power management logic of described polycaryon processor;
During second thread execution, updating value on second kernel of described polycaryon processor;
Respond described value and upgrade, withdraw from the described low power state of described first kernel and determine whether to satisfy described condition; And
If satisfy, then continue on described first kernel, to carry out described first thread.
22. article as claimed in claim 21 also comprise the instruction that makes described system can carry out following steps: make described first kernel responds be monitored the renewal of position and withdraw from described low power state and utilize the described condition of described value refresh test to described.
23. article as claimed in claim 22, also comprise the instruction that makes described system can carry out following steps:, determine to the described described renewal that is monitored the position when being monitored that cache line that the position is associated has been updated or the coherency state of described cache line when being updated with described; And response makes described first kernel withdraw from described low power state to the described described renewal that is monitored the position.
24. article as claimed in claim 21, also comprise the instruction that makes described system can carry out following steps: based on the information in the table with a plurality of clauses and subclauses, utilize described Power management logic to select described low power state from a plurality of low power states, each clauses and subclauses in described a plurality of clauses and subclauses are associated low power state with timer value; And send at least one control signal so that described first kernel enters described low power state.
CN201010615167.0A 2009-12-18 2010-12-17 For enabling the instruction of processor waiting status Expired - Fee Related CN102103484B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/641,534 US8464035B2 (en) 2009-12-18 2009-12-18 Instruction for enabling a processor wait state
US12/641534 2009-12-18

Publications (2)

Publication Number Publication Date
CN102103484A true CN102103484A (en) 2011-06-22
CN102103484B CN102103484B (en) 2015-08-19

Family

ID=44152840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010615167.0A Expired - Fee Related CN102103484B (en) 2009-12-18 2010-12-17 For enabling the instruction of processor waiting status

Country Status (8)

Country Link
US (3) US8464035B2 (en)
JP (2) JP5571784B2 (en)
KR (1) KR101410634B1 (en)
CN (1) CN102103484B (en)
DE (1) DE102010052680A1 (en)
GB (1) GB2483012B (en)
TW (1) TWI512448B (en)
WO (1) WO2011075246A2 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750225A (en) * 2013-12-31 2015-07-01 联想(北京)有限公司 Processor and processing method thereof
CN105589336A (en) * 2014-11-07 2016-05-18 三星电子株式会社 Multi-Processor Device
CN107430425A (en) * 2015-04-16 2017-12-01 英特尔公司 For adjusting the apparatus and method of processor power utilization rate based on network load
CN108369495A (en) * 2015-12-22 2018-08-03 英特尔公司 Hardware for floating-point operation eliminates monitor
CN104750225B (en) * 2013-12-31 2018-08-31 联想(北京)有限公司 The processing method and processor of processor
CN109661656A (en) * 2016-09-30 2019-04-19 英特尔公司 Method and apparatus for the intelligent storage operation using the request of condition ownership
CN110214299A (en) * 2017-01-30 2019-09-06 国际商业机器公司 Processor economize on electricity during waiting event
CN110471699A (en) * 2011-12-23 2019-11-19 英特尔公司 The instruction execution of broadcast and mask is carried out to data value under different granular levels
CN113867518A (en) * 2021-09-15 2021-12-31 珠海亿智电子科技有限公司 Processor low-power consumption blocking type time delay method, device and readable medium

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10621092B2 (en) 2008-11-24 2020-04-14 Intel Corporation Merging level cache and data cache units having indicator bits related to speculative execution
US9672019B2 (en) 2008-11-24 2017-06-06 Intel Corporation Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads
US8464035B2 (en) * 2009-12-18 2013-06-11 Intel Corporation Instruction for enabling a processor wait state
US8775153B2 (en) * 2009-12-23 2014-07-08 Intel Corporation Transitioning from source instruction set architecture (ISA) code to translated code in a partial emulation environment
US8977878B2 (en) * 2011-05-19 2015-03-10 Texas Instruments Incorporated Reducing current leakage in L1 program memory
US9207730B2 (en) * 2011-06-02 2015-12-08 Apple Inc. Multi-level thermal management in an electronic device
WO2013048468A1 (en) 2011-09-30 2013-04-04 Intel Corporation Instruction and logic to perform dynamic binary translation
US9063760B2 (en) * 2011-10-13 2015-06-23 International Business Machines Corporation Employing native routines instead of emulated routines in an application being emulated
US9829951B2 (en) 2011-12-13 2017-11-28 Intel Corporation Enhanced system sleep state support in servers using non-volatile random access memory
WO2013101165A1 (en) * 2011-12-30 2013-07-04 Intel Corporation Register error protection through binary translation
WO2013145282A1 (en) * 2012-03-30 2013-10-03 富士通株式会社 Data processing device
US20140075163A1 (en) * 2012-09-07 2014-03-13 Paul N. Loewenstein Load-monitor mwait
JP5715107B2 (en) * 2012-10-29 2015-05-07 富士通テン株式会社 Control system
DE112012007058T5 (en) * 2012-12-19 2015-08-06 Intel Corporation Vector mask-driven clock gating for power efficiency of a processor
US9164565B2 (en) 2012-12-28 2015-10-20 Intel Corporation Apparatus and method to manage energy usage of a processor
US9081577B2 (en) 2012-12-28 2015-07-14 Intel Corporation Independent control of processor core retention states
US9405551B2 (en) 2013-03-12 2016-08-02 Intel Corporation Creating an isolated execution environment in a co-designed processor
JP6175980B2 (en) * 2013-08-23 2017-08-09 富士通株式会社 CPU control method, control program, and information processing apparatus
US9507404B2 (en) * 2013-08-28 2016-11-29 Via Technologies, Inc. Single core wakeup multi-core synchronization mechanism
US9891936B2 (en) 2013-09-27 2018-02-13 Intel Corporation Method and apparatus for page-level monitoring
US9513904B2 (en) 2013-10-15 2016-12-06 Mill Computing, Inc. Computer processor employing cache memory with per-byte valid bits
CN105094747B (en) * 2014-05-07 2018-12-04 阿里巴巴集团控股有限公司 The device of central processing unit based on SMT and the data dependence for detection instruction
US10467011B2 (en) * 2014-07-21 2019-11-05 Intel Corporation Thread pause processors, methods, systems, and instructions
KR102476357B1 (en) 2015-08-06 2022-12-09 삼성전자주식회사 Clock management unit, integrated circuit and system on chip adopting the same, and clock managing method
US11023233B2 (en) 2016-02-09 2021-06-01 Intel Corporation Methods, apparatus, and instructions for user level thread suspension
US10185564B2 (en) 2016-04-28 2019-01-22 Oracle International Corporation Method for managing software threads dependent on condition variables
US11061730B2 (en) * 2016-11-18 2021-07-13 Red Hat Israel, Ltd. Efficient scheduling for hyper-threaded CPUs using memory monitoring
US10394678B2 (en) 2016-12-29 2019-08-27 Intel Corporation Wait and poll instructions for monitoring a plurality of addresses
US11086672B2 (en) * 2019-05-07 2021-08-10 International Business Machines Corporation Low latency management of processor core wait state
CN113986663A (en) * 2021-10-22 2022-01-28 上海兆芯集成电路有限公司 Electronic device and power consumption control method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1955931A (en) * 2005-09-30 2007-05-02 科威尔公司 Scheduling in a multicore architecture
CN101203831A (en) * 2005-06-23 2008-06-18 英特尔公司 Primitives to enhance line-level speculation
CN101458558A (en) * 2007-12-10 2009-06-17 英特尔公司 Transitioning a processor package to a low power state
US20090235105A1 (en) * 2008-03-11 2009-09-17 Alexander Branover Hardware Monitoring and Decision Making for Transitioning In and Out of Low-Power State

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001318742A (en) * 2000-05-08 2001-11-16 Mitsubishi Electric Corp Computer system and computer readable recording medium
US7127561B2 (en) * 2001-12-31 2006-10-24 Intel Corporation Coherency techniques for suspending execution of a thread until a specified memory access occurs
US7363474B2 (en) 2001-12-31 2008-04-22 Intel Corporation Method and apparatus for suspending execution of a thread until a specified memory access occurs
US7213093B2 (en) 2003-06-27 2007-05-01 Intel Corporation Queued locks using monitor-memory wait
JP4376692B2 (en) 2004-04-30 2009-12-02 富士通株式会社 Information processing device, processor, processor control method, information processing device control method, cache memory
GB2414573B (en) 2004-05-26 2007-08-08 Advanced Risc Mach Ltd Control of access to a shared resource in a data processing apparatus
US8607241B2 (en) * 2004-06-30 2013-12-10 Intel Corporation Compare and exchange operation using sleep-wakeup mechanism
US7810083B2 (en) * 2004-12-30 2010-10-05 Intel Corporation Mechanism to emulate user-level multithreading on an OS-sequestered sequencer
US8607235B2 (en) * 2004-12-30 2013-12-10 Intel Corporation Mechanism to schedule threads on OS-sequestered sequencers without operating system intervention
US8719819B2 (en) * 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US8516483B2 (en) * 2005-05-13 2013-08-20 Intel Corporation Transparent support for operating system services for a sequestered sequencer
US8010969B2 (en) * 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
US8028295B2 (en) * 2005-09-30 2011-09-27 Intel Corporation Apparatus, system, and method for persistent user-level thread
US7941681B2 (en) * 2007-08-17 2011-05-10 International Business Machines Corporation Proactive power management in a parallel computer
US9081687B2 (en) * 2007-12-28 2015-07-14 Intel Corporation Method and apparatus for MONITOR and MWAIT in a distributed cache architecture
DE102009001142A1 (en) * 2009-02-25 2010-08-26 Robert Bosch Gmbh Electromechanical brake booster
US8156275B2 (en) * 2009-05-13 2012-04-10 Apple Inc. Power managed lock optimization
US8464035B2 (en) * 2009-12-18 2013-06-11 Intel Corporation Instruction for enabling a processor wait state

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101203831A (en) * 2005-06-23 2008-06-18 英特尔公司 Primitives to enhance line-level speculation
CN1955931A (en) * 2005-09-30 2007-05-02 科威尔公司 Scheduling in a multicore architecture
CN101458558A (en) * 2007-12-10 2009-06-17 英特尔公司 Transitioning a processor package to a low power state
US20090235105A1 (en) * 2008-03-11 2009-09-17 Alexander Branover Hardware Monitoring and Decision Making for Transitioning In and Out of Low-Power State

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471699A (en) * 2011-12-23 2019-11-19 英特尔公司 The instruction execution of broadcast and mask is carried out to data value under different granular levels
CN110471699B (en) * 2011-12-23 2023-07-28 英特尔公司 Processor core, method and system for instruction processing
US11709961B2 (en) 2011-12-23 2023-07-25 Intel Corporation Instruction execution that broadcasts and masks data values at different levels of granularity
CN104750225A (en) * 2013-12-31 2015-07-01 联想(北京)有限公司 Processor and processing method thereof
CN104750225B (en) * 2013-12-31 2018-08-31 联想(北京)有限公司 The processing method and processor of processor
CN105589336B (en) * 2014-11-07 2021-01-01 三星电子株式会社 Multi-processor device
CN105589336A (en) * 2014-11-07 2016-05-18 三星电子株式会社 Multi-Processor Device
CN107430425B (en) * 2015-04-16 2022-09-23 英特尔公司 Apparatus and method for adjusting processor power usage based on network load
CN107430425A (en) * 2015-04-16 2017-12-01 英特尔公司 For adjusting the apparatus and method of processor power utilization rate based on network load
CN108369495A (en) * 2015-12-22 2018-08-03 英特尔公司 Hardware for floating-point operation eliminates monitor
CN109661656A (en) * 2016-09-30 2019-04-19 英特尔公司 Method and apparatus for the intelligent storage operation using the request of condition ownership
US11550721B2 (en) 2016-09-30 2023-01-10 Intel Corporation Method and apparatus for smart store operations with conditional ownership requests
CN109661656B (en) * 2016-09-30 2023-10-03 英特尔公司 Method and apparatus for intelligent storage operation with conditional ownership request
CN110214299A (en) * 2017-01-30 2019-09-06 国际商业机器公司 Processor economize on electricity during waiting event
CN110214299B (en) * 2017-01-30 2023-07-14 国际商业机器公司 Processor power saving during a wait event
CN113867518A (en) * 2021-09-15 2021-12-31 珠海亿智电子科技有限公司 Processor low-power consumption blocking type time delay method, device and readable medium

Also Published As

Publication number Publication date
KR20120110120A (en) 2012-10-09
WO2011075246A3 (en) 2011-08-18
JP5571784B2 (en) 2014-08-13
JP5795820B2 (en) 2015-10-14
GB2483012B (en) 2017-10-18
US8990597B2 (en) 2015-03-24
US20130185580A1 (en) 2013-07-18
US9032232B2 (en) 2015-05-12
TW201131349A (en) 2011-09-16
DE102010052680A1 (en) 2011-07-07
JP2014222520A (en) 2014-11-27
TWI512448B (en) 2015-12-11
US20130246824A1 (en) 2013-09-19
CN102103484B (en) 2015-08-19
WO2011075246A2 (en) 2011-06-23
GB201119728D0 (en) 2011-12-28
GB2483012A (en) 2012-02-22
KR101410634B1 (en) 2014-06-20
US20110154079A1 (en) 2011-06-23
US8464035B2 (en) 2013-06-11
JP2012531681A (en) 2012-12-10

Similar Documents

Publication Publication Date Title
CN102103484B (en) For enabling the instruction of processor waiting status
US20210357214A1 (en) Methods, apparatus, and instructions for user-level thread suspension
CN101105711B (en) System and method for distributing processing function between main processor and assistant processor
CN101454753A (en) Handling address translations and exceptions for heterogeneous resources
EP3588288B1 (en) A multithreaded processor core with hardware-assisted task scheduling
TW201220183A (en) Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region
CN103562870A (en) Automatic load balancing for heterogeneous cores
CN102103525A (en) Controlling time stamp counter (TSC) offsets for mulitple cores and threads
CN106293894B (en) Hardware device and method for performing transactional power management
EP3716046B1 (en) Technology for providing memory atomicity with low overhead
US11048516B2 (en) Systems, methods, and apparatuses for last branch record support compatible with binary translation and speculative execution using an architectural bit array and a write bit array
US20110173420A1 (en) Processor resume unit
US20110173422A1 (en) Pause processor hardware thread until pin
CN116302868A (en) System, method and apparatus for high-level microarchitectural event performance monitoring using fixed counters
Radaideh et al. Exploiting zero data to reduce register file and execution unit dynamic power consumption in GPGPUs
US11880231B2 (en) Accurate timestamp or derived counter value generation on a complex CPU
CN101615115B (en) Device, method and system for instruction retire
US20240103914A1 (en) Dynamically adjusting thread affinitization using hardware-based core availability notifications
CN103235716B (en) A kind of for detecting the relevant device of pipeline data
US20130159740A1 (en) Electronic device and method for energy efficient status determination

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150819

Termination date: 20191217