CN1973261A

CN1973261A - Method and apparatus for speculative execution of uncontended lock instructions

Info

Publication number: CN1973261A
Application number: CNA200580021048XA
Authority: CN
Inventors: B·萨哈; M·C·默藤; P·哈马隆德
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2004-06-30
Filing date: 2005-06-17
Publication date: 2007-05-30
Anticipated expiration: 2025-06-17
Also published as: JP2008504603A; WO2006012103A2; US7529914B2; CN101847091A; JP2011175669A; CN101847091B; JP6113705B2; WO2006012103A3; JP2015072717A; CN100593154C; US20060004998A1; DE112005001515T5

Abstract

A method and apparatus for executing lock instructions speculatively in an out-of-order processor are disclosed. In one embodiment, a prediction is made whether a given lock instruction will actually be contended. If not, then the lock instruction may be treated as having a normal load micro-operation which may be speculatively executed. Monitor logic may look for indications that the lock instruction is actually contended. If no such indications are found, the speculative load micro-operation and other micro-operations corresponding to the lock instruction may retire. However, if such indications are in fact found, the lock instruction may be restarted, and the prediction mechanism may be updated.

Description

Be used to predict the method and apparatus of carrying out uncompetitive lock instruction

Technical field

The present invention relates generally to adopt the microprocessor of memory lock instruction (storer carried out reading-revising-write atomic operation), more specifically, relate to the microprocessor of wishing to adopt the memory lock instruction that in unordered execution framework, to carry out.

Background technology

Modern microprocessor can be supported unordered execution in its framework.Each instruction can be decoded as the microoperation of one group of correspondence separately, is stored in before execution then in the rearrangement impact damper.Scheduler can determine which microoperation has been ready to be performed, and can send above-mentioned microoperation not according to procedure order or " disorderly " of strictness.When resignation was prepared in microoperation, they can be retired from office by follow procedure in proper order, therefore, look that they are that follow procedure is carried out in proper order.

An instruction family that throws into question in out-of-order processors before is that family is specified in locking.This lock instruction is asserted certain signal or adopt to carry out the process of atomic memory business processing usually, that is to say, the ad-hoc location of its locking in storer carrying out as this memory location (or cache line of equivalence) of using during the loading of its ingredient and the storage microoperation to prevent other thread accesses on other processors or the same processor.In different embodiment, this signal can comprise bus signals or cache coherent protocol locking (protocollock).The embodiment of these lock instructions requires resignation (retire) all previous instructions before beginning to carry out lock instruction (follow procedure order).The loading of lock instruction and storage microoperation are delayed usually, and this makes them be carried out together as much as possible and retire from office, the storage address of being used by above-mentioned lock instruction with the limiting processor protection or the time of cache line.Yet, stoped the prediction of load micro-operation and any other intervenient microoperation to be carried out like this, therefore, in the critical path of program, added their stand-by period.These embodiments also may stop the prediction of follow-up load operation or other subsequent operations to be carried out, thereby have increased the stand-by period of subsequent operation.In fact, this may mean and anyly be used to support the rearrangement impact damper of unordered processing may fill up pipeline and this pipeline is paused that this has caused the performance of application program further to descend.

Description of drawings

In the accompanying drawings, by way of example and unrestricted mode has illustrated the present invention, and similarly Reference numeral is represented similar elements, wherein:

Fig. 1 is processor according to an embodiment of the invention and the synoptic diagram of carrying out pipeline thereof, wherein shows near the lock contention predictor of working end of line;

Fig. 2 is processor according to an embodiment of the invention and the synoptic diagram of carrying out pipeline thereof, wherein shows near the lock contention predictor of work pipeline is initial;

Fig. 3 is processor according to an embodiment of the invention and the synoptic diagram of carrying out pipeline thereof, wherein shows near the lock contention predictor of working end of line;

Fig. 4 is the constitutional diagram that lock instruction according to an embodiment of the invention is carried out; And

Fig. 5 A and Fig. 5 B are the synoptic diagram according to the system of two embodiment of the present invention, and these systems comprise the processor of supporting lock contention predictor to carry out lock instruction with prediction.

Embodiment

The following description has been described the technology that allows unordered execution lock instruction, and when described lock instruction was competed, unordered execution was favourable.Can think that when surpassing a thread almost side by side attempting to lock same position in the storer in surpassing a processor or same processor, competition has taken place lock instruction.When another thread in another processor or the same processor attempts to visit memory location by another processor or thread locked, can think that competition has taken place lock instruction.This is because memory access that can not determine another processor (or another thread) is locking attempt or common memory access.In the following description, a large amount of details (as the details of logic embodiment, software module allocation, bus and other interface signaling technology and operation) have been provided, so that make the reader understand the present invention more thoroughly.Yet, it will be appreciated by those skilled in the art that when not having these details and still can implement the present invention.In other examples, be not shown specifically control structure, gate level circuit and complete software instruction sequences, to avoid interference to explanation of the present invention.The description of utilizing this paper to comprise need not too much to test, and those skilled in the art just can realize appropriate functional.In certain embodiments, think that the form with the processor of Pentium  compatibility (processor of producing as Intel  company) the satisfactory predicted value of prediction discloses the present invention.Yet, also can the processor of other kinds (as with the processor of Itanium  family compatibility or with the processor of X-Scale  family compatibility) in implement the present invention.

With reference to figure 1, wherein show processor 100 according to an embodiment of the invention and carry out pipeline, the figure shows near the lock contention predictor of the work end of pipeline.In the embodiment in figure 1, show front-end stage 102, decoder stage 104, trace cache 106, rearrangement impact damper (ROB) 108, execution level 112 and retirement stage 114.In other embodiments, other levels can be in pipeline, used, and the order of level can be changed.

Can obtain macro instruction from one-level (L1) high-speed cache 124 by front-end stage 102, and be decoded into one group of corresponding microoperation by decoder stage 104.The form of these microoperations in groups with vestige (trace) can be stored in the trace cache 106, in other embodiments, above-mentioned vestige can be stored in the other forms of impact damper.In further embodiments, can these microoperations in groups be stored in the other forms of impact damper with the form that is different from vestige.When preparing to carry out this group microoperation, it can be loaded among the ROB108.ROB108 can comprise a series of memory locations 150 to 166, and wherein, each memory location can comprise the sign and the execution result (when existing) of microoperation, its source and destination register.In other embodiments, the memory location of different numbers can be set, and the definite form of the content of these memory locations can be different.

Can use scheduler 110 to determine which microoperation among the 150-166 of memory location has their source operand value (thereby allowing to be performed).In one embodiment, scheduler 110 can be checked the state of the source-register of each microoperation among the 150-166 of memory location.Then, scheduler 110 sends those microoperations (at execution level 112) that comprise the valid data that are used for carrying out in its source-register, no matter and their orders (promptly may be " unordered ") in the software of being write.Then, any result that the execution because of these microoperations can be obtained is stored in the corresponding memory location as execution result temporarily.

Each memory location in the memory location 150 to 166 can have relevant " finishing " position 130 to 146, and these can indicate, and execution has been finished in corresponding microoperation and execution gained result is stored among the corresponding memory location 150-166 as execution result temporarily.In one embodiment, completion bit 130-146 can indicate, in case the microoperation corresponding with instruction before (follow procedure order) retired from office, resignation is also prepared in then corresponding with this completion bit microoperation.(microoperation that macro instruction produces still must be retired from office by original program order).The microoperation of preparing resignation can be sent to retirement stage 114.Also the microoperation of calling memory reference can be placed among the memory order buffer memory (MOB) 122.MOB122 can store the memory reference operation of some waits.

Processor among Fig. 1 can be carried out lock instruction.A kind of lock instruction of form can stop other thread accesses this given memory location or cache line in other processors or the multiline procedure processor during to blocked memory location executable operations at processor.In fact, when carrying out above-mentioned instruction, locked this concrete memory location or cache line, to prevent the visit of other processors or thread.Another kind of viewpoint is that the locking of this form allows instruction that atom modification (being commonly referred to atom in the industry reads-revise-write command) is carried out in concrete memory location or cache line.By contrast, these lock instructions can be used as software semaphore (semaphore) and use, so that the mode with semanteme locks other memory locations that cover the more instruction of number: in the art, usually the more instruction of these numbers is called critical section (criticalsection).In one embodiment, lock instruction can be implemented as lock prefix is appended to form before the presumptive instruction.With the framework of Pentium  compatibility in, lock prefix can be added to comprise following kind instruction (wherein, target operand is a memory operand) before: ADD (adding), ADC (add with carry), AND (logical and), BTC (bit test is also negated), BTR (bit test also resets), BTS (bit test and set), CMPXCHG (relatively also exchange), CMPXCH8B (relatively and exchange 8 bytes), DEC (successively decreasing), INC (increasing progressively), NEG (two's complement negate), NOT (one's complement negate), OR (logical OR), SBB (integer subtraction with borrow), SUB (subtracting), XOR (XOR), XADD (exchange and add), and XCHG (swapping memory and register).When other processors of mandatory requirement or thread can not change by the reading-revise of these instruction appointments-when writing the value of the memory location, destination between the funtion part, can make these part of atomsization (appearing as a part) with described lock prefix.

In one embodiment, lock instruction can be decoded into some microoperations, comprise " load_with_store_intent_lock " microoperation and " store_unlock " microoperation.Can there be other microoperations in various instructions for the preceding paragraph is mentioned.For ease of discussing, we can be called " load_with_store_intent_lock " microoperation " load_with_lock " microoperation, and it is designated as load_lock.When entering performance element 112, load_lock microoperation meeting starts the locking situation.When sending the store_unlock microoperation, can remove this locking situation from MOB122.

Previous embodiment can not sent the load_lock microoperation before satisfying two conditions.First condition is that the previous instruction of all in the original program order must executed and resignation.In other words, the load_lock microoperation should be microoperation the most outmoded among the ROB108.Second condition is that the previous storage microoperation of waiting for must be finished among the MOB122, and the content of the memory buffer related with MOB122 emptying (in other words, all storage operations must be written to its data in the accumulator system).For unordered execution, these two conditions are incompatible.

Should be noted that lock instruction often is not is indispensable.Under many circumstances, memory location or cache line keep uncontested state between lockup period: that is to say, do not have particular memory location or cache line in other processors or the thread trial visit locking, and do not have other processor incidents to threaten the integrality of this memory location.Therefore, processor 100 also comprises lock contention predictor 118 and monitor logic 116.Lock contention predictor 118 can predict whether particular lock instruction can compete between lockup period.If prediction thinks that this particular lock instruction can compete, then adopt the previous manner of execution that is used for lock instruction.

Yet, if prediction thinks that in fact this particular lock instruction can not compete, can send normal load micro-operation, and monitor the memory location of being concerned about, to determine whether that any competition having occurred indicates with monitor logic 116 in the mode of prediction.Therefore, the execution command read-revise-when writing part, in fact, we can not lock the memory location and realize atomicity (atomicity), but the independent various piece of carrying out monitors any situation that other processors or thread may destroy atomicity that indicates simultaneously.Described competition indication can comprise: to the cache line of the destination address that comprises load instructions spy upon (snoop), interrupt or not in high-speed cache miss store_unlock subsequently do not operate.In certain embodiments, the some existing logical signal that exists in monitor logic 116 monitoring processors.If the competition indication do not occur in the period of expression equivalent competition situation, then above-mentioned normal load microoperation of sending with prediction mode can normally be retired from office.Like this, just, allow unordered execution lock instruction and improve processor performance.Yet, if competition indication, must clean (flush) pipeline, and re-execute lock instruction.During re-executing, can carry out lock instruction by traditional embodiment nonanticipating ground, carry out forward to help to handle.In another embodiment, before executing instruction in the nonanticipating mode, processor can be attempted carrying out the lock instruction several times with prediction mode, and detects competition when each the execution.When processor executes instruction in the nonanticipating mode (as in traditional embodiment), it is signal or adopt certain process to prevent the memory location that any other thread (or processor) visit is concerned about it can be asserted that.Can guarantee that like this processor is finished described execution and the described lock instruction of retiring from office, and need not any restarting subsequently.If after the prediction of finishing fixed number of times carry out to be attempted, processor was not got back to non-speculative execution, then may be that processor has run into the competition indication and is forced to repeatedly restart described lock instruction in each the execution, thereby stop carrying out forward of handling.

Lock contention predictor 118 can be utilized the theory (correlation theory that comprises local fallout predictor and global prediction device) of the operation of circuit and one of relevant multiple branch predictor.In one embodiment, lock contention predictor 118 can be the table of the linear instruction pointer of some lock instruction of storage, and past attempts finds that there is competition in these lock instructions.In one embodiment, when the processor initialization, described table can be sky, and presupposes all lock instructions and can not compete.When the prediction error found about given lock instruction, the linear instruction pointer of this lock instruction is write in the described table, in order to using in the future.

When the given lock instruction of lock contention predictor 118 predictions can not competed, scheduler 110 can send corresponding load_lock microoperation with prediction mode from ROB108.In one embodiment, can send the load_lock microoperation of correspondence from ROB108 as the load_without_lock microoperation of prediction.Then, two kinds of microoperations all can generate about having the request of corresponding cache line, in certain embodiments, this cause cache line change into exclusive " E " state (use to revise/exclusive/share/high-speed cache of invalid " MESI " cache coherent protocol in).If load micro-operation misses in lowest cache is then distributed fill buffer, and described load micro-operation is carried out " dormancy " as operating in of hang-up among the MOB122.

If hit the load_lock microoperation in high-speed cache, maybe when the cache line filling by correspondence has waken the load_lock microoperation of dormancy in MOB122 up, following situation takes place then.In certain embodiments, have necessity and between execution load_lock and resignation store_unlock, prevent to replace the cache line that comprises lock variable.In one embodiment, in the label (tag) of cache line, can be with position, a position, preventing above-mentioned replacement, but still allow to carry out that the memory order agreement is desired spies upon.Yet, before more outmoded load operation, can carry out the load_lock of one group of above-mentioned prediction, thereby use up all approach (way) in the cache set.Like this, just in cache set, anyly fill the approach of its data, thereby prevented the resignation of more outmoded load operation, because it can't be finished from the next stage high-speed cache for more outmoded load operation stays.The load_lock of above-mentioned prediction can't retire from office, because they are not the most outmoded operations, thereby has formed deadlock.For preventing the appearance of this situation, only when having enough unlocked ways in the described cache set, just send load_lock, so that reserve some available approach at least for more outmoded instruction.If there is no enough unlocked ways are then only just sent load_lock (the same with traditional embodiment) behind all previous Retirements.In one embodiment, must have the load_lock that two available approach just can send prediction at least.

No matter whether load_lock microoperation or load_without_lock microoperation send from ROB108, before described load micro-operation or afterwards, can send corresponding store_unlock microoperation and any microoperation that occupy therebetween from ROB108.Yet the store_unlock microoperation can keep hanging up in MOB122, till load micro-operation and any microoperation that occupy therebetween are in resignation point (at this some place, MOB122 can send the store_unlock microoperation).

If monitor logic 116 determines to have occurred the competition indication, load micro-operation and corresponding store_unlock microoperation then do not allow to retire from office.This means that the prediction that can not compete about lock instruction is wrong.In different embodiment, consider that the time segment length of described competition indication is possible different.In one embodiment, the described period can be operated (corresponding to the store_unlock) overall situation that becomes in memory stores and finished when visible." overall situation as seen " herein means that all agencies (agent) in the cache coherence territory all can see the last look of described memory location.In another embodiment, the described period can finish when store_unlock becomes the most outmoded storage operation among the MOB122.In this second embodiment, become moment of storage operation the most outmoded among the MOB122 and store_unlock at store_unlock and become in blink between the overall visible moment, may need to realize actual locking situation.

In aforementioned embodiments, when memory stores operation becomes the overall situation when visible, the store_unlock microoperation becomes the not microoperation of resignation the most outmoded among the ROB108.Yet, in one embodiment, when memory stores operation becomes the overall situation when visible, the store_unlock microoperation is not the most outmoded not microoperation of resignation among the ROB108, can not retire from office before as seen in the memory stores operation overall situation that becomes because load (have locking or do not have locking) microoperation.Therefore, described load operation is the not microoperation of resignation the most outmoded in the machine.

In another embodiment, can omit lock contention predictor 118.On the contrary, can suppose that lock instruction can not competed in all cases.In all cases, can predict the load micro-operation of carrying out correspondence at first.Under the situation that load instructions is in fact competed, monitor logic 116 can detect the competition indication, and restarts the execution pipeline.Only the lock instruction of the appearance of those indications that constitute competition just can re-execute in the nonanticipating mode.

In another embodiment, can omit monitor logic 116.In this embodiment, cache systems can comprise be used to refuse to lock instruction at the address logic of spying upon of carrying out.Like this, need not to call formal locking, the integrality of the content in the address that just can keep being concerned about.Generate above-mentioned another agency who spies upon and the refusal that it is spied upon can be considered as attempting again after a while the indication spied upon.

With reference now to Fig. 2,, wherein show according to the processor of an embodiment and the synoptic diagram of execution pipeline thereof, the figure shows near the lock contention predictor of work pipeline is initial.Many circuit and Fig. 1 shown in Figure 2 are similar, but lock contention predictor 218 can be used for revising the operation of decoder stage 204.When lock contention predictor 218 determines that lock instruction can not competed, decoder stage 204 is decoded into lock instruction and comprises the conventional load micro-operation and the microoperation of store_unlock microoperation, rather than always lock instruction is decoded into load_lock microoperation and store_unlock microoperation.In certain embodiments, the load micro-operation of above-mentioned routine may occur with the form of the load_lock microoperation that added prompting or other mode bits.Can in trace cache 206, make up vestige with these microoperations then.In other embodiments, above-mentioned microoperation can be stored in the impact damper of another kind of form temporarily.

Monitor logic 216 can carry out with Fig. 1 in monitor logic 116 performed functionally similar functions.Same, if monitor logic 216 determines to have occurred the competition indication, load micro-operation and corresponding store_unlock microoperation then do not allow to retire from office.This means that the prediction that can not compete about lock instruction is wrong.In different embodiment, consider that the time segment length of above-mentioned competition indication is possible different.In one embodiment, this period can be operated (corresponding to the store_unlock) overall situation that becomes in memory stores and finished when visible.In another embodiment, this period can finish when store_unlock becomes the most outmoded storage operation among the MOB122.

If determined the competition indication, then rejuvenation may be different from above in conjunction with the described process of Fig. 1.Trace cache 206 can not from trace cache 206, send lock instruction again when restarting, because may comprise the vestige with load_without_lock microoperation.Must in decoder stage 204, above-mentioned lock instruction be decoded once more, this time, this instruction is decoded into the microoperation of the store_unlock microoperation that comprises load_lock microoperation and correspondence.These microoperations may require to make up new vestige in trace cache 206.

With reference now to Fig. 3,, wherein show according to the processor of an embodiment and the synoptic diagram of execution pipeline thereof, the figure shows near the lock contention predictor of end of line, working.Embodiment among Fig. 3 comprises the amended MOB that is appointed as replay queue 322, with the replay operations in the processor of support and Pentium  4 compatibilities.Replay operations can be repaired incorrect data prediction by re-executing the microoperation of sending with prediction mode, till data prediction comes right.In one embodiment, if monitor logic 316 has indicated competition and has not cleaned pipeline or restart lock instruction, then can reset load_lock and store_unlock microoperation.

In another embodiment, can use checkpoint recovery logic 370 to carry out the checkpoint repairs.The snapshot of storage of processor state when in one embodiment, checkpoint recovery logic 370 can all microoperations before the load_lock microoperation be all retired from office.Carrying out after the checkpoint repairs, form all microoperations of the lock instruction that described prediction carries out and in certain embodiments any subsequent instructions in the program can after finishing, retire from office successively.If monitor logic 316 had indicated competition before memory stores operation (corresponding to the store_unlock) overall situation is visible, this shows necessary cleaning treatment pipeline, processor state when then, the microoperation that is right after with it before the above-mentioned load_lock of (from checkpoint recovery logic 370) recovery is retired from office.Can re-execute any other microoperation of load_lock, store_unlock and the above-mentioned lock instruction of composition.During this re-executes, the above-mentioned microoperation as ingredient can be considered as the microoperation in traditional embodiment, and can carry out these microoperations in the nonanticipating mode.In other embodiments, can in other processors (processor as shown in Fig. 1 and Fig. 2), use checkpoint recovery logic 370.

In one embodiment, when lock contention predictor 318 determines that lock instruction can not competed, can send load_lock microoperation or load_without_lock microoperation with prediction mode from ROB308.No matter whether load_lock microoperation or load_without_lock microoperation send from ROB308, before load micro-operation or afterwards, can send corresponding store_unlock microoperation and any microoperation that occupy therebetween from ROB308.When incorrect data prediction had been created bad address, one or more data check logic 368 sent to replay queue 322 with invalid address signal 372.Invalid address signal 372 can be used with monitor logic 316, to determine the load micro-operation of prediction and the processing of the store_unlock microoperation of correspondence.

When invalid address signal 372 did not detect the competition indication for vacation and monitor logic 316, load micro-operation and store_unlock microoperation can normally be retired from office.When invalid address signal 372 detects the competition indication for vacation and monitor logic 316, can clean the pipeline definiteness order that undoes the lock of laying equal stress on.In another embodiment, if monitor logic 316 detects competition indication, the load micro-operation of then can resetting.Yet when invalid address signal 372 is a true time, the state of monitor logic 316 has nothing to do, because any competition indication all can be associated with misaddress.Thereby, when invalid address signal 372 is a true time, reset, and any competition indication from monitor logic 316 all can not upgraded lock contention predictor 318.

With reference now to Fig. 4,, wherein shows the constitutional diagram of the execution of lock instruction according to an embodiment of the invention.In frame 410, make the prediction that whether can compete about lock instruction.Can compete if prediction is thought, then process is left along contended path 414, and enters frame 460.In frame 460,, the load_lock microoperation is sent to accumulator system when load_lock microoperation resignation and only after the memory buffer unit emptying in all hang-up.Then, in frame 470, normally carry out other microoperations of lock instruction in the nonanticipating mode.

If the prediction of making in the frame 410 thinks that lock instruction can not compete, then process is left along contended path 412 not, and can send load_lock microoperation (or the load_lock microoperation among some embodiment with prediction mode with some prompting that can not compete about described lock operation, or load_lock microoperation among some embodiment, that be deformed into some new microoperation of load_with_uncontended_lock microoperation or load_without_lock microoperation and so on), for execution.Then, in frame 430, when the load_lock microoperation is retired from office, the store_unlock microoperation can be sent to storer.Then, resignation is prepared in this store_unlock microoperation.In one embodiment, this store_unlock microoperation is prepared resignation when visible in the memory stores operation overall situation that becomes.So also allow above-mentioned load_lock microoperation resignation.In another embodiment, above-mentioned store_unlock microoperation is operated to become in memory stores and is prepared resignation when memory order delays the storage microoperation of filling hang-up the most outmoded in the device.And this allows above-mentioned load_lock microoperation resignation.

If above-mentioned store_unlock microoperation is prepared resignation (in one embodiment, when it becomes the overall situation when visible) and do not have any competition indication, then process 432 is left along the path, and above-mentioned load_lock microoperation resignation in frame 440, and with the true renewal prediction logic that predicts the outcome.Yet if the competition indication occurred before resignation is prepared in above-mentioned store_unlock microoperation, program 434 is left along the path, and restarts lock instruction in frame 450, and with the false renewal prediction logic that predicts the outcome.In this re-executes, can as traditional embodiment, carry out lock instruction in the nonanticipating mode, carry out forward to help to handle.

In another embodiment, can omit frame 410,460 and 470.Opposite, can suppose that lock instruction can not competed in all cases.In all cases, carry out corresponding load micro-operation (frame 420) with prediction mode at first.Under the situation that load instructions is in fact competed, monitor logic 160 can detect competition indication, cleans and carries out pipeline, and the definiteness that undoes the lock of laying equal stress on makes (frame 450).Only those have caused the lock instruction of competition indication just can re-execute in the nonanticipating mode.

With reference now to Fig. 5 A and Fig. 5 B,, wherein show according to synoptic diagram two embodiment of the present invention, that comprise the system of the processor of supporting lock contention predictor and monitor logic.System among Fig. 5 A shows the system that processor, storer and input-output apparatus is connected with each other by system bus prevailingly, and the system of Fig. 5 B shows the system that processor, storer and input-output apparatus is connected with each other by some point-to-point interfaces prevailingly.

The system of Fig. 5 A can comprise some processors, wherein, for clarity sake, only shows two processors 40,60.Processor 40,60 can comprise on-chip cache 42,62.The system of Fig. 5 A can have the some functions that are connected with system bus 6 by bus interface 44,64,12,8.In one embodiment, system bus 6 can be the Front Side Bus (FSB) of the Pentium  microprocessor employing of Intel  company production.In other embodiments, can use other buses.In certain embodiments, Memory Controller 34 and bus bridge 32 are referred to as chipset.In certain embodiments, can between each phy chip, divide each function of chipset in the mode of content shown in the embodiment that is different from Fig. 5 A.

Memory Controller 34 allows the Erasable Programmable Read Only Memory EPROM (EPROM) 36 of processor 40,60 read and write system storages 10, basic input/output (BIOS).In certain embodiments, BIOS EPROM36 can utilize flash memory.Memory Controller 34 can comprise bus interface 8, the data of storer read and write is delivered to the bus agent (bus agents) on the system bus 6 and is received above-mentioned data from these agencies allowing.By high performance graphic interface 39, Memory Controller 34 also can link to each other with high performance graphics circuitry 38.In certain embodiments, high performance graphics interface 39 can be the interface of advanced graphics port (AGP) type.Memory Controller 34 can be by high performance graphics interface 39 with data from system storage 10 guiding high performance graphics circuits 38.

The system of Fig. 5 B also can comprise some processors, wherein, for clarity sake, only shows two processors 70,80.

Processor

70,80 comprises local memory controller hub (MCH) 72,82 separately, to be connected to storer 2,4.

Processor

70,80 can and use point-to-

point interface circuit

78,88 swap datas by point-to-point interface 50.

Processor

70,80 can be separately by independent point-to-

point interface

52,54 and use point-to-

point interface circuit

76,94,86,98 and chipset 90 swap datas.Chipset 90 also can be by high performance graphics interface 92 and high performance graphics circuit 38 swap datas.

In the system of Fig. 5 A, the exchanges data that bus bridge 32 allows between system bus 6 and the bus 16, in certain embodiments, above-mentioned bus is that Industry Standard Architecture (ISA) bus or periphery component interconnection (PCI) bus are carried out.In the system of Fig. 5 B, chipset 90 can be by bus interface 96 and bus 16 swap datas.In these two systems, there are various I/O I/O equipment 14 on the bus 16, in certain embodiments, these equipment comprise low performance graphics controller, Video Controller and network controller.In certain embodiments, can use another bus bridge 18 to allow swap data between bus 16 and bus 20.In certain embodiments, bus 20 can be the bus of small computer system interface (SCSI) bus, integrated drive electronics (IDE) bus or USB (universal serial bus) (USB) type.Other I/O equipment can be linked to each other with bus 20.These equipment comprise keyboard and cursor control device 22 (comprising mouse), audio frequency I/O 24, communication facilities 26 (comprising modulator-demodular unit and network interface) and data storage device 28.Software code 30 can be stored on the data storage device 28.In certain embodiments, data storage device 28 can be disk, floppy disk, CD drive, magneto optical driver, tape or the nonvolatile memory of fixing (comprising flash memory).

In the above description, invention has been described in conjunction with specific embodiments.Yet, obviously can carry out various modifications and changes, and be unlikely to deviate from the spirit and scope widely of the present invention that are defined by the following claims the present invention.Therefore, should be considered as this instructions and accompanying drawing illustrative and nonrestrictive.

Claims

1. processor comprises:

Fallout predictor is used to make the prediction whether lock instruction can be competed; And

Scheduler is used for sending one group of microoperation corresponding with described lock instruction with prediction mode when described prediction shows that described lock instruction is not competed.

2. processor according to claim 1, wherein, described scheduler sends the load_with_lock microoperation as the load_without_lock microoperation.

3. processor according to claim 1 also comprises the monitor logic that has been used to determine whether to occur the competition indication.

4. processor according to claim 3, wherein, when described monitor logic determined to have occurred the competition indication, described restart processor was to the processing of described lock instruction.

5. processor according to claim 4, wherein, described competition indication is spying upon the cache line of the destination address that comprises described lock instruction.

6. processor according to claim 4, wherein, storage microoperation when described competition indication is the when described release in the cache miss.

7. processor according to claim 4, wherein, described competition indication is to interrupt.

8. processor according to claim 3, wherein, described monitor logic is stored microoperation and is determined described competition indication before becoming the storage microoperation of the most outmoded not resignation when described release.

9. processor according to claim 3, wherein, described monitor logic is stored microoperation when described release the result overall situation that becomes is determined described competition indication before visible.

10. processor according to claim 1 also comprises and spies upon the refusal logic, is used to refuse spying upon the destination address of described lock instruction.

11. a processor comprises:

Fallout predictor is used to make the prediction whether lock instruction can be competed;

Demoder is used for described lock instruction is decoded into load_without_lock microoperation and storage microoperation; And

Monitor logic is used to determine whether to have occurred the competition indication.

12. processor according to claim 11, wherein, described processor is restarted the processing to described lock instruction when described monitor logic has determined to occur the competition indication.

13. processor according to claim 12, wherein, described competition indication is spying upon the cache line of the destination address that comprises described lock instruction.

14. processor according to claim 12, wherein, described competition indication is the when described storage microoperation in the cache miss.

15. processor according to claim 12, wherein, described competition indication is to interrupt.

16. processor according to claim 11, wherein, described monitor logic was determined described competition indication before described storage microoperation becomes the most outmoded storage microoperation of not retiring from office.

17. processor according to claim 11, wherein, described monitor logic is determined described competition indication in the result of the described storage microoperation overall situation that becomes before visible.

18. a method comprises:

Whether the prediction lock instruction can be competed;

When described prediction thinks that described lock instruction can not competed, send load_without_lock microoperation corresponding to described lock instruction; And

Monitor the competition indication.

19. method according to claim 18 also is included in the execution of restarting when described supervision detects the competition indication described lock instruction.

20. method according to claim 18, wherein, described competition indication is spying upon the cache line of the destination address that comprises described lock instruction.

21. method according to claim 18, wherein, storage microoperation when described competition indication is the when described release in the cache miss.

22. method according to claim 18, wherein, described competition indication is to interrupt.

23. method according to claim 18, wherein, described sending comprises from impact damper and sends described load_without_lock microoperation.

24. method according to claim 23, wherein, microoperation is stored in the described impact damper as load_with_lock in described load_without_lock microoperation.

25. method according to claim 18, wherein, described sending comprises from the described load_without_lock microoperation of decoding of described lock instruction.

26. a device comprises:

The instrument whether the prediction lock instruction can be competed;

When described prediction thinks that described lock instruction sends the instrument corresponding to the load_without_lock microoperation of described lock instruction in the time of can not competing; And

Monitor the instrument of competition indication.

27. equipment according to claim 26 also is included in the instrument of restarting when described supervision detects the competition indication the execution of described lock instruction.

28. equipment according to claim 26, wherein, described competition indication is spying upon the cache line of the destination address that comprises described lock instruction.

29. equipment according to claim 26, wherein, storage microoperation when described competition indication is the when described release in the cache miss.

30. equipment according to claim 26, wherein, described competition indication is to interrupt.

31. equipment according to claim 26, wherein, the described instrument that sends comprises postponing and sends the instrument of described load_without_lock microoperation towards device.

32. equipment according to claim 31, wherein, microoperation is stored in the described impact damper as load_with_lock in described load_without_lock microoperation.

33. equipment according to claim 26, wherein, the described instrument that sends comprises from the decode instrument of described load_without_lock microoperation of described lock instruction.

34. a system comprises:

First processor, this processor comprises fallout predictor and scheduler, described fallout predictor is used to predict whether lock instruction can be competed, and described scheduler is used for sending one group of microoperation corresponding to described lock instruction with prediction mode when described prediction thinks that described lock instruction can not competed;

First interface to second processor;

Second interface to input-output apparatus; And

Be coupled to the audio frequency input-output device of described second interface.

35. system according to claim 34, wherein, described scheduler sends the load_with_lock microoperation as the load_without_lock microoperation.

36. system according to claim 34, wherein, described processor also comprises monitor logic, and this logic is used to determine the competition indication whether occurred before store_with_unlock microoperation resignation.

37. system according to claim 36, wherein, described processor is restarted the processing to described lock instruction when described monitor logic has determined to occur the competition indication.

38. system according to claim 36, wherein, described monitor logic was determined described competition indication before described store_with_unlock microoperation becomes the most outmoded storage microoperation of not retiring from office.

39. system according to claim 36, wherein, described monitor logic is determined described competition indication in the result of the described store_with_unlock microoperation overall situation that becomes before visible.

40. a system comprises:

First processor, comprise and be used to the fallout predictor of predicting whether lock instruction can be competed, be used for described lock instruction is decoded into the demoder of load_without_lock microoperation and storage microoperation, and the monitor logic that is used for before described storage microoperation resignation, having determined whether to occur the competition indication;

First interface to second processor;

Second interface to input-output apparatus; And

41. according to the described system of claim 40, wherein, described processor is restarted the processing to described lock instruction when described monitor logic has determined to occur the competition indication.

42. according to the described system of claim 40, wherein, described monitor logic was determined described competition indication before described storage microoperation becomes the most outmoded storage microoperation of not retiring from office.

43. according to the described processor of claim 40, wherein, described monitor logic is determined described competition indication in the result of the described storage microoperation overall situation that becomes before visible.

44. a processor comprises:

Logic is used for indicating lock instruction at first and does not compete; And

Scheduler is used for sending one group of microoperation corresponding to described lock instruction with prediction mode.

45. according to the described processor of claim 44, wherein, described scheduler sends the load_with_lock microoperation as the load_without_lock microoperation.

46., also comprise the monitor logic that has been used to determine whether to occur the competition indication according to the described processor of claim 44.

47. according to the described processor of claim 46, wherein, described processor is restarted the processing to described lock instruction when described monitor logic has determined to occur the competition indication.

48. according to the described processor of claim 46, wherein, described monitor logic is stored microoperation and is determined described competition indication before becoming the storage microoperation of the most outmoded not resignation when described release.

49. according to the described processor of claim 46, wherein, described monitor logic is stored microoperation when described release the result overall situation that becomes is determined described competition indication before visible.

50. according to the described processor of claim 44, also comprise and spy upon the refusal logic, be used to refuse spying upon to the destination address of described lock instruction.

51. a processor comprises:

Logic is used for indicating lock instruction at first and does not compete;

52. according to the described processor of claim 51, wherein, described processor is restarted the processing to described lock instruction when described monitor logic has determined to occur the competition indication.

53. according to the described processor of claim 51, wherein, described monitor logic was determined described competition indication before described storage microoperation becomes the most outmoded storage microoperation of not retiring from office.

54. according to the described processor of claim 51, wherein, described monitor logic is determined described competition indication in the result of the described storage microoperation overall situation that becomes before visible.

55. a method comprises:

Initial supposition lock instruction can not competed;

Send load_without_lock microoperation corresponding to described lock instruction; And

Monitor the competition indication.

56., also be included in the execution of restarting when described supervision detects the competition indication to described lock instruction according to the described method of claim 55.

57. according to the described method of claim 55, wherein, described sending comprises from impact damper and sends described load_without_lock microoperation.

58. according to the described method of claim 57, microoperation is stored in the described impact damper as load_with_lock in wherein said load_without_lock microoperation.

59. according to the described method of claim 55, wherein, described sending comprises from the described load_without_lock microoperation of decoding of described lock instruction.

60. a device comprises:

The instrument that initial supposition lock instruction can not be competed;

Send instrument corresponding to the load_without_lock microoperation of described lock instruction; And

Monitor the instrument of competition indication.

61., also be included in the instrument of restarting when described supervision detects the competition indication to the execution of described lock instruction according to the described device of claim 60.

62. according to the described equipment of claim 60, wherein, the described instrument that sends comprises postponing and sends the instrument of described load_without_lock microoperation towards device.

63. according to the described equipment of claim 62, wherein, microoperation is stored in the described impact damper as load_with_lock in described load_without_lock microoperation.

64. according to the described equipment of claim 60, wherein, the described instrument that sends comprises from the decode instrument of described load_without_lock microoperation of described lock instruction.