CN101901132B - Microprocessor and correlation storage method - Google Patents
Microprocessor and correlation storage method Download PDFInfo
- Publication number
- CN101901132B CN101901132B CN 201010247338 CN201010247338A CN101901132B CN 101901132 B CN101901132 B CN 101901132B CN 201010247338 CN201010247338 CN 201010247338 CN 201010247338 A CN201010247338 A CN 201010247338A CN 101901132 B CN101901132 B CN 101901132B
- Authority
- CN
- China
- Prior art keywords
- mentioned
- written
- storage
- instruction
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Advance Control (AREA)
Abstract
The present invention provides a microprocessor. The microprocessor contains a queue, including a plurality of items for preserving storage information of storage instructions. The storage information defines the source of a plurality of operands for calculating a storage address. The storage instruction defines storage data to be stored on the memory location defined by the storage address. The microprocessor also includes a control logic unit coupled to the queue for receiving a load instruction. The load instruction includes load information for defining the source of the plurality of operands for calculating the storage address. The control logic unit is used to detect the load information corresponding to the storage information stored by an effective item of the queue items, and accordingly forecast the storage data transferred by the microprocessor defined by the storage instruction of the storage information corresponding to the load information to the load instruction.
Description
Technical field
The present invention relates to field of microprocessors, particularly relate to being stored to of microprocessor and be written into (store-to-load forwarding) mechanism of passing on.
Background technology
Program is used frequently and is stored and be written into instruction.A save command is moved data to storer from a register of microprocessor, and one is written into instruction data are moved a register to microprocessor from storer.The microprocessor crossfire that executes instruction frequently, in the instruction crossfire one or more save command be in one be written into instruction before, wherein be written into the loaded data of instruction and be with the stored data of one or more previous save command on identical memory location.In these examples, for correct executive routine, microprocessor must guarantee to be written into the storage data that command reception is produced by up-to-date previous save command.A kind of method of correct executive routine is to suspend (stall) with being written into instruction, till save command writes to data storer (that is: system storage or cache memory), and then is written into instruction reads data from storer.Yet this is not a very dynamical method.Therefore, Xian Dai microprocessor transmits storage data to the functional element (for example: is written into the unit) that is written into the instruction place from the functional element (for example: stores formation) at save command place.This is commonly referred to a storage and passes on (store forwarding) operation or store to pass on or to be stored to be written into and pass on.
Whether need to pass on the data to that store to be written into instruction in order to detect, microprocessor is written into the storing memory address of storage address and older save command to check whether both are consistent with comparison.In order to be absolutely correct, microprocessor need be compared the physical address (physical address) that is written into and the physical address that stores.Yet, will be written into virtual address (virtual address) conversion (translate) and become to be written into the suitable spended time of physical address.Therefore, for fear of the comparison of deferred address, the microprocessor in modern times comparison is written into virtual address and older storage virtual address, will be written into virtual address translation simultaneously and become to be written into physical address, and store transfer of data according to the comparison of virtual address.Whether the storage that the comparison that microprocessor is then carried out physical address is carried out according to the comparison of virtual address with check passes on correct, or judges that this passes on for incorrect and re-execute (replay) this is written into to right the wrong.
In addition, because the comparison of virtual address completely is (and causing power supply and the loss of wafer real resource) very consuming time, and may influence the maximum clock pulse frequency of microprocessor operation, modern microprocessor tendency only compares a part of virtual address, but not whole virtual addresses.This will cause store collision (collision) to detect wrong and incorrect increase of passing on.Yet, still need more to detect the correct method of store collision to realize storing the purpose of passing on.
In addition, utilize and to carry out storage based on the framework of virtual address comparison and pass on the required time and may be hidden by virtual-to-physical address switching time (that is: TLB tables look-up the time) and cache marks and the data array time of tabling look-up.Yet it no longer is true supposing aforementioned, needs a replacement scheme to detect store collision to realize storing the purpose of passing on.
At last, the store collision detection framework based on the virtual address comparison needs an a large amount of relatively address comparator, microprocessor wafer (die) space and power supply that its waste is a large amount of relatively.Therefore, need effective ways that reduce wafer physical resource and the loss of power to realize storing the purpose of passing on to detect store collision.
Summary of the invention
The embodiment of the invention provides a kind of microprocessor.This microprocessor comprises a formation, comprises a plurality of projects, and each above-mentioned project is in order to keep the storage information of a save command.Above-mentioned storage information is specified the source in order to a plurality of operands that calculate a storage address.Above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address.Microprocessor also comprises the steering logic unit, is coupled to above-mentioned formation, is written into instruction in order to receive one.Above-mentionedly be written into instruction and comprise that appointment is in order to calculate the information that is written in source that is written into a plurality of operands (operand) of address.Above-mentioned steering logic unit is in order to detecting the above-mentioned above-mentioned storage information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry, and predicts correspondingly that above-mentioned microprocessor should pass on and be written into instruction by storing the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information of information conforms to above-mentioned.
Another embodiment of the present invention provides a kind of storage method, in order to store transfer of data in a microprocessor.Method comprises a crossfire that receives instruction with procedure order.Dispose one of them of a plurality of projects of a formation for each save command in this crossfire that receives, and the storage information of inserting is to the project of this configuration.Above-mentioned storage information is specified the source in order to a plurality of operands that calculate a storage address.Above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address.Storage method also comprises one in this crossfire of reception and is written into instruction.Above-mentionedly be written into instruction and comprise that appointment is in order to calculate the information that is written in source that is written into a plurality of operands of address.Storage method also comprises and detects the above-mentioned above-mentioned storage information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry, and predicts correspondingly that above-mentioned microprocessor should pass on and be written into instruction by storing the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information of information conforms to above-mentioned.
For above-mentioned and other purposes of the present invention, feature and advantage can be become apparent, cited below particularlyly go out preferred embodiment, and be described with reference to the accompanying drawings as follows.
Description of drawings
Fig. 1 shows a calcspar according to a microprocessor of the embodiment of the invention.
Fig. 2 shows the detailed block diagram that is written into the unit and stores the streamline of formation of the microprocessor of Fig. 1.
Fig. 3 shows the detailed block diagram that is written into the unit and stores the streamline (pipeline) of formation of an existing microprocessor.
Fig. 4 shows a calcspar according to a project of the forward address source formation (FASQ) of Fig. 1 of the embodiment of the invention.
Fig. 5 shows a RAT operational flowchart according to Fig. 1.
Fig. 6 shows the operational flowchart according to the microprocessor of Fig. 1.
Fig. 7 shows the operational flowchart of the microprocessor of Fig. 1, is written into instruction in order to according to the comparison of source, address data are transferred to one from a save command.
Fig. 8 shows the calcspar that re-executes a project of historical formation (FRHQ) according to passing on of Fig. 1.
Fig. 9 shows the operational flowchart according to the microprocessor of Fig. 1, in order to dispose and to insert the project of the FRHQ shown in Fig. 8.
Figure 10 shows the operational flowchart of the microprocessor of Fig. 1, in order to use the project among the FRHQ.
Figure 11 shows the operational flowchart according to the microprocessor of Fig. 1.
The reference numeral explanation
100~microprocessor; 106~instruction cache; 108~instruction decoder;
134~register alias table; 136~reservation station; 138~performance element;
158~interdependent information;
162~structure register; 164~result; 166,168~state signal;
172~resequencing buffer; 176~pass on the path;
182~memory sub-system; 183~storage element; 184~storage formation; 185~be written into the unit; 186~data caching; 188~dependence generator;
The source formation of 192~forward address; 194~pass on and re-execute historical formation (FRHQ); 195~be written into the instruction address operand; 196~store and to pass on fallout predictor; 197~be written into instruction; 198~meet the ROB index of storage; 199~meet the ROB index of storage;
202~storage ROB index; 204~ROB index comparer; 206~meet ROB index item designator;
226~storage data; 228~multiplexer; 222~address generator; 224~be written into virtual address;
246~translate query buffer; 248~be written into physical address;
262~cache data array; 263~cache marks array; 264~cached data; 265~transfer of data; 266~multiplexer; 267~storage physical address; 268~physical address comparer; 269~physical address meets designator;
286~steering logic unit;
302~storage virtual address; 304~virtual address comparer; 306~virtual address meets designator;
402~project; 404~significance bit; 406~srcA field; 408~srcB field; 412~displacement field; 414~displacement significance bit; 416~index field;
504,506,508,512,514,516,518,522~execution in step;
602,604,606~execution in step;
702,704,706,708,712,714,716,718,722,724,726,728,732,734,736,738~execution in step;
802~project; 804~significance bit; 806~instruction pointer; 808~ROB index error field;
902,904,906,908~execution in step;
1002,1004,1006~execution in step;
1102,1104,1106~execution in step.
Embodiment
The embodiment of following description provides two kinds of basic solutions, and each can solve one or more aforesaid prior art problems.
First kind of solution is that comparison can be calculated the information that is written into the source that stores the address in order to definition, but not compares these addresses itself.The benefit of this method has been to remove to store passes on the virtual address of the critical path judged and calculates and may use less and/or less comparer, and it can save wafer physical resource (die real estate) and the loss of power.
Second workaround is to keep (maintain) nearest of instruction of being written into to re-execute history and re-execute data of historical prediction according to this and should be transferred to a save command that is written into instruction.The benefit of this method be (at least one embodiment) by remove store pass on virtual address computing time of judging the path and by comparison than virtual address comparison framework figure place still less, can reduce to store and pass on the time.The method also may be used less and/or less comparer, and it can save wafer physical resource and the loss of power.At last, this solution can be compared framework than virtual address and more correctly detects store collision and realize storing the purpose of passing on.
These two kinds of methods also can be used in combination simultaneously.This two kinds of methods are below described.
Generally speaking, microprocessor 100 (referring to Fig. 1) is predictably carried out and is written into instruction.That is to say that microprocessor 100 hypothesis cache hit (hit) are sent out in being written into instruction and allowing to be written into instruction, need not to depend on the older save command that may have the data of being written into; Then if be written into instruction continuous miss (miss), 100 of microprocessors re-execute this and are written into instruction.When the storage address of an older save command can't be in order to when being written into the instruction comparison, being written into the unit and finishing to be written into and instruct to resequencing buffer (reorder buffer, ROB) 172 (referring to Fig. 1); Yet, when older save command is prepared by resignation, its with inspection be written into instruction queue and detection need its address but also do not take of its address new be written into instruction; Therefore, ROB 172 re-executes this and is written into instruction.That is to say that detect mistake immediately and miss to the situation of ROB 172 when being written into the unit, this is written into instruction indirectly but not is directly re-executed.This is written into instruction may be because the data that are written into not in microprocessor 100 and miss, need to obtain being written into data in the case from storer.Simultaneously, being written into instruction may be because to be written into data be on machine (storage formation), but does not pass on and therefore miss from older storage.Aforementioned phenomenon may be because following several situations take place: (1) is sent to when being written into the unit stream waterline and handling in being written into instruction, and microprocessor 100 does not have the address of storage to be come and be written into instruction to compare, so it can't compare the demand of passing on detection in the address; (2) microprocessor 100 detects address conflict, but it does not have the storage data that will pass on as yet; (3) microprocessor 100 passes on wrong data (error-detecting to a conflict or detect the effectively conflict failure of (valid)).
Above-mentioned preceding two reasons be to allow out of order transmission because be written into instruction, that is being written into instruction is to send before save command is sent out and produces address and data.Microprocessor 100 out of order transmissions this be written into instruction be because be written into instruction arrive be written into the unit before, it is written into the address and can not be calculated, therefore, register alias table (register alias table, RAT) 134 (referring to Fig. 1) and do not know to be written into the address and can produce a dependence.That is to say that RAT 134 is according to the register manipulation number but not produces dependence (dependency) according to the operand of storer.
Be written into that instruction sends that the scheduling invention is not contained about memory sub-system because the address relatively (that is virtual physics and/or non-whole virtual address are used) inaccurate and detect one receive incorrect data finish the situation that is written into instruction; Otherwise it contains the invalid situation of address/data that stores.This is because the foundation of enhanced dependence should be unable to be helpful to the situation of the inaccurate comparison in address.Yet, the purpose of passing on for storage, to contain this situation be helpful to comprise an embodiment who re-executes history.As described below, no matter when one be written into instruction must be because anyly pass on related causes and when being re-executed, pass on and re-execute historical formation (forwarding replay history queue, FRHQ) 194 (in Fig. 1) will be activated.It is noted that, the inaccuracy of address comparison can produce simultaneously the collision detection of (1) mistake (that is virtual index/hash (hash) meet with physics do not meet) and (2) conflict of missing (that is virtual index/hash do not meet with physics meet).
See also Fig. 1, demonstration one is according to the calcspar of a microprocessor 100 of the embodiment of the invention.
In an embodiment, the huge collection framework (macroarchitecture) of microprocessor 100 is the huge collection framework of an x86.If a microprocessor can correctly be carried out the major applications program that is designed for the x86 processor, this microprocessor namely is x86 structure processor.If the execution result that application program can obtain expecting can claim that then this application program is correctly carried out.Special, microprocessor 100 can be carried out the instruction of x86 instruction set, and comprises visible (user-visible) register group of x86 user.Yet storage forwarding mechanism described herein also can be applicable in the microprocessor of any other the existing and following framework.
Reservation station 136 comprises the formation of the interdependent information 158 that keeps these instructions and receive and RIOMS 198 from RAT 134.Reservation station 136 also comprises and can these instructions be sent to the transmission logic of performance element 138 from formation when instructions arm is performed.Performance element 138 can be by structure register 162, temporary register (not shown) by being renamed by structure register 162 among the ROB 172, perhaps directly receives the result 164 of the instruction of execution from performance element 138 by passing on path 176.Performance element 138 also provides its result 164 to ROB 172 to write to temporary register.
As described above, under certain situation, memory sub-system 182 necessarily requires one to be written into one of instruction and to re-execute, and it is to indicate by providing to the state signal 166 of ROB 172.State signal 166 is specified the ROB index of the instruction (for example being written into instruction) that must quilt be re-executed, and makes an indication of the state that ROB 172 can instruct according to this, and whether need re-execute, upgrade the project of its index if comprising.In an embodiment, state signal 166 also specifies its data should be transferred to the ROB index of the save command that is written into instruction.These ROB index of state signal 166 also are provided to store and pass on fallout predictor 196, and its activation stores passes on fallout predictor 196 and calculate a error (delta) between two ROB index, and related content will be specified in down.When its ROB project mark for the instruction that need be re-executed during for the next instruction that will be retired from office, that is, be the oldest not instruction retired, ROB172 will re-execute this instruction.That is to say, ROB 172 from ROB 172 send with charge free again this instruction with and relevant interdependent information 158 to reservation station 136, deliver to re-executing of performance element 138 and performance element 138 to wait for follow-up repeating transmission.In an embodiment, ROB 172 not only re-executes this instruction, also re-executes all instructions of the result who depends upon this instruction.Be written into when instruction when ROB 172 re-executes one, ROB 172 also by state signal 168 with this event notice to RAT 134.State signal 168 is to specify the ROB index that is written into instruction that is re-executed.
See also Fig. 2, show the detailed block diagram that is written into unit 185 and stores the streamline of formation 184 of the microprocessor 100 of Fig. 1.In the embodiment of Fig. 2, each streamline comprises 6 stages, is labeled as A to F.In the A stage, be written into unit 185 receptions and be written into instruction address operand 195 and RIOMS 199 (being the RIOMS 198 of Fig. 1).
In the B stage, an address generator 222 that is written into unit 185 produces from be written into instruction address operand 195 and is written into virtual address 224.Each project that stores formation 184 keeps the storage ROB index 202 of the save command that this project disposes.A plurality of ROB index comparer 204 comparisons are written into the RIOMS 199 and storage ROB index 202 of instruction, meet ROB index item designator (matching ROB index entry indicator) 206 to produce one, whether any in its indication storage ROB index 202 meets RIOMS 199, if then the project of which storage formation 184 meets.
In the C stage, be written into one in the unit 185 and translate query buffer (Translation Lookaside Buffer, TLB) 246 inquiries are written into virtual address 224 and output and are written into physical address 248 after translating.Each project that stores formation 184 also keeps the storage data 226 of its save command that disposes.A multiplexer 228 that stores formation 184 streamlines stores to receive storage data 226 formation 184 projects and select to meet ROB index item designator 206 indicated storage datas 226 from each and is written into unit 185 as transfer of data 265 to transfer to.
In the D stage, be written into physical address 248 cache marks array 263 and cache data array 262 to data caching 186 are provided, to obtain cached data 264.A multiplexer 266 that is written in the unit 185 receives cached data 264 and receive transfer of data 265 from stores formation 184, and selects a result 164 who imports as Fig. 1 wherein.Designator 206 is indicated if transfer of data 265 meets the ROB index item, and then multiplexer 226 is selected transfer of data 265, otherwise selects cached data 264.Each project that stores formation 184 has also kept the storage physical address 267 of its save command that disposes.A plurality of physical address comparer 268 comparisons are written into physical address 248 and meet designator 269 with each storage physical address 267 to produce a physical address, whether be used to refer to has any one to store physical address 267 to meet and be written into physical address 248, if and then which project of indication storage formation 184 meets.
In the E stage, store steering logic unit 286 in formation 184 streamlines and receive and meet ROB index item designator 206 and physical address meets designator 269 and accordingly for being written into the state signal 166 that instruction produces Fig. 1.Whether state signal 166 points out whether to be written into instruction is successfully finished, has missed or must be re-executed.
In the F stage, as a result 164 and state signal 166 provide to other unit of ROB 172 and microprocessor 100.
Referring to Fig. 3, show the detailed block diagram that is written into unit 185 and stores the streamline of formation 184 of an existing microprocessor.The streamline 185/184 of Fig. 3 is similar to the streamline 185/184 of Fig. 2, but following difference is arranged.In Fig. 3, store formation 184 streamlines and comprise virtual address comparer 304, but not the ROB index comparer 204 of Fig. 2.Virtual address comparer 304 comparison is written into virtual address 224 and meets designator 306 with the storage virtual address 302 (or its some) that each stores formation 184 projects to produce a virtual address, but not Fig. 2 meet ROB index item designator 206.Comparison diagram 2 can find with Fig. 3, and the embodiment of Fig. 2 is that comparison ROB index is judged and will be transferred to the storage data 226 (if having) that is written into instruction, and compared to the existing design of Fig. 3, its advantage is can avoid relying on the generation that is written into virtual address 224.
Referring to Fig. 4, demonstration one is according to forward address source formation (forwarding adderss source queue, FASQ) calcspar of a project 402 of 192 of Fig. 1 of the embodiment of the invention.FASQ project 402 keeps the information relevant with the save command of RAT 134 received (encounter).See also the explanation of following Fig. 5 and Fig. 6, RAT is 134 configurable, insert (populate) and use FASQ project 402.FASQ project 402 comprises a significance bit 404, and whether its indication FASQ project 402 is effective.Corresponding to (reset) action of resetting, all items 402 of microprocessor 100 initialization FASQ 192 is invalid, that is, remove the significance bit 404 of each FASQ project 402.FASQ project 402 also comprises a srcA field 406 and a srcB field 408, its respectively define storage subsystem 182 be used for calculating the first operand of storage address of save command and the source of second operand.SrcA field 406 and srcB field 408 specified and remained with the structure register 162 that operand or constant are used as operand.FASQ project 402 also comprises a displacement field (displacement field) 412, and it remains with memory sub-system 182 and is used for calculating the specified displacement of a save command that it stores the address.FASQ project 402 also comprises a displacement significance bit 414, and whether the value of its indication displacement field 412 is effective.FASQ project 402 also comprises an index field 416, and it remains with the ROB index of save command.
See also Fig. 5, show a RAT operational flowchart according to Fig. 1 of the present invention.Flow process starts from step 504.
In step 504, RAT 134 decodings, one instruction and generation corresponding interdependent information 158 as shown in Figure 1.Then carry out determining step 506.
In determining step 506, RAT 134 judges whether decoded instruction is a save command.If, execution in step 508; Otherwise then execution in step 512.
In step 508, RAT 134 is configured in the project 402 among the FASQ 192.That is to say that RAT 134 logically pushes (push) to the tail end of FASQ 192 with a project 402, it logically releases the project 402 on the top of FASQ 192.RAT 134 then inserts srcA field 406, srcB field 408 and the displacement field 412 of the project 402 of configuration to come from adequate information in the save command.When if save command defines a displacement, RAT 134 sets displacement significance bit 414; Otherwise RAT134 will remove displacement significance bit 414.RAT 134 also inserts index field 416 with the ROB index of save command.At last, RAT 134 sets significance bit 404.In an embodiment, save command is actually two indivedual micro-orders: one stores address (STA) micro-order and a storage data (STD) micro-order.The STA instruction is sent to a storage address location that calculates the memory sub-system 182 that stores the address.STD instruction is sent to a storage data unit of memory sub-system 182, its always source-register obtain storage data and storage data be distributed to one storing formation 184 projects, be used for the follow-up storer that writes to.In this embodiment, when RAT 134 sees the STA instruction, configuration project 402 is in FASQ 192 and insert srcA field 406, srcB field 408 and displacement field 412, and when RAT 134 saw the STD instruction, RAT 134 inserted index field 416 and sets significance bit 404 with the ROB index of STD micro-order.Flow process is then returned step 504 and is carried out.
In determining step 512, RAT 134 judges whether decoded instruction is one and is written into instruction.If carry out determining step 514; Otherwise then carry out determining step 518.
In determining step 514, RAT 134 comparison is written into the specified source, address of instruction and FASQ 192 projects 402 specified save command addresses and originates to judge whether that it meets in the project 402 any one.That is to say that RAT 134 comparison is written into the srcB field 408 of the srcA field 406 of the first source operand field of instruction and each project 402, second source operand field that comparison is written into instruction and each project 402 and compares is written into the displacement field of instruction and the displacement field 412 of each project 402.In an embodiment, RAT 134 also allows to be written into and specifies the identical source-register that comes, but is the order that exchanges.If meet aforementioned three fields of any project 402 among the FASQ 192, and if being written into instruction specifies a displacement and displacement significance bit 414 to be set or to be written into instruction not specify a displacement and displacement significance bit 414 to be eliminated, flow performing step 516 then; Otherwise flow process is returned step 504.
In step 516, RAT 134 dope be written into instruction should be for from the transfer of data in the older save command relevant with the FASQ that meets 192 projects 402, and the RIOMS 198 in the corresponding output map 1.That is to say that RAT 134 is output in the value of the ROB index field 416 of the FASQ project 402 that meets that step 514 determines.Flow process is returned step 504.In addition, the step 702 of the following Fig. 7 of flow process continuation execution is carried out and is written into instruction.
In determining step 518, RAT 134 judges that whether decoded instruction is an instruction revising a specified source of the srcA 406 of any project 402 of FASQ 192 or srcB 408 fields.If, execution in step 522; Otherwise then return step 504.
In step 522, RAT 134 removes each and specify the significance bit 414 of the FASQ project 402 of a register in srcA 406 or srcB 408 field, and wherein this field is revised by the instruction that determining step 518 determines.It is unlikely can overlap because be written into the address and store the address that RAT 134 removes significance bit 404; Therefore unlikely can pass on the storage data relevant with the specified save command of FASQ project 402 to being written into instruction.Flow process is returned step 504.
Referring to Fig. 6, show the operational flowchart according to the microprocessor 100 of Fig. 1.Flow process starts from step 602.
In step 602, instruction of ROB 172 resignations.Then carry out determining step 604.
In determining step 604, ROB 172 scanning FASQ 192 judge whether that the index field 412 of any project 402 just meets the index of the instruction of being retired from office by ROB 172.If, execution in step 606; Otherwise then return step 602.
In step 606, ROB 172 removes the significance bit 404 of the FASQ project 402 that meets.Can prevent that so RAT 134 from producing a RIOMS 198 at a save command of having been retired from office and giving one follow-uply to be written into instruction.Flow process is returned step 602.
Referring to Fig. 7, show the operational flowchart of the microprocessor 100 of Fig. 1, be written into instruction in order to according to the comparison of source, address data are transferred to one from a save command.Flow process starts from step 702.
In step 702, reservation station 136 send one be written into instruction 197 with and relevant RIOMS198 to being written into unit 185.Flow process continues from step 702 to step 704 and step 712.
In step 704, be written into unit 185 receptions and be written into instruction address operand 195.Flow process proceeds to step 706.
In step 706, be written into 222 calculating of element address generator and be written into virtual address 224.Flow process proceeds to step 708.
In step 708, TLB 246 receive be written into virtual address 224 and produce Fig. 2 be written into physical address 248.Flow process continues from step 708 to step 724 and 736.
In step 712, be written into unit 185 and transmit RIOMS 199 to storing formation 184.Flow process proceeds to step 714.
In step 714, the ROB index comparer 204 that stores formation 184 compares RIOMS 199 and meets ROB index item designator 206 with storage ROB index 202 with generation.Flow process proceeds to determining step 716.
In determining step 716, storage formation 184 is inspected the ROB index item designator 206 that meets that produces in step 714 and has been judged whether that any one meets RIOMS199 in the storage ROB index 202.If have at least one to meet, execution in step 718; Otherwise then execution in step 734.
In step 718, multiplexer 228 select than meet the indicated storage data 226 that is written into the older up-to-date save command of instruction of ROB index item designator 206 as transfer of data 265 to provide to multiplexer 266.Flow process proceeds to step 722.
In step 722, be written into unit 185 and be used in transfer of data 265 that step 718 passes on and carry out and be written into instruction 197.That is to say that multiplexer 266 has been chosen transfer of data 265.Flow process proceeds to step 724.
In step 724, physical address comparer 268 comparison is written into physical address 248 and stores physical address 267 and produce physical address and meet designator 269.Flow process proceeds to determining step 726.
In determining step 726, steering logic unit 286 is inspected the physical address that produces in step 724 and is met designator 269 and judge and be written into physical address 248 whether to accord with its storage data 226 be to transfer to the storage physical address 267 of the save command that is written into instruction 197 in step 718, and judges whether that this save command stores physical address 267 and meets the up-to-date save command that is written into physical address 248 for satisfying.If correct transfer of data 265 is transferred to be written into instruction 197 and to be used by being written into instruction 197, and flow process is followed execution in step 728; Otherwise incorrect data are transferred to be written into instruction 197 and to be used by being written into instruction 197, and flow process is followed execution in step 732.
In step 728, be written into other unit and the finishing carry out be written into instruction 197 in state signal 166 indications one success of unit 185 by result 164 to ROB 172 and microprocessor 100 are provided.At last, when being written into instruction 197 when becoming instruction the oldest in the microprocessor 100, ROB 172 is written into instruction 197 with resignation.Flow process ends at step 728.
In step 732, steering logic unit 286 produces a state signal 166 and points out to be written into instruction 197 and must be re-executed, and is written into unit 185 and internally re-executes and be written into instruction 197, has used incorrect data because be written into instruction 197.In addition, ROB 172 re-executes and depends upon all instructions that are written into instruction, because these instructions may receive incorrect data from the previous result who is written into instruction.Flow process ends at step 732.
In step 734, be written into unit 185 and utilize cached data 264 execution to be written into instruction 197, that is, do not utilize the storage data that passes on, because show without any meeting in the ROB of determining step 716 index comparative result.Flow process proceeds to step 736.
In step 736, physical address comparer 268 comparison is written into physical address 248 and stores physical address 267 and produce physical address and meet designator 269.Flow process proceeds to determining step 738.
In determining step 738, steering logic unit 286 is inspected the physical address that produces in step 724 and is met designator 269 and judge whether be written into physical address 248 meets any one and store physical address 267.If a storage of missing takes place and passes in expression.That is to say, be written into the invalid data that instruction 197 is used from data caching 186, but not the storage data 226 that should be passed on by a save command from store formation 184, and flow performing is to step 732.Yet when if a storage of missing not taking place passing on, flow performing is to step 728.
Referring to Fig. 8, show the calcspar that re-executes a project 802 of historical formation (FRHQ 194) according to passing on of Fig. 1.FRHQ project 802 keeps with one passes on related causes and of re-executing is written into the relevant information of instructing in order to store.Referring to the description of following Fig. 9 and Figure 10 and earlier figures 7, RAT is 134 configurable, insert and use FRHQ project 802, and it comprises one and is used to refer to whether project 802 is effective significance bit 804.Corresponding to the action of resetting, microprocessor 100 is invalid with all items 802 of initialization FRHQ 194, that is, remove the significance bit 804 of each FRHQ project 802.In addition, in an embodiment, the significance bit 804 of each FRHQ project 802 is to be eliminated when being written in the procedure code section ultimate value in x86 procedure code section describer (code segment descriptor) (code segment limit value) at every turn.FRHQ project 802 also comprises an instruction pointer (IP) field 806, and its storage is written into the storage address at instruction place.In an embodiment, IP 806 is the storage address of the next instruction after being written into instruction, but not is written into the address of instruction itself.FRHQ project 802 also comprises a ROB index error field 808, and it can store the ROB index that is written into instruction and come from difference between the ROB index that storage data should be transferred to the save command that is written into instruction, as discussed below.
Referring to Fig. 9, show the operational flowchart according to the microprocessor 100 of Fig. 1, in order to dispose and to insert the project 802 of the FRHQ 194 shown in Fig. 8.Flow process starts from step 902.
In step 902, memory sub-system 182 detects one and is written into instruction and passes on relevant reason and re-executed because store.Storage is passed on the example of relevant reason and is comprised following severally, but is not limited thereto.First point, the storage physical address that stores an older save command in the formation 184 when unit 185 processing are written into instruction is still not yet in effect when being written into.That is to say that RIOMS 198 meets an older storage, but physical address meets designator 269 for invalid, still not yet in effect because storage formation 184 detects the storage physical address 267 that meets.Under this situation, when save command is prepared by resignation, its can judge the storage physical address 267 of save command meet be written into physical address 248 and therefore its storage data 226 should be transferred to and be written into instruction.Therefore, ROB 172 makes and is written into instruction and anyly depends upon the instruction that is written into instruction and re-executed, and notice RAT 134, so RAT134 can upgrade FRHQ 194.Second point is written into when instruction when being written into cell processing, and the storage data of an older save command is still not yet in effect.That is to say that RIOMS 198 meets an older storage, but the data of the storage that meets are not as yet for effective.Thirdly, RIOMS 198 meets a storage that stores in the formation; Yet, physical address meet that designator 269 is not pointed out to be written into and the storage identified by RIOMS 198 between be consistent, the wrong transfer of data 265 of its expression is passed on.The 4th point, RIOMS 198 meet one and store the storage in the formation and be written into physical address 248 and also be consistent with storage physical address 267; Yet physical address meets designator 269 and points out by the storage that RIOMS198 identifies not to be that (for example: the old storage that meets in other physics of the storage that meets), the wrong transfer of data 265 of its expression is passed in the correct storage that will be passed on.The 5th point, RIOMS 198 do not meet any storage ROB index 202 that stores in the formation 184; Yet physical address meets designator 269 one of generation and meets storage, and the data that its expression is extracted from data caching 124 are misdata.The 6th point, RIOMS 198 meet an older storage and its physical address is also confirmed to have to meet; Yet the memory characteristics of relational storage address (trait) does not allow to store passes on (for example: not between cacheable area).Flow process is followed execution in step 904.
In step 904, the ROB index that is written into instruction that memory sub-system 182 re-executes in state signal 166 output with and storage data should be transferred to the ROB index of the save command that is written into instruction.ROB 192 utilizes state signal 166 to upgrade to be written into instruction ROB 192 project status, and to re-execute action be under the situation performed by ROB 172 to point out that it need be re-executed in aforementioned, with respect to by being written into re-executing of a performed inside of unit 185.Flow process is followed execution in step 906.
In step 906, RAT 134 spies on state signal 168 and corresponding difference or the error that is written between instruction ROB index and the save command ROB index of calculating that (snoop) memory sub-system 182 produces in step 904.When the error of calculation, RAT 134 lists consideration with the ring-type formation characteristic of ROB 192 in around effect (wrap around effect).Flow process is followed execution in step 908.
In step 908, corresponding to the state signal 166 that produces in step 906, RAT 134 disposes a project 802 in FRHQ 194.That is to say that RAT 134 logically pushes (push) to the tail end of FRHQ 194 with a project 802, it logically releases the project 802 on the top of FRHQ 194.RAT 134 follows with the value of the instruction pointer that is written into instruction and inserts IP field 806.RAT 134 is also to insert ROB index error field 808 in the difference that step 906 was calculated.At last, RAT134 sets significance bit 804.Flow process ends at step 908.
Referring to Figure 10, show the operational flowchart of the microprocessor 100 of Fig. 1, in order to use the project 802 among the FRHQ 194.Flow process starts from step 1002.
In step 1002, RAT 134 receives one and is written into instruction and for being written into the interdependent information that instruction produces its standard.In addition, RAT 134 comparison is written into the IP field 806 in each effective item 802 of the value of instruction pointer of instruction and FRHQ 194.Flow process proceeds to determining step 1004.
In determining step 1004, RAT 134 judges in the performed comparison of step 1002 whether meet with any FRHQ project 802.If not, flow process finishes; Otherwise flow process proceeds to step 1006.It is noted that the RAT 134 received examples that are written into instruction are different with the example of the instruction pointer that is written into instruction stored in step 908 in step 1002/1004/1006.Therefore, when one was written into instruction because relevant reason is passed in storage and re-executes, RAT 134 can't insert FRHQ project 802 with the real ROB index of save command.Otherwise, useful, be written into when instruction when re-executing one, (in the step 908 of Fig. 9) RAT 134 inserts FRHQ project 802 with the difference between the ROB index that is written into instruction and save command in first kind of example, make in being written into second kind of instruction and follow-up example, RAT 134 can be passing on a demand of storage data from ROB index error field 808 prediction of the previous decision that is written into the instruction example at present from instruction (it is predicted to be dismissible a save command), shown in following steps 1006.The inventor judged one be written into instruction with and the storage data save command that should be passed between ROB index error have the possibility of height will be identical with the example that re-executes after the example.
In step 1006, RAT 134 predictions should be passed on storage data to being written into instruction from an older save command, wherein the ROB index of this older save command can be calculated by the value of the ROB index error field 808 relevant with the FRHQ project that meets 802, and RAT 134 corresponding calculating are written into the value that instruction ROB index deducts the ROB index error field 808 of the FRHQ project 802 that meets that is determined by step 1004, and the difference of gained is as RIOMS 198.Useful, RIOMS 198 activation memory sub-systems 182 store under the situation that need not wait for the generation that is written into virtual address 224 and pass on and (for example: 7 ROB index) compare the bit quantity less with respect to virtual address space.The step 702 that flow process continues from step 1006 to Fig. 7 is written into instruction with execution.
According to one of them embodiment of the present invention, the figure place that the IP field 806 of FRHQ 194 stores is less than all instruction pointer address bits; Therefore, meet if in step 1004, find one, also can't guarantee to be written into instruction and be with step 902 in detected re-execute to be written into instruction identical.It is noted that also can't guarantee has a save command in ROB 192 at the index that calculates herein, even perhaps have, its storage data should be transferred to is written into instruction.In other words, RAT 134 is producing a prediction.
Referring to Figure 11, show the operational flowchart according to the microprocessor 100 of Fig. 1.Flow process starts from step 1102.
In step 1102, instruction of ROB 172 resignations.Flow process proceeds to determining step 1104.
In determining step 1104, ROB 172 scanning FRHQ 194 just meet the IP of the instruction of being retired from office by ROB 172 with any one IP field 806 that judges whether its project 802.If, execution in step 1106; Otherwise then returning step 1102 carries out.
In step 1106, ROB 172 removes the significance bit 804 that meets FRHQ project 802.So can avoid RAT 134 to produce the follow-up RIOMS 198 of instruction that is written on a save command of having been retired from office.Flow process is back to step 1102 and carries out.
The embodiment of Fig. 1, Fig. 2 and Fig. 4-7 describes as the aforementioned, and wherein, microprocessor 100 uses the source comparison of an address to predict the state that passes on that stores as the framework on basis.In addition, the embodiment of Fig. 1, Fig. 2 and Fig. 7-11 describes as the aforementioned, and wherein, microprocessor 100 uses one to re-execute and historical predict the state that passes on that stores as the framework on basis.What must remind is, aforementioned two kinds of basic frameworks can use separately or common combination is used or pass on other and to store framework and use.For instance, each framework can be used by itself.In addition, these two kinds of frameworks can be used together.In a this embodiment, when the both produces one when meeting, can consider that various embodiment selects which of RIOMS 198 of two fallout predictors to use.In an embodiment, the comparison of source, address is preferable for the fallout predictor on basis.Another embodiment considers a selector switch according to one or more factor, and for example, forecasting accuracy history or other non-history are selected one of them fallout predictor for the factor (for example being written into/storage characteristics, being written into/storing queue depth etc.) on basis.Moreover, replace and intactly to replace the virtual address comparison and be the framework on basis, re-executing historical fallout predictor for the basis can use jointly with the framework of a virtual address comparison for the basis, may increase its accuracy.So ask part beneficial especially cycle length at the microprocessor clock pulse.For instance, do not meet or produce one when being different from the meeting of the storage that re-executes historical comparison for the basis if virtual address produce to have for the comparison on basis, it is preferable to re-execute historical fallout predictor for the basis.
Though RAT is that pending storage has kept source, address/re-execute historical information in FASQ/FRHQ among the aforesaid embodiment, and carry out to store prediction and provide the up-to-date ROB index that meets storage to continue to prolong streamline and be sent to and be written into the unit together with being written into instruction is provided, can consider to store formation among other embodiment and be pending storage and keep the address and originate/re-execute historical information in FASQ/FRHQ, and be written into the unit and provide in address source-information/IP leaves FASQ/FRHQ to standing for a long while the storage formation.This embodiment does not comprise to be written into to send dispatches the processor of inventing, and such method may be preferable.
As described above, address source comparison is for the basis and re-execute historical storage for the basis and pass on the advantage of framework and be written into virtual address calculating for it may remove to store to pass in the critical path of judging, and may use still less and/or littler comparer, it can make some designs meet less time restriction and can save wafer real resource and the loss of power.In addition, framework of the present invention can store for the framework on basis can detect more accurately than virtual address comparison and pass on the store collision of purpose.
Though the present invention discloses as above with preferred embodiment; so it is not in order to limit the present invention; those skilled in the art can do some changes and retouching under the premise without departing from the spirit and scope of the present invention, so protection scope of the present invention is as the criterion with claim of the present invention.For example, but the software activation, for example, function, manufacturing, modelling, simulation, description and/or test device of the present invention and method.Above-mentioned can be by using general procedure language (for example: C, C++), hardware description language (HDL) comprise that Verilog HDL, VHDL etc. realize.This type of software can be placed in any known computer-readable media with the kenel of procedure code, for example a tape, semiconductor, magnetic sheet, floppy disk, hard disk or CD sheet (for example: CD-ROM, DVD-ROM etc.), a network, wired line, wireless or other communication mediums.Wherein, when procedure code by machine, when being written into and carrying out as computing machine, this machine can become in order to implement device of the present invention.Device of the present invention and method can be contained in semiconductor intellecture property core, a microcontroller core (being embedded in HDL) for example, and convert the hardware product of integrated circuit to.In addition, the described device of the embodiment of the invention and method can comprise the combined physical embodiment with hardware and software.Therefore protection scope of the present invention is to be as the criterion with claim of the present invention.Special, the present invention can be implemented in the micro processor, apparatus, and is used in the general purposes processor.At last, those skilled in the art can do some changes and retouch to reach identical purpose of the present invention under the premise without departing from the spirit and scope of the present invention based on disclosed concept and specific embodiment.
Claims (23)
1. microprocessor comprises:
One formation, comprise a plurality of projects, each above-mentioned project is in order to keep the storage information of a save command, wherein above-mentioned storage information is specified the source in order to a plurality of operands that calculate a storage address, and wherein above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address; And
One steering logic unit, be coupled to above-mentioned formation, be written into instruction in order to receive one, above-mentionedly be written into instruction and comprise that appointment is in order to calculate the information that is written in source that is written into a plurality of operands of address, wherein above-mentioned steering logic unit is in order to detecting the above-mentioned above-mentioned storage information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry, and predicts correspondingly that above-mentioned microprocessor should pass on and be written into instruction by storing the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information of information conforms to above-mentioned.
2. microprocessor as claimed in claim 1, wherein above-mentioned steering logic unit in order to predict above-mentioned microprocessor in above-mentioned microprocessor calculate above-mentioned be written into the address before, the above-mentioned storage data that passes on is written into instruction to above-mentioned.
3. microprocessor as claimed in claim 1, wherein above-mentioned formation is in order to keep each of a plurality of above-mentioned save commands, wherein if above-mentioned steering logic unit detects the above-mentioned above-mentioned storage information that is written in the wherein effective item that information conforms is retained in above-mentioned formation, above-mentioned steering logic unit is predicted that above-mentioned microprocessor should pass on and is written into instruction by storing the specified above-mentioned storage data of the above-mentioned up-to-date above-mentioned save command that is written into information of information conforms to above-mentioned.
4. microprocessor as claimed in claim 1, wherein the above-mentioned project of each above-mentioned formation is in order to keep a resequencing buffer index of above-mentioned save command, wherein above-mentioned steering logic unit is in order to store the above-mentioned above-mentioned resequencing buffer index that is written into the above-mentioned save command of information of information conforms by output, predicts above-mentioned microprocessor and should pass on and be written into instruction by storing the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information of information conforms to above-mentioned.
5. microprocessor as claimed in claim 4 also comprises:
One is written into the unit, is written into instruction in order to carry out this; And
One stores formation, be coupled to this and be written into the unit, wait for the storage data that is written into storer with each reservation of thinking a plurality of save commands, wherein whether this storage formation meets any productive rearrangement order buffer index of this save command of this storage formation in order to this resequencing buffer index of judging this save command that above-mentioned steering logic unit is predicted, and be written into the unit in order to this up-to-date one storage data that passes in this save command to this, wherein the productive rearrangement order buffer index of this up-to-date save command meets the resequencing buffer index of this prediction.
6. microprocessor as claimed in claim 5, wherein whether above-mentioned storage formation meets any productive rearrangement order buffer index of this save command of this storage formation in order to this resequencing buffer index of judging this save command that above-mentioned steering logic unit is predicted, and it generally is written into unit by using with this and calculates this by this this operand that is written into the specified source of information and be written into the address and take place simultaneously.
7. microprocessor as claimed in claim 1, above-mentioned storage information and above-mentionedly be written at least one identifier that information comprises a register of this microprocessor wherein, this identifier keeps comes source operand in order to calculate one of this storage address.
8. microprocessor as claimed in claim 7, wherein above-mentioned storage information and the above-mentioned information that is written into also comprise in order to calculate a displacement of this storage address.
9. microprocessor as claimed in claim 1, wherein above-mentioned steering logic unit is in order to receive instruction according to procedure order, each save command that above-mentioned steering logic unit is received wherein, above-mentioned steering logic unit disposes one of them of this project in this formation for this save command, and inserts this storage information of the project of this configuration.
10. microprocessor as claimed in claim 9, wherein above-mentioned steering logic unit in order to the project mark that after inserting this storage information, will dispose for effectively.
11. microprocessor as claimed in claim 10, wherein corresponding to receiving an instruction, above-mentioned steering logic unit is invalid in order to each of one or more these queued entries is labeled as, wherein the one or more sources by the specified operand of one or more these queued entries of this modifying of order.
12. microprocessor as claimed in claim 1 is wherein retired from office corresponding to one of them of the above-mentioned a plurality of save commands in the above-mentioned formation, it is invalid that above-mentioned steering logic unit is labeled as in order to the above-mentioned queued entry with above-mentioned save command of being retired from office.
13. a storage method, in order to store transfer of data in a microprocessor, this method comprises the following steps:
Receive a crossfire of instruction and be one of them of each save command in this crossfire that is received a plurality of projects of disposing a formation with procedure order, and the storage information of inserting is to the project of this configuration, wherein above-mentioned storage information is specified the source in order to a plurality of operands that calculate a storage address, and wherein above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address;
Receive one in this crossfire and be written into instruction, above-mentionedly be written into instruction and comprise that appointment is written into the information that is written in source of a plurality of operands of address in order to calculate one, and detect the above-mentioned above-mentioned storage information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry; And
Corresponding to above-mentioned detection step, predict that above-mentioned microprocessor passes on to be written into instruction by storing the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information of information conforms to above-mentioned.
14. storage method as claimed in claim 13 also comprises:
Utilize this to be written into this operand in the specified source of information, calculate this and be written into the address;
Wherein the above-mentioned microprocessor of this prediction should pass on above-mentioned storage data to be written into instruction be to be executed in that this calculating is above-mentioned to be written into before the address to above-mentioned.
15. storage method as claimed in claim 13 also comprises:
Keep each of above-mentioned save command;
Wherein above-mentioned detection step comprises the above-mentioned above-mentioned storage information that is written in the wherein effective item that information conforms is retained in above-mentioned formation that detects;
Wherein above-mentioned prediction steps comprises that the above-mentioned microprocessor of prediction should pass on and is written into instruction by storing the specified above-mentioned storage data of the above-mentioned up-to-date above-mentioned save command that is written into information of information conforms to above-mentioned.
16. storage method as claimed in claim 13, wherein above-mentioned this storage information to the step of the project of this configuration of inserting comprises that a resequencing buffer of inserting above-mentioned save command is indexed to the project of this configuration, and wherein above-mentioned prediction steps comprises that output stores the above-mentioned above-mentioned resequencing buffer index that is written into the above-mentioned save command of information of information conforms.
17. storage method as claimed in claim 16 also comprises:
For the storage data that is written into storer is waited in each reservation of a plurality of save commands;
Whether this resequencing buffer index of judging the above-mentioned save command that predicts meets any productive rearrangement order buffer index of this save command; And
This up-to-date one storage data that passes in this save command is written into instruction to this, and wherein this up-to-date one productive rearrangement order buffer index meets the resequencing buffer index of this prediction.
18. storage method as claimed in claim 17 also comprises:
Utilization is calculated this by this this operand that is written into the specified source of information and is written into the address, and it is generally to take place simultaneously with this any productive rearrangement order buffer index of judging whether this resequencing buffer index of the save command of above-mentioned prediction meets this save command of this storage formation.
19. storage method as claimed in claim 13, above-mentioned storage information and above-mentionedly be written at least one identifier that information comprises a register of this microprocessor wherein, this identifier keeps comes source operand in order to calculate one of this storage address.
20. storage method as claimed in claim 19, wherein above-mentioned storage information and the above-mentioned information that is written into also comprise in order to calculate a displacement of this storage address.
21. storage method as claimed in claim 20 also comprises:
Inserting this storage information to the project of this configuration, is effective with the project mark of this configuration.
22. storage method as claimed in claim 21 also comprises:
In this crossfire, receive an instruction, the one or more sources by the specified operand of one or more these queued entries of this modifying of order; And
Corresponding to this instruction that receives in this crossfire, it is invalid that each of one or more these queued entries is labeled as.
23. storage method as claimed in claim 13 also comprises:
Retire from office in above-mentioned a plurality of save commands of above-mentioned formation one of them; And
Corresponding in above-mentioned a plurality of save commands of the above-mentioned formation of above-mentioned resignation one of them, it is invalid that the above-mentioned queued entry of above-mentioned save command of being retired from office is labeled as.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US23325909P | 2009-08-12 | 2009-08-12 | |
US61/233,259 | 2009-08-12 | ||
US12/781,274 | 2010-05-17 | ||
US12/781,274 US8533438B2 (en) | 2009-08-12 | 2010-05-17 | Store-to-load forwarding based on load/store address computation source information comparisons |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101901132A CN101901132A (en) | 2010-12-01 |
CN101901132B true CN101901132B (en) | 2013-08-21 |
Family
ID=43226689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010247338 Active CN101901132B (en) | 2009-08-12 | 2010-08-05 | Microprocessor and correlation storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101901132B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101996351B1 (en) | 2012-06-15 | 2019-07-05 | 인텔 코포레이션 | A virtual load store queue having a dynamic dispatch window with a unified structure |
EP2862068B1 (en) | 2012-06-15 | 2022-07-06 | Intel Corporation | Reordered speculative instruction sequences with a disambiguation-free out of order load store queue |
CN104583943B (en) | 2012-06-15 | 2018-06-08 | 英特尔公司 | Possess the virtual load storage queue of the dynamic assignment window with distributed frame |
KR101826399B1 (en) | 2012-06-15 | 2018-02-06 | 인텔 코포레이션 | An instruction definition to implement load store reordering and optimization |
KR101996462B1 (en) | 2012-06-15 | 2019-07-04 | 인텔 코포레이션 | A disambiguation-free out of order load store queue |
KR101667167B1 (en) | 2012-06-15 | 2016-10-17 | 소프트 머신즈, 인크. | A method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
CN104375895B (en) * | 2013-08-13 | 2018-02-06 | 华为技术有限公司 | For the data storage dispatching method and device between multiple memorizers |
US20160077836A1 (en) * | 2014-09-12 | 2016-03-17 | Qualcomm Incorporated | Predicting literal load values using a literal load prediction table, and related circuits, methods, and computer-readable media |
CN107204940B (en) * | 2016-03-18 | 2020-12-08 | 华为技术有限公司 | Chip and transmission scheduling method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5809275A (en) * | 1996-03-01 | 1998-09-15 | Hewlett-Packard Company | Store-to-load hazard resolution system and method for a processor that executes instructions out of order |
US7321964B2 (en) * | 2003-07-08 | 2008-01-22 | Advanced Micro Devices, Inc. | Store-to-load forwarding buffer using indexed lookup |
US7219185B2 (en) * | 2004-04-22 | 2007-05-15 | International Business Machines Corporation | Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache |
US7752393B2 (en) * | 2006-11-16 | 2010-07-06 | International Business Machines Corporation | Design structure for forwarding store data to loads in a pipelined processor |
-
2010
- 2010-08-05 CN CN 201010247338 patent/CN101901132B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN101901132A (en) | 2010-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101901132B (en) | Microprocessor and correlation storage method | |
TWI428825B (en) | Microprocessors and storing methods using the same | |
CN102087591B (en) | Non sequential execution microprocessor and an operating method thereof | |
CN101853150B (en) | Out-of-order execution microprocessor and operating method therefor | |
US20210311737A1 (en) | Store-to-load forwarding | |
CN101694613B (en) | Unaligned memory access prediction | |
TWI552069B (en) | Load-store dependency predictor, processor and method for processing operations in load-store dependency predictor | |
US8255670B2 (en) | Replay reduction for power saving | |
EP2674858B1 (en) | Loop buffer learning | |
US20090172360A1 (en) | Information processing apparatus equipped with branch prediction miss recovery mechanism | |
KR101496009B1 (en) | Loop buffer packing | |
CN101847094A (en) | Non-microprocessor and the method for operating of carrying out in proper order thereof | |
CN101449237A (en) | A fast and inexpensive store-load conflict scheduling and forwarding mechanism | |
US20200210191A1 (en) | Exit history based branch prediction | |
EP3321811B1 (en) | Processor with instruction cache that performs zero clock retires | |
US9740557B2 (en) | Pipelined ECC-protected memory access | |
US20130138931A1 (en) | Maintaining the integrity of an execution return address stack | |
KR20230093442A (en) | Prediction of load-based control independent (CI) register data independent (DI) (CIRDI) instructions as control independent (CI) memory data dependent (DD) (CIMDD) instructions for replay upon recovery from speculative prediction failures in the processor | |
CN111078295B (en) | Mixed branch prediction device and method for out-of-order high-performance core | |
CN116302106A (en) | Apparatus, method, and system for facilitating improved bandwidth of branch prediction units | |
CN102163139A (en) | Microprocessor fusing loading arithmetic/logic operation and skip macroinstructions | |
US20220261350A1 (en) | Promoting Prefetched Data from a Cache Memory to Registers in a Processor | |
CN101840330B (en) | Microprocessor and information storing method thereof | |
EP3321810B1 (en) | Processor with instruction cache that performs zero clock retires | |
CN111984323B (en) | Processing device for allocating micro-operations to micro-operation cache and method of operating the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |