CN101901132A - Microprocessor and correlation storage method - Google Patents

Microprocessor and correlation storage method Download PDF

Info

Publication number
CN101901132A
CN101901132A CN2010102473389A CN201010247338A CN101901132A CN 101901132 A CN101901132 A CN 101901132A CN 2010102473389 A CN2010102473389 A CN 2010102473389A CN 201010247338 A CN201010247338 A CN 201010247338A CN 101901132 A CN101901132 A CN 101901132A
Authority
CN
China
Prior art keywords
mentioned
written
instruction
microprocessor
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102473389A
Other languages
Chinese (zh)
Other versions
CN101901132B (en
Inventor
罗德尼·E·虎克
柯林·艾迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/781,274 external-priority patent/US8533438B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN101901132A publication Critical patent/CN101901132A/en
Application granted granted Critical
Publication of CN101901132B publication Critical patent/CN101901132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

The invention provides a kind of microprocessor.Microprocessor comprises a formation, comprises a plurality of projects in order to the store information of retain stored instruction.Store information is specified in order to calculate the source of a plurality of operands that store the address.This save command is specified the storage data of desiring to be stored to this defined memory location in storage address.Microprocessor also comprises the steering logic unit, couples this formation, is written into instruction in order to reception.This is written into instruction and comprises that appointment is written into the information that is written in source of a plurality of operands of address in order to calculating.This steering logic unit is written into this store information in the wherein effective item that information conforms is retained in this queued entry in order to detect this, and predicts correspondingly that microprocessor should pass on and meet this specified storage data of this this save command that is written into information by store information and be written into instruction to this.

Description

Microprocessor and correlation storage method
Technical field
The present invention relates to field of microprocessors, particularly relate to being stored to of microprocessor and be written into (store-to-load forwarding) mechanism of passing on.
Background technology
The frequent use of program stores and is written into instruction.A save command is moved data to storer from a register of microprocessor, and one is written into instruction data are moved a register to microprocessor from storer.The microprocessor crossfire that executes instruction frequently, in the instruction crossfire one or more save command be in one be written into instruction before, wherein be written into the loaded data of instruction and be with the stored data of one or more previous save command on identical memory location.In these examples, for correct executive routine, microprocessor must guarantee to be written into command reception by the up-to-date storage data that previous save command produced.A kind of method of correct execution program is to suspend (stall) with being written into instruction, till save command writes to data storer (that is: system storage or cache memory), and then is written into instruction reading of data from storer.Yet this is not a very dynamical method.Therefore, Xian Dai microprocessor transmits storage data to the functional element (for example: is written into the unit) that is written into the instruction place from the functional element (for example: stores formation) at save command place.This is commonly referred to a storage and passes on (store forwarding) operation or store to pass on or to be stored to be written into and pass on.
Whether need to pass on the data to that store to be written into instruction in order to detect, microprocessor is written into the storing memory address of storage address and older save command to check whether both are consistent with comparison.In order to be absolutely correct, microprocessor need be compared physical address (physical address) that is written into and the physical address that stores.Yet, will be written into virtual address (virtual address) conversion (translate) and become to be written into the suitable spended time of physical address.Therefore, for fear of the comparison of deferred address, the comparison of the microprocessor in modern times is written into virtual address and older storage virtual address, will be written into virtual address translation simultaneously and become to be written into physical address, and store transfer of data according to the comparison of virtual address.Whether the storage that the comparison that microprocessor is then carried out physical address is carried out according to the comparison of virtual address with check passes on correct, or judges that this passes on for incorrect and re-execute (replay) this is written into and rights the wrong.
In addition, because the comparison of virtual address completely is (and causing power supply and the loss of wafer real resource) very consuming time, and may influence the maximum clock pulse frequency of microprocessor operation, the virtual address that modern microprocessor tendency is only relatively more a part of, but not whole virtual addresses.This will cause store collision (collision) to detect wrong and incorrect increase of passing on.Yet, still need more to detect the correct method of store collision to realize storing the purpose of passing on.
In addition, utilize and to carry out storage based on the framework of virtual address comparison and pass on the required time and may be hidden by virtual-to-physical address switching time (that is: TLB tables look-up the time) and cache marks and the data array time of tabling look-up.Yet it no longer is true supposing aforementioned, needs a replacement scheme to detect store collision to realize storing the purpose of passing on.
At last, the store collision detection framework based on the virtual address comparison needs an a large amount of relatively address comparator, microprocessor wafer (die) space and power supply that its waste is a large amount of relatively.Therefore, need effective ways that reduce the wafer physical resource and the loss of power to realize storing the purpose of passing on to detect store collision.
Summary of the invention
The embodiment of the invention provides a kind of microprocessor.This microprocessor comprises a formation, comprises a plurality of projects, and each above-mentioned project is in order to keep the store information of a save command.Above-mentioned store information is specified the source in order to a plurality of operands that calculate a storage address.Above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address.Microprocessor also comprises the steering logic unit, is coupled to above-mentioned formation, is written into instruction in order to receive one.Above-mentionedly be written into instruction and comprise that appointment is in order to calculate the information that is written in source that is written into a plurality of operands (operand) of address.Above-mentioned steering logic unit is in order to detecting the above-mentioned above-mentioned store information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry, and predicts correspondingly that above-mentioned microprocessor should pass on and meet the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
Another embodiment of the present invention provides a kind of storage method, in order to store transfer of data in a microprocessor.Method comprises a crossfire that receives instruction with procedure order.Dispose one of them of a plurality of projects of a formation for each save command in this crossfire that is received, and insert the project of a store information to this configuration.Above-mentioned store information is specified the source in order to a plurality of operands that calculate a storage address.Above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address.Storage method also comprises one in this crossfire of reception and is written into instruction.Above-mentionedly be written into instruction and comprise that appointment is in order to calculate the information that is written in source that is written into a plurality of operands of address.Storage method also comprises and detects the above-mentioned above-mentioned store information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry, and predicts correspondingly that above-mentioned microprocessor should pass on and meet the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
For above-mentioned and other purposes of the present invention, feature and advantage can be become apparent, cited below particularlyly go out preferred embodiment, and be described with reference to the accompanying drawings as follows.
Description of drawings
Fig. 1 shows a calcspar according to a microprocessor of the embodiment of the invention.
The detailed block diagram that is written into the unit and stores the streamline of formation of the microprocessor of Fig. 2 displayed map 1.
Fig. 3 shows the detailed block diagram that is written into the unit and stores the streamline (pipeline) of formation of an existing microprocessor.
Fig. 4 shows a calcspar according to a project of the forward address source formation (FASQ) of Fig. 1 of the embodiment of the invention.
Fig. 5 shows a RAT operational flowchart according to Fig. 1.
Fig. 6 shows the operational flowchart according to the microprocessor of Fig. 1.
The operational flowchart of the microprocessor of Fig. 7 displayed map 1 is written into instruction in order to according to the comparison of source, address data are transferred to one from a save command.
Fig. 8 shows the calcspar that re-executes a project of historical formation (FRHQ) according to passing on of Fig. 1.
Fig. 9 shows the operational flowchart according to the microprocessor of Fig. 1, in order to dispose and to insert the project of the FRHQ shown in Fig. 8.
The operational flowchart of the microprocessor of Figure 10 displayed map 1 is in order to use the project among the FRHQ.
Figure 11 shows the operational flowchart according to the microprocessor of Fig. 1.
The reference numeral explanation
100~microprocessor; 106~instruction cache; 108~instruction decoder;
134~register alias table; 136~reservation station; 138~performance element;
158~interdependent information;
162~structure register; 164~result; 166,168~state signal;
172~resequencing buffer; 176~pass on the path;
182~memory sub-system; 183~storage element; 184~storage formation; 185~be written into the unit; 186~data caching; 188~dependence generator;
The source formation of 192~forward address; 194~pass on and re-execute historical formation (FRHQ); 195~be written into the instruction address operand; 196~store and to pass on fallout predictor; 197~be written into instruction; 198~meet the ROB index of storage; 199~meet the ROB index of storage;
202~storage ROB index; 204~ROB index comparer; 206~meet ROB index item designator;
226~storage data; 228~multiplexer; 222~address generator; 224~be written into virtual address;
246~translate query buffer; 248~be written into physical address;
262~cache data array; 263~cache marks array; 264~cached data; 265~transfer of data; 266~multiplexer; 267~storage physical address; 268~physical address comparer; 269~physical address meets designator;
286~steering logic unit;
302~storage virtual address; 304~virtual address comparer; 306~virtual address meets designator;
402~project; 404~significance bit; 406~srcA field; 408~srcB field; 412~displacement field; 414~displacement significance bit; 416~index field;
504,506,508,512,514,516,518,522~execution in step;
602,604,606~execution in step;
702,704,706,708,712,714,716,718,722,724,726,728,732,734,736,738~execution in step;
802~project; 804~significance bit; 806~instruction pointer; 808~ROB index error field;
902,904,906,908~execution in step;
1002,1004,1006~execution in step;
1102,1104,1106~execution in step.
Embodiment
The embodiment of following description provides two kinds of basic solutions, and each can solve one or more aforesaid prior art problems.
First kind of solution is that comparison can be calculated the information in the source that is written into and stores the address in order to definition, but not compares these addresses itself.The benefit of this method has been to remove to store passes on the virtual address of the critical path judged and calculates and may use less and/or less comparer, and it can save the wafer physical resource (die real estate) and the loss of power.
Second workaround is to keep (maintain) nearest of instruction of being written into to re-execute history and re-execute data of historical prediction according to this and should be transferred to a save command that is written into instruction.The benefit of this method be (at least one embodiment) by remove store pass on virtual address computing time of judging the path and by comparison than virtual address comparison framework figure place still less, can reduce to store and pass on the time.The method also may be used less and/or less comparer, and it can save the wafer physical resource and the loss of power.At last, this solution can be compared framework than virtual address and more correctly detects store collision and realize storing the purpose of passing on.
These two kinds of methods also can be used in combination simultaneously.This two kinds of methods are below described.
Generally speaking, microprocessor 100 (referring to Fig. 1) is predictably carried out and is written into instruction.That is to say that microprocessor 100 hypothesis cache hit (hit) are sent out in being written into instruction and allowing to be written into instruction, need not to depend on the older save command that may have the data of being written into; Then if be written into instruction continuous miss (miss), 100 of microprocessors re-execute this and are written into instruction.When the storage address of an older save command can't be in order to when being written into the instruction comparison, being written into the unit and finishing to be written into and instruct to resequencing buffer (reorder buffer, ROB) 172 (referring to Fig. 1); Yet, when older save command is prepared by resignation, its with inspection be written into instruction queue and detection need its address but also do not take of its address new be written into instruction; Therefore, ROB 172 re-executes this and is written into instruction.That is to say that detect wrongly and miss to the situation of ROB 172 immediately when being written into the unit, this is written into instruction indirectly but not directly re-executed.This is written into instruction may be because the data that are written into not in microprocessor 100 and miss, need to obtain being written into data in the case from storer.Simultaneously, being written into instruction may be because to be written into data be on machine (storage formation), but does not pass on and therefore miss from older storage.Aforementioned phenomenon may be because following several situations take place: (1) is sent to when being written into the unit stream waterline and handling in being written into instruction, and microprocessor 100 does not have the address of storage and comes and be written into instruction to compare, so it can't compare the demand of passing on detection in the address; (2) microprocessor 100 detects address conflict, but it does not have the storage data that will pass on as yet; (3) microprocessor 100 passes on wrong data (error-detecting to a conflict or detect the effectively conflict failure of (valid)).
Above-mentioned preceding two reasons be to allow out of order transmission because be written into instruction, that is being written into instruction is to send before save command is sent out and produces address and data.Microprocessor 100 out of order transmissions this be written into instruction be because be written into instruction arrive be written into the unit before, it is written into the address and can not be calculated, therefore, register alias table (register alias table, RAT) 134 (referring to Fig. 1) and do not know to be written into the address and can produce a dependence.That is to say that RAT 134 is according to the register manipulation number but not produces dependence (dependency) according to the operand of storer.
Microprocessor 100 attempts to utilize modification RAT 134 to improve this problem to set up enhanced (enhanced) dependence that is written into instruction, so that it depends upon save command (or depend upon some aforementioned save commands interdependent instruction), data make aforementionedly to be written into instruction and can not to be sent out, till can so far be written into instruction by suitable passing on.Yet, so also can't overcome the 3rd above-mentioned reason.That is to say,, make memory sub-system have one at least and correct transfer to the aforementioned chance that is written into instruction even RAT 134 can make and be written into instruction and postpone being sent out, but the detection conflict and the correct data of passing on that memory sub-system still must be correct.
Microprocessor 100 use two kinds as the aforementioned store collision detections/storages pass on the prediction framework, be similar to RAT 134 and make two kinds of frameworks that are used for sending scheduling, be in order to store the purpose of passing on herein, but not dispatch the transmission that is written into instruction.The demand of passing on that stores is predicted with the source that stores address computation in the source that is written into address computation based on the framework of address source comparison by comparison, but not compares address itself, and detailed content will be described in down.The instruction pointer that is written into instruction (the instruction pointer that keeps (keep) to re-execute based on the framework that re-executes history, IP) history of 806 (referring to Fig. 8) is used to indicate the information of passing on relevant reason and can identify the storage data that be passed on; When microprocessor 100 is seen the IP that is written into instruction once more, it will pass on from the storage that meets, as described below.
Be written into that instruction sends that the scheduling invention is not contained about memory sub-system because the address relatively (that is virtual physics and/or non-whole virtual address are used) inaccurate and detect one receive incorrect data finish the situation that is written into instruction; Otherwise it contains the invalid situation of address/data that stores.This is because the foundation of enhanced dependence should be unable to be helpful to the situation of the inaccurate comparison in address.Yet, the purpose of passing on for storage, to contain this situation be helpful to comprise an embodiment who re-executes history.As described below, no matter when one be written into instruction must be because anyly pass on related causes and when being re-executed, pass on and re-execute historical formation (forwarding replay history queue, FRHQ) 194 (in Fig. 1) will be activated.It is noted that, the inaccuracy of address comparison can produce simultaneously the collision detection of (1) mistake (that is virtual index/hash (hash) meet with physics do not meet) and (2) conflict of missing (that is virtual index/hash do not meet with physics meet).
See also Fig. 1, demonstration one is according to the calcspar of a microprocessor 100 of the embodiment of the invention.
In an embodiment, the huge collection framework (macroarchitecture) of microprocessor 100 is the huge collection framework of an x86.If a microprocessor can correctly be carried out the major applications program that is designed for the x86 processor, this microprocessor promptly is an x86 structure treatment device.If the execution result that application program can obtain expecting can claim that then this application program is correctly carried out.Special, microprocessor 100 can be carried out the instruction of x86 instruction set, and comprises visible (user-visible) registers group of x86 user.Yet storage forwarding mechanism described herein also can be applicable in the microprocessor of any other the existing and following framework.
Microprocessor 100 comprises an instruction cache (cache) 106, and it can be from a system storage (not shown) high speed buffer memory programmed instruction.Microprocessor 100 also comprises an instruction decoder (decoder) 108, it can receive instruction and will receive from instruction cache 106 instruction decoding.In an embodiment, instruction decoder 108 comprises an instruction transfer interpreter (translator), its can be with of microprocessor 100 in the huge collection instruction set (macroinstruction set) huge collection instruction (for example: the x86 instruction set architecture) translate to micro-order in the microinstruction set (microinstruction set) of microprocessor 100.Special, instruction decoder 108 is with memory access instruction, for example x86MOV, PUSH, POP, CALL and RET or the like instruct, and translate to comprise one or more microinstruction sequences that are written into or store micro-order, abbreviate one herein as and are written into an instruction or a save command.In other embodiment, being written into instruction and save command is the some of primary (native) instruction set of microprocessor 100.
Microprocessor 100 also comprises a register alias table (register alias table is hereinafter to be referred as RAT) 134, and it is coupled to instruction decoder 108; A plurality of reservation stations (reservation station) 136, it is to be coupled to RAT 134; One resequencing buffer (reorder buffer is hereinafter to be referred as ROB) 172, it is coupled to RAT 134 and reservation station 136; Performance element 138, it is coupled to reservation station 136 and ROB 172; And structure register (architectural register) 162, it is coupled to ROB 172 and performance element 138.
Performance element 138 comprises a memory sub-system 182, it comprises one and is written into unit 185 and can carries out and be written into instruction, a storage element 183 and can carry out save command and and store formation (store queue) 184, and it can keep (hold) and wait for the executed save command that is written into storer (for example being coupled to the data caching 186 of memory sub-system 182).In addition, a memory sub-system 182 and a bus-bar interface unit (not shown) link and write so far in the system storage with sense data from a system storage or with data.Though memory sub-system 182 can receive and be written into instruction and save command comes to carry out according to procedure order, memory sub-system 182 can correctly be resolved store collision.That is to say, memory sub-system 182 guarantees that each is written into instruction, especially in the example of a store collision, receive correct data from correct save command (or save command the one single example that is written into the specified data of instruction is provided in a plurality of save commands).More particularly, embodiments herein is attempted to improve from storing formation 184 to the storage of the storage data that is written into unit 185 and is passed on accuracy.In the time of if necessary, memory sub-system 182 produces one and re-executes to indicate to ROB 172 and be written into instruction to require ROB 172 on a state signal 166, can receive correct data in order to guarantee it.When needs, be written into unit 185 and also can internally re-execute and be written into instruction.Performance element 138 also comprises other performance elements that can carry out non-memory access instruction (not shown), for example integer (integer) performance element, floating number (floating point) unit, multimedia (multimedia) unit or the like.
RAT 134 receives decoded instruction with procedure order from instruction decoder 108, and judges other dependences of instruction retired not in each instruction and the microprocessor 100.RAT 134 store with microprocessor 100 in each register that instruction retired is not relevant rename (renaming) information.This register renames the procedure order that information comprises instruction.In addition, RAT 134 comprises a complex state machine (complex state machine), and it can rename information and other inputs, the exercises of control microprocessor 100 corresponding to register.
RAT 134 comprises a dependence generator 188, and it can rename information according to procedure order, its specified operand source and the register of each instruction and come each instruction is produced interdependent information 158.Interdependent information 158 comprises an identifier (identifier) of each input operand of instruction, promptly input operand an identifier of interdependent dependent instruction (if the words that have).In an embodiment, this identifier is that an index enters to ROB 172, can discern the project (entry) among the ROB 172, and this project stores aforementioned dependent instruction and relevant status information.Referring to the following description.
RAT 134 comprises a storage and passes on fallout predictor (store forwarding predictor) 196, be written in one and predict when instruction conflicts with an older save command, so it must have and transfers to storage from older storage and pass on the storage data of fallout predictor 196.Special, RAT 134 produces the ROB index of the older save command of prediction, is called the ROB index (ROBindex of matching store, RIOMS) 198 that meet storage herein.RAT 134 provides RIOMS 198 together with being written into instruction and interdependent information 158 to reservation station 136.
RAT 134 comprises RAT 134 and is used for producing storing and passes on a plurality of formations of prediction.These formations comprise forward address source formation (forwarding address source queue, FASQ) 192 and one passes on and re-executes historical formation (forwarding replay history queue, FRHQ) 194, its each project is described in detail in following Fig. 4 and Fig. 8 respectively.
RAT 134 sends (dispatch) decoded instruction and relevant interdependent information 158 and RIOMS 198 thereof with charge free to reservation station 136.Before sending an instruction with charge free, RAT 134 instructs in ROB 172 configuration one project for this reason.Therefore, these instructions will be disposed to ROB 172 with procedure order, and it is to be set to a ring-type formation.This makes ROB 172 can guarantee that these instructions are retired from office with procedure order.RAT134 also provides interdependent information 158 to ROB 172 with in the project that is stored in this instruction.When ROB 172 re-executes an instruction, for example one be written into instruction, ROB 172 will provide the interdependent information 158 that is stored in the ROB project to reservation station 136 in re-executing between this order period.
Reservation station 136 comprises the formation of the interdependent information 158 that keeps these instructions and receive and RIOMS 198 from RAT 134.Reservation station 136 also comprises and can these instructions be sent to the transmission logic of performance element 138 from formation when instructions arm is performed.Performance element 138 can be by structure register 162, temporary register (not shown) by being renamed by structure register 162 among the ROB 172, perhaps directly receives the result 164 of the instruction of execution from performance element 138 by passing on path 176.Performance element 138 also provides its result 164 to ROB 172 to write to temporary register.
Memory sub-system 182 can utilize and be written into and save command is specified comes source operand, is written into the address and save command resolved and store the address being written into instruction and resolving (that is calculating).The source of operand can be structure register 162, constant and/or instructs specified displacement (displacements).Memory sub-system 182 also is written into to read from data caching 186 on the address and is written into data what calculate.Memory sub-system 182 also writes to storage data in the data caching 186 on the storage address that calculates.
As described above, under certain situation, memory sub-system 182 necessarily requires one to be written into one of instruction and to re-execute, and it is to indicate by providing to the state signal 166 of ROB 172.State signal 166 is specified the ROB index of the instruction (for example being written into instruction) that must quilt be re-executed, and makes an indication of the state that ROB 172 can instruct according to this, and whether need re-execute, upgrade the project of its index if comprising.In an embodiment, state signal 166 also specifies its data should be transferred to the ROB index of the save command that is written into instruction.These ROB index of state signal 166 also are provided to store and pass on fallout predictor 196, and its activation stores passes on fallout predictor 196 and calculate a error (delta) between two ROB index, and related content will be specified in down.When its ROB project mark for the instruction that need be re-executed during for the next instruction that will be retired from office, that is, be the oldest not instruction retired, ROB172 will re-execute this instruction.That is to say, ROB 172 from ROB 172 send with charge free again this instruction with and relevant interdependent information 158 to reservation station 136, deliver to re-executing of performance element 138 and performance element 138 to wait for follow-up repeating transmission.In an embodiment, ROB 172 not only re-executes this instruction, also re-executes all instructions of the result who depends upon this instruction.Be written into when instruction when ROB 172 re-executes one, ROB 172 also by state signal 168 with this event notice to RAT 134.State signal 168 is to specify the ROB index that is written into instruction that is re-executed.
See also Fig. 2, the detailed block diagram that is written into unit 185 and stores the streamline of formation 184 of the microprocessor 100 of displayed map 1.In the embodiment of Fig. 2, each streamline comprises 6 stages, is labeled as A to F.In the A stage, be written into unit 185 receptions and be written into instruction address operand 195 and RIOMS 199 (being the RIOMS 198 of Fig. 1).
In the B stage, an address generator 222 that is written into unit 185 produces from be written into instruction address operand 195 and is written into virtual address 224.Each project that stores formation 184 keeps the storage ROB index 202 of the save command that this project disposes.A plurality of ROB index comparer 204 comparisons are written into the RIOMS 199 and storage ROB index 202 of instruction, meet ROB index item designator (matching ROB index entry indicator) 206 to produce one, whether any in its indication storage ROB index 202 meets RIOMS 199, if then the project of which storage formation 184 meets.
In the C stage, be written into one in the unit 185 and translate query buffer (Translation Lookaside Buffer, TLB) 246 inquiries are written into virtual address 224 and output and are written into physical address 248 after translating.Each project that stores formation 184 also keeps the storage data 226 of the save command that it disposed.A multiplexer 228 that stores formation 184 streamlines stores to receive storage data 226 formation 184 projects and select to meet ROB index item designator 206 indicated storage datas 226 from each and is written into unit 185 as transfer of data 265 to transfer to.
In the D stage, be written into physical address 248 cache marks array 263 and cache data array 262 to data caching 186 are provided, to obtain cached data 264.A multiplexer 266 that is written in the unit 185 receives cached data 264 and receive transfer of data 265 from stores formation 184, and selects a result 164 who imports as Fig. 1 wherein.Designator 206 is indicated if transfer of data 265 meets the ROB index item, and then multiplexer 226 is selected transfer of data 265, otherwise selects cached data 264.Each project that stores formation 184 has also kept the storage physical address 267 of the save command that it disposed.A plurality of physical address comparer 268 comparisons are written into physical address 248 and meet designator 269 with each storage physical address 267 to produce a physical address, whether be used to refer to has any one to store physical address 267 to meet and be written into physical address 248, if and then which project of indication storage formation 184 meets.
In the E stage, store steering logic unit 286 in formation 184 streamlines and receive and meet ROB index item designator 206 and physical address meets designator 269 and in view of the above for being written into the state signal 166 that instruction produces Fig. 1.Whether state signal 166 points out whether to be written into instruction is successfully finished, has missed or must be re-executed.
In the F stage, as a result 164 and state signal 166 provide to other unit of ROB 172 and microprocessor 100.
Referring to Fig. 3, show the detailed block diagram that is written into unit 185 and stores the streamline of formation 184 of an existing microprocessor.The streamline 185/184 of Fig. 3 is similar to the streamline 185/184 of Fig. 2, but following difference is arranged.In Fig. 3, store formation 184 streamlines and comprise virtual address comparer 304, but not the ROB index comparer 204 of Fig. 2.Virtual address comparer 304 comparison is written into virtual address 224 and meets designator 306 with the storage virtual address 302 (or its some) that each stores formation 184 projects to produce a virtual address, but not Fig. 2 meet ROB index item designator 206.Comparison diagram 2 can find with Fig. 3, and the embodiment of Fig. 2 is that comparison ROB index is judged and will be transferred to the storage data 226 (if having) that is written into instruction, and compared to the existing design of Fig. 3, its advantage is can avoid relying on the generation that is written into virtual address 224.
Referring to Fig. 4, demonstration one is according to forward address source formation (forwarding adderss source queue, FASQ) calcspar of a project 402 of 192 of Fig. 1 of the embodiment of the invention.FASQ project 402 keeps the relevant information of a save command with RAT 134 received (encounter).See also the explanation of following Fig. 5 and Fig. 6, RAT is 134 configurable, insert (populate) and use FASQ project 402.FASQ project 402 comprises a significance bit 404, and whether its indication FASQ project 402 is effective.Corresponding to (reset) action of resetting, all items 402 of microprocessor 100 initialization FASQ 192 is invalid, that is, remove the significance bit 404 of each FASQ project 402.FASQ project 402 also comprises a srcA field 406 and a srcB field 408, its respectively define storage subsystem 182 be used for calculating the first operand of storage address of save command and the source of second operand.SrcA field 406 and srcB field 408 specified and remained with the structure register 162 that operand or constant are used as operand.FASQ project 402 also comprises a displacement field (displacement field) 412, and it remains with memory sub-system 182 and is used for calculating the specified displacement of a save command that it stores the address.FASQ project 402 also comprises a displacement significance bit 414, and whether the value of its indication displacement field 412 is effective.FASQ project 402 also comprises an index field 416, and it remains with the ROB index of save command.
See also Fig. 5, show a RAT operational flowchart according to Fig. 1 of the present invention.Flow process starts from step 504.
In step 504, RAT 134 decodings, one instruction and generation corresponding interdependent information 158 as shown in Figure 1.Then carry out determining step 506.
In determining step 506, RAT 134 judges whether decoded instruction is a save command.If, execution in step 508; Otherwise then execution in step 512.
In step 508, RAT 134 is configured in the project 402 among the FASQ 192.That is to say that RAT 134 logically pushes (push) tail end to FASQ 192 with a project 402, it logically releases the project 402 on the top of FASQ 192.RAT 134 then inserts srcA field 406, srcB field 408 and the displacement field 412 of the project 402 of configuration to come from adequate information in the save command.When if save command defines a displacement, RAT 134 sets displacement significance bit 414; Otherwise RAT134 will remove displacement significance bit 414.RAT 134 also inserts index field 416 with the ROB index of save command.At last, RAT 134 sets significance bit 404.In an embodiment, save command is actually two indivedual micro-orders: one stores address (STA) micro-order and a storage data (STD) micro-order.The STA instruction is sent to a storage address location that calculates the memory sub-system 182 that stores the address.STD instruction is sent to a storage data unit of memory sub-system 182, its always source-register obtain storage data and storage data be distributed to one storing formation 184 projects, be used for the follow-up storer that writes to.In this embodiment, when RAT 134 sees the STA instruction, configuration project 402 is in FASQ 192 and insert srcA field 406, srcB field 408 and displacement field 412, and when RAT 134 saw the STD instruction, RAT 134 inserted index field 416 and sets significance bit 404 with the ROB index of STD micro-order.Flow process is then returned step 504 and is carried out.
In determining step 512, RAT 134 judges whether decoded instruction is one and is written into instruction.If carry out determining step 514; Otherwise then carry out determining step 518.
In determining step 514, RAT 134 comparison is written into the specified source, address of instruction and FASQ 192 projects 402 specified save command addresses and originates and judge whether that it meets in the project 402 any one.That is to say that RAT 134 comparison is written into the srcB field 408 of the srcA field 406 of the first source operand field of instruction and each project 402, second source operand field that comparison is written into instruction and each project 402 and compares is written into the displacement field of instruction and the displacement field 412 of each project 402.In an embodiment, RAT 134 also allows to be written into and specifies the identical source-register that comes, but is the order with exchange.If meet aforementioned three fields of any project 402 among the FASQ 192, and if being written into instruction specifies a displacement and displacement significance bit 414 to be set or to be written into instruction not specify a displacement and displacement significance bit 414 to be eliminated, flow performing step 516 then; Otherwise flow process is returned step 504.
In step 516, RAT 134 dope be written into instruction should be for from the transfer of data in the older save command relevant with the FASQ that meets 192 projects 402, and the RIOMS 198 in the corresponding output map 1.That is to say that RAT 134 is output in the value of the ROB index field 416 of the FASQ project 402 that meets that step 514 determines.Flow process is returned step 504.In addition, the step 702 of the following Fig. 7 of flow process continuation execution is carried out and is written into instruction.
In determining step 518, RAT 134 judges that whether decoded instruction is an instruction revising a specified source of the srcA 406 of any project 402 of FASQ 192 or srcB 408 fields.If, execution in step 522; Otherwise then return step 504.
In step 522, RAT 134 removes each and specify the significance bit 414 of the FASQ project 402 of a register in srcA 406 or srcB 408 field, and wherein this field is revised by the instruction that determining step 518 is determined.It is unlikely can overlap because be written into the address and store the address that RAT 134 removes significance bit 404; Therefore unlikely can pass on the storage data relevant to being written into instruction with the specified save command of FASQ project 402.Flow process is returned step 504.
Referring to Fig. 6, show operational flowchart according to the microprocessor 100 of Fig. 1.Flow process starts from step 602.
In step 602, instruction of ROB 172 resignations.Then carry out determining step 604.
In determining step 604, ROB 172 scanning FASQ 192 judge whether that the index field 412 of any project 402 just meets the index of the instruction of being retired from office by ROB 172.If, execution in step 606; Otherwise then return step 602.
In step 606, ROB 172 removes the significance bit 404 of the FASQ project 402 that meets.Can prevent that so RAT 134 from producing a RIOMS 198 and giving one follow-uply to be written into instruction on a save command of having been retired from office.Flow process is returned step 602.
Referring to Fig. 7, the operational flowchart of the microprocessor 100 of displayed map 1 is written into instruction in order to according to the comparison of source, address data are transferred to one from a save command.Flow process starts from step 702.
In step 702, reservation station 136 send one be written into instruction 197 with and relevant RIOMS198 to being written into unit 185.Flow process continues from step 702 to step 704 and step 712.
In step 704, be written into unit 185 receptions and be written into instruction address operand 195.Flow process proceeds to step 706.
In step 706, be written into 222 calculating of element address generator and be written into virtual address 224.Flow process proceeds to step 708.
In step 708, TLB 246 receive be written into virtual address 224 and produce Fig. 2 be written into physical address 248.Flow process continues from step 708 to step 724 and 736.
In step 712, be written into unit 185 and transmit RIOMS 199 to storing formation 184.Flow process proceeds to step 714.
In step 714, the ROB index comparer 204 that stores formation 184 compares RIOMS 199 and meets ROB index item designator 206 with storage ROB index 202 with generation.Flow process proceeds to determining step 716.
In determining step 716, storage formation 184 is inspected the ROB index item designator 206 that meets that produces in step 714 and has been judged whether that any one meets RIOMS199 in the storage ROB index 202.If have at least one to meet, execution in step 718; Otherwise then execution in step 734.
In step 718, multiplexer 228 select than meet the indicated storage data 226 that is written into the older up-to-date save command of instruction of ROB index item designator 206 as transfer of data 265 to provide to multiplexer 266.Flow process proceeds to step 722.
In step 722, be written into unit 185 and be used in transfer of data 265 that step 718 passes on and carry out and be written into instruction 197.That is to say that multiplexer 266 has been chosen transfer of data 265.Flow process proceeds to step 724.
In step 724, physical address comparer 268 comparison is written into physical address 248 and stores physical address 267 and produce physical address and meet designator 269.Flow process proceeds to determining step 726.
In determining step 726, steering logic unit 286 is inspected the physical address that produces in step 724 and is met designator 269 and judge and be written into physical address 248 whether to accord with its storage data 226 be to transfer to the storage physical address 267 that is written into instruction 197 save command in step 718, and judges whether that this save command stores physical address 267 and meets the up-to-date save command that is written into physical address 248 for satisfying.If correct transfer of data 265 is transferred to be written into instruction 197 and to be used by being written into instruction 197, and flow process is followed execution in step 728; Otherwise incorrect data are transferred to be written into instruction 197 and to be used by being written into instruction 197, and flow process is followed execution in step 732.
In step 728, be written into unit 185 by result 164 to ROB 172 and microprocessor 100 are provided other unit and on state signal 166 the finishing to carry out and be written into instruction 197 of indication one success.At last, when being written into instruction 197 when becoming instruction the oldest in the microprocessor 100, ROB 172 is written into instruction 197 with resignation.Flow process ends at step 728.
In step 732, steering logic unit 286 produces a state signal 166 and points out to be written into instruction 197 and must be re-executed, and is written into unit 185 and internally re-executes and be written into instruction 197, has used incorrect data because be written into instruction 197.In addition, ROB 172 re-executes and depends upon all instructions that are written into instruction, because these instructions may receive incorrect data from the previous result who is written into instruction.Flow process ends at step 732.
In step 734, be written into unit 185 and utilize cached data 264 execution to be written into instruction 197, that is, do not utilize the storage data that passes on, because show without any meeting in the ROB of determining step 716 index comparative result.Flow process proceeds to step 736.
In step 736, physical address comparer 268 comparison is written into physical address 248 and stores physical address 267 and produce physical address and meet designator 269.Flow process proceeds to determining step 738.
In determining step 738, steering logic unit 286 is inspected the physical address that produces in step 724 and is met designator 269 and judge whether be written into physical address 248 meets any one and store physical address 267.If a storage of missing takes place and passes in expression.That is to say, be written into the invalid data that instruction 197 is used from data caching 186, but not the storage data 226 that should be passed on by a save command from store formation 184, and flow performing is to step 732.Yet when if a storage of missing not taking place passing on, flow performing is to step 728.
Referring to Fig. 8, show the calcspar that re-executes a project 802 of historical formation (FRHQ 194) according to passing on of Fig. 1.FRHQ project 802 keeps with one passes on related causes and of re-executing is written into the relevant information of instructing in order to store.Referring to the description of following Fig. 9 and Figure 10 and earlier figures 7, RAT is 134 configurable, insert and use FRHQ project 802, and it comprises one and is used to refer to whether project 802 is effective significance bit 804.Corresponding to the action of resetting, microprocessor 100 is invalid with all items 802 of initialization FRHQ 194, that is, remove the significance bit 804 of each FRHQ project 802.In addition, in an embodiment, the significance bit 804 of each FRHQ project 802 is to be eliminated when being written in the procedure code section ultimate value in x86 procedure code section describer (code segment descriptor) (code segment limit value) at every turn.FRHQ project 802 also comprises an instruction pointer (IP) field 806, and its storage is written into the storage address at instruction place.In an embodiment, IP 806 is the storage address of the next instruction after being written into instruction, but not is written into the address of instruction itself.FRHQ project 802 also comprises a ROB index error field 808, and it can store the ROB index that is written into instruction and come from difference between the ROB index that storage data should be transferred to the save command that is written into instruction, as discussed below.
Referring to Fig. 9, show operational flowchart, in order to dispose and to insert the project 802 of the FRHQ 194 shown in Fig. 8 according to the microprocessor 100 of Fig. 1.Flow process starts from step 902.
In step 902, memory sub-system 182 detects one and is written into instruction and passes on relevant reason and re-executed because store.Storage is passed on the example of relevant reason and is comprised following severally, but is not limited thereto.First point, the storage physical address that stores an older save command in the formation 184 when unit 185 processing are written into instruction is still not yet in effect when being written into.That is to say that RIOMS 198 meets an older storage, but physical address meets designator 269 for invalid, still not yet in effect because storage formation 184 detects the storage physical address 267 that meets.Under this situation, when save command is prepared by resignation, its can judge the storage physical address 267 of save command meet be written into physical address 248 and therefore its storage data 226 should be transferred to and be written into instruction.Therefore, ROB 172 makes and is written into instruction and anyly depends upon the instruction that is written into instruction and re-executed, and notice RAT 134, so RAT134 can upgrade FRHQ 194.Second point is written into when instruction when being written into cell processing, and the storage data of an older save command is still not yet in effect.That is to say that RIOMS 198 meets an older storage, but the data of the storage that meets are not as yet for effective.Thirdly, RIOMS 198 meets a storage that stores in the formation; Yet, physical address meet that designator 269 is not pointed out to be written into and the storage discerned by RIOMS 198 between be consistent, the wrong transfer of data 265 of its expression is passed on.The 4th point, RIOMS 198 meet one and store the storage in the formation and be written into physical address 248 and also be consistent with storage physical address 267; Yet physical address meets designator 269 and points out that the storage of being discerned by RIOMS198 is not that (for example: the old storage that meets in other physics of the storage that meets), the wrong transfer of data 265 of its expression is passed in the correct storage that will be passed on.The 5th point, RIOMS 198 do not meet any storage ROB index 202 that stores in the formation 184; Yet physical address meets designator 269 one of generation and meets storage, and the data that its expression is extracted from data caching 124 are misdata.The 6th point, RIOMS 198 meet an older storage and its physical address is also confirmed to have to meet; Yet the memory characteristics of relational storage address (trait) does not allow to store passes on (for example: not between cacheable area).Flow process is followed execution in step 904.
In step 904, memory sub-system 182 on state signal 166, export the ROB index that is written into instruction that re-executes with and storage data should be transferred to the ROB index of the save command that is written into instruction.ROB 192 utilizes state signal 166 to upgrade to be written into instruction ROB 192 project status, and to re-execute action be under the situation performed by ROB 172 to point out that it need be re-executed in aforementioned, with respect to by being written into re-executing of a performed inside of unit 185.Flow process is followed execution in step 906.
In step 906, RAT 134 spies on state signal 168 that (snoop) memory sub-system 182 produced and corresponding calculated and is written into difference or error between instruction ROB index and the save command ROB index in step 904.When the error of calculation, RAT 134 lists consideration with the ring-type formation characteristic of ROB 192 in around effect (wrap around effect).Flow process is followed execution in step 908.
In step 908, corresponding in the state signal 166 that step 906 produced, RAT 134 disposes a project 802 in FRHQ 194.That is to say that RAT 134 logically pushes (push) tail end to FRHQ 194 with a project 802, it logically releases the project 802 on the top of FRHQ 194.RAT 134 follows with the value of the instruction pointer that is written into instruction and inserts IP field 806.RAT 134 is also to insert ROB index error field 808 in the difference that step 906 was calculated.At last, RAT134 sets significance bit 804.Flow process ends at step 908.
Referring to Figure 10, the operational flowchart of the microprocessor 100 of displayed map 1 is in order to use the project 802 among the FRHQ 194.Flow process starts from step 1002.
In step 1002, RAT 134 receives one and is written into instruction and for being written into the interdependent information that instruction produces its standard.In addition, RAT 134 comparison is written into the IP field 806 in each effective item 802 of the value of instruction pointer of instruction and FRHQ 194.Flow process proceeds to determining step 1004.
In determining step 1004, RAT 134 judges in the performed comparison of step 1002 whether meet with any FRHQ project 802.If not, flow process finishes; Otherwise flow process proceeds to step 1006.It is noted that the RAT 134 received examples that are written into instruction are different with the example of the instruction pointer that is written into instruction stored in step 908 in step 1002/1004/1006.Therefore, when one was written into instruction because relevant reason is passed in storage and re-executes, RAT 134 can't insert FRHQ project 802 with the real ROB index of save command.Otherwise, useful, be written into when instruction when re-executing one, (in the step 908 of Fig. 9) RAT 134 inserts FRHQ project 802 with the difference between the ROB index that is written into instruction and save command in first kind of example, make in being written into second kind of instruction and follow-up example, RAT 134 can be passing on a demand of storage data from ROB index error field 808 prediction of the previous decision that is written into the instruction example at present from instruction (it is predicted to be dismissible a save command), shown in following steps 1006.The inventor judged one be written into instruction with and the storage data save command that should be passed between ROB index error have the possibility of height will be identical with the example that re-executes after the example.
In step 1006, RAT 134 predictions should be passed on storage data to being written into instruction from an older save command, wherein the ROB index of this older save command can be calculated by the value of the ROB index error field 808 relevant with the FRHQ project that meets 802, and RAT 134 corresponding calculated are written into the value that instruction ROB index deducts the ROB index error field 808 of the FRHQ project 802 that meets that is determined by step 1004, and the difference of gained is as RIOMS 198.Useful, RIOMS 198 activation memory sub-systems 182 store under the situation that need not wait for the generation that is written into virtual address 224 and pass on and (for example: 7 ROB index) compare the bit quantity less with respect to virtual address space.The step 702 that flow process continues from step 1006 to Fig. 7 is written into instruction with execution.
According to one of them embodiment of the present invention, the figure place that the IP field 806 of FRHQ 194 stores is less than all instruction pointer address bits; Therefore, meet if in step 1004, find one, also can't guarantee to be written into instruction and be with step 902 in detected re-execute to be written into instruction identical.It is noted that also can't guarantee has a save command in ROB 192 herein on the index that calculates, even perhaps have, its storage data should be transferred to is written into instruction.In other words, RAT 134 is producing a prediction.
Referring to Figure 11, show operational flowchart according to the microprocessor 100 of Fig. 1.Flow process starts from step 1102.
In step 1102, instruction of ROB 172 resignations.Flow process proceeds to determining step 1104.
In determining step 1104, ROB 172 scanning FRHQ 194 just meet the IP of the instruction of being retired from office by ROB 172 with any one IP field 806 that judges whether its project 802.If, execution in step 1106; Otherwise then returning step 1102 carries out.
In step 1106, ROB 172 removes the significance bit 804 that meets FRHQ project 802.So can avoid RAT 134 to produce the follow-up RIOMS 198 of instruction that is written on a save command of having been retired from office.Flow process is back to step 1102 and carries out.
The embodiment of Fig. 1, Fig. 2 and Fig. 4-7 describes as the aforementioned, and wherein, microprocessor 100 uses the source comparison of an address to predict the state that passes on that stores as the framework on basis.In addition, the embodiment of Fig. 1, Fig. 2 and Fig. 7-11 describes as the aforementioned, and wherein, microprocessor 100 uses one to re-execute and historical predict the state that passes on that stores as the framework on basis.What must remind is, aforementioned two kinds of basic frameworks can use separately or common combination is used or pass on other and to store framework and use.For instance, each framework can be used by itself.In addition, these two kinds of frameworks can be used together.In a this embodiment, when the both produces one when meeting, can consider that various embodiment selects which of RIOMS 198 of two fallout predictors to use.In an embodiment, the comparison of source, address is preferable for the fallout predictor on basis.Another embodiment considers a selector switch according to one or more factor, and for example, forecasting accuracy history or other non-history are selected one of them fallout predictor for the factor (for example being written into/storage characteristics, being written into/storing queue depth or the like) on basis.Moreover, replace and intactly replace the framework of virtual address comparison for the basis, re-executing historical fallout predictor for the basis can use jointly with the framework of a virtual address comparison for the basis, may increase its accuracy.So ask part beneficial especially cycle length at the microprocessor clock pulse.For instance, do not meet or produce one when being different from the meeting of the storage that re-executes historical comparison for the basis if virtual address produce to have for the comparison on basis, it is preferable to re-execute historical fallout predictor for the basis.
Though RAT is that pending storage has kept source, address/re-execute historical information in FASQ/FRHQ among the aforesaid embodiment, and carry out to store prediction and provide the up-to-date ROB index that meets storage to continue to prolong streamline and be sent to and be written into the unit together with being written into instruction is provided, can consider to store formation among other embodiment and be pending storage and keep the address and originate/re-execute historical information in FASQ/FRHQ, and be written into the unit and provide in address source-information/IP leaves FASQ/FRHQ to standing for a long while the storage formation.This embodiment does not comprise to be written into to send dispatches the processor of inventing, and such method may be preferable.
As described above, address source comparison is for the basis and re-execute historical storage for the basis and pass on the advantage of framework and be written into virtual address calculating for it may remove to store to pass in the critical path of judging, and may use still less and/or littler comparer, it can make some designs meet less time restriction and can save the wafer real resource and the loss of power.In addition, framework of the present invention can store for the framework on basis can detect more accurately than virtual address comparison and pass on the store collision of purpose.
Though the present invention discloses as above with preferred embodiment; right its is not in order to limit the present invention; those skilled in the art can do some changes and retouching under the premise without departing from the spirit and scope of the present invention, so protection scope of the present invention is as the criterion with claim of the present invention.For example, but the software activation, for example, function, manufacturing, modelling, simulation, description and/or test device of the present invention and method.Above-mentioned can be by using general procedure language (for example: C, C++), hardware description language (HDL) comprise that Verilog HDL, VHDL or the like realize.This type of software can be placed in any known computer-readable media with the kenel of procedure code, for example a tape, semiconductor, magnetic sheet, floppy disk, hard disk or discs (for example: CD-ROM, DVD-ROM or the like), a network, wired line, wireless or other communication mediums.Wherein, when procedure code by machine, when being written into and carrying out as computing machine, this machine can become in order to implement device of the present invention.Device of the present invention and method can be contained in semiconductor intellecture property core, a microcontroller core (being embedded in HDL) for example, and convert the hardware product of integrated circuit to.In addition, described device of the embodiment of the invention and method can comprise the combined physical embodiment with hardware and software.Therefore protection scope of the present invention is to be as the criterion with claim of the present invention.Special, the present invention can be implemented in the micro processor, apparatus, and is used in the general purposes processor.At last, those skilled in the art can do some changes and retouch to reach identical purpose of the present invention under the premise without departing from the spirit and scope of the present invention based on disclosed notion and specific embodiment.

Claims (23)

1. microprocessor comprises:
One formation, comprise a plurality of projects, each above-mentioned project is in order to keep the store information of a save command, wherein above-mentioned store information is specified the source in order to a plurality of operands that calculate a storage address, and wherein above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address; And
One steering logic unit, be coupled to above-mentioned formation, be written into instruction in order to receive one, above-mentionedly be written into instruction and comprise that appointment is in order to calculate the information that is written in source that is written into a plurality of operands of address, wherein above-mentioned steering logic unit is in order to detecting the above-mentioned above-mentioned store information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry, and predicts correspondingly that above-mentioned microprocessor should pass on and meet the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
2. microprocessor as claimed in claim 1, wherein above-mentioned steering logic unit in order to predict above-mentioned microprocessor in above-mentioned microprocessor calculate above-mentioned be written into the address before, the above-mentioned storage data that passes on is written into instruction to above-mentioned.
3. microprocessor as claimed in claim 1, wherein above-mentioned formation is in order to keep each of a plurality of above-mentioned save commands, wherein if above-mentioned steering logic unit detects the above-mentioned above-mentioned store information that is written in the more than one effective item that information conforms is retained in above-mentioned formation, above-mentioned steering logic unit is predicted that above-mentioned microprocessor should pass on and is met the specified above-mentioned storage data of the above-mentioned up-to-date above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
4. microprocessor as claimed in claim 1, wherein the above-mentioned project of each above-mentioned formation is in order to keep a resequencing buffer index of above-mentioned save command, wherein above-mentioned steering logic unit is in order to meet the above-mentioned above-mentioned resequencing buffer index that is written into the above-mentioned save command of information by the output store information, predicts above-mentioned microprocessor and should pass on and meet the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
5. microprocessor as claimed in claim 4 also comprises:
One is written into the unit, is written into instruction in order to carry out this; And
One stores formation, be coupled to this and be written into the unit, wait for the storage data that is written into storer with each reservation of thinking a plurality of save commands, wherein whether this storage formation meets any productive rearrangement preface buffer index of this save command of this storage formation in order to this resequencing buffer index of judging this save command that above-mentioned steering logic unit is predicted, and be written into the unit in order to this up-to-date one storage data that passes in this save command to this, wherein this up-to-date one productive rearrangement preface buffer index meets the resequencing buffer index of this prediction.
6. microprocessor as claimed in claim 5, whether wherein above-mentioned storage formation meets any productive rearrangement preface buffer index of this save command of this storage formation in order to this resequencing buffer index of judging this save command that above-mentioned steering logic unit is predicted, it generally is written into unit by using with this and calculates this by this this operand that is written into the specified source of information and be written into the address and take place simultaneously.
7. microprocessor as claimed in claim 1, wherein above-mentioned store information and above-mentionedly be written at least one identifier that information comprises a register of this microprocessor, this identifier keeps comes source operand in order to calculate one of this storage address.
8. microprocessor as claimed in claim 7, wherein above-mentioned store information and the above-mentioned information that is written into also comprise in order to calculate a displacement of this storage address.
9. microprocessor as claimed in claim 1, wherein above-mentioned steering logic unit is in order to receive instruction according to procedure order, each save command that above-mentioned steering logic unit is received wherein, above-mentioned steering logic unit disposes one of them of this project in this formation for this save command, and inserts this store information of the project of this configuration.
10. microprocessor as claimed in claim 9, wherein above-mentioned steering logic unit is effectively in order to the project mark that will dispose after inserting this store information.
11. microprocessor as claimed in claim 10, wherein corresponding to receiving an instruction, above-mentioned steering logic unit is invalid in order to each of one or more these queued entries is labeled as, wherein the one or more sources by the specified operand of one or more these queued entries of this modifying of order.
12. microprocessor as claimed in claim 1 is wherein retired from office corresponding to one of them of the above-mentioned a plurality of save commands in the above-mentioned formation, it is invalid that above-mentioned steering logic unit is labeled as in order to the above-mentioned queued entry with above-mentioned save command of being retired from office.
13. a storage method, in order to store transfer of data in a microprocessor, this method comprises the following steps:
Receive a crossfire of instruction and be one of them of each save command in this crossfire that is received a plurality of projects of disposing a formation with procedure order, and insert the project of a store information to this configuration, wherein above-mentioned store information is specified the source in order to a plurality of operands that calculate a storage address, and wherein above-mentioned save command is specified the storage data of desiring to be stored to the defined memory location in above-mentioned storage address;
Receive one in this crossfire and be written into instruction, above-mentionedly be written into instruction and comprise that appointment is written into the information that is written in source of a plurality of operands of address in order to calculate one, and detect the above-mentioned above-mentioned store information that is written in the wherein effective item that information conforms is retained in above-mentioned queued entry; And
Corresponding to above-mentioned detection step, predict that above-mentioned microprocessor passes on and meet the specified above-mentioned storage data of the above-mentioned above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
14. storage method as claimed in claim 13 also comprises:
Utilize this to be written into this operand in the specified source of information, calculate this and be written into the address;
Wherein the above-mentioned microprocessor of this prediction should pass on above-mentioned storage data to be written into instruction be to be executed in that this calculating is above-mentioned to be written into before the address to above-mentioned.
15. storage method as claimed in claim 13 also comprises:
Keep each of above-mentioned save command;
Wherein above-mentioned detection step comprises and detects the above-mentioned above-mentioned store information that is written in the more than one effective item that information conforms is retained in above-mentioned formation;
Wherein above-mentioned prediction steps comprises that the above-mentioned microprocessor of prediction should pass on and meets the specified above-mentioned storage data of the above-mentioned up-to-date above-mentioned save command that is written into information by store information and be written into instruction to above-mentioned.
16. storage method as claimed in claim 13, wherein above-mentioned this store information to the step of the project of this configuration of inserting comprises that a resequencing buffer of inserting above-mentioned save command is indexed to the project of this configuration, and wherein above-mentioned prediction steps comprises that the output store information meets the above-mentioned above-mentioned resequencing buffer index that is written into the above-mentioned save command of information.
17. storage method as claimed in claim 16 also comprises:
For the storage data that is written into storer is waited in each reservation of a plurality of save commands;
Whether this resequencing buffer index of judging the above-mentioned save command that predicts meets any productive rearrangement preface buffer index of this wait save command; And
This up-to-date one storage data that passes in this save command is written into instruction to this, and wherein this up-to-date one productive rearrangement preface buffer index meets the resequencing buffer index of this prediction.
18. storage method as claimed in claim 17 also comprises:
Utilization is calculated this by this this operand that is written into the specified source of information and is written into the address, and it is generally to take place simultaneously with this any productive rearrangement preface buffer index of judging whether this resequencing buffer index of the save command of above-mentioned prediction meets this save command of this storage formation.
19. storage method as claimed in claim 13, wherein above-mentioned store information and above-mentionedly be written at least one identifier that information comprises a register of this microprocessor, this identifier keeps comes source operand in order to calculate one of this storage address.
20. storage method as claimed in claim 19, wherein above-mentioned store information and the above-mentioned information that is written into also comprise in order to calculate a displacement of this storage address.
21. storage method as claimed in claim 20 also comprises:
Inserting this store information to the project of this configuration, is effective with the project mark of this configuration.
22. storage method as claimed in claim 21 also comprises:
In this crossfire, receive an instruction, the one or more sources of this modifying of order by the specified operand of one or more these queued entries; And
Corresponding to this instruction that receives in this crossfire, it is invalid that each of one or more these queued entries is labeled as.
23. storage method as claimed in claim 13 also comprises:
Retire from office in above-mentioned a plurality of save commands of above-mentioned formation one of them; And
Corresponding in above-mentioned a plurality of save commands of the above-mentioned formation of above-mentioned resignation one of them, it is invalid that the above-mentioned queued entry of above-mentioned save command of being retired from office is labeled as.
CN 201010247338 2009-08-12 2010-08-05 Microprocessor and correlation storage method Active CN101901132B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US23325909P 2009-08-12 2009-08-12
US61/233,259 2009-08-12
US12/781,274 2010-05-17
US12/781,274 US8533438B2 (en) 2009-08-12 2010-05-17 Store-to-load forwarding based on load/store address computation source information comparisons

Publications (2)

Publication Number Publication Date
CN101901132A true CN101901132A (en) 2010-12-01
CN101901132B CN101901132B (en) 2013-08-21

Family

ID=43226689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010247338 Active CN101901132B (en) 2009-08-12 2010-08-05 Microprocessor and correlation storage method

Country Status (1)

Country Link
CN (1) CN101901132B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015021919A1 (en) * 2013-08-13 2015-02-19 华为技术有限公司 Method and device for data storage scheduling among multiple memories
CN104583957A (en) * 2012-06-15 2015-04-29 索夫特机械公司 Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
CN106605207A (en) * 2014-09-12 2017-04-26 高通股份有限公司 Predicting literal load values using a literal load prediction table, and related circuits, methods, and computer-readable media
CN107204940A (en) * 2016-03-18 2017-09-26 华为技术有限公司 Chip and transmission dispatching method
US9904552B2 (en) 2012-06-15 2018-02-27 Intel Corporation Virtual load store queue having a dynamic dispatch window with a distributed structure
US9928121B2 (en) 2012-06-15 2018-03-27 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9965277B2 (en) 2012-06-15 2018-05-08 Intel Corporation Virtual load store queue having a dynamic dispatch window with a unified structure
US9990198B2 (en) 2012-06-15 2018-06-05 Intel Corporation Instruction definition to implement load store reordering and optimization
US10048964B2 (en) 2012-06-15 2018-08-14 Intel Corporation Disambiguation-free out of order load store queue

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5809275A (en) * 1996-03-01 1998-09-15 Hewlett-Packard Company Store-to-load hazard resolution system and method for a processor that executes instructions out of order
CN1690952A (en) * 2004-04-22 2005-11-02 国际商业机器公司 Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache
CN1836207A (en) * 2003-07-08 2006-09-20 先进微装置公司 Store-to-load forwarding buffer using indexed lookup
US20080288752A1 (en) * 2006-11-16 2008-11-20 Cox Jason A Design structure for forwarding store data to loads in a pipelined processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5809275A (en) * 1996-03-01 1998-09-15 Hewlett-Packard Company Store-to-load hazard resolution system and method for a processor that executes instructions out of order
CN1836207A (en) * 2003-07-08 2006-09-20 先进微装置公司 Store-to-load forwarding buffer using indexed lookup
CN1690952A (en) * 2004-04-22 2005-11-02 国际商业机器公司 Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache
US20080288752A1 (en) * 2006-11-16 2008-11-20 Cox Jason A Design structure for forwarding store data to loads in a pipelined processor

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104583957B (en) * 2012-06-15 2018-08-10 英特尔公司 With the speculative instructions sequence without the rearrangement for disambiguating out of order load store queue
CN104583957A (en) * 2012-06-15 2015-04-29 索夫特机械公司 Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US9904552B2 (en) 2012-06-15 2018-02-27 Intel Corporation Virtual load store queue having a dynamic dispatch window with a distributed structure
US9928121B2 (en) 2012-06-15 2018-03-27 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9965277B2 (en) 2012-06-15 2018-05-08 Intel Corporation Virtual load store queue having a dynamic dispatch window with a unified structure
US9990198B2 (en) 2012-06-15 2018-06-05 Intel Corporation Instruction definition to implement load store reordering and optimization
US10019263B2 (en) 2012-06-15 2018-07-10 Intel Corporation Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US10048964B2 (en) 2012-06-15 2018-08-14 Intel Corporation Disambiguation-free out of order load store queue
US10592300B2 (en) 2012-06-15 2020-03-17 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
WO2015021919A1 (en) * 2013-08-13 2015-02-19 华为技术有限公司 Method and device for data storage scheduling among multiple memories
CN106605207A (en) * 2014-09-12 2017-04-26 高通股份有限公司 Predicting literal load values using a literal load prediction table, and related circuits, methods, and computer-readable media
CN107204940A (en) * 2016-03-18 2017-09-26 华为技术有限公司 Chip and transmission dispatching method
CN112383490A (en) * 2016-03-18 2021-02-19 华为技术有限公司 Chip and transmission scheduling method

Also Published As

Publication number Publication date
CN101901132B (en) 2013-08-21

Similar Documents

Publication Publication Date Title
CN101901132B (en) Microprocessor and correlation storage method
US8533438B2 (en) Store-to-load forwarding based on load/store address computation source information comparisons
CN102087591B (en) Non sequential execution microprocessor and an operating method thereof
CN101853150B (en) Out-of-order execution microprocessor and operating method therefor
US11379234B2 (en) Store-to-load forwarding
CN101694613B (en) Unaligned memory access prediction
EP2674858B1 (en) Loop buffer learning
TWI552069B (en) Load-store dependency predictor, processor and method for processing operations in load-store dependency predictor
US8255670B2 (en) Replay reduction for power saving
KR101496009B1 (en) Loop buffer packing
CN101847094A (en) Non-microprocessor and the method for operating of carrying out in proper order thereof
KR20180036490A (en) Pipelined processor with multi-issue microcode unit having local branch decoder
US9354886B2 (en) Maintaining the integrity of an execution return address stack
CN101449237A (en) A fast and inexpensive store-load conflict scheduling and forwarding mechanism
EP3321811B1 (en) Processor with instruction cache that performs zero clock retires
US9740557B2 (en) Pipelined ECC-protected memory access
EP4202661A1 (en) Device, method, and system to facilitate improved bandwidth of a branch prediction unit
KR20230093442A (en) Prediction of load-based control independent (CI) register data independent (DI) (CIRDI) instructions as control independent (CI) memory data dependent (DD) (CIMDD) instructions for replay upon recovery from speculative prediction failures in the processor
CN102163139A (en) Microprocessor fusing loading arithmetic/logic operation and skip macroinstructions
CN101840330A (en) A kind of microprocessor and its information storing method
EP3321810B1 (en) Processor with instruction cache that performs zero clock retires
US11481331B2 (en) Promoting prefetched data from a cache memory to registers in a processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant