CN103577159B - For the method and apparatus using the multistage depositor renaming of dependence cancellation - Google Patents

For the method and apparatus using the multistage depositor renaming of dependence cancellation Download PDF

Info

Publication number
CN103577159B
CN103577159B CN201310333130.2A CN201310333130A CN103577159B CN 103577159 B CN103577159 B CN 103577159B CN 201310333130 A CN201310333130 A CN 201310333130A CN 103577159 B CN103577159 B CN 103577159B
Authority
CN
China
Prior art keywords
renaming
instruction
register
group
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310333130.2A
Other languages
Chinese (zh)
Other versions
CN103577159A (en
Inventor
H·杰克逊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Imagination Technologies Ltd
Original Assignee
Imagination Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB1213994.5A external-priority patent/GB2496934B/en
Application filed by Imagination Technologies Ltd filed Critical Imagination Technologies Ltd
Publication of CN103577159A publication Critical patent/CN103577159A/en
Application granted granted Critical
Publication of CN103577159B publication Critical patent/CN103577159B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Describe the multistage depositor renaming using dependence cancellation.In an embodiment, in two stages depositor is carried out the renaming stage.First stage relates to eliminating just by all of dependency in one group of instruction of renaming together.Subsequently, terminal stage uses renaming mapping concurrently all depositors to be carried out renaming.In various embodiments, in the first phase, the destination register in each instruction of fixing mapping pair is used to carry out renaming, to eliminate dependency, and in certain embodiments, the position of the fixing destination register mapped in instructing based on described a group.The most also dependence depositor is carried out renaming, before relying on during depositor is read in an instruction but instructs at described one group, instruction is written of those depositors.In terminal stage, in addition to performing renaming, also update renaming and map.

Description

For the method and apparatus using the multistage depositor renaming of dependence cancellation
Background technology
Out-of order processor can provide the calculated performance of improvement in the following manner: suitable with program The order that sequence is different performs instruction so that perform instruction when the input data of instruction can use, and not Instruction before being to wait in program is performed.In order to allow to instruct the most out of order operation, It is highly useful for the depositor that instruction is used can being carried out renaming.This makes it possible to from instruction Middle elimination " writeafterread " (WAR) dependency, this is because these are not real dependency.Logical Cross use depositor renaming and eliminate these dependencies, can not perform more in program order Instruction, and further increase performance.By keeping about which depositor of name in instruction The mapping that (referred to as structure register) is mapped on the physical register of processor, performs to deposit Think highly of name.This mapping can be referred to as " renaming mapping ", " register mappings ", " deposit and think highly of Name maps ", " register alias table " (RAT) or other similar term.
In each cycle, in multiple instructions, generally perform renaming, but weighed in one cycle Data dependency in one group of instruction of name means to operate complete parallel.The most right Destination register carries out renaming (i.e. with currently available physical register replacing structure depositor In the case of), just renaming is mapped (data during i.e. renaming maps) and is updated.(described one In group) reading in the future must use the mapping of renewal rather than this cycle to exist when initiateing subsequently Mapping.In order to solve this problem, it is possible to use from the knot of each destination register renaming operation Fruit is to the forward-path of the source register reading in each future.But, this can promptly become the most multiple Miscellaneous, and can not well extend (in the case of the quantity increase of the instruction processed the most in a group).
Have been proposed for a kind of two benches renaming method using two streamline renaming blocks.The party Method operated on two cycles, and used use at intermediate point rather than at clock edge The more asynchronous mode latched.Perform write first cycle and perform reading at second period, Which results in the increase of complexity, this is because in addition to the dependency in the middle of a group, working as now There is extra dependency between one group of front instruction with next group instruction temporal, reason is this Two groups update renaming within the single cycle and map/be read out from renaming mapping.
Embodiment as described below is not limited to solve the known method for depositor renaming and device The realization of any or all of shortcoming.
Summary of the invention
There is provided present invention to introduce the essence of design in simplified form, hereafter in specific embodiment party These designs are described by formula further.Present invention is not intended as the master that mark is claimed The key feature of topic or essential feature, be not intended to the scope being used as to assist to determine claimed theme.
Describe the multistage depositor renaming using dependence cancellation.In an embodiment, on two rank In Duan, depositor is carried out renaming.First stage relates to eliminating the whole dependencies in one group of instruction, Wherein, described one group of instruction is just by renaming together.Terminal stage uses renaming to map parallel subsequently Ground carries out renaming to all depositors.In various embodiments, by using fixing mapping pair every Destination register in individual instruction carries out renaming, eliminates dependency in the first phase, and In some embodiments, fixing maps position based on the destination register in described one group of instruction.Depend on Bad depositor is also life of being born the same name in the first phase, and described dependence depositor is in an instruction Before in being read but instructing at described one group, instruction is written of register stages.? In the whole stage, in addition to performing renaming, also update renaming and map.
First scheme provides a kind of method of depositor renaming in out-of order processor, including: In one stage, use the dependency in the fixing mapping defined in hardware logic eliminates one group of instruction; And in terminal stage, use renaming to map concurrently to whole depositing in described one group of instruction Device carries out renaming.
The dependency in the fixing mapping defined in hardware logic eliminates one group of instruction is used to wrap Include: use described fixing mapping, come described one group of instruction with in one group of extra depositor In target complete depositor and arbitrarily rely on depositor and carry out renaming;And which will employ Extra register carries out the details of renaming and is delivered to described final rank each destination register Section.
Fixing mapping between destination register and extra register can be based in described one group of instruction The physical location of each destination register.
Described terminal stage may further include: updates described renaming and maps.
Described renaming maps the entry that can include being associated with each extra register.
Update the mapping of described renaming and may include that the details renewal based on transmitting from the first stage is described The entry being associated with each destination register in renaming mapping;And update described renaming The entry being associated with each extra register in mapping, to map each extra register To unappropriated physical register.
The method may further include: accesses the list of unappropriated physical register.
Described fixing mapping can be independent of the state before any.
The method may further include: performs to optimize operation between first stage and terminal stage.
Described one group of instruction can include that N bar instructs, and described one group of extra register can include N number of extra register, wherein N is integer.
Each instruction in described one group of instruction can include the destination register less than Y, and And each instruction can have one group of Y the significance bit being associated, each significance bit indicates at this Whether instruction employs in Y destination register.The instruction of this group can include that N bar refers to Make, and described one group of extra register can include N × Y extra register, wherein N and Y It it is all integer.
Each instruction in described one group of instruction can include the source register less than X, and Each instruction can have one group of X the significance bit being associated, and the instruction of each significance bit refers at this Whether order employs in X source register.
Alternative plan provides a kind of out-of order processor, including: renaming maps;Definition register it Between the hardware logic of fixing mapping;Dependence cancellation logic, is used for using described fixing mapping to eliminate Dependency in one group of instruction;Renaming logic, is used for using described renaming to map concurrently to institute The whole depositors stated in one group of instruction carry out renaming;And multiple physical register.
Dependence cancellation logic can include multiple dependence cancellation logical instance, and wherein, each Individual dependence cancellation logical instance is for eliminating single, the non-overlapped subset in described one group of instruction In dependency.
Dependence cancellation logic may be used for eliminating the dependence in described one group of instruction in the following manner Property: use described fixing mapping, with in one group of extra register to one group instruction in whole Destination register and arbitrarily dependence depositor carry out renaming;And which will employ additionally deposit Device carries out the details of renaming and is delivered to renaming logic each destination register.
Renaming maps the entry that can include being associated with each extra register.
Multiple physical registers can include multiple unappropriated physical register.
Renaming logic can be further used for updating renaming and map.
Out-of order processor may further include the circulation between dependence cancellation logical AND renaming logic Buffer, wherein, described cyclic buffer is for after dependence cancellation logic eliminates dependency Storage is positioned at the instruction among circulation;And once store the whole instructions in described circulation, just will Described instruction is discharged into renaming logic.
Out-of order processor may further include the optimization between dependence cancellation logical AND renaming logic Logic.
Third program provides substantially with reference to any one institute in Fig. 1, Fig. 5 and the Fig. 6 in accompanying drawing A kind of out-of order processor described.
Fourth program provides substantially described with reference to any one in Fig. 2-Fig. 5 in accompanying drawing A kind of method of depositor renaming in out-of order processor.
Method described herein is performed by the software of the machine-readable form on tangible media, example As, the form of computer program, computer program includes computer program code modules, described calculating Machine code modules is adapted for carrying out any side described herein when described program is run on computers In steps, and wherein, computer program can embody on a computer-readable medium in the institute of method.Have The example of shape (or non-transitory) storage medium includes disk, thumb actuator, storage card etc., And do not include transmitting signal.Software may be adapted to hold on parallel processor or in serial processor OK so that method step can perform with any suitable order or simultaneously.
The application recognizes that firmware and software can be commodity valuable, that can individually conclude the business.It is intended to contain Cover and run on " mute " or standard hardware or be controlled performing expectation to " mute " or standard hardware The software of function.Also aiming to contain the software of the configuration of " description " or definition hardware, such as HDL is (hard Part describes language) software, as being used for designing silicon, or it is used for configuring universal programming chip, to hold Row desired function.
Preferred feature can be combined as, and this will be apparent to the skilled person, and can It is combined with any aspect with the present invention.
Accompanying drawing explanation
Embodiments of the invention will be described with reference to the following drawings, wherein by the way of example:
Fig. 1 is the schematic diagram of exemplary out-of-order processor;
Fig. 2 is that the exemplary register that the out-of order processor shown in Fig. 1 can be used to realize heavily is ordered The flow chart of name method.
Fig. 3 shows the example of depositor renaming;
Fig. 4 shows the schematic diagram of the streamline renaming operation on four cycles;
Fig. 5 shows the schematic diagram of the streamline renaming operation on five cycles, and wherein dependency disappears Except being divided into two benches, and show the schematic diagram of another example of out-of order processor;And
Fig. 6 is to show two other schematic diagrams of exemplary out-of-order processor.
All figures use common reference number to represent similar feature.
Detailed description of the invention
It is merely by the mode of example below to describe embodiments of the invention.These examples represent What applicant was currently known is committed to the best mode of practice by the present invention, although these best modes are also It is not that by only mode of the present invention.Describe and give the function of example and for constructing Sequence with the step of example of operation.However, it is possible to by different examples realize identical or etc. Same function and sequence.
In out-of order processor, the use of depositor renaming is referred to following example and describes, and originally shows Example includes two instructions (being labeled as I1 and I2):
I1:R3=R1+2
I2:R1=R2
Because R1 is the destination register of I2, so I2 can not (in I1, R1 be that source is posted at I1 Storage) carry out evaluation before, otherwise, the value being stored in R1 when to I1 evaluation is the most just Really.But, between these instruct, not there is " real " dependency, and this means Depositor renaming can be used to eliminate dependency.Such as, I2 can make its destination register be weighed Named as follows:
I2:R4=R2
Because destination register has been modified to R4, so there is no dependency between I1 and I2 now, And these two instructions can be by Out-of-order execution.This illustration show writeafterread (WAR) dependency Elimination.In other example, it is also possible to there is write after write (WAW) dependency, such as, if Instruction set farther includes the 3rd and instructs (being labeled as I3):
I3:R1=R5+4
This instruction (I3) writes to the depositor (R1) identical with instruction (I2) before, This means to write and can be left in the basket, unless this operation has some other side effect for the first time.
Fig. 1 shows that the schematic diagram of out-of order processor 100, out-of order processor 100 include the extraction stage 102, decoding stage 104, renaming stage 106 and multiple physical register 107.Should be clearly It is: out-of order processor is additionally may included in Fig. 1 other element being shown without and (such as resequences Buffer, execution pipeline etc.).The extraction stage 102 is arranged for indicated by extraction procedure enumerator The instruction from program (follow procedure order).The decoding stage 104 was arranged in the renaming stage 106 perform interpretative order before depositor renaming.As it has been described above, one group (or a collection of) instructs permissible It is renamed simultaneously.Depositor renaming can be by the renaming stage 106 by using structure register And the mapping between the physical register 107 on processor performs, and figure 1 illustrates and show Example Register renaming map 108.Register renaming map 108 was kept by the renaming stage 106 (i.e. updating), it is the data structure of a kind of storage, it is shown that each structure register divides with nearest Mapping between the physical register of this structure register of dispensing.Structure register is to use in instruction The name/identifier of depositor, and for explanation below, these structure registers are marked as A* (wherein * represents the numbering of depositor, such as A0, A1 ...).Physical register 107 is for processing Memory element actual present in device, and these physical registers be marked as P* (such as P0, P1…).There is physical register 107 more more than structure register, and multiple physical register 110 Including multiple unappropriated physical registers 109 (as indicated by shade in Fig. 1).Showing at Fig. 1 In example, Register renaming map 108 includes four entries, and these entries indicate by structure register The physical register identifier (P*) that identifier (A*) indexes.Such as, structure register 0 (A0) Being currently mapped to physical register 6 (P6), structure register 1 (A1) is currently mapped to physics and deposits Device 5 (P5) etc..Renaming maps 108 and can be stored in the trigger in processor hardware logic.
As shown in fig. 1, the renaming stage 106 is divided into two stages: the dependence cancellation stage 110 With renaming 112, although as described in more detail below, it is understood that there may be the stage more than two is (such as, The dependence cancellation stage 110 can be divided into two or more sub stages).In these stages One stage, i.e. dependence cancellation stage 110, eliminate and instructed by one group (or a collection of) of parallel renaming In dependency.By using fixing being mapped in this stage to disappear for the instruction in described one group of instruction Except RAW and WAW dependency, it is entirely predictable for wherein fixing mapping, and independent of any State before is also implemented in hardware logic 114.As described in greater detail below, fix and reflect Penetrate the destination register in described one group of instruction and rely on register mappings to distributor (labelling For N*).By using such fixing mapping, it is only necessary to minimal amount of logic (such as hardware logic) Realize this stage, the physical location of the instruction in wherein said mapping link to group.First rank Section does not use renaming to map, and (it is not fixing mapping that renaming maps, but store can be along with often The dynamic mapping of change of individual cycle), it is not required that perform any lookup (such as at fixing data knot Structure makes a look up).
Second stage in these stages, i.e. renaming 112 (it is also referred to as terminal stage), so Rear use (such as, from distributor to physical register) renaming maps 108 and weighs concurrently Name all of depositor.So, renaming stage pipeline ground performs all readings mapping renaming Take and update (that is, performing to set up while all of reading all of renewal, but until clock These renewals of edge just come into force so that read the effect that can't see current renewal), this makes this final Stage is highly susceptible to extension (a large amount of instructions such as expanding in same period).The renaming used Map and include extra register mappings, as shown in Figure 3 and be described below.
Although method shows that (in square frame 208) updates renaming in each cycle and map, It should be appreciated that: there may be the situation that need not change, and in this case, more The step that new renaming maps will make to map constant.
The renaming stage 106 is divided into two stages by this way to be had the effect that and heavily orders Name operation spends two cycles, compared with monocycle single-stage operation, which increases the waiting time, but It is not reduce handling capacity, this is because two stages are easily pipelined (as with reference to Fig. 4 more Describe in detail).By using the method, (by increasing the quantity of the instruction in one group of instruction) Increase handling capacity and/or increase maximum clock speed is possible.
Dependence cancellation stage and renaming stage 110 and 112 can fully use the hard of processor Part logic realizes.Or, some or all in these method steps can realize with software. Processor can be single-threaded processor or multiline procedure processor.It is multiline procedure processor at processor In the case of, can be that each thread repeats the element shown in Fig. 1 so that each thread has office One group of structure register in portion and renaming stage 106.Interchangeable multiline procedure processor can be shared firmly Some or all of part logic (square frame 106) carry out the renaming of reality, wherein can be in conjunction with depositing Device numbering uses thread number, to be indexed (that is, reflecting in renaming to renaming mapping 108 Penetrate relevant to the thread more than in the case of).Such as, renaming maps and can have and will be used for line The structure register 0 (A0) of journey 0 is mapped to the entry of physical register 6 (P6) and will be used for line The identical structure register (A0) of journey 1 is mapped to the different bar of physical register 26 (P26) Mesh.
Fig. 2 shows the flow chart of the exemplary methods of operationthe in renaming stage 106.By in Fig. 1 In the first stage 21 that the shown dependence cancellation stage 110 performs, use fixing mapping by all of Destination register becomes extra depositor (square frame 202) with dependence depositor renaming.Used herein Term " dependence depositor " represents it in being read in an instruction and also being instructed by a group Those depositors of front instruction write (that is, are that target is posted in the instruction before in one group of instruction Any source register of storage).In order to carry out explanation below, destination register can be marked as OP*, wherein * represents the numbering of instruction.
During the quantity (the most N number of extra register) of the extra register used instructs with one group The maximum quantity of destination register is equal.In many examples, each instruction is only to a target Write, and in such an example, the quantity of the extra register used with together with by The quantity (such as, the N bar instruction in one group of instruction) of the instruction in one group of instruction of renaming is equal. Such as, include one group of instruction:
I1:R3=R1+2
I2:R1=R2
I3:R5=R1+4
In the case of, three extra registers (N=3) used will be there are.One extra register will be used for Article 1, instructing the destination register (R3) of (I1), another extra register will be used for Article 2 and refer to Make the destination register (R1) of (I2), and the 3rd extra register will be used for Article 3 instruction (I3) Destination register (R5).In the middle of this example, there is a dependence depositor, it is Article 3 Source register R1 in instruction (I3), this is because this depositor is in described one group of instruction Instruction before is written into (that is, in Article 2 instructs).But, in other example, Instruction is likely to be of the destination register more than, and the number of the extra register therefore used Amount can exceed that the quantity of the instruction in described one group of instruction.
Can be as follows in first stage (square frame 201) and the fixing mapping that used in this illustration Shown in table, wherein use symbol N*.
Depositor N0-N7 is that the accurate of structure register A0-A7 represents the (side only by example Formula, employs 8 structure registers), and three extra registers are N8, N9 and N10.This Some additional register mappings are to three physics in unappropriated (or free time) physical register pond Depositor.In the middle of this example, destination register (OP1, OP2) is to be weighed sequentially in time Name, this simplify logic, although they can be renamed in any order (although once by Realize, identical order will be used for each cycle, because this is fixing mapping).Unappropriated thing Reason depositor can be any depositor and need not be neighbouring depositor, shows as shown in Figure 3 Indicated by example and be described below.After this dependence cancellation, these instructions are written as (making N* symbol with middle):
I1:N8=N1+2
I2:N9=N2
I3:N10=N9+4
It can be seen that Article 3 has instructed the dependence depositor (R1) in (I3) from this example Through being renamed (to N9), with to be written of depositor in instruction (I2) before corresponding.
In order to come more with new physical register in the renaming stage (that is, in next cycle) The respective entries for each destination register (R3, R1, R5) in new renaming mapping is right The initial register numbering of each destination register is tracked (square frame 204), i.e. storage is to making Which extra register to carry out the details that each destination register of renaming is identified with (such as to deposit In storage trigger between the two renaming stage).Returning to above example, this relates to following the tracks of Following information:
N3→[N8]
N1→[N9]
N5→[N10]
The wherein content of [N8] mark renaming map unit N8.
The terminal stage 22 performed by the renaming logic 112 shown in Fig. 1 uses renaming to reflect subsequently Penetrate and be performed in parallel all of depositor renaming (square frame 206).As it has been described above, renaming maps 108 Being the data structure of storage, it was updated (and storage) by the renaming stage 106 in each cycle, institute The mapping updated in the cycle before being is mapped with the renaming used in any cycle.In order to perform weight Name, the renaming stored maps and is accessed and for (in square frame 206) renaming concurrently All of depositor.This needs to be mapped into renaming row read operation.(such as grasp with reading simultaneously Make parallel), update renaming and map (square frame 208), i.e. set up the renewal that renaming is mapped, but It is that the renewal mapping renaming just comes into force until clock edge, on this aspect of clock edge, uses To update in all of trigger creating renaming mapping, thus store the mapping of renewal.Have two Write/renewal that renaming is mapped by group (performing in square frame 208).First, based on first The information followed the tracks of (in block 204) in stage, updates renaming and maps so that at initial mesh Mapping at scalar register file numbering is updated to be currently located at the extra register list being associated with this instruction Value (square frame 210) in unit.Secondly, with new one group from unappropriated physical register pond not The physical register of distribution updates the extra register unit being no longer point to unappropriated physical register (N8-N10 in above-mentioned example) (because they have been allocated) (square frame 212).Should be clear Chu: the two update step can concurrently or with any order perform (such as square frame 210 it After be square frame 212, or vice versa as the same).
Although it should be appreciated that Fig. 2 shows that square frame 206 occurs before square frame 208, as above Described, but the reading in the two square frame and renewal (or write) operation can be performed in parallel, Wherein, it is established in being written in this cycle and comes into force at clock edge afterwards (i.e. so that write Come into force after a read, and there is not the probability that may read incorrect data).
It is referred to example as shown in Figure 3 to further describe the method.In this illustration, Carrying out renamed instructions according to quaternate mode, so there are four extra registers, being labeled as N8-N11.And in this illustration, distribute initial destination register sequentially in time (OP0-OP3) logic of this step is realized with simplification hardware, as shown in fixing mapping 302. In this illustration, initial instruction 304 is written as the form of " OP Rd, Rs1, Rs2 ", wherein Rd is destination register, and Rs is source register.So as a example by the Article 1 in Fig. 3 instructs, This instruction is write exactly " OP A0, A0, A1 ", and destination register is structure register A0, and source register is Structure register A0 and A1.
In the first stage 21 of renaming operation, fixing mapping 302 is used to carry out all of mesh of renaming Scalar register file and dependence depositor (square frame 202 and arrow 306).Accord with distributor in figure 3 Number (that is, N* symbol being used for all depositors) shows that the renaming required for instruction maps and reads Produced list 308.From this example it can be seen that destination register OP A0, OP A2, OP A1 and OP A4 has been renamed into four extra register N8-N11.Rely on depositor the most Through the identified and suitable extra register of RNTO, i.e. because Article 1 instruction modification A0 Value, so the reading to A0 has been modified to N8 in Article 3 instruction, and because Article 2 instructs Have modified the value of A2, so the reading to A2 has been modified to N9 in Article 4 instruction.Post in source In the case of storage is not dependence depositor, there is the mapping one to one from A* symbol to N* symbol, As shown in fixing mapping 302.
In the first phase in addition to renaming (square frame 202 and arrow 306), required to instruction The list 310 of renaming map updating be identified (square frame 204 and arrow 312).As it has been described above, Symbol [N8] illustrates the content of renaming map unit N8.
Between two renaming stages 21,22, produced renaming can be mapped the row read The list 310 of table 308 and renaming map updating is stored in the trigger in hardware logic.
It can be seen that at the end of the first stage, do not have in the one group of instruction being renamed RAW or WAW dependency.
In order to perform the terminal stage 22 of renaming operation, use two information: for renaming can (physics) depositor list 314 and current renaming map 316.As it has been described above, this is final Stage realizes in the second cycle.In this terminal stage 22, renaming is used to map 316 Concurrently all of depositor to be carried out renaming (square frame 206 and arrow 318), and use physics Depositor symbol (that is, P* symbol) shows the operand being renamed produced by these instructions 320.Term used herein " operand " represents the depositor in instruction.
Also figure 3 illustrates the renewal (square frame 208 and arrow 322) that renaming maps, and such as Upper described, this renewal includes two parts: update initial target register number (square frame 210) with And update extra register unit (square frame 212).
In a part (square frame 210) of the renewal of renaming mapping, use and give birth in the first phase The map updating information 310 become and renaming map 316 and update four bars during renaming maps Mesh (updates 324).Such as, in the first phase, have recorded: depositor N0 is mapped to renaming The content of map unit N8, it is physical register P5 in renaming maps 316.Therefore more When new renaming maps (to generate the renaming mapping 326 of output), the content of renaming unit N0 P5 is become from P3.Similarly, renaming map unit N2, N1 and N4 content from P11, P2 P8, P7 and P0 is respectively become with P1.
In other another part (square frame 212) of the renewal of renaming mapping, also renaming is reflected Four entries hit are updated (updating 328).Renaming maps and is updated so that additionally deposit Device N8-N11 is mapped to the idle register from available depositor list 314, and in this example In, the content of renaming map unit N8-N11 (is idle before them from P5, P8, P7, P0 But be allocated physical register now) become P6, P10, P13, P15.Although at this In example, available depositor is allocated sequentially in time, but in other example, Can in any order by available map physical registers to extra structure register.This part Extra depositor is reset back to idle register so that in each iteration in dependence cancellation stage (that is, for each group of instruction being renamed) can use identical fixing mapping.
After (in square frame 208) have updated renaming mapping, (it is gone back in the renaming mapping of renewal The renaming that can be referred to as output maps) next group instruction of renaming in the cycle subsequently will be used for, And figure 4 illustrates this pipelining of renaming process.Fig. 4 shows four cycles C1-C4On renaming operation schematic diagram.At period 1 C1In, (in square frame 202-204) Dependency is eliminated from first group of instruction (I0-I3).At C second round2In, (in square frame 206) Use initial renaming to map R0 and first group of instruction (I0-I3) is carried out renaming, and ( In square frame 208) it is updated generating the renaming mapping R updated to this mapping1.Concurrently, exist Second round C2In, (in square frame 202-204) eliminates the dependency of second group of instruction (I4-I7). At period 3 C3In, (in square frame 206) uses the renaming from the output of cycle before to map R1 Second group of instruction (I4-I7) is carried out renaming, and this is mapped into by (in square frame 208) Row updates the renaming to generate renewal further and maps R2.Concurrently, at period 3 C3In, ( In square frame 202-204) eliminate the 3rd group of dependency instructing (I8-I11).Can be any remaining The instruction of many groups repeats this process.
Figure 4, it is seen that the two stage (dependence cancellation and renaming) can be easily It is pipelined, this is because each stage separated with other stage so that they are not shared Logical bit or renaming map.As it has been described above, with separate read operation and write operation on the contrary other Two benches renaming process is compared, and method described herein decreases due in one group of instruction and many The forwarding organizing the dependency between instruction and cause.It can also be seen that within the single cycle, only one Group instruction updates renaming and maps/read from renaming mapping.This is because (dependency disappears the first stage Remove) do not use renaming to map, but use fixing mapping.
It can also be seen that from Fig. 4, although owing to using two benches renaming process, renaming etc. The time for the treatment of adds a cycle (compared with single stage renaming block), but handling capacity is maintained at often Instruction of the individual cycle one group (including four instructions in this illustration).But, owing to each stage has There is low-complexity, it is possible to increase the quantity of the instruction in each group of instruction and keep and single-revolution simultaneously The clock speed that phase renaming block is the same, and therefore total throughout is higher.Or, for (with list Stage renaming block) identical handling capacity, clock speed can be increased, and needing identical gulping down In the case of the amount of telling, it is possible to achieve two benches system so that it takies less silicon area, and (this reduces Cost).Because owing to the reason of fixing mapping can realize dependency weight by the most only small amounts of logic Name step, it is possible to realize this less region.In other example, it is possible to achieve increase Clock speed and the combination of handling capacity of increase.
Method described above depends on the availability of unappropriated physical register, described unappropriated Physical register can serve as extra depositor in renaming operates.If reaching no longer to have available The such degree of depositor (such as, C in the diagram3At the end of), then the method can be allowed Pause so that renaming operation stops, until depositor is made available by, (the weight of such as I8-I11 Name is delayed by), and compared with the realization of existing monocycle, make the method stop by this way It is no longer problematic for pausing.As shown in Figure 3, the state only maintained is that renaming maps 316, 326.Be not veritably retain renaming map read 308 and update 310, and for example be by with They are delivered to the next stage from renaming stage by under type: at the end of the first stage (that is, when an end cycle) writes information into trigger, and subsequently in terminal stage (i.e. In next cycle) use trigger value.The method can also be made in different situations to pause, Such as in the case of processor rear end lacks available resources.
In as described above about Fig. 3, each group of instruction includes four instructions.This is only citing, And it should be appreciated that described one group of instruction can have any number of instruction, and at some In example, the instruction of many groups is likely to be of larger numbers of instruction.Substantial amounts of instruction is included in the instruction of many groups Example in, the first stage 21 can be divided into two or more sub stages, each of which height Stage eliminates the dependency in the subset of described one group of instruction.
In the example shown in fig. 5, the renaming stage 500 in out-of order processor 502 includes relying on Property eliminate two examples of logic 110, and as shown in sequential chart 504, shown in Fig. 4 Dual stage process compare, handling capacity is not affected (it remains each cycle one group instruction), but Be exist extra latency time period (that is, compared with two shown in Fig. 4 the cycle, In this example, renaming operation takes the total of three cycle).
In the first dependence cancellation sub stage (" dependence cancellation A "), use destination register The first half (or first subset) (such as, for the destination register of I0-I19) check one group of instruction In the dependency of all of instruction (such as including the I0-I39 of 40 instruct one group instruction).? In second dependence cancellation sub stage (" dependence cancellation B "), use the second the half of destination register (such as, for the destination register of I20-I39) checks the dependency of the second the half of instruction source.? In second sub stage, it is not necessary that go to check the first the half of instruction source, this is because they can not depend on Rely the target in the second half instructions (because in the instruction in the first half carry out a depositor is any Before reading the write in the instruction that will occur in the second half, same depositor carried out).
Under indicate the example including 4 instruction set instructed.In the first sub stage, before using Article two, the destination register (such as A0, A3) in instruction checks the dependence of all instructions (I0-I3) Property, and all of source register is carried out renaming, the most also the title of initial depositor is entered Line trace.In the table, show in the row of entitled " after the dependence cancellation of half " Result.Second sub stage finds follow-up dependent in the case of (such as, with in this example The situation of the last item instruction is the same, and wherein the renaming of N4 is substituted by N10), initial is deposited Device title is tracked.It should be appreciated that substitute all of depositor of renaming and follow the tracks of initial These depositors can not be carried out renaming in this first sub stage by the title of depositor, but Renaming can be tracked for follow-up realization (such as, as one of last sub stage Point).
In the second sub stage, the destination register (such as A4, A5) in the second half instructions is used Check the dependency in the instruction source (such as instructing the source of I2 and I3) of the second half.
In the case of using more than the dependence cancellation sub stage of two, such as, use n sub stage, I-th sub stage uses the destination register in i-th subset of instructions to check the finger in subset i to n (such as, for n=3, the 2nd sub stage uses the target in the 2nd subset of instructions to the dependency of order Depositor checks the dependency of the instruction in subset 2 and 3).
So, by significantly increasing very much the quantity of instruction in one group of instruction so that use two or In the more dependence cancellation stages, can increase handling capacity, cost is to wait for the time.Because final rank Section 22 is easy to extension, it is possible to concurrently to whole one group of instruction (such as 40 instructions Example in I0-I39) carry out renaming, and therefore there is the single instance of renaming logic 112.
Method as above shows the exemplary reality using extra register to perform depositor renaming Existing.It will be clear however that: can be in a different manner to using in renaming (to update Map and instruction) N number of unappropriated physical register be allocated, and do not affect and retouched herein The whole technology (such as using FIFO method or other method) stated.Such as, extra depositor can To inject mutually, the most not all extra depositor is used in the specific cycle, such as There are 3 extra (middle) depositor N0, N1, N2 and only employing N0's and N1 In the case of, then the value (that is, corresponding to the unappropriated depositor of N2) of N2 can be put into In N0 (N0 → [N2]), and N1 and N2 can obtain new unappropriated physical register.With Sample ground, if only used N0, then the value of N1 can be put into N0, and the value of N2 is permissible It is put into N1 (N0 → [N1] and N1 → [N2]), and N2 can obtain new unappropriated physics Depositor.
In the examples described above, in each group of instruction, there is the instruction of equal number.But, at other In example, different groups can include the instruction of varying number, and in such an example, permissible Existence can be contained in the instruction of the maximum quantity in one group of instruction.In some implementations, one group of instruction In instruction quantity can according to decoding the stage 104 can be to renaming rank in any specific period The quantity of the instruction that section 106,500 sends is changed.Additionally, using multiple dependence cancellation In the case of stage, each subset of instruction (such as, makes without the instruction including equal number In the case of two dependence cancellation sub stages, the first subset can include in described one group of instruction More than half or less than the instruction of half).
In the examples described above, all instructions being renamed have equal number target operand ( Above-mentioned example is one) and the source operand of equal number (be one in above-mentioned first example Individual, and it is two in the example depicted in fig. 3).In the deformation of method as defined above, instruct permissible Have operand variable, limited number (the most up to X source and up to Y target, its Middle X and Y can be identical can be maybe different).In such an implementation, up to maximum fair Each operand (such as target or source register) of the operand being permitted quantity can have phase therewith The significance bit of association, it indicates whether this operand is being used.Such as, X=3's and Y=2 In the case of, five significance bits existence being associated with each instruction, even if this instruction potentially includes Operand less than five is also such.In the case of described bit-identify operand is currently being used, Make to come in aforementioned manners it is carried out renaming, but, just do not made at described bit-identify operand Situation, renaming operation skip (or ignoring) untapped operand.
In the case of the source of the target and variable number that there is fixed qty, such significance bit is permissible For each source operand, or alternatively, each source operand can be that implicit expression is effective. It is inefficient for performing renaming operation on untapped source operand, but grasps in untapped target Upper execution renaming of counting will be useless.For this reason that, in some implementations, simply use The significance bit relevant with target operand rather than source operand.
Have in the example of the destination register more than only having a small amount of instruction, will have more than one The instruction of individual destination register is divided into a series of sub-instructions may more efficiently, each of which height Instruction has a most destination register.Method described above can be used subsequently to including that son refers to One group of instruction of order carries out renaming, without significance bit.
In some instances, can be between the two of renaming process the stage or as the first stage Or a part for terminal stage, increase extra renaming optimisation technique.Specifically, there is many In the case of renaming optimizes, but it is being mapped into row write to renaming after having eliminated dependency The ability entering to increase before optimization step can improve the efficiency of this process, and described herein many Stage renaming process is well suited for the extra operation being thus inserted between the stage.At an example In, wherein instruct and (such as, the value of a structure register is moved to another structure register A0=A1), then this can be by more new mappings in optimization step rather than by performing instruction subsequently Realize.
Fig. 6 shows two schematic diagrams of out-of order processor, and each of which out-of order processor all includes Cyclic buffer.First example processor 600 shows such layout: wherein circular buffering After device 602 is positioned at extraction stage 102 and decoding stage 104 and before the renaming stage 604. During operation, if be detected that the beginning of circulation, then before the renaming stage 604, slow in circulation Rush in device 602 by instruction acquisition together.When whole circulating in cyclic buffer 602, permissible Stop extracting and decoding operation, and instead, instruction can be presented from cyclic buffer 602 Deliver to the renaming stage 604.In this configuration, the execution of the instruction in circulation is by the renaming stage The impact of the bottleneck in 604.
Second example processor 606 shows the layout of improvement, in the layout of this improvement, Cyclic buffer 602 is between two stages 110 and 112 in renaming stage 106.This Two optimize examples in the middle of, eliminate (in the dependence cancellation stage 110) dependency it Rear still before the renaming stage, instruction is stored in cyclic buffer 602.Once by whole Circulation is stored among cyclic buffer 602, and the renaming stage 112 just can use small number of behaviour Make to carry out the instruction in renaming circulation.As it has been described above, the renaming stage 112 can be (at square frame 206 In) be performed in parallel all of renaming operation, and be very easy to extension (and than rely on Property eliminate the stage 110 and easily extend much), and in some instances, may in single operation (i.e. In the single cycle) the whole circulation of renaming.This structure (heavily order by the multistage the most described herein Name structure) use significantly decrease the delay introduced by the renaming that circulates, this is because circulation After buffer can be placed on the stage that capacity is the most limited.
Process as described above and renaming device provide the renaming operation being more easily extensible, and are inciting somebody to action While waiting time increases the small number of cycle (the most one or more), this renaming operates Add handling capacity and/or maximum clock speed.Further, since dependency be all in the first phase by Eliminate, this removes the complicated forward-path between operation and the needs of latch, thus with can The two benches renaming technology selected is compared, it is possible to more easily synthesis system.
Compared with single phase renaming block of equal value, there is less logical layer (the most less cascade Door), and this has the effect that the maximum clock speed of renaming block is higher.
Term used herein " processor " and " computer " represent that having disposal ability makes it It is able to carry out the arbitrary equipment of instruction.Term used herein " processor " with include microprocessor, Multiline procedure processor and single-threaded processor.In some instances, such as SOC(system on a chip) framework is being used In the case of, processor can include one or more fixing functional device (also referred to as accelerator), These functional device hardware (rather than software or firmware) realize specific function (such as by processing A part for the method that device realizes).It would be recognized by those skilled in the art that such disposal ability is merged in To many different equipment, and therefore, term " computer " include Set Top Box, media player, Digital radio station, PC, server, mobile phone, personal digital assistant, game console and many its Its equipment.
It would be recognized by those skilled in the art that the storage device for storing programmed instruction can be distributed in net On network.Such as, the example of described process can be stored into software by remote computer.Local or Terminal computer can access remote computer, and downloads part or all of described software to run Program.Alternatively, local computer can the fragment of downloaded software as required, or in this locality Terminal performs some software instructions and performs some softwares at remote computer (or computer network) Instruction.It will also be appreciated by the skilled artisan that by using routine well known by persons skilled in the art Technology, all or part of of software instruction can be by special circuit (such as DSP, FPGA Array etc.) perform.
Any range given herein or device value can be expanded or be modified, and do not lose and asked The effect asked, this will be apparent to the skilled person.
It will be appreciated that benefit described above and advantage can relate to an embodiment, or permissible Relate to some embodiments.These embodiments are not limited to any one in solution institute statement problem or complete The embodiment in portion or there is any one in stated benefit and advantage or whole embodiments.
" one " item any is mentioned refer in these one or more.Term used herein " include " representing and comprise identified method square frame or element, but such square frame or element are also Do not include exclusive list, and method or apparatus can comprise extra square frame or element.
The step of method described herein can perform in any suitable order, or in suitable feelings Perform under condition simultaneously.In figure, the arrow between square frame shows an exemplary series of method steps, It is not intended that get rid of other sequence or the executed in parallel of multiple step.Furthermore, it is possible to from arbitrarily side Method is deleted single square frame, without deviating from the spirit and scope of theme described herein.It is described above The scheme of arbitrary examples can combine with the scheme of described other example any, to form other Example, and do not lose asked effect.In the case of element in the drawings is illustrated as being connected by arrow, It will be clear that these arrows only illustrate the communication (include data and control message) between element An exemplary stream.Stream between element can be gone up or in the two directions in any direction.
The mode only by example that it will be appreciated that gives the above description of preferred embodiment, and And those skilled in the art various modifications may be made.Although above with certain specifically degree or reference one Individual or multiple independent embodiments describe each embodiment, but those skilled in the art can be public to institute The embodiment opened makes substantial amounts of change, without deviating from the spirit or scope of the present invention.

Claims (18)

1. a method for depositor renaming in out-of order processor, including:
In the first phase, use in the fixing mapping defined in hardware logic eliminates one group of instruction Dependency (21), eliminate described dependency and include: use described fixing mapping, additionally post with one group An extra register in storage to all destination registers in described one group of instruction and arbitrarily depends on Bad depositor carries out renaming (202), wherein, described fixing mapping independent of the state before any, And wherein, the described fixing mapping between destination register and extra register is based on described one group The physical location of each destination register in instruction;And
In terminal stage, renaming is used to map concurrently to all depositing in described one group of instruction Device carries out renaming (22,206).
Method the most according to claim 1, wherein, uses fixing defined in hardware logic Map the dependency eliminated in one group of instruction also to include:
By about the letter employing which extra register each destination register is carried out renaming Breath is delivered to described terminal stage (204).
Method the most according to claim 1, wherein, described terminal stage farther includes:
Update described renaming and map (208).
Method the most according to claim 3, wherein, described renaming maps and includes and each The entry that extra register is associated.
Method the most according to claim 4, wherein, updates the mapping of described renaming and includes:
Based on the information transmitted from the described first stage, update described renaming map in each mesh The entry (210) that scalar register file is associated;And
Update during described renaming maps the entry being associated with each extra register, with by each Individual extra register is mapped to unappropriated physical register (212).
Method the most according to claim 5, farther includes:
Access the list of unappropriated physical register.
Method the most according to claim 1, farther includes:
Perform to optimize operation between described first stage and described terminal stage.
Method the most according to claim 1, wherein, described one group of instruction includes that N bar instructs, And described one group of extra register includes N number of extra register, and wherein, N is integer.
Method the most according to claim 1, wherein, each instruction in described one group of instruction Including the destination register less than Y, and wherein, each instruction has one group of Y and is correlated with The significance bit of connection, whether the instruction of each significance bit employs described Y target in this instruction is deposited A destination register in device.
Method the most according to claim 9, wherein, described one group of instruction includes that N bar instructs, And described one group of extra register includes N × Y extra register, and wherein, N and Y is integer.
11. according to any one the described method in aforementioned claim, and wherein, described one group refers to Each instruction in order includes the source register less than X, and wherein, each instruction has Have whether one group of X the significance bit being associated, the instruction of each significance bit employ institute in this instruction State a source register in X source register.
12. 1 kinds of out-of order processors (100,500,606), including:
Renaming maps (108);
The hardware logic (114) of the fixing mapping between definition register;
Dependence cancellation logic (110), for using described fixing mapping to eliminate in one group of instruction Dependency, wherein, described dependence cancellation logic is for eliminating in described one group of instruction by following Dependency: use described fixing mapping, with an extra register in one group of extra register come To all destination registers in described one group of instruction with arbitrarily rely on depositor and carry out renaming, wherein, Described fixing mapping is independent of the state before any, and wherein, and destination register and additionally depositing Described fixing mapping between device is physics based on each destination register in described one group of instruction Position;
Renaming logic (112), is used for using described renaming to map concurrently to described one group of instruction In all depositors carry out renaming;And
Multiple physical registers (107).
13. out-of order processors according to claim 12, wherein, described dependence cancellation logic Including multiple dependence cancellation logical instance (110), and wherein, each dependence cancellation logic Example is for eliminating the dependency in single, the non-overlapped subset in described one group of instruction.
14. out-of order processors according to claim 12, wherein, described dependence cancellation logic It is additionally operable to eliminate the dependency in one group of instruction in the following manner: additionally will post about which employs Storage carries out the information of renaming and is delivered to described renaming logic each destination register.
15. out-of order processors according to claim 12, wherein, described renaming maps and includes The entry being associated with each extra register.
16. out-of order processors according to claim 12, wherein, the plurality of physical register Including multiple unappropriated physical registers (109).
17. out-of order processors according to claim 12, wherein, described renaming logic enters one Step is used for updating described renaming and maps.
18., according to the described out-of order processor of any one in claim 12-17, farther include Cyclic buffer (602) between described dependence cancellation logic and described renaming logic, wherein, Described cyclic buffer is used for: store after described dependence cancellation logic has carried out dependence cancellation The instruction being positioned in circulation;And all instructions in the most described circulation are all stored, just by described Instruction is discharged into described renaming logic.
CN201310333130.2A 2012-08-07 2013-08-02 For the method and apparatus using the multistage depositor renaming of dependence cancellation Expired - Fee Related CN103577159B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1213994.5 2012-08-07
GB1213994.5A GB2496934B (en) 2012-08-07 2012-08-07 Multi-stage register renaming using dependency removal

Publications (2)

Publication Number Publication Date
CN103577159A CN103577159A (en) 2014-02-12
CN103577159B true CN103577159B (en) 2016-11-30

Family

ID=

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5996068A (en) * 1997-03-26 1999-11-30 Lucent Technologies Inc. Method and apparatus for renaming registers corresponding to multiple thread identifications
JP2002521762A (en) * 1998-07-31 2002-07-16 アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド A processor configured to selectively free physical registers during instruction retirement
EP1237072A1 (en) * 1999-09-08 2002-09-04 Hajime Seki Register renaming system
CN102566976A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 Register renaming system and method for managing and renaming registers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5996068A (en) * 1997-03-26 1999-11-30 Lucent Technologies Inc. Method and apparatus for renaming registers corresponding to multiple thread identifications
JP2002521762A (en) * 1998-07-31 2002-07-16 アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド A processor configured to selectively free physical registers during instruction retirement
EP1237072A1 (en) * 1999-09-08 2002-09-04 Hajime Seki Register renaming system
CN102566976A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 Register renaming system and method for managing and renaming registers

Similar Documents

Publication Publication Date Title
CN104067282B (en) Counter operation in state machine lattice
CN104040492B (en) Microprocessor accelerated code optimizer and dependency reordering method
CN104040490B (en) Code optimizer for the acceleration of multi engine microprocessor
Mahram et al. Fast and accurate NCBI BLASTP: Acceleration with multiphase FPGA-based prefiltering
CN102566976B (en) Register renaming system and method for managing and renaming registers
CN108268422A (en) For handling the hardware accelerator framework of very sparse and supersparsity matrix data
CN107256156A (en) Method and system for the detection in state machine
CN107851028A (en) The narrow generation value of instruction operands is stored directly in the register mappings in out-of order processor
Perais et al. BeBoP: A cost effective predictor infrastructure for superscalar value prediction
CN107239413A (en) Handle memory requests
GB2496934A (en) Multi-stage register renaming using dependency removal and renaming maps.
CN108027773A (en) The generation and use of memory reference instruction sequential encoding
CN106020778A (en) Restoring a Register Renaming Map
CN107609644A (en) Method and system for the data analysis in state machine
US9612963B2 (en) Store forwarding cache
US10437594B2 (en) Apparatus and method for transferring a plurality of data structures between memory and one or more vectors of data elements stored in a register bank
CN108509270A (en) The high performance parallel implementation method of K-means algorithms on a kind of domestic 26010 many-core processor of Shen prestige
Das et al. A framework for post-silicon realization of arbitrary instruction extensions on reconfigurable data-paths
CN107810486A (en) Lock the value of the operand of the instruction group for atomically performing
CN104615409B (en) The method jumped over the processor of MOV instruction and used by the processor
CN104166539B (en) parallel atomic increment
WO2005106713A1 (en) Information processing method and information processing system
CN103577159B (en) For the method and apparatus using the multistage depositor renaming of dependence cancellation
CN108628892A (en) Method, apparatus, electronic equipment and the readable storage medium storing program for executing of ordered data storage
Moscola et al. Hardware-accelerated RNA secondary-structure alignment

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Hertfordshire

Patentee after: Mex Technology Co.,Ltd.

Address before: Hertfordshire

Patentee before: Hai Luo Software Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20180717

Address after: California, USA

Patentee after: Imagination Technologies Ltd.

Address before: Hertfordshire

Patentee before: Mex Technology Co.,Ltd.

Effective date of registration: 20180717

Address after: Hertfordshire

Patentee after: Hai Luo Software Co.,Ltd.

Address before: Hertfordshire

Patentee before: Imagination Technologies Ltd.

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161130