CN100388187C - Apparatus for predicting multiple branch target addresses - Google Patents

Apparatus for predicting multiple branch target addresses Download PDF

Info

Publication number
CN100388187C
CN100388187C CNB2005100919093A CN200510091909A CN100388187C CN 100388187 C CN100388187 C CN 100388187C CN B2005100919093 A CNB2005100919093 A CN B2005100919093A CN 200510091909 A CN200510091909 A CN 200510091909A CN 100388187 C CN100388187 C CN 100388187C
Authority
CN
China
Prior art keywords
soon
branch instruction
row
microprocessor
memory cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005100919093A
Other languages
Chinese (zh)
Other versions
CN1821953A (en
Inventor
G·葛兰亨利
汤玛斯·麦克唐纳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN1821953A publication Critical patent/CN1821953A/en
Application granted granted Critical
Publication of CN100388187C publication Critical patent/CN100388187C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A branch prediction apparatus having two two-way set associative cache memories each indexed by a lower portion of an instruction cache fetch address is disclosed. The index selects a group of four entries, one from each way of each cache. Each entry stores a single target address of a different previously executed branch instruction. For some groups, the four entries cache target addresses for one branch instruction in each of four different cache lines, to obtain four-way group associativity; for other groups, the four entries cache target addresses for one branch instruction in each of two different cache lines and two branch instructions in a third different cache line, to effectively obtain three-way group associativity, depending on the distribution of the branch instructions in the program. The apparatus trades off associativity for number of predictable branches per cache line on an index-by-index basis to efficiently use storage space.

Description

Variable group combination branch target address cache transmits the apparatus and method that each gets row multiple goal address soon
Technical field
The invention relates to the field of branch prediction in the microprocessor, in particular to branch target address cache.
Background technology
Many modern pipeline microprocessors are to comprise a branch target address cache (branch target address cache is called for short BTAC thereafter), and this BTAC gets the destination address of the branch instruction that before had been performed soon.Get when row soon when get acquisition one soon from the instruction of microprocessor, the acquisition address provides to BTAC, and whether BTAC utilize this acquisition address prediction to have branch instruction to appear at this to get in the row soon, and whether this BTAC comprises an actual target address gives branch instruction.If branch instruction is predicted generation, then processor is to branch to the actual target address that BTAC provides.Because each gets row soon is to store multiple instruction, therefore instructing and getting row soon is possible comprise more than one branch instruction.Therefore, some BTAC fixedly provide reservoir to get two destination addresses that each gets row soon soon, because the branch instruction in this gets row soon is possible carry out and another branch instruction is possible not carry out, so the method provides the flow process of the more accurate predictor of BTAC.
In traditional B TAC, the storage of two destination addresses is to be fixed in the BTAC.That is whether no matter two branch instructions appear at this and get Lie Nei or a branch instruction soon and appear at this and get soon in the row, and its space is fixing special-purpose.In fact, be integrated among the traditional B TAC that instruction gets soon one, even do not have branch instruction to appear at this to get soon in the row, its space is still fixing special-purpose.Yet, gained according to the observation, only nearly 20% contain a branch instruction to get row soon be to contain two branch instructions.Therefore, the fixing special-purpose exceptional space of second destination address is to waste the space of getting row 80% soon in BTAC.For example: getting (2-way set associativecache) soon in a pair of channel set combination fixedly provides among the BTAC of two destination addresses storages of each inlet, since only have about 20% to get row soon be to contain two or more branch instructions, therefore it is to be used to store effective destination address that about 60% destination address storage area is only arranged.
Therefore, needed is that the mechanism of the efficient of more having living space gets in an acquisition that multiple-limb instructs in the row with prediction soon.
Summary of the invention
The purpose of this invention is to provide a branch prediction device, it dynamically determines the combination of group inlet, and this group's inlet is according to specified the getting the branch instruction quantity that is occurred in the row soon and choose of this index by the acquisition allocation index of an appointment.Only enjoy more index combination and enjoy less index combination thus with the multiple-limb instruction with a single branch instruction.
The present invention provides the device in the micro computer, and the address captured that each gets a destination address of the branch instruction of the variable number in the row soon in an acquisition of getting soon from an instruction in order to prediction.This device is to comprise one first and one second double-channel set combination memory cache (two-way set associative cache memory), and each double-channel set combination memory cache is to have index input coupling to receive the part that the acquisition address is got in this instruction soon.This index choose a plurality of four the inlet (entry) groups one of them.Each group has one to enter the mouth in each channel of each first and second memory cache.Each inlet is in order to get a destination address of a branch instruction that before had been performed soon.This device also comprises a permutation logic circuit (replacement logic), be to couple so far first second to get soon therewith, be to enter the mouth one of them in order to choose to change, to respond the decision of a branch instruction, so that (a) in one first subclass micro computer operating period of a plurality of groups, four inlets are to get the destination address of getting a branch instruction in the row soon soon in each four difference; And (b) in one second subclass micro computer operating period of a plurality of groups, four inlets be per one or two difference get soon the destination address of getting a branch instruction in the row soon and one the 3rd difference get soon get two branch instructions in the row soon destination address to obtain the combination of three channel groups.
The method of getting a destination address of the branch instruction of a variable number in the row soon that the present invention provides prediction is got soon from an instruction in the micro computer a acquisition address captured.The method be comprise provide an index to one first and one second double-channel set combination memory cache with choose a plurality of four inlet groups one of them.Each group comprises one to enter the mouth in each channel of each first and second memory cache.Each inlet is a destination address of getting a branch instruction that before had been performed soon.This index is to instruct to get a part that captures the address soon.The method also comprise choose change inlet one of them, to respond the decision of a branch instruction, so that (a) in one first subclass micro computer operating period of a plurality of groups, four inlets be each four difference get soon get a branch instruction in the row soon destination address to obtain the combination of four channel groups; And (b) in one second subclass micro computer operating period of a plurality of groups, four inlets be per one or two difference get soon the destination address of getting a branch instruction in the row soon and one the 3rd difference get soon get two branch instructions in the row soon destination address to obtain the combination of three channel groups.
The present invention provides the device in the micro computer, an acquisition of getting soon from an instruction in order to the prediction address captured a destination address of getting the branch instruction of a variable number in the row soon.This device is to comprise M N channel set combination memory cache, and each N channel set combination memory cache is to have index input coupling to receive the part that the acquisition address is got in this instruction soon.This index choose a plurality of MxN inlet group one of them.Each group comprises one to enter the mouth in each channel of each M memory cache.Each inlet is in order to get a destination address of a branch instruction that before had been performed soon.This device also comprises a permutation logic circuit, be to couple so far M to get soon, be to enter the mouth one of them in order to choose to change, to respond the decision of a branch instruction, so that (a) in one first subclass micro computer operating period of a plurality of groups, the MxN inlet is to make up to obtain M xN channel group each MxN the different destination addresses of getting a branch instruction in the row soon of getting soon; And (b) in one second subclass micro computer operating period of a plurality of groups, the MxN inlet be each (MxN-1) individual difference get soon the destination address of getting a branch instruction in the row soon and a MxN difference get soon get two branch instructions in the row soon destination address to obtain the combination of (MxN-1) channel group.
The method of getting a destination address of the branch instruction of a variable number in the row soon that the present invention provides prediction is got soon from an instruction in the micro computer a acquisition address captured.The method be comprise provide an index to M N channel set combination memory cache with choose a plurality of MxN inlet group one of them.Each group comprises one to enter the mouth in each channel of each M memory cache.Each inlet is a destination address of getting a branch instruction that before had been performed soon.This index is to instruct to get a part that captures the address soon.The method also comprise choose change inlet one of them, to respond the decision of a branch instruction, so that (a) in one first subclass micro computer operating period of a plurality of groups, the MxN inlet is each MxN different combination of getting the destination address of getting a branch instruction in the row soon with acquisition MxN channel group soon; And (b) in one second subclass micro computer operating period of a plurality of groups, the MxN inlet be each (MxN-1) individual difference get soon the destination address of getting a branch instruction in the row soon and a MxN difference get soon get two branch instructions in the row soon destination address to obtain the combination of (MxN-1) channel group.
The present invention provides a computer program together with the computer installation use, this computer program is to comprise the spendable medium of a computing machine, its procedure code with embodied on computer readable embeds this medium, and each that makes a acquisition that a device prediction in a micro computer gets soon from an instruction address captured is got a destination address of the branch instruction of a variable number in the row soon.The procedure code of this embodied on computer readable is to comprise one first procedure code one first and one second double-channel set combination memory cache is provided, and each double-channel set combination memory cache is to have index input coupling to receive the part that the acquisition address is got in this instruction soon.This index be choose a plurality of four the inlet groups one of them.Each group has one to enter the mouth in each channel of each first and second memory cache.Each inlet is in order to get a destination address of a branch instruction that before had been performed soon.The procedure code of this embodied on computer readable also comprises one second program provides a permutation logic circuit, be to couple so far first second to get soon therewith, be to enter the mouth one of them in order to choose to change, to respond the decision of a branch instruction, so that (a) in one first subclass micro computer operating period of a plurality of groups, four inlets be each four difference get soon get a branch instruction in the row soon destination address to obtain the combination of four channel groups; And (b) in one second subclass micro computer operating period of a plurality of groups, four inlets be per one or two difference get soon the destination address of getting a branch instruction in the row soon and one the 3rd difference get soon get two branch instructions in the row soon destination address to obtain the combination of three channel groups.
An advantage of the present invention is that two destination addresses taking of row are got in its measurable each instruction soon, also measurable each get soon that column index takies each have a single goal address of getting row soon of higher associativity.The present invention be a single goal address by each inlet of ordering reaching above-mentioned purpose, and be not multiple goal address by each inlet of ordering, use the storage area more efficiently compared to a traditional B TAC by this.And, if the instruction combination of getting soon increases, branch target address prediction then of the present invention be can suitably increase its effectively combination manyly do not have an equal proportion and increase the combination that the instruction of the index of branch target address prediction unit is got soon to meet.
Description of drawings
For further specifying concrete technology contents of the present invention, below in conjunction with embodiment and accompanying drawing describes in detail as after, wherein:
Fig. 1 is the calcspar of a microprocessor of the present invention;
Fig. 2 is the calcspar of the branch target address prediction unit of Fig. 1;
Fig. 3 is the process flow diagram of the branch target address prediction unit of Fig. 2 in the operation that is read generation one predicted target address; And
Fig. 4 is the process flow diagram of the branch target address prediction unit of Fig. 2 in the operation that is updated response one decision branch instruction.
Embodiment
Please refer to Fig. 1, it is the calcspar of a microprocessor 100 of the present invention.Microprocessor 100 is to comprise a pipeline microprocessor.In one embodiment, microprocessor 100 is to comprise a microprocessor, and the instruction set of this microprocessor is that essence meets x86 framework instruction set.
Microprocessor 100 is to comprise an instruction acquisition device 102.Instruction acquisition device 102 is also controlled an acquisition address multiplexer 136, and acquisition address 162 is got in the multiplexer 136 outputs one present instruction of acquisition address soon.Instruction is at present got the next one of specifying the command byte of the present executive routine of acquisition in acquisition address 162 soon and is got the address of row soon for microprocessor 100 execution.If get address 162 soon and is and hit (hit) and get 104 soon in an instruction, then instruction is got 104 outputs soon and is got row soon by the specified instruction in acquisition address 162; Otherwise, instruction acquisition device 102 is from the instruction of a storer (for example: is coupled to the system storage of microprocessor 100) acquisition miss (missing), and instruction is got 104 soon and got the instruction that captures from storer soon and use for the postorder of microprocessor 100.Especially getting 104 row of getting soon that captured soon from instruction is to comprise 0,1,2 or more a plurality of branch instruction.In one embodiment, to get 104 soon be that the four-way set combination single order that comprises a 64KB is got (4-way set associative level-1 cache) soon in instruction; Yet the present invention can be used to connect the instruction of all size and various combinations get soon.
Microprocessor 100 also comprises a branch target address prediction unit 142 (being discussed in detail as follows).Branch target address prediction unit 142 is got the information of the relevant branch instruction that before had been performed soon.Get 104 acquisitions one when instruction acquisition device 102 soon from instruction and get when row soon, whether branch target address prediction unit 142 has one or more branch instruction according to the information predictions of being got soon in the branch target address prediction unit 142 is to appear at this to get soon in the row, and provides one of them a predicted destination address of branch instruction to multiplexer 136.If predicted branches instruction will take place, then multiplexer 136 be choose predicted destination address 164 as in the acquisition address 162 of next clock pulse on the cycle with a branch of realization microprocessor 100 to predicted destination address 164.
Especially, branch target address prediction unit 142 is to get the destination address of the branch instruction that before had been performed soon, get the off-set value of branch instruction in the row, whether branch instruction one prediction that can take place, a mark (tag) and of getting row soon that contains branch instruction soon and effectively indicate (indicator).As following detailed description, branch target address prediction unit 142 is combination branch target address cache storer and the permutation logic circuits that comprise many set.The permutation logic circuit is to be considered as one control with the replacing how soon method of effective combination of each index group of dynamic change will get, thus, instruction is to appear at instruction to get the group that a correspondence of 104 is got row soon soon to some multiple-limbs, associativity is the less multiple-limb that holds, and, some is only had a single branch instruction is to appear at corresponding group of getting row soon, and associativity is more.One index group or group is all inlets that comprise in all set of getting soon of being chosen by the index part of acquisition address 162, as shown in Figure 2.
As some traditional branch predictor, if from instruction get soon 104 captured one to get row soon be to comprise multiple-limb instruction, then branch target address prediction unit 142 is that the multiple goal address can be provided; Yet, unlike the branch predictor of getting the row multiple-limb soon, each inlet in the branch target address prediction unit 142 of the present invention is to comprise reservoir to get soon single branch target address and its relevant information only to be arranged, but not be to comprise reservoir to get branch target addresses soon as traditional fallout predictor, in the tradition fallout predictor, traditional storage area is to waste a basic number percent of getting row soon.Therefore, branch target address prediction unit 142 of the present invention makes the use of storage area more efficient, and provides more combinations, the precision of potential thus increase branch prediction.
Institute will be appreciated that, get the use of the noun of row or row soon, at this except showing in others, be to get device 102 with reference to each clock pulse cycle soon by instruction to get the total amount of 104 command byte that captured soon from instruction, its may be instruct get soon 104 and primary memory between the subclass of some bytes of the actual transmission of institute.For example: in the embodiment in figure 1, microprocessor 100 is got the instruction that may once transmit 32 bytes between 104 soon at the system storage and instruction; Yet instruction is got 102 each clock pulse cycle of device soon and is got 104 soon and only capture 16 bytes from instruction.As discussed below, in one embodiment, 142 each clock pulse cycle of prediction of branch target address prediction unit, whether one or more branch instruction appeared at one is got row soon, or 16 byte total amounts.
Microprocessor 100 also comprises an instruction buffer 106 and is coupled to instruction and gets 104 soon.Instruction buffer 106 receives from what 104 command byte was got in instruction soon and gets row soon, and buffering is got row soon and can be formatted into the instruction of difference and performed by microprocessor 100 up to them.In one embodiment, instruction buffer 106 is to comprise four inlets to get row soon to be stored into four.
Microprocessor 100 also comprises an order format device 108 and is coupled to instruction buffer 106.The command bits that order format device 108 receives from instruction buffer 106, and produce formatted instruction thus.That is order format device 108 is checked a character string of command byte in the instruction buffer 106, determines this byte to comprise next instruction and its length, and output next instruction and its length.In one embodiment, formatted instruction is to comprise the instruction that meets x86 framework instruction set in fact.
Microprocessor 100 also comprises a format instruction queue 112 and is coupled to order format device 108.Format instruction queue 112 receives the format instruction from order format device 108, and the buffer format instruction can be converted into micro-order up to them.In one embodiment, format instruction queue 112 is to comprise inlet to be stored into 12 format instructions.
Microprocessor 100 also comprises an instruction transfer interpreter 114 and is coupled to format instruction queue 112.The format macro instruction that instruction transfer interpreter 114 will be stored in format instruction queue 112 is translated into micro-order.In one embodiment, microprocessor 100 is to comprise one to carry out the micro-order simplify or Reduced Instruction Set Computer (the reduced instruction set computer of native instructions collection; RISC) nuclear.
Microprocessor 100 also comprise translate the instruction queue 116 be coupled to the instruction transfer interpreter 114.Translate the translate micro-order of instruction queue 116 receptions, and the buffering micro-order can be performed by remaining microprocessor pipeline up to them from instruction transfer interpreter 114.
Microprocessor 100 also comprise a working storage stage 118 be coupled to translate the instruction queue 116.The working storage stage 118 is to comprise a plurality of working storages in order to save command operand and result.The working storage stage 118 comprises the visible working storage group of a user in order to store the visible state of user of microprocessor 100.
Microprocessor 100 also comprises an address phase 122 and is coupled to the working storage stage 118.Address phase 122 is to comprise the address to produce logic in order to produce the instruction of storage address to access memory, for example: loading or save command and branch instruction.
Microprocessor 100 also comprises the data stage 124 and is coupled to address phase 122.The data stage 124 is to comprise logic in order to from memory load data and one or more taking soon to get the data that is loaded from storer soon.
Microprocessor 100 also comprises the execute phase 126 and is coupled to the data stage 124.Execute phase 126 is to comprise the execution units execution command, and for example arithmetic and logical block are in order to carry out arithmetic and logical order.In one embodiment, the execute phase 126 is to comprise an Integer Execution Units; One performance element of floating point; One MMX performance element; An and SSE performance element.Execute phase 126 also comprises logic in order to the decision branch instruction, and especially whether the execute phase 126 determines a branch instruction can take place and the actual target address of this branch instruction.
Microprocessor 100 also comprises a storage stage 128 and is coupled to the execute phase 126.Storage stage 128 is to comprise logic to store data to storer in order to respond the storage micro-order.In addition, storage stage 128 produces a update request 176 and has the branch target address prediction unit 142 of decision branch instruction destination address in order to renewal, and responds the execute phase 126 of decision branch instruction.Update request 176 is still to comprise by decision branch instruction address and quilt decision destination address, and in one embodiment, each of its grade is 32.BTAC update request 176 also comprises information (will go through in Fig. 2), itself and branch instruction are sent (piped down) with under the pipeline, wherein branch instruction be when branch target address prediction unit 142 be with comprise from instruction get soon 104 get row access simultaneously with acquisition soon the time and obtain.
Microprocessor 100 also comprises one and writes back the stage 132 and be coupled to storage stage 128.Writing back the stage 132 is to comprise logic in order to an instruction results is written to the working storage stage 118.
Except receiving predicted target address 164, multiplexer 136 also receives acquisition address 162 and next acquisition address 166 that follows closely.One totalizer 134 is to get the size of row soon and produce the acquisition address 166 that next follows closely by increasing present acquisition address one.Get soon from instruction 104 normally get soon one get row soon after, multiplexer 136 is to choose next acquisition address 166 that follows closely with output as present acquisition address 162, in next clock pulse cycle.If instruction buffer 106 is full, then multiplexer 136 is to choose acquisition address 162 but not next acquisition address 166 that follows closely.As mentioned above, if branch target address prediction unit 142 indication its provide an effective predicted destination address 164 to one at present from instruction get soon 104 captured get branch instruction in the row soon, and branch instruction is predicted will the generation, and then multiplexer 136 is chosen predicted target address 164 as capturing address 162 in next clock pulse cycle.Though do not draw, multiplexer 136 also receives a correct address from storage stage 128.If storage stage 128 indications one branch instruction is a prediction error, then multiplexer 136 is to choose correct address to correct this branch prediction mistake.
Please refer to Fig. 2, it is the calcspar of the branch target address prediction unit 142 of Fig. 1.Branch target address prediction unit 142 is to comprise steering logic 202, the various operations of its control branch target address prediction unit 142, and for example: BTAC 208 reads and writes and a following described LRU array 212.Steering logic 202 is instruction pointers 222 that receive microprocessor 100, and its indication is at present by the address of the programmed instruction of acquisition execution.
Branch target address prediction unit 142 also comprises one or two Input Address multiplexer 216.One input of this address multiplexer 216 is to receive the instruction of Fig. 1 to get 104 acquisition addresses 162 soon, with and another input be to receive a scheduler 232 that is produced by steering logic 202.When BTAC 208 and/or LRU array 212 are read, steering logic 202 is that control address multiplexer 216 is with output acquisition address, and when BTAC 208 and/or LRU array 212 were written into, steering logic 202 was that control address multiplexer 216 is to choose scheduler 232.
Branch target address prediction unit 142 also comprises two branch target address caches (branch target address cache; BTAC) storer, its grade are to be denoted as a BTAC 208A and a BTAC 208B respectively.The one BTAC208A and the 2nd BTAC 208B are that indivedual addresses and the integral body of general BATC 208 is called BATCs 208.The one BTAC 208A and the 2nd BTAC 208B also are called as first side and second side at this.Each BATC 208 is that coupling is to receive an index part 274 by 216 OPADD of multiplexer.In one embodiment, index 274 is the 4th ~ 13 that comprises by 216 OPADD of multiplexer.Each BTAC 208 is combinations of binary channels set.Each unique index 274 value is to choose two passages (indicating passage 0 and passage 1 among Fig. 2) of a different sets from each BTACs 208.Each passage 0 has an inlet 264 in order to get a destination address 254 of a branch instruction that before had been performed soon with passage 1; Whether one effectively indicate 238 indication inlets 264 effective; One skew 266 is to specify to get 104 correspondences that captured soon from instruction and get the position of the branch instruction that before had been performed in the row or the byte offset of beginning soon; One generation of the branch instruction that before had been performed/do not predict 276; An and label 242 of getting column address soon that comprises the branch instruction that before had been performed.BTACs 208 can upgrade individually, and therefore, steering logic 202 produces indivedual write signals and gives each BTACs 208.
By four selected inlets 264 of an index 274 values (two inlets 264 are from each two BTACs 208), be that integral body is called as an index group 262 or group 262 at this, as shown in Figure 2.Fig. 2 is the typical group 262 that three of graphic extensions are denoted as 262A, 262B and 262C.In one embodiment, branch target address prediction unit 142 is to have 1024 groups 262.Each instruction is got device 102 soon and is got 104 acquisitions one soon when getting row soon from instruction, and BTACs 208 is that output is cached by the information 252 in all four inlets of the selected group of the index 274 of acquisition address 162.
The 262A of group is illustrated in a subclass of groups 262 in the branch target address prediction unit 142, its be taken at just soon that four different instructions get row soon each get a single branch target address and a relevant information that before had been performed branch instruction in the row soon.These four different target addresses are to be denoted as W, X, Y, Z in the 262A of group.That is each label of getting soon of getting row soon that four differences are got row soon is unique.Therefore, though two BTACs 208 only are the combinations of binary channels set, but integral body is checked two BTACs 208, because to some index 274 value, the 262A of group gets the destination address that four differences are got the single branch instruction in the row soon soon, so the 262A of group is actually four-way combination group 262.
The 262B of group is illustrated in a subclass of group 262 in the branch target address prediction unit 142, its be taken at just soon that two different instructions get row soon each get a single branch target address and a relevant information that before had been performed branch instruction in the row and two two branch target address and relevant informations that before had been performed branch instruction in one the 3rd different instruction is got row soon soon.These four different target addresses are to be denoted as W1, X, Y, W2 in the 262B of group.W1 is to be indicated in identical destination address of getting two different branch instructions in the row soon with W2.That is soon the get label relevant to W2 with destination address W1 is similar fully, but relevant with Y with destination address X to get label soon be unique.Therefore, integral body is checked two BTACs 208, because to some index 274 value, the 262B of group is taken at the destination address that two different instructions are got the single branch instruction in the row soon soon, and be taken at two destination addresses that one the 3rd different instruction is got two different branch instructions in the row soon soon, so the 262B of group is actually triple channel combination group 262.
The 262C of group is illustrated in a subclass of groups 262 in the branch target address prediction unit 142, its be taken at just soon that two different instructions get row soon each get two branch target address and relevant informations that before had been performed branch instruction in the row soon.These four different target addresses are to be denoted as W1, X1, X2, W2 in the 262C of group.W1 is to be indicated in the destination address that two different branch instructions in the row are got in one first instruction soon with W2, and X1 is to be indicated in the destination address that two different branch instructions in the row are got in one second instruction soon with X2.That is, soon the get label relevant to W2 with destination address W1 is similar fully, soon the get label relevant to X2 with destination address X1 is similar fully, but relevant with W2 with destination address W1 gets label soon and relevant with X2 with destination address X1 to get label soon be different.Therefore, integral body is checked two BTACs 208, because to some index 274 values, the 262C of group is that each that be taken at soon that two differences get row is soon got a destination address of interior two the different branch instructions of row soon, so the 262C of group is actually a pair of combination of channels group 262.
No matter branch target address prediction unit 142 interior particular index groups 262 become the subclass that binary channels, triple channel or four-way make up group 262 according to the interior distribution that before is performed branch instruction of present executive routine, and especially foundation stores getting soon of present execution of program instructions and is listed as the interior distribution that before is performed branch instruction.Its advantage is, when a new branch instruction is carried out and determined finally to microprocessor 100, and when upgrading branch target address prediction unit 142 with the destination address of new branch instruction and relevant information, branch target address prediction unit 142 may change be selected one in the group 22 exist inlet 264 with the combination of change group 262 to meet demand.Especially, branch target address prediction unit 142 stratum that may reduce associativity be contained in one get soon have in the row two branch instructions or even get the distribution that a particular index of two branch instructions 274 branch instructions are arranged in the row soon two; On the contrary, branch target address prediction unit 142 may increase the stratum of associativity to be contained in the distribution that each gets a particular index 274 branch instructions that only have a single branch instruction in the row soon.
Branch target address prediction unit 142 also comprises one least recently used (least recently used is called for short LRU thereafter) memory array 212.LRU array 212 is reception hint 274 also, and this index 274 is inlets of choosing in the LRU array 212.Each inlet in LRU array 212 is the replacing information that stores by index 274 selected corresponding groups 262 in BTACs 208.Therefore, LRU array 212 is one to be shared on the aggregate resource between two BTACs 208.In one embodiment, replacing information is whether comprise one be least recently used in order to indication a BTAC 208A or the 2nd BTAC 208B relevant with being selected group 262; One in order to indicate whether passage 0 or passage 1 by the relevant BTAC 208A of index 274 selected set in a BTAC 208A are least recently used; And whether one be least recently used in order to indication by passage 0 or the passage 1 of the 2nd relevant BTAC 208B of index 274 selected set in the 2nd BTAC 208B.Each instruction is got device 102 soon and is got 104 acquisitions one soon when getting row soon from instruction, and 212 outputs of LRU array are by the replacing information 236 of the selected inlet of index 274.Steering logic 202 produces more new data 234 and offers BTACs 208 and LRU array 212 as input.When upgrading BTACs 208 and/or L RU array 212 with new data more 234, steering logic 202 is to make address selection multiplexer 216 choose scheduler 232.In one embodiment, data for updating 234 may comprise and upgrades LRU information, destination address, label, significance bit, branch instruction off-set value and generation/do not take place (T/NT) prediction.When a branch instruction is determined and pipeline produces a update request 176, steering logic 202 is to use replacing information 236 to change that inlet in a group 262 with decision, as the following more detailed description relevant with Fig. 4.Steering logic 202 also is updated in the replacing information in the LRU array 212 according to the use that is stored in the information in the BTACs 208.In one embodiment, the inlet 264 in BTACs 208 is if if it is to be allocated for to change and its relevant branch instruction is effectively and prediction generation when BTACs 208 is read, then be to be regarded as being used for least-recently-used purpose.
Whether branch target address prediction unit 142 also comprises four comparers 214 and gets address 162 soon and hit in BTACs 208 to help detecting.Each comparer 214 is to receive a label 242 as address 274, and this label 242 is by do not enter the mouth 264 outputs of BTACs 208 from the selected group 262 of index 274 parts of the acquisition address 162 that multiplexer 216 is exported.Each comparer 214 is its individual tag 242 and the label segment 272 that captures address 162 relatively, if individual tag 242 is to be matched with acquisition address 162 labels 272, then produces a true value (true) in a coupling indication 244.Coupling indication 244 is to be provided for steering logic 202.
Steering logic 202 also receives an effectively indication 238, branch instruction skew 266 and by the do not enter the mouth 264 T/NTs predictions 276 of being exported of BTACs 208 from the selected group 262 of index 274.Steering logic 202 produces four four inlets 264 that hit indication 258 corresponding to group 262.If both are during for " very " for corresponding effectively indication 238 and matched signal 244, then steering logic 202 produces a true value and indicates 258 in hitting.When branch instruction was determined, hitting indication 258 was with being used to determine to change the branch instruction of that inlet 264 in a group 262 to send the pipeline of microprocessor 100 under the pipeline.
Branch target address prediction unit 142 also comprises the first passage of one or two input and selects the second channel of multiplexer 206A and one or two input to select multiplexer 206B.It is the destination address 254 that receives from each inlet 264 of the BTAC 208A in the selected group 262 of index 274 that first passage is selected multiplexer 206A.Steering logic 202 makes first passage selection multiplexer 206A choose output as first sidelong glance mark address 256A via hiting signal 258, and first sidelong glance mark address 256A is passage 0 or the passage 1 that captures the destination address 254 of being hit address 162.In like manner, it is the destination address 254 that receives from each inlet 264 of the BTAC 208B in the selected group 262 of index 274 that second channel is selected multiplexer 206B, and steering logic 202 makes second channel selection multiplexer 206B choose output as second sidelong glance mark address 256B, and second sidelong glance mark address 256B is passage 0 or the passage 1 that captures the destination address 254 of being hit location, address 162.
Branch target address prediction unit 142 also comprises the side of one or two input and selects multiplexer 204, and it receives from first sidelong glance mark address 256A of channel selecting multiplexer 206 and second sidelong glance mark address 256B.Steering logic 202 selects signal 278 to make side select the predicted destination address 164 of multiplexer 204 outputs as Fig. 1 via one, predicted destination address 164 be during being selected group 262 first, the destination address 256 of effective, generation, visible branch instruction.The following detailed description of closing Fig. 3 of showing.
Steering logic 202 receives the update request 176 of Fig. 1.Update request 176 is to include about being determined the information (for example: its address and destination address) of branch instruction.When branch target address prediction unit 142 begins to get 104 acquisitions soon from instruction in branch instruction, and be when down passing access with branch instruction with pipeline through microprocessor 100 pipelines, update request 176 also comprises significance bit 238, skew 266, T/NT prediction 276, mates indication 244 and 236 outputs of LRU information.If by the branch instruction that determined is not a new branch instruction (for example: if branch target address prediction unit 142 has been got the information of forecasting of the branch instruction that is determined soon), then update request 176 indication of two passages that also comprises two BTACs 208 and the information of forecasting of the branch instruction that is determined is provided at BTAC 208.
In one embodiment, each BTAC of BTACs 208 comprises other memory array in order to get branch prediction information soon.For example: in one embodiment, branch target address 254 is to be cached in a first memory array with branch instruction skew 266; Label 242 is to be cached in a second memory array with significance bit 238; And T/NT prediction 276 is to be stored in one the 3rd memory array.In one embodiment, the storage assembly of indivedual T/NT storage arrays is that two saturated number/subordinate's counters of going up are in order to indicate a prediction that takes place most probably, takes place, do not take place or extremely can not take place.In another embodiment, T/NT prediction 276 is produced by a complete indivedual branch predictor, for example: a branch history form.
Observable from Fig. 2 and other icon, the use that branch target address prediction unit 142 of the present invention makes the storage area than tradition each to get the branch predictor of row multiple-limb soon more efficient, wherein each branch predictor of getting the row multiple-limb soon of tradition is by comprising reservoir in order to getting each inlet relevant information of only one single branch target address and its soon, rather than fixedly comprises reservoir in order to get each branch target addresses that enters the mouth soon.Yet the efficient of storage area is to consume each BTAC 208 to get label soon and obtain, in the embodiment of Fig. 2 get soon label be a single tradition each get the twice of row multiple-limb BTAC soon.Yet label has less bits (in one embodiment, each inlet is to get 20 label soon, and each inlet is to get 42 branch prediction information soon) than branch destination address and correlation predictive information basically; Therefore, its advantage is that the whole size of branch target prediction unit 142 is less.And the associativity that provides each group variable is provided branch target prediction unit 142, and its its performance of potential improvement surmounts a traditional B TAC.
Please refer to Fig. 3, it is being read to produce the operational flowchart of a predicted destination address 164 for the branch target address prediction unit 142 of Fig. 2.Flow process starts from step 302.
In step 302, instruction acquisition device 102 produces acquisition address 162 to get 104 acquisitions, one instruction of getting row soon soon from the instruction of Fig. 1.Acquisition address 162 also is provided the branch target address prediction unit 142 in order to access graph 1.Respond acquisition address 162, steering logic 202 control address multiplexers 216 are to choose acquisition address 162 in order to the address 274 of output as Fig. 2.The acquisition address 162 of index 274 parts is the groups wherein of group 262 that choose the BTACs 208 of Fig. 2.As preceding as described in, group 262 be comprise each BTAC 208A and the 2nd BTAC 208B each passage 0 and 1 one the inlet 264.Then carry out to step 304.
In step 304, BTACs 208 is output label 242, significance bit 238, skew 266, T/NT prediction 276 and in the destination address 254 of Fig. 2 of each inlet of the cohort 262 that step 302 was selected.Then carry out to step 306.
In step 306, comparer 214 is relatively to capture address 162 labels 272 to indicate 244 each inlet 264 that is used in the group 262 with each label 242 that is selected group 262 with the coupling that produces Fig. 2.Then carry out to step 308.
In step 308, steering logic 202 produces hits each inlet that the group 262 that is selected is given in indication 258, and it is according to them corresponding coupling indication 244 and effectively indication 238.Steering logic 202 also control channel selects multiplexer 206 to choose the passage that captures the destination address 254 of hitting address 162.Then carry out to step 312.
In step 312, side select multiplexer 204 be according to instruction pointer 222, hit indication 258, T/NT prediction 276 and be offset 266 values choose have first, the BTAC 208 of effective, generation, visible branch instruction.Whether steering logic 202 can determine from T/NT prediction 276.In one embodiment, if the T/NT of branch instruction prediction is to take place or take place most probably, then branch instruction is to take place.If skew 266 values of a branch instruction are during more than or equal to the value of the corresponding minimum symbol of present instruction pointer 222, as seen this branch instruction is.If the corresponding significance bit 238 of a branch instruction is to be true time, this branch instruction is effective.If a branch instruction is the earliest getting row soon, it is the first that this branch instruction is got row soon at it, for example: if it has low skew 266 values.Therefore, if acquisition address 162 is to hit within a BTAC 208A and the 2nd BTAC 208B (for example: if branch target address prediction unit 142 is to comprise each branch instruction of getting two branch instructions in the row soon that an actual target address is used for being captured at present), and two branch instructions are that the skew 266 of predicted generation and two branch instructions is greater than instruction pointer 222 (for example: as seen two branches are), and then steering logic 202 makes side select multiplexer 204 to choose the destination address 256 of the branch instruction with minimum skew 266 values.If acquisition address 162 be only hit in a BTAC 208A and the 2nd BTAC 208B one of them the time (for example: if branch target address prediction unit 142 be comprise that an actual target address is used for being captured at present get an only branch instruction in being listed as soon), or a branch instruction is only arranged is that predicted generation or the skew 266 that a branch instruction is only arranged are less than instruction pointer 222, then steering logic 202 make that side selects that multiplexer 204 is chosen effectively, generation, visible branch instruction.Flow process ends at step 312.
Please refer to Fig. 4, its branch target address prediction unit 142 for Fig. 1 is being updated to respond one by the process flow diagram of decision branch instruction operation.Flow process starts from step 402.
In step 402, the pipeline of microprocessor 100 determines a branch instruction and responds a update request 176 that produces Fig. 1, the destination address that its quilt that comprises address, the branch instruction of the branch instruction that is determined determines and when the potential generation branch instruction one predicted destination address of branch target address prediction unit 142 produces the information of carrying with pipeline downwards.Then carry out to step 404.
In step 404, whether steering logic 202 checks that the information of carrying with pipeline in the update request 176 is to measure whether the branch instruction that is determined is a new branch instruction, for example downwards: do not have BTAC 208 to get soon by effective information of forecasting of decision branch instruction.If the branch instruction that is determined is a new branch instruction, then flow process is to carry out to step 408; Otherwise flow process is to carry out to step 406.
In step 406, steering logic 202 is updated in the passage of getting soon in a BTAC 208A or the 2nd BTAC 208B by the effective information of decision branch instruction, as interior indicated with the downward information transmitted of pipeline by update request 176.For example: if indicate the passage 1 of the 2nd BTAC 208B with the downward information transmitted of pipeline is the information of forecasting of getting soon by the decision branch instruction, then steering logic 202 is updated in the inlet in the passage 1 of the 2nd BTAC 208B of the selected group 262 of branch instruction address index in the update request 176 274, and wherein update request 176 is offered multiplexer 216 at the reproducting periods of branch target prediction unit 142 as scheduler 232.Flow process ends at step 406.
In step 408, steering logic 202 checks whether the information of carrying with pipeline in the update request 176 only hits within a BTAC 208A with the acquisition address portion of measuring the branch instruction that is determined downwards.That is, steering logic 202 measure that branch target prediction unit 142 whether is just being predicted a BTAC 208A but not the 2nd BTAC 208B be containing by the decision branch instruction get effective information of forecasting of getting a branch instruction in the row soon soon, but it is not the branch instruction that is determined.If not, then flow process is to carry out to step 414; Otherwise flow process is to carry out to step 412.
In step 412, steering logic 202 is changed the interior least-recently-used passage of the 2nd BTAC 208B of group 262, wherein group 262 is selected by the index 274 of the branch instruction address in the update request 176, and wherein update request 176 is offered multiplexer 216 as scheduler 232.That is the LRU information 236 that steering logic 202 inspection is used to be selected group 262 is least recently used to measure passage 0 or passage 1, and to be changed the least-recently-used passage within BTAC 208B by the information of forecasting of decision branch instruction.Therefore its advantage is, be selected group 262 and will be cached the branch prediction information of two branch instructions in the identical row of getting soon, making to be selected group 262 according to holding within other two inlet 264 in the group 262, is not that a pair of combination of channels is exactly the group 262 of triple channel combination.Flow process ends at step 412.
In step 414, steering logic 202 checks whether the information of carrying with pipeline in the update request 176 only hits within the 2nd BTAC 208B with the acquisition address portion of measuring the branch instruction that is determined downwards.That is, steering logic 202 measure that branch target prediction unit 142 whether is just being predicted the 2nd BTAC 208B but not a BTAC 208A be containing by the decision branch instruction get effective information of forecasting of getting a branch instruction in the row soon soon, but it is not the branch instruction that is determined.If not, then flow process is to carry out to step 418; Otherwise flow process is to carry out to step 416.
Least-recently-used passage in a BTAC 208A of step 416 steering logic 202 replacing groups 262, wherein group 262 is selected by the index 274 of the branch instruction address in the update request 176, and wherein update request 176 is offered multiplexer 216 as scheduler 232.That is the LRU information 236 that steering logic 202 inspection is used to be selected group 262 is least recently used to measure passage 0 or passage 1, and to be changed the least-recently-used passage within a BTAC 208A by the information of forecasting of decision branch instruction.Therefore its advantage is, be selected group 262 and will be cached the branch prediction information of two branch instructions in the identical row of getting soon, making to be selected group 262 according to holding within other two inlet 264 in the group 262, is not that a pair of combination of channels is exactly the group 262 of triple channel combination.Flow process ends at step 416.
In step 418, steering logic 202 checks whether the information of carrying with pipeline in the update request 176 hits in a BTAC 208A and the 2nd BTAC 208B with the acquisition address portion of measuring the branch instruction that is determined downwards.That is, steering logic 202 measure branch target prediction unit 142 whether just predicting the 2nd a BTAC 208B and a BTAC 208A be containing by the decision branch instruction get effective information of forecasting of getting different branch instructions in the row soon soon, but it is not the branch instruction that is determined.If not, then flow process is to carry out to step 424; Otherwise flow process is to carry out to step 422.
In step 422, steering logic 202 is changed the interior least-recently-used passage that hits of BTAC208 of group 262, wherein group 262 is selected by the index 274 of the branch instruction address in the update request 176, and wherein update request 176 is offered multiplexer 216 as scheduler 232.That is, the LRU information 236 that steering logic 202 inspection is used to be selected group 262 is a BTAC 208A or BTAC 208B is least recently used to be determined in the group 262 that is selected, steering logic 202 checks that the information with pipeline conveying is downwards hit in least recently used BTAC 208 to measure passage 0 or passage 1 in update request 176 then, and to be changed the least-recently-used passage within BTAC 208 by the information of forecasting of decision branch instruction.Therefore its advantage is, being selected the branch prediction information that group 262 will be cached two branch instructions gets in the row in identical soon, making to be selected group 262 according to holding within other two inlet 264 in the group 262, is not that a pair of combination of channels is exactly the group 262 of triple channel combination.Flow process ends at step 422.
In step 424, there is not BTAC 208 to hit, for example: the information indication of carrying downwards with pipeline in update request 176 was not both hit within a BTAC 208A by the acquisition address portion of decision branch instruction, did not also hit within the 2nd BTAC 208B.That is, neither BTAC 208B neither a BTAC 208A containing and is being got effective information of forecasting of getting a branch instruction in the row soon soon by the decision branch instruction.Therefore, steering logic 202 is selected a BTAC 208 and is changed with passage according to effective inlet quantity in being selected group 262 and the least-recently-used BTAC 208 of foundation.Especially, steering logic 202 is to select the least-recently-used BTAC 208 of group 262, unless the binary channels of a BTAC 208 be effectively and be not the binary channels of other BTAC 208 be effectively, in this example, steering logic 202 is changed other BTAC 208, as described in the follow procedure sign indicating number.Flow process ends at step 424.
The follow procedure sign indicating number is description control logic 202 employed replacing methods, and it is to sum up the process flow diagram that is summarized in Fig. 4.
//
//Btac?update?logic
//
//Define?some?signals?needed?below
wire[1:0]xbpBtacRdHitA_W,xbpBtacRdHitB_W;
rregs#(2)rhaw(xbpBtacRdHitA_W,xbpBtacRdHitA_S,clk);
rregs#(2)rhbw(xbpBtacRdHitB_W,xbpBtacRdHitB_S,clk);
wire?xcfBtacAHit_W=|xbpBtacRdHitA_W;
wire?xcfBtacBHit_W=|xbpBtacRdHitB_W;
wire?xcfBtacHitAB_W=xcfBtacAHit_W?&?xcfBtacBHit_W;
wire[1:0]xbpBtacRdValA_W,xbpBtacRdValB_W;
rregs#(2)rvaw(xbpBtacRdValA_W,xbpBtacRdValA_S,clk);
rregs#(2)rvbw(xbpBtacRdValB_W,xbpBtacRdValB_S,clk);
wire?xcfBtacAFull_W=&?xbpBtacRdValA_W;
wire?xcfBtacBFull_W=&?xbpBtacRdValB_W;
//Definition?of?what?the?3?bits?in?the?lru?mean:
//lru?data
//bit?2-side?A?mru
//bit?1-A?way?1?mru
//bit?0-B?way?1?mru
// For?this?16B
//New Branch HitA HitB Method
// 0 - - Use?staged?way/side
// 1 0 0 Use?3?b?mru
// 1 0 1 Use?1?b?A?mru
// 1 1 0 Use?1?b?B?mru
// 1 1 1 Use?1?b?side?mru?to
choose?side,then?replace?way?that?hit
//For?case?of?new?branch,nohits?for?this?16B.To?choose
side?A?vs.B:
//
// Valids
// Side?A Side?B Method
// 2 2 A/B?mru
// 2 1 Choose?B
// 1 2 Choose?A
// 2 0 Choose?B
// 0 2 Choose?A
// 1 1 A/B?mru
// 1 0 A/B?mru
// 0 1 A/B?mru
// 0 0 A/B?mru
//
//The?mru?bit?is?used?for?the?last?four?cases?for?proper
behavior?for?case?of?2?branches
//in?the?same?16?B?seen?close?together.The?btac?valid
bits?staged?down?for?the?second
//branch?may?not?include?the?write?of?the?first?branch.
Using?the?A/B?mru?bit?allows
//for?each?branch?tobe?correctly?placed?on?opposite?btac
sides.
//
//Note?that?if,for?instance,side?A?is?marked?as?having
both?ways?valid,while?side?B
//has?no?ways?valid,then?if?the?mru?bit?indicates?B?was
mru,one?of?3?cases?has
//occurred:
//1)2branches?in?the?same?16?B?wereseen?close?together.
The?first?branch?was?written
//to?side?B,so?the?second?branch?should?be?written?to
side?A,even?though?it?will
//displace?another?branch.
//2)A?branch?on?side?B?was?mru,but?it?has?since?been
invalidated?due?to?aliasing?or
//self-modifying?code.
//3)2branches?with?the?same?index,not?in?the?same?1
6?B,were?seen?close?together.The
//first?branch?was?written?to?side?B,but?the?second
branch?should?be?also?written?to
//side?B,to?avoid?displacing?another?branch.
//Case?1?should?be?more?common?than?case?2,but?not?more
common?that?case?3.So
//should?choose?the?side?that?is?not?already?full.
//lru?read?addr?from?E,lru?write?addr?3?cycles?later
//E?-?read?address?to?lru
//S?-?lru?read,capture?in?xcfetch
//W?-?use?lru?data?to?determine?replacement?way,capture
new?lru?write?data
//Z?-?write?lru
wire[2:0]xcfBtacLruRdData_W;
rregs_io#(3)lrurd(xcfBtacLruRdData_W,btacLruRdData_P,
clk);
wire?xcfBtacSideAMRU_W=xcfBtacLruRdData_W[2];
wire?xcfBtacAWay?1?MRU_W=xcfBtacLruRdData_W[1];
wire?xcfBtacBWay?1?MRU_W=xcfBtacLruRdData_W[0];
//if?this?16?B?has?no?hits?in?either?A?or?B,use?normal
lru
wire?xcfBtacAReplaceWay?0_W=(xcfBtacAWay?1?MRU_W?&
xbpBtacRdValA_W[1])|
~xbpBtacRdValA_W[0];
wire?xcfBtacBReplaceWay?0_W=(xcfBtacBWay?1?MRU_W &
xbpBtacRdValB_W[1])|
~xbpBtacRdValB_W[0];
//Choose?side?to?write?based?on?mru?bit?and?valids
wire?xcfBtacLruSelSideA_W=(~xcfBtacAFull_W &
xcfBtacBFull_W)|
(~xcfBtacSideAMRU_W &
~(xcfBtacAFull_W?&~xcfBtacBFull_W));
wire?xcfBtacBaseReplace?0_W=xcfBtacLruSelSideA_W?
xcfBtacAReplaceWay0_W:
xcfBtacBReplaceWay0_W;
//if?this?16?B?already?has?a?hit?in?either?A?or?B,must
write?to?opposite?side
wire?xcfBtacForceSideA_W=~xcfBtacAHit_W &
xcfBtacBHit_W;
wire?xcfBtacForceSideB_W=xcfBtacAHit_W &
~xcfBtacBHit_W;
//if?this?16?B?already?has?a?hit?in?both?A?and?B,must
replace?one
wire?xcfBtacReplaceHitSideA_W=xcfBtacHitAB_W &
~xcfBtacSideAMRU_W;
wire?xcfBtacReplaceHitSideB_W=xcfBtacHitAB_W &
xcfBtacSideAMRU_W;
wire?xcfBtacUseBaseReplace_W=~xcfBtacAHit_W &
~xcfBtacBHit_W;
wire?xcfBtacReplaceWay?0_W=(xcfBtacForceSideA_W &
xcfBtacAReplaceWay?0_W)|
(xcfBtacForceSideB_W &
xcfBtacBReplaceWay?0_W)|
(xcfBtacReplaceHitSideA_W &
xbpBtacRdHitA_W[0])|
(xcfBtacReplaceHitSideB_W &
xbpBtacRdHitB_W[0])|
(xcfBtacUseBaseReplace_W &
xcfBtacBaseReplace0_W);
wire[1:0]xcfBtacReplaceWay_W={~xcfBtacReplaceWay?0
_W,xcfBtacReplaceWay?0_W};
wire xcfBtacReplaceA_W=xcfBtacForceSideA_W|
xcfBtacReplaceHitSideA_W|
(~xcfBtacForceSideB_W?&?~xcfBtacHitAB_W &
xcfBtacLruSelSideA_W);
//
//Determine?if?this?branch?is?already?in?the?btac.
//if?so,rewrite?using?the?staged?way?and?side,not?the
lru-chosen?victim:
//Choose?replacement?side?only?for?real?new?branches.
Must?qualify?WrNew?with
//~(Valid?and?MatchAB),which?indicates?we?are?actually
re-writing?an?existing
//branch?due?to?cache?miss,bad?target,etc.
xbpBtacSelA_W?handles?these?cases.
wire xcfBtacValidMatch_W=xbpBtacValid_W &
xbpBtacMatch_W;
wire xcfBtacWrNewReal_W=xcfBtacWrNew_W &
~xcfBtacValidMatch_W;
//Choose?replacement?side?for?new?branch
wire xcfBtacWrQA_W=xcfBtacWrNewReal_W ?
xcfBtacReplaceA_W:
xbpBtacSelA_W;
//If?btac?was?valid?for?the?16?B?containing?the?ins,
replace?same?way,else?use
//lru-chosen?victim.
wire[1:0]xcfBtacStagedWay_W=xbpBtacSelA_W ?
xbpBtacRdHitA_W:
xbpBtacRdHitB_W;
wire[1:0]xcfBtacWrQWay_W=xcfBtacWrNewReal_W ?
xcfBtacReplaceWay_W:
xcfBtacStagedWay_W;
//lru?write
//lru?update?on?both?allocate?and?use
//write?the?lru?if?the?branch?was?seen?and?predicted?taken
//or?when?initializing
wire?xcfBtacLruWrEn_W=xcfBranchT_W|xcfInitBtac_P;
rregs?lrup(xcfBtacLruWrEn_P,xcfBtacLruWrEn_W,clk);
//lru?data
//bit?2?-?side?B?mru
//bit?1?-?A?way?1?mru
//bit?0?-?B?way?1?mru
wire[2:0]xcfBtacLruWrData_W;
assign?xcfBtacLruWrData_W[2]=~xcfBtacWrQA_W;
assign?xcfBtacLruWrData_W[1]=(xcfBtacWrQA_W &
~xcfBtacReplaceWay?0_W)|
(~xcfBtacWrQA_W &
btacLruRdData_P[1]);
assign?xcfBtacLruWrData_W[0]=(~xcfBtacWrQA_W &
~xcfBtacReplaceWay?0_W)|
(xcfBtacWrQA_W &
btacLruRdData_P[0]);
//force?0?0?0?when?initializing
rregs?#(3)lrudp(xcfBtacLruWrData_P,xcfBtacLruWrData_W
&{3{~xcfInitBtac_P}},clk);
Though the present invention and purpose thereof, feature and advantage are described in detail, yet other embodiment is comprised by the present invention.For example: though one to have both sides and each side be that the embodiment of branch prediction device of the combination of binary channels set is described, yet other embodiment is expected.For example: a device has four sides and each side is that the embodiment that a direct reflection is got is soon expected.The advantage of this embodiment is, some group of its activation is to predict the destination address of getting three branch instructions in the row soon identical, and the destination address of a branch instruction in a difference is got row soon, with effective binary channels combination that obtains group, and some group of activation, makes up with effective single channel that obtains group in identical destination address of getting four branch instructions in the row soon with prediction.This embodiment benefits for getting soon relatively on a large scale to be listed as to capture to have.Yet, the shortcoming of this embodiment is, its compared with choose first in two branch instructions of steering logic in getting row soon, effectively, generation, visible branch instruction, require the more time to give to choose in steering logic in getting row soon three or four branch instructions first, effective, generation, visible branch instruction.This extra time is possible require the minimizing of processor clock pulse frequency or require the additional conduits stage.Cost extra time of relevant this embodiment must be weighed based on three or four branch instructions will be contained in identical advantage of getting the possibility in the row soon, and it may increase and increase along with getting the row size soon.
And, it is to be 4 that each embodiment is described in the quantity of leading in the group, yet other embodiment comprises the inlet that other quantity is contained in each group, for example: an embodiment who is comprised, wherein device has both sides and each side is that a direct reflection is got soon so that each group comprises two inlets.Again for example: another embodiment that is comprised, wherein to have both sides and each side be that the combination of four-way set is got soon so that each group comprises 8 inlets to device.Again for example: another embodiment that is comprised, wherein to have four sides and each side be that the combination of a pair of passage set is got soon so that each group comprises 8 inlets to device.More at large, the embodiment that is comprised, wherein to have N side and each side be that the combination of M passage set is got soon so that each group comprises the MxN inlet to device.Therefore some group may effectively obtain the MxN combination of channels, and prediction is a MxN different destination address of getting an only single branch instruction in the row soon; Other group may effectively obtain (MxN-1) combination of channels, and a destination address of prediction only single branch instruction in (MxN-1) individual difference is got row soon, and a destination address of interior two branch instructions of row is got in prediction soon in one second difference; Other group may effectively obtain (MxN-2) combination of channels, an and destination address of prediction only single branch instruction in (MxN-2) individual difference is got row soon, and predict a destination address of getting interior two branch instructions of row in one second difference soon, and a destination address of interior two branch instructions of row is got in prediction soon in one the 3rd difference; And carry out to the last according to aforesaid way that other group may effectively obtain the N combination of channels, and N different each of getting row soon of prediction are got a destination address of interior M the branch instruction of row soon.
And each set of variations of getting the quantity of row branch instruction soon is combined in particular demographic combination rank and may be reached.For example: suppose that a device has four sides and each side is the combination of a pair of passage set.One group may effectively obtain four-way combination, by prediction (1) four branch get in the row soon one first, two branches get Lie Nei and a branch gets in the row the 3rd and the 4th soon soon one second; (2) three branches get in the row one first soon, Lie Nei gets and a branch gets in the row one the 4th soon soon one second and the 3rd in two branches; (3) three branches get in the row one first soon, Lie Nei gets and a branch gets in the row the 3rd and the 4th soon soon one second in three branches; Or (4) two branch get row soon four differences each get in the row soon.
Although many embodiment of the present invention are described as above, however these embodiment that will be appreciated that are modes by example present and and unrestricted.For haveing the knack of this skill person,, be easily to know for showing not breaking away from the form that spirit of the present invention and category do or the modification of details.
For example: except using hardware (for example: or be coupled to a CPU (central processing unit) (CPU) within it, microprocessor, microcontroller, digital signal processor, processor core, System on Chip/SoC (SOC) or any other programmable device) outside, also but embedded software (for example: computer readable code, procedure code, instruction and/or the configuration of any form data are (for example: the source, purpose or machine language)) the configuration implementation, for example: can use (for example: can read) medium storing software at a computing machine, but this type of software apparatus and method described herein that are activation, for example: function, make, module, emulation, character and/or test.For example: above-mentioned is see through to use general procedure language (for example: C, C++), GDSII database, hardware description language (HDL) (for example: gripping tool and finishing icon) to comprise Verilog HDL, VHDL etc. or other available program, database and/or circuit.This type of software is can be configured in the spendable medium of any known computer, comprise semiconductor, disk, CD (for example: CR-ROM, DVD-ROM), and can use the interior computer data signal of (for example: can read) transmission medium (for example: carrier wave or any other media comprise the media of numeral, light or simulation application) as embedding a computing machine.So, software is can see through telecommunication network (be comprise world-wide web (Internet) with internal network (intranet)) and be transmitted.
Institute will be appreciated that apparatus and method described herein may be comprised in semiconductor intellectual property power core, for example: a microprocessor core (for example embedding HDL) and be converted into hardware at the product of integrated circuit.In addition, apparatus and method described herein may be embedded into the combination as hardware and software.Therefore, the present invention should not be subject to above-mentioned example embodiment, and it should be as the criterion with following interest field and its equivalence.

Claims (71)

1. the device of one in the microprocessor, each gets a destination address of the branch instruction of a variable number in the row soon this device prediction, and this each to get row soon are acquisitions one captures the address from what an instruction was got soon, it is characterized in that this device comprises:
One first and one second double-channel set combination memory cache, each double-channel set combination memory cache is to have index input coupling to get the acquisition address soon with the instruction that receives a part, wherein this index be choose a plurality of four the inlet groups one of them, each this a plurality of four inlet groups be comprise each this first with each channel of this second memory cache in an inlet, wherein each those inlet is in order to get a destination address that before had been performed branch instruction soon; And
One permutation logic circuit is to be coupled to this first second to get soon with this, changes those one of them inlets that enters the mouth in order to choose, respond thus a branch instruction decision so that:
(a) this microprocessor is during one first subclass of these a plurality of four inlet groups of operation, these a plurality of four inlets four of groups are gone into outspoken each that is taken at that four differences get row soon and are got the destination address of a branch instruction in the row soon, to obtain the combination of four channel groups; And
(b) this microprocessor is during one second subclass of these a plurality of four inlet groups of operation, these a plurality of four inlets four of groups go into that outspoken each that is taken at that two differences get row is soon got the destination address of a branch instruction in the row soon and in one the 3rd difference is got row soon the destination address of two branch instructions, to obtain the combination of three channel groups.
2. the device in the microprocessor as claimed in claim 1, it is characterized in that, wherein to this a plurality of four the inlet groups this second subclass, if instruction get soon the acquisition address be hit in this first with this second memory cache within the time, this first second is got soon and provides the 3rd different these destination addresses of getting interior this two branch instruction of row soon with this.
3. the device in the microprocessor as claimed in claim 2 is characterized in that, also comprises:
One side is selected multiplexer, be be coupled to this first with this second memory cache, wherein this side selection multiplexer is according to three different positions of soon getting row in each branch instruction of this two branch instruction relevant with an instruction pointer of this microprocessor, chooses one of them destination address of those destination addresses that the 3rd difference is got interior this two branch instruction of row soon.
4. the device in the microprocessor as claimed in claim 3, it is characterized in that wherein this side selection multiplexer is in order to choose first one of them destination address of those destination addresses effective and generation and visible branch instruction in this two branch instruction relevant with this instruction pointer.
5. the device in the microprocessor as claimed in claim 4, it is characterized in that, when each branch instruction of this two branch instruction is to be positioned at that the 3rd difference is got this instruction pointer of row soon or at this instruction pointer afterwards the time, as seen each branch instruction of this two branch instruction is.
6. the device in the microprocessor as claimed in claim 4, it is characterized in that, when its pairing this its pairing this destination address of inlet indication of each branch instruction of this two branch instruction when being effective, each branch instruction of this two branch instruction is effective.
7. the device in the microprocessor as claimed in claim 4, it is characterized in that, when this microprocessor produced the prediction whether this branch instruction will take place, each branch instruction of this two branch instruction was to take place, and wherein this prediction is indicated this branch instruction to take place but not do not taken place.
8. the device in the microprocessor as claimed in claim 7 is characterized in that wherein each those inlet is more in order to get this prediction soon.
9. the device of one in the microprocessor as claimed in claim 8 is characterized in that, wherein this first with each memory cache of this second memory cache be to comprise indivedual storage arrays in order to get those destination addresses and those predictions soon.
10. the device in the microprocessor as claimed in claim 4, it is characterized in that, in the time of before the position of the branch instruction in this two branch instruction in the 3rd difference is got row soon is another branch instruction in this two branch instruction, this branch instruction of this two branch instruction is first.
11. the device in the microprocessor as claimed in claim 3 is characterized in that, wherein each those inlet is more in order to get this position that this gets this branch instruction in the row soon soon.
12. the device in the microprocessor as claimed in claim 3 is characterized in that, also comprises:
One first and one second channel is selected multiplexer, be to be respectively coupled to this side to select between multiplexer and this first and this second memory cache, wherein this first to select multiplexer with this second channel be to get those channels that hit the address soon according to this instruction, choose respectively this first with this destination addresses of those one of them channels of channel of this second memory cache, and provide this to be selected two destination addresses to select multiplexer to this side.
13. the device of one in the microprocessor as claimed in claim 1 is characterized in that, wherein each those inlet is more in order to get a label of getting row soon soon, and this gets row soon is to comprise this branch instruction.
14. the device in the microprocessor as claimed in claim 13, it is characterized in that, wherein this permutation logic circuit is more to change those one of them inlets that enter the mouth in order to choose, to respond the decision of a branch instruction, thus to each group in this second subclass of these a plurality of four inlet groups, each this first be to get the 3rd different these labels of getting row soon soon with this second memory cache, and wherein to get row soon be to comprise this two branch instruction to the 3rd difference.
15. the device in the microprocessor as claimed in claim 13, it is characterized in that, when this label of getting this inlet of row soon is effectively and mates a label segment of this acquisition address, this instruction get soon the acquisition address be hit in this first with one of them channel of those channels of this second memory cache in, wherein this to get row soon be to comprise this branch instruction.
16. the device of one in the microprocessor as claimed in claim 13 is characterized in that, wherein each this first with this second memory cache be to comprise indivedual storage arrays in order to get those destination addresses and those labels soon.
17. as the device of one in the claim 1 described microprocessor, it is characterized in that, wherein in this second subclass of these a plurality of four inlet groups, these a plurality of four inlets two of groups go into outspokenly to get the 3rd difference to get those destination addresses of this two branch instruction in the row soon be in different this first and this second memory cache.
18. the device in the microprocessor as claimed in claim 1 is characterized in that, also comprises:
One displacement storer, be to be coupled to this permutation logic circuit, in order to store this relevant replacing information of a plurality of four inlet groups with each, this replacing information is to change those one of them inlets that enter the mouth to respond the parsing of this branch instruction by this permutation logic circuit in order to choose.
19. the device in the microprocessor as claimed in claim 18, it is characterized in that, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving only hit in this first with this second memory cache one of them the time, this permutation logic circuit be choose change this first with this second memory cache another one of them channel of those channels wherein.
20. the device in the microprocessor as claimed in claim 19, it is characterized in that, wherein should replacing information be that this two channel that comprises each set of two set of these a plurality of four inlet groups to each is a least-recently-used indication, wherein this permutation logic circuit be in order to choose this least-recently-used channel of replacing.
21. the device in the microprocessor as claimed in claim 18, it is characterized in that, wherein should replacing information be comprise this first with this second memory cache be a least-recently-used indication, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving be hit in this first with this second memory cache within the time, this permutation logic circuit be choose change this first with one of them this least-recently-used memory cache of this second memory cache.
22. the device in the microprocessor as claimed in claim 18, it is characterized in that, wherein should replacing information be comprise this first with this second memory cache be a least-recently-used indication, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving be do not hit in this first with this second memory cache within the time, this permutation logic circuit be choose change this first with one of them this least-recently-used memory cache of this second memory cache.
23. the device in the microprocessor as claimed in claim 22, it is characterized in that, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving be do not hit in this first with this second memory cache within the time, this permutation logic circuit be choose change this first with one of them this least-recently-used memory cache of this second memory cache; Yet, if this first with one of them two these channels of this second memory cache be effectively and than this first with this second memory cache another two this channel wherein be effectively after a little while, then this permutation logic circuit be choose change this first and this second memory cache this another memory cache wherein.
24. the device in the microprocessor as claimed in claim 1 is characterized in that, wherein this permutation logic circuit is more to change those one of them inlets that enters the mouth in order to choose, and to respond the decision of a branch instruction, makes:
(c) this microprocessor is during three subsetss of these a plurality of four inlet groups of operation, these a plurality of four inlets four of groups are gone into outspoken each that is taken at that two differences get row soon and are got the destination address of two branch instructions in the row soon, to obtain the combination of double-channel group.
25. the device in the microprocessor as claimed in claim 24, it is characterized in that, wherein in these three subsetss of these a plurality of four inlet groups, it is in different this first and this second memory cache that these a plurality of four inlets two of groups are gone into those destination addresses that outspoken each of getting that this two difference gets row soon gets this two branch instruction in the row soon.
26. the device in the microprocessor as claimed in claim 24, it is characterized in that, wherein to this a plurality of four the inlet groups these three subsetss, if instruction get soon the acquisition address be hit in this first with this second memory cache within the time, this first with this second to get soon be in order to provide this two differently to get row soon one of them gets those destination addresses of interior this two branch instruction of row soon.
27. the device in the microprocessor as claimed in claim 24, it is characterized in that, wherein each those inlet is more in order to get a label of getting row soon soon, this gets row soon is to comprise this branch instruction, wherein this permutation logic circuit is more to change those one of them inlets that enter the mouth to respond the decision of a branch instruction in order to choose, so that to each group in these three subsetss of these a plurality of four inlet groups, each this first be to get this label that these two different each of getting row are soon got row soon soon with this second memory cache, and wherein to get row soon be to comprise this two branch instruction to this two difference.
28. the device in the microprocessor as claimed in claim 1, it is characterized in that, wherein this permutation logic circuit more according to by this first with this second double-channel set combination memory cache during its access with the of short duration information that provided of taking place simultaneously of row acquisition of getting soon that contains this branch instruction after resolving of getting soon from instruction, this permutation logic circuit is to change those one of them inlets that enter the mouth in order to choose.
29. the device in the microprocessor as claimed in claim 1, it is characterized in that, wherein a computer program makes this device action, this computer program is to comprise the procedure code that the spendable medium of a computing machine have embodied on computer readable, and wherein this computer program is in order to use with a computer installation.
30. the device of one in the microprocessor as claimed in claim 1 is characterized in that, wherein to be embedded within the transmission medium be that the procedure code that comprises embodied on computer readable provides this device to a computer data signal.
31. a prediction one in one microprocessor got the method for a destination address of the branch instruction of a variable number in the row soon, this get soon row be acquisition from the acquisition address that an instruction is got soon, it is characterized in that this method comprises:
Provide an index to one first and one second double-channel set combination memory cache with choose a plurality of four inlet groups one of them, each this a plurality of four inlet groups be comprise each this first with each channel of this second memory cache in an inlet, each those go into the outspoken destination address that before had been performed branch instruction of getting, this index is the part that the acquisition address is got in this instruction soon; And
Those enter the mouth one of them to choose replacing, to respond the decision of a branch instruction, make:
(a) this microprocessor is during one first subclass of these a plurality of four inlet groups of operation, these a plurality of four inlets four of groups are gone into outspoken each that is taken at that four differences get row soon and are got the destination address of a branch instruction in the row soon, to obtain the combination of four channel groups; And
(b) this microprocessor is during one second subclass of these a plurality of four inlet groups of operation, these a plurality of four inlets four of groups go into that outspoken each that is taken at that two differences get row is soon got the destination address of a branch instruction in the row soon and in one the 3rd difference is got row soon the destination address of two branch instructions, to obtain the combination of three channel groups.
32. one in prediction one microprocessor as claimed in claim 31 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
To this second subclass of these a plurality of four inlet groups, measure instruction get soon the acquisition address whether hit in this first with this second memory cache within; And
This first is provided at the 3rd with this second memory cache and differently gets these destination addresses of this two branch instruction in the row soon by each.
33. one in prediction one microprocessor as claimed in claim 32 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
According to the relevant with an instruction pointer of this microprocessor the 3rd different positions of getting each this two branch instruction in the row soon, choose the 3rd difference get soon this two branch instruction in the row those destination addresses one of them.
34. one in prediction one microprocessor as claimed in claim 33 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, wherein this choose those destination addresses one of them be comprise choose in this two branch instruction relevant with this instruction pointer first effectively and take place and those destination addresses of visible branch instruction one of them.
35. one in prediction one microprocessor as claimed in claim 33 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
Got this this position of getting this branch instruction in the row soon soon before this is chosen.
36. one in prediction one microprocessor as claimed in claim 33 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
Get those channels of hitting of acquisition address soon according to this instruction, choose respectively this first with this destination address of this one of them channel of channel of this second memory cache; And
Provide this be selected two destination addresses to this choose those destination addresses one of them.
37. one in prediction one microprocessor as claimed in claim 31 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
Get a label of getting row soon soon, this gets row soon is to comprise this branch instruction.
38. one in prediction one microprocessor as claimed in claim 31 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
Store with each should a plurality of four inlet groups relevant replacing information and choose replacing those enter the mouth that one of them enters the mouth, to respond the parsing of this branch instruction to be used for this.
39. one in prediction one microprocessor as claimed in claim 38 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and this acquisition address of this branch instruction through resolving be only hit in this first with this second memory cache one of them the time, this choose change be comprise choose change other this first and those channels of this second memory cache one of them.
40. one in prediction one microprocessor as claimed in claim 39 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, wherein should replacing information be that this two channel that comprises each set of two set of these a plurality of four inlet groups to each is a least-recently-used indication, wherein this chooses that to change be to comprise to choose this least-recently-used channel of replacing.
41. one in prediction one microprocessor as claimed in claim 38 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, wherein should replacing information be comprise this first with this second memory cache be a least-recently-used indication, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving be hit in this first with this second memory cache within the time, this choose change be comprise choose change this first with one of them this least-recently-used memory cache of this second memory cache.
42. one in prediction one microprocessor as claimed in claim 38 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, wherein should replacing information be comprise this first with this second memory cache be a least-recently-used indication, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving be do not hit in this first with this second memory cache within the time, this choose change be comprise choose change this first with one of them this least-recently-used memory cache of this second memory cache.
43. one in prediction one microprocessor as claimed in claim 42 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, if wherein its destination address of this branch instruction through resolving be not cached this first with this second memory cache within, if and the acquisition address of this branch instruction through resolving be do not hit in this first with this second memory cache within the time, this choose change be comprise choose change this first with one of them this least-recently-used memory cache of this second memory cache; Yet, if this first with one of them two these channels of this second memory cache be effectively and than this first with this second memory cache another two this channel wherein be effectively after a little while, then this choose change be comprise choose change this first and this second memory cache wherein this another.
44. one in prediction one microprocessor as claimed in claim 31 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, wherein this chooses that to change be to comprise to choose to change those one of them inlets that enters the mouth, and to respond the decision of a branch instruction, makes:
(c) this microprocessor is during three subsetss of these a plurality of four inlet groups of operation, these a plurality of four inlets four of groups are gone into outspoken each that is taken at that two differences get row soon and are got the destination address of two branch instructions in the row soon, to obtain the combination of double-channel group.
45. one in prediction one microprocessor as claimed in claim 44 got the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprises:
To this a plurality of four the inlet groups these three subsetss, measure this instruction get soon the acquisition address whether hit in this first and this second memory cache within; And
This first second is got soon and provides this two differently to get row soon one of them gets those destination addresses of this two branch instruction in the row soon with this by each.
46. one in the microprocessor installs, a destination address of the branch instruction of a variable number in the row is got in this device prediction one soon, and this gets row soon is to capture an acquisition address of getting soon from an instruction, it is characterized in that this device comprises:
M N channel set combination memory cache, each this N channel set combination memory cache is to have index input coupling to get the acquisition address soon with this instruction that receives a part, wherein this index is to choose one of them group of a plurality of MxN inlet group, each this a plurality of MxN inlet group is the interior inlet of each channel that comprises each this M memory cache, and wherein each those inlet is in order to get a destination address of a branch instruction that before had been performed soon; And
One permutation logic circuit is to be coupled to this M memory cache, changes those one of them inlets that enters the mouth in order to choose, and responds the decision of a branch instruction thus, so that:
(a) this microprocessor is during one first subclass of this a plurality of MxN inlet group of operation, go into the destination address that the outspoken MxN of being taken at different each of getting row are soon got a branch instruction in the row soon for MxN of this a plurality of MxN inlet group, with the combination of acquisition MxN channel group; And
(b) this microprocessor is during one second subclass of this a plurality of MxN inlet group of operation, MxN of this a plurality of MxN inlet group go into outspoken be taken at that this MxN different each of getting soon that row (MxN-1) individual difference wherein gets row are soon got the destination address of a branch instruction in the row soon and in a MxN difference is got row soon the destination address of two branch instructions, with the combination of acquisition (MxN-1) channel group.
47. the device in the microprocessor as claimed in claim 46, it is characterized in that, wherein to this second subclass of this a plurality of groups, if it is when hitting within wherein both of this M memory cache that the acquisition address is got in instruction soon, getting soon for this M is in order to be provided at those destination addresses that MxN gets interior this two branch instruction of row soon.
48. the device in the microprocessor as claimed in claim 47 is characterized in that, also comprises:
One side is selected multiplexer, be to be coupled to this M memory cache, wherein to select multiplexer be a position of getting each this two branch instruction in the row according to this MxN relevant with an instruction pointer of this microprocessor soon for this side, choose this MxN get soon this two branch instruction in the row those destination addresses one of them.
49. the device in the microprocessor as claimed in claim 48, it is characterized in that, wherein this side select multiplexer be in order to choose in this two branch instruction relevant with this instruction pointer first effectively and take place and those destination addresses of visible branch instruction one of them.
50. the device in the microprocessor as claimed in claim 48 is characterized in that, also comprises:
The N channel is selected multiplexer, be to be respectively coupled to this side to select between multiplexer and this M the memory cache, wherein this N channel selection multiplexer is to get those channels that hit the acquisition address soon according to this instruction, choose this destination address of those one of them channels of channel of this N memory cache respectively, and provide this to be selected N destination address to this side selection multiplexer.
51. the device in the microprocessor as claimed in claim 46 is characterized in that, also comprises:
One displacement storer, be to be coupled to this permutation logic circuit, this displacement storer is in order to store this relevant replacing information of a plurality of MxN inlet groups with each, this replacing information is to change those one of them inlets that enters the mouth by this permutation logic circuit in order to choose, to respond the parsing of this branch instruction.
52. the device in the microprocessor as claimed in claim 51, it is characterized in that, if wherein its destination address of this branch instruction through resolving is not cached within this M memory cache, if and the acquisition address misses of this branch instruction through resolving is within least one this M memory cache the time, this permutation logic circuit be choose this at least one this M memory cache of replacing those channels one of them.
53. the device in the microprocessor as claimed in claim 52, it is characterized in that, wherein should replacing information be that this N channel that comprises to each M each set of gathering of this a plurality of MxN inlet group is a least-recently-used indication, wherein this permutation logic circuit be in order to choose this least-recently-used channel of replacing.
54. the device in the microprocessor as claimed in claim 51, it is characterized in that, wherein should replacing information be that to comprise this M memory cache be a least-recently-used indication, if wherein its destination address of this branch instruction through resolving is not cached within this M memory cache, if and the acquisition address of this branch instruction through resolving is that this permutation logic circuit is to choose one of them this least-recently-used memory cache of this M of replacing memory cache when hitting within whole this M memory caches.
55. the device in the microprocessor as claimed in claim 51, it is characterized in that, wherein should replacing information be that to comprise this M memory cache be a least-recently-used indication, if wherein its destination address of this branch instruction through resolving is not cached within this M memory cache, if and the acquisition address of this branch instruction through resolving is when not hitting within this M memory cache, this permutation logic circuit is to choose one of them this least-recently-used memory cache of this M of replacing memory cache.
56. the device in the microprocessor as claimed in claim 46 is characterized in that, wherein this permutation logic circuit is more to change those one of them inlets that enters the mouth in order to choose, and to respond the decision of a branch instruction, makes:
(c) this microprocessor is during three subsetss of this a plurality of MxN inlet group of operation, MxN of this a plurality of MxN inlet group goes into outspokenly to be taken at that this MxN different each of getting soon that row (MxN-2) individual difference wherein gets row are soon got the destination address of a branch instruction in the row soon and each gets the destination address of two branch instructions in the row soon in that two differences are got row soon, with the combination of acquisition (MxN-2) channel group.
57. the device in the microprocessor as claimed in claim 46 is characterized in that, wherein this permutation logic circuit is more to change those one of them inlets that enters the mouth in order to choose, and to respond the decision of a branch instruction, makes:
(c) this microprocessor is during three subsetss of this a plurality of MxN inlet group of operation, go into the destination address that the outspoken N of being taken at different each of getting row are soon got M branch instruction in the row soon for MxN of this a plurality of MxN inlet group, with the combination of acquisition N channel group.
58. the device in the microprocessor as claimed in claim 46, it is characterized in that, wherein a computer program makes this device action, this computer program is to comprise the procedure code that the spendable medium of a computing machine have embodied on computer readable, and wherein this computer program is in order to use with a computer installation.
59. the method for a destination address of the branch instruction of a variable number in the row is got in the interior prediction one of a microprocessor soon, this gets row soon is to capture an acquisition address of getting soon from an instruction, it is characterized in that this method comprises:
Provide an index to M N channel set combination memory cache to choose one of them group of a plurality of MxN inlet group, each this a plurality of MxN inlet group is the interior inlet of each channel that comprises each this M memory cache, each those go into the outspoken destination address that before had been performed branch instruction of getting, this index is the part that the acquisition address is got in this instruction soon; And
Those enter the mouth one of them to choose replacing, to respond the decision of a branch instruction, make:
(a) this microprocessor is during one first subclass of this a plurality of MxN inlet group of operation, go into the destination address that the outspoken MxN of being taken at different each of getting row are soon got a branch instruction in the row soon for MxN of this a plurality of MxN inlet group, with the combination of acquisition MxN channel group; And
(b) this microprocessor is during one second subclass of this a plurality of MxN inlet group of operation, MxN of this a plurality of MxN inlet group goes into outspokenly to be taken at that this MxN different each of getting soon that row (MxN-1) individual difference wherein gets row are soon got the destination address of a branch instruction in the row soon and in one (MxN) different destination addresses of getting two branch instructions in the row soon, with the combination of acquisition (MxN-1) channel group.
60. the method for a destination address of the branch instruction of a variable number in the row is got in prediction one soon in the microprocessor as claimed in claim 59, it is characterized in that, also comprises:
To this second subclass of this a plurality of MxN inlet group, measure instruction and get the acquisition address soon and whether hit within this M memory cache two memory caches wherein; And
To get the acquisition address soon be when hitting within this M memory cache two memory caches wherein when measuring instruction, is provided at MxN by each memory caches of this M memory cache these two memory caches wherein and gets those destination addresses that are listed as interior this two branch instruction soon.
61. the method for a destination address of the branch instruction of a variable number in the row is got in prediction one soon in the microprocessor as claimed in claim 60, it is characterized in that, also comprises:
Get a position of each this two branch instruction in the row soon according to this MxN relevant with an instruction pointer of this microprocessor, be chosen at this MxN get soon this two branch instruction in the row those destination addresses one of them.
62. the method for a destination address of the branch instruction of a variable number in the row is got in prediction one soon in the microprocessor as claimed in claim 61, it is characterized in that, wherein this choose those destination addresses one of them be comprise choose in this two branch instruction relevant with this instruction pointer first effectively and take place and those destination addresses of visible branch instruction one of them.
63. the method for a destination address of the branch instruction of a variable number in the row is got in prediction one soon in the microprocessor as claimed in claim 60, it is characterized in that, also comprises:
Store with each should be a plurality of the relevant replacing information of MxN inlet groups choose replacing those enter the mouth that one of them enters the mouth to be used for this, to respond the parsing of this branch instruction.
64. get the method for a destination address of the branch instruction of a variable number in the row soon as prediction one in the described microprocessor of claim 63, it is characterized in that, if wherein this its destination address of quilt decision branch instruction is not cached within this M memory cache, if and this acquisition address misses of this quilt decision branch instruction is within least one this M memory cache the time, this choose change be comprise choose this at least one this M memory cache of replacing those channels one of them.
65. get the method for a destination address of the branch instruction of a variable number in the row soon as prediction one in the described microprocessor of claim 64, it is characterized in that, wherein should replacing information be that this N channel of each set that comprises to each M set of this a plurality of MxN inlet group is a least-recently-used indication, wherein this chooses that to change be to comprise to choose the least-recently-used channel of replacing.
66. get the method for a destination address of the branch instruction of a variable number in the row soon as prediction one in the described microprocessor of claim 63, it is characterized in that, wherein should replacing information be that to comprise this M memory cache be a least-recently-used indication, if wherein this its destination address of quilt decision branch instruction is not cached within this M memory cache, if and the acquisition address of this quilt decision branch instruction is that this is chosen and changes is to comprise to choose one of them this least-recently-used memory cache of this M of replacing memory cache when hitting within whole this M memory caches.
67. get the method for a destination address of the branch instruction of a variable number in the row soon as prediction one in the described microprocessor of claim 63, it is characterized in that, wherein should replacing information be that to comprise this M memory cache be a least-recently-used indication, if wherein this its destination address of quilt decision branch instruction is not cached within this M memory cache, if and the acquisition address of this quilt decision branch instruction is when not hitting within this M memory cache, this is chosen and changes is to comprise to choose one of them this least-recently-used memory cache of this M of replacing memory cache.
68. the method for a destination address of the branch instruction of a variable number in the row is got in prediction one soon in the microprocessor as claimed in claim 59, it is characterized in that, wherein this chooses that to change be to comprise to choose to change those one of them inlets that enters the mouth, and to respond the decision of a branch instruction, makes:
(c) this microprocessor is during three subsetss of this a plurality of MxN inlet group of operation, MxN of this a plurality of MxN inlet group goes into outspokenly to be taken at that this MxN different each of getting soon that row (MxN-2) individual difference wherein gets row are soon got the destination address of a branch instruction in the row soon and each gets the destination address of two branch instructions in the row soon in that two differences are got row soon, with the combination of acquisition (MxN-2) channel group.
69. as predicting in the described microprocessor of claim 68 that one gets the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprise:
To these three subsetss of this a plurality of MxN inlet group, measure this instruction and get soon and capture the address and whether hit within this M memory cache two memory caches wherein; And
To get the acquisition address soon be when hitting within this M memory cache two memory caches wherein when measuring instruction, and one of them gets those destination addresses that are listed as interior this two branch instruction soon to provide these two differences to get row soon by each memory caches of this M memory cache these two memory caches wherein.
70. the method for a destination address of the branch instruction of a variable number in the row is got in prediction one soon in the microprocessor as claimed in claim 59, it is characterized in that, wherein this chooses that to change be to comprise to choose to change those one of them inlets that enters the mouth, and to respond the decision of a branch instruction, makes:
(c) this microprocessor is during three subsetss of this a plurality of MxN inlet group of operation, go into the destination address that the outspoken N of being taken at different each of getting row are soon got M branch instruction in the row soon for MxN of this a plurality of MxN inlet group, with the combination of acquisition N channel group.
71. as predicting in the described microprocessor of claim 70 that one gets the method for a destination address of the branch instruction of a variable number in the row soon, it is characterized in that, also comprise:
To these three subsetss of this a plurality of MxN inlet group, measure this instruction and get soon and capture the address and whether hit within all this M memory caches; And
Getting the acquisition address soon when the mensuration instruction is when hitting within all this M memory caches, and getting soon by each this M provides this N different getting soon to be listed as those destination addresses that one of them gets this N branch instruction in the row soon.
CNB2005100919093A 2004-08-04 2005-08-04 Apparatus for predicting multiple branch target addresses Active CN100388187C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US59886804P 2004-08-04 2004-08-04
US60/598,868 2004-08-04

Publications (2)

Publication Number Publication Date
CN1821953A CN1821953A (en) 2006-08-23
CN100388187C true CN100388187C (en) 2008-05-14

Family

ID=36923343

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100919093A Active CN100388187C (en) 2004-08-04 2005-08-04 Apparatus for predicting multiple branch target addresses

Country Status (2)

Country Link
CN (1) CN100388187C (en)
TW (1) TWI303777B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9158697B2 (en) 2011-12-28 2015-10-13 Realtek Semiconductor Corp. Method for cleaning cache of processor and associated processor

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887358B (en) * 2009-05-19 2014-06-25 威盛电子股份有限公司 Device and method suitable for a microprocessor
CN106406823B (en) * 2016-10-10 2019-07-05 上海兆芯集成电路有限公司 Branch predictor and method for operating branch predictor
CN106843812A (en) * 2016-12-23 2017-06-13 北京北大众志微系统科技有限责任公司 A kind of method and device for realizing the prediction of indirect branch associated software
US11642768B2 (en) * 2020-07-15 2023-05-09 Snap-On Incorporated Dead blow hammer head

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470438B1 (en) * 2000-02-22 2002-10-22 Hewlett-Packard Company Methods and apparatus for reducing false hits in a non-tagged, n-way cache
CN1397874A (en) * 2001-05-04 2003-02-19 智慧第一公司 Appts. and method for quick fetching line selecting target address of high speed buffer storage
US20040030838A1 (en) * 2002-08-12 2004-02-12 Van De Waerdt Jan-Willem Instruction cache way prediction for jump targets
US20040139292A1 (en) * 2003-01-14 2004-07-15 Ip-First, Llc. Apparatus and method for resolving deadlock fetch conditions involving branch target address cache
US20040139281A1 (en) * 2003-01-14 2004-07-15 Ip-First, Llc. Apparatus and method for efficiently updating branch target address cache
US20040143709A1 (en) * 2003-01-16 2004-07-22 Ip-First, Llc. Apparatus and method for invalidation of redundant branch target address cache entries

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470438B1 (en) * 2000-02-22 2002-10-22 Hewlett-Packard Company Methods and apparatus for reducing false hits in a non-tagged, n-way cache
CN1397874A (en) * 2001-05-04 2003-02-19 智慧第一公司 Appts. and method for quick fetching line selecting target address of high speed buffer storage
US20040030838A1 (en) * 2002-08-12 2004-02-12 Van De Waerdt Jan-Willem Instruction cache way prediction for jump targets
US20040139292A1 (en) * 2003-01-14 2004-07-15 Ip-First, Llc. Apparatus and method for resolving deadlock fetch conditions involving branch target address cache
US20040139281A1 (en) * 2003-01-14 2004-07-15 Ip-First, Llc. Apparatus and method for efficiently updating branch target address cache
US20040143709A1 (en) * 2003-01-16 2004-07-22 Ip-First, Llc. Apparatus and method for invalidation of redundant branch target address cache entries

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9158697B2 (en) 2011-12-28 2015-10-13 Realtek Semiconductor Corp. Method for cleaning cache of processor and associated processor
TWI579695B (en) * 2011-12-28 2017-04-21 瑞昱半導體股份有限公司 Method for cleaning cache of processor and associated processor

Also Published As

Publication number Publication date
TW200620096A (en) 2006-06-16
TWI303777B (en) 2008-12-01
CN1821953A (en) 2006-08-23

Similar Documents

Publication Publication Date Title
US10768939B2 (en) Load/store unit for a processor, and applications thereof
CN1632877B (en) Variable latency stack cache and method for providing data
US7707397B2 (en) Variable group associativity branch target address cache delivering multiple target addresses per cache line
US9632939B2 (en) Data cache virtual hint way prediction, and applications thereof
KR100747127B1 (en) Cache which provides partial tags from non-predicted ways to direct search if way prediction misses
CN105183663B (en) Pre-fetch unit and data prefetching method
CN100517274C (en) Cache memory and control method thereof
CN105701033B (en) The cache memory dynamically configurable depending on mode
JP3830651B2 (en) Microprocessor circuit, system, and method embodying a load target buffer for prediction of one or both of loop and stride
CN105701022B (en) Set associative cache
CN102160033B (en) Hybrid branch prediction device with sparse and dense prediction caches
CN105701031B (en) The operating method of processor and its cache memory and cache memory
CN100495325C (en) Method and system for on-demand scratch register renaming
CN101794214B (en) Register renaming system using multi-block physical register mapping table and method thereof
JPH10198561A (en) Microprocessor circuit, its system and its method for embodying load target buffer provided with entry related to preferableness of prefetching
CN108268282A (en) Be used to check and store to storage address whether processor, method, system and the instruction of the instruction in long-time memory
KR20040014673A (en) Branch prediction with two levels of branch prediction cache
KR20050013544A (en) System and method for linking speculative results of load operations to register values
CN101918925B (en) Second chance replacement mechanism for a highly associative cache memory of a processor
CN107111550A (en) Conversion is omitted by selective page and prefetches conversion omission time delay in concealing program Memory Controller
CN100388187C (en) Apparatus for predicting multiple branch target addresses
CN100524202C (en) Data processing system, processor and method of data processing employing an improved instruction destination tag
US20080082793A1 (en) Detection and prevention of write-after-write hazards, and applications thereof
US6704854B1 (en) Determination of execution resource allocation based on concurrently executable misaligned memory operations
US6425090B1 (en) Method for just-in-time delivery of load data utilizing alternating time intervals

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant