CN1961285A - TLB correlated branch predictor and method for use therof - Google Patents

TLB correlated branch predictor and method for use therof Download PDF

Info

Publication number
CN1961285A
CN1961285A CNA2004800432090A CN200480043209A CN1961285A CN 1961285 A CN1961285 A CN 1961285A CN A2004800432090 A CNA2004800432090 A CN A2004800432090A CN 200480043209 A CN200480043209 A CN 200480043209A CN 1961285 A CN1961285 A CN 1961285A
Authority
CN
China
Prior art keywords
branch
shift register
value
translation look
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004800432090A
Other languages
Chinese (zh)
Other versions
CN1961285B (en
Inventor
C·赖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN1961285A publication Critical patent/CN1961285A/en
Application granted granted Critical
Publication of CN1961285B publication Critical patent/CN1961285B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

Embodiments of the present invention relate to an apparatus and method t enable efficient branch prediction in super-scalar and other branching-enabled processors. In accordance with an embodiment of the present invention, a branch predictor may be include a branch prediction circuit to predict a branch outcome in an executing instruction in a processor using an input form a translation look-aside buffer.

Description

TLB correlated branch predictor and using method thereof
Invention field
Embodiments of the invention relate to high-performance processor, relate in particular to the instruction branch predictor that uses translation look-aside buffer input and dynamic length global branch history.
Background
Along with the increase that branch instruction is sent the speed and the instruction pipelining degree of depth, branch prediction accurately is to the potential performance that realizes the superscale out-of-order processors ever more important that becomes.The branch predictor of some prior art or be implemented as the branch predictor of no global history, or be implemented as the two-stage branch predictor that has global history.
In some branch predictor, global history is made up of m closest branch, and each has all write down in the m position global shift register of whether taking branch and realizes therein.Unfortunately, existing global shift register only writes down the global history of regular length.Yet nearest research is pointed out, by using the global history of different length, may make from the different instruction of distinct program and experience better forecasting accuracy.
Fig. 1 is the circuit block diagram of a kind of branch predictor known in the art.In Fig. 1, the historical shift register 110 in m position comprises the single place shift input at m on the throne place and the single place shift output at 1 place on the throne, and wherein this single place shift input receives the indication of the branch that whether takes specific instruction.For example, value " 1 " is used for indication and takes branch, and " 0 " is used for indication and does not take this branch.Historical shift register 110 is used to store regular length, and (that is, the m bit length) global branch prediction history shifts out the highest significant position value, promptly primary value, and will export the whole m position global branch prediction history value that will store.
In Fig. 1, historical shift register 110 is coupled to XOR (EXCLUSIVE-OR) door 120, and historical shift register 110 value that will be stored in the m position global branch prediction history in the historical shift register 110 exports first input of XOR gate 120 to.XOR gate 120 also is coupled to the branch address register 130 that m position branch address is exported to XOR gate 120 second inputs.If from the m position branch address of branch address register 130 inputs and the m position global history coupling of importing from historical shift register 110, then XOR gate 120 exports m position global history to pattern history table 140.Should be noted that from the m position branch address of branch address register 130 input before being output, can be shifted, expansion or amputation, with the figure place of coupling from historical shift register 110 outputs.The result is, even the length of global branch prediction history value changes to some extent, still always is complementary with position from the global branch predicted value of historical shift register 110 inputs from the figure place of the m position branch address bit string of branch address register 130 outputs.
In Fig. 1, pattern history table 140 is by 2 mIndividual clauses and subclauses are formed, and wherein each clauses and subclauses in the form all comprise " local history ".Local history information is stored in 2 saturated branch predictors usually.Be used for selecting then clauses and subclauses from pattern history table 140 from the m position global history of XOR gate 120 outputs, these clauses and subclauses are used for carrying out prediction subsequently.By this design, use firm predicted entry to store effective historical information that wherein different branch instructions is relative to each other.
In Fig. 1,2 branch predictors are safeguarded one 2 digit counter.It will be based on its oneself content output branch prediction when being cited.For example, if " 10 " are 2 contents of fallout predictor (that is, the pattern history table clauses and subclauses) of distributing to a branch, then this branch prediction " is taked ".In a certain moment after a while, this content is updated after true directions is known.For example, if this branch " by taking ", then " 10 " are updated to " 11 ", and if this branch " is not taked ", then are updated to " 01 ".Generally speaking, when 2 bit counter value more than or equal to its peaked half, promptly 2 2-1, predict that then this branch is not taked at=2 o'clock.On the contrary, if 2 bit counter value, predict then that this branch is not taked less than 2.In other words,, predict that then this branch is taked if this 2 digit counter comprises " 10 " (that is, 2) or " 11 " (that is, 3), if but 2 digit counters comprise " 00 " (that is, 0) or " 01 " (that is, 1), predict that then this branch is not taked.
Local history means that the output of branch will depend on its oneself history, and global history means that then the output of branch depends on other branch histories.In following short code example, if first branch output " taking ", then second branch also exports " taking ".Then, one independently 2 branch predictors (, taking to have the pattern history entries of global history) corresponding to the d==0 of branch will be used to keep this to have the information of this global history and 2 grades of branch's prediction scheme.
If (d==0) if // d=0
D=1; // d=1 then is set
If (d==1) if // d=1
// then continue the conditional order of d=1
Unfortunately, because the global history register among Fig. 1 110 all only writes down the global history of regular length in all cases, so enough not good based on the accuracy of the branch prediction of regular length global history.For example, can not distinguish the previous branch instruction relevant exactly based on the branch prediction of regular length global history with current branch instruction.Similarly, be not only and use the global history of regular length can't always predict incoherent other branch instructions exactly, and relevantly exist in some cases and under other situation that they should exist, but do not exist.For example, in following example code, if memory operand is X, then Y can be owing to data locality has consecutive value.Branch predictor can be carried out aforesaid operations.Yet this relation will be destroyed along with the forfeiture of data locality.
If (d==0) if // d=0
D=X; // d=X then is set
If (d==Y) if // d=Y
// then continue the conditional order of d=Y
This kind situation shows overall being correlated with and not only depends on global history or branch address sometimes, also depends on data locality.As shown in the example above, when d in second instruction is configured to equate with X, and d is judged as when being not equal to Y in the 3rd instruction, data locality can take place lose.As a result, may not carry out the conditional order of d=Y.This can damage global history too.Therefore, a kind of branch predictor that can avoid above-mentioned deficiency of expectation.
The accompanying drawing summary
Fig. 1 is the circuit block diagram of branch predictor known in the art.
Fig. 2 is the circuit block diagram that is used for the translation look-aside buffer correlated branch predictor of processor according to an embodiment of the invention.
Fig. 3 is the process flow diagram of method according to an embodiment of the invention.
Fig. 4 is the block diagram that comprises the computer system of the one or more processors that use according to one embodiment of the invention and storer.
Describe in detail
Embodiments of the invention relate to a kind of apparatus and method that are used for the relevant branch prediction of translation look-aside buffer, include but not limited to, have or do not have the relevant branch predictor and/or the relevant branch predictor of two-stage translation look-aside buffer of global history translation look-aside buffering of distance to go branch history.For example, according to one embodiment of present invention, processor can comprise relevant branch predictor, and this branch predictor has the incoming line from translation look-aside buffer to the global branch history shift register.In the indication translation look-aside buffer when miss incoming line taking place can be used for the zero clearing of global branch history shift register.Because the global branch history that is stored in the global branch history shift register can be by the training of data part, so when translation look-aside buffer is miss, the zero clearing of global branch history shift register helped avoid the destruction of the global branch history that the non-data localization that caused by the translation look-aside buffer miss data caused.
Fig. 2 is the circuit block diagram that is used for the translation look-aside buffer correlated branch predictor of processor according to an embodiment of the invention.In Fig. 2, processor 200 can comprise the historical shift register 210 in m position, this history shift register 210 can comprise first single place shift input (being similar to the single place shift input among Fig. 1), the input of second single place shift and single place shift output (being similar to the single place shift output among Fig. 1), and wherein the input of first single place shift receives the indication whether branch about specific instruction is taked.Historical shift register 210 can be used for storing the dynamic length global branch history at an execution command.Historical shift register 210 can be used for storing the dynamic length global branch history that is used to execute instruction.Generally speaking, can use value for the highest significant position of " 1 " identifies effective history length, for example, if the highest effectively " 1 " on the 5th of m bit shift register, then global history can be confirmed as long m-5 position.As a result, whether " 1 " of highest significant position does not indicate branch to occur.According to one embodiment of present invention, " 1 " value can be used as the permission signal that indication branch is taked, and " 0 " can be used as the non-permission signal that indication branch is not taked.Historical shift register 210 can be used for storing the dynamic length global branch prediction history that maximum length is the m-1 position, and the value of output highest significant position, the i.e. value of m-1 position.Therefore, string " 0000 ... 01 " can indicating length be zero global history, this can indicate recently from historical shift register 210 flush global history.Similarly, according to one embodiment of present invention, think that string " 0000 ... 00 " is meaningless, because it can indicate non-existent global history length, and think that string " 1X ... Y " (wherein X and Y can be equal to " 0 " or " 1 ") comprises the longest possibility global history length, the i.e. length of m-1 position that register can comprise.
In Fig. 2, historical shift register 210 can be coupled to XOR gate 220, and historical shift register 210 can export first input of XOR gate 220 to being stored in m position global branch prediction history value in the historical shift register 210.XOR gate 220 also can be coupled to the branch address register 230 that m position branch address is exported to XOR gate 220 second inputs.If from the m position branch address of branch address register 230 inputs and the m position global history coupling of importing from historical shift register 210, then XOR gate 220 can export m position global history to pattern history table 240.Should be noted that m position branch address from branch address register 230 can be shifted before being output, expansion or amputation, be complementary with output figure place with historical shift register 210.As a result, even the length of global branch prediction history value changes to some extent, always be complementary with position from the global branch predicted value of historical shift register 210 inputs from the figure place of the m position branch address bit string of branch address register 230 output is still general.
In Fig. 2, pattern history table 240 can be by 2 mIndividual clauses and subclauses are formed, and wherein each clauses and subclauses in the table all can comprise one " local history ".Local history information is stored in 2 saturated branch predictors usually.Can be used for selecting then clauses and subclauses from the m position global history of XOR gate 220 outputs from pattern history table 240, these clauses and subclauses can be used for carrying out prediction.By this design, can use firm predicted entry to be stored in effective historical information that wherein different branch instructions is relative to each other.
Generally speaking, in Fig. 2, historical shift register 210 can be shifted as shown in Figure 1, but 2 exceptions are arranged, that is, when global branch history will be by flush and when the value of global history string equal " 1XYZ ...; " the time, wherein X, Y and Z can be equal to " 0 " or " 1 ".At first, in Fig. 2, if historical shift register 210 will be by flush, the global branch history string in the then historical shift register 210 can be cleared and be set to equal " 0000 ... 01 ".Secondly, when historical shift register 210 comprises the long global branch history in m-1 position, mean that promptly " 1 " can be stored in the highest significant position of historical shift register 210 (that is position 1), " 1 " value of storing on the throne 1 can be held, and the place value in the position 2 can be shifted out.
Historical shift register 210 can also be coupled to for example latched memory 250 of three-state buffer, this latched memory can receive the indication TLB whether have miss signal from translation look-aside buffer (" TLB ") (not shown), and this latched memory 250 also receives and store the input clear value of m position.The input clear value of m position can comprise complete " 0 " value except that the rightmost position number may be " 1 ", and for example, under the situation of m=16,16 input clear value can equal " 0000000000000001 ".When generation TLB was miss, the permission signal of the miss generation of indication TLB just can be sent by the TLB (not shown) on the TLB miss line 260.When the permission signal of the miss generation of indication TLB arrived at latched memory 250, the m position input clear value that is stored in the latched memory 250 just was read into historical shift register 210.As a result, historical shift register 210 can be by " zero clearing ", so that the current m place value that is stored in the historical shift register 210 can be by the m place value overwrite from for example " 0000000000000001 " of latched memory 250.
In Fig. 2, feedback circuit 270 can be coupled to 1 position, position and 2 positions, position in the historical shift register 210.Feedback circuit 270 can comprise with historical shift register 210 be coupled with the highest significant position that receives output and with or (OR) door 290 that be coupled with (AND) door 280, or door 290 can be coupled with 1 position, position and 2 positions, position in the historical shift register 210.Feedback circuit 270 can be used for keeping 1 value of the highest significant position in the position, m-1 position in the historical shift register 210.More specifically, can be coupled to the output of historical shift register 210 with first input 281 of door 280.Can receive " 1 " value with second input of door 280, this value can be carried out and computing with the output valve of historical shift register 210, thus obtain via output 287 from export to door 280 or door 290 first import 291 with value.Or second input 293 of door 290 can be coupled to 2 positions, position in the historical shift register 210 and receive value from this position.Or door 290 output 297 can be coupled in the historical shift register 210 1 position, position and will or value export this position to.Because have one group " 1 " input, so only have two kinds of possible input combinations, i.e. (0,1) and (1,1) with second input 283 of door 280.In any case, all have only two kinds with the possible output valve of door 280.That is, if the output valve of position, m-1 position also is " 1 " in the historical shift register 210, then can from door 280 outputs " 1 ", and if the output valve of position, m-1 position is " 0 " in the historical shift register 210, then can from door 280 outputs " 0 ".Similarly, though or door 290 also only has two identical possibility output valves (promptly, " 0 " or " 1 "), but because or first input, 291 and second input of door 290 293 all be not limited to single value, may import combination so the gained result can come from four, promptly (0,0), (0,1), (1,0) and (1,1).As seen in can the logical OR table from table 1, " 1 " can be used as four results' outputs of three in may the input values combination.Therefore, because always export " 1 " with position 1 value of door 280 in historical shift register 210 when " 1 ", so just can see, feedback circuit 270 is worth " 1 " in holding position 1 position, up to historical shift register 210 by the miss zero clearing of TLB.
Table 1
Export with door
Position 2 outputs 1 0
1 1 1
0 1 0
Embodiments of the invention can be realized in out-of-order processors, in this processor, get finger/decoding unit and can take out such as instructions such as macro instructions from memory locations such as for example instruction cache, and these instructions are decoded.For complex instruction set computer (CISC) (CISC) framework, get finger/decoding unit and complicated order can be decoded into one or more micro-instructions/operations.Generally speaking, these micro-orders have defined loading-store type architecture, can be to realizing such as other frameworks such as Reduced Instruction Set Computer (RISC) or very long instruction word (VLIW) framework so that relate to the micro-order of storage operation.
In typical R ISC framework, do not decode the instruction into micro-order.Because the present invention can realize RISC framework and CISC framework, so unless otherwise prescribed, otherwise just need not between instruction and micro-instructions/operations, to make differentiation, and be simply referred to as instruction.
Fig. 3 is the process flow diagram of method according to an embodiment of the invention.In Fig. 3, can use from the input of TLB from for example selecting a predicted entry (310) the pattern history table 240, and predict dynamically based on selected predicted entry and TLB input whether a branch is taked (320).Whether this method can receive about this branch by the actual information of taking (330), and whether is taked to upgrade predicted entry by actual based on this branch, for example upgrades in pattern history table 240 (340).Whether indicated branch whether by actual global history value of taking and pattern history table 240 in the renewable for example historical shift register 210 based on branch by actual taking; And can take out next branch instruction (360).Generally speaking, this method only just stops when processor cuts out or do not have extra instruction process to carry out.
In an alternative embodiment that does not clearly illustrate of the present invention, if extra branch instruction is not available immediately, then the method among Fig. 3 can stop and wait for more branch instruction.
Though the method among Fig. 3 may contain the particular order of carrying out this method, embodiments of the invention should not be limited on this order.In fact, can conceive wherein the embodiments of the invention that can carry out some or all elements of this method with any order, include but not limited in unordered (" OOO ") processor for example executed in parallel whole or in part.Similarly, though, the method among Fig. 3 is simplified to only reflects a branch at every turn, also can conceive wherein and can handle the embodiments of the invention of a plurality of branches simultaneously, and these embodiment are subjected to the dependent restriction of any available data natch for ease of illustrating.
Below the false code of Jian Huaing partly shows the operation of the realization of TLB correlated global history branch predictor according to an embodiment of the invention.
check_and_initialize_predictor(argc,argv,&inTrace,&aPredictor);
while(!inTrace->EndOfTrace()){
aPredictor->SelectPredictionEntry(inTrace->GetAddress(),inTrace->TLBMissOrNot());
// be TLB information here
bool pr-taken=aPredictor->prediction(inTrace->ForwardBranchOrNot());
// permission static prediction
aPredictor->UpdatePredictor(inTrace->TakenOrNot(),pr_taken);
// after knowing the real goal of branch, upgrade pattern history table and the global register that is shifted
InTrace-〉read_trace (); // read next branch instruction in this simulation
}
aPredictor->Show AccuracyQ;
For example, in above false code, can see that fallout predictor upgrades these predictions in instruction term of execution operation with the result that predicts each branch in this instruction and after learning realistic objective.Though above-mentioned pseudo-code example may hint the serial execution, it has just illustrated general conception, and can conceive the optional embodiment that the wherein parallel and/or unordered execution of each branch can take place according to any data dependency constrained each other natch.
Fig. 4 is the block diagram according to the computer system that comprises one or more processors and storer of one embodiment of the invention use.In Fig. 4, computer system 400 can comprise the one or more processors 410 (1) to 410 (n) that are coupled to memory bus 420, and memory bus 420 can be coupled to system logic 430.In one or more processors 410 (1) to 410 (n) each can be the N bit processor, and can comprise demoder (not shown) and one or more N bit register (not shown).System logic 430 can be coupled to system storage 440 by bus 450, and can be coupled to nonvolatile memory 470 and one or more peripherals 480 (1) to 480 (m) by peripheral bus 460.Peripheral bus 460 can represent for example to meet one or more peripheral component interconnect (pci) bus of PCI Local BusSpecification (PCI local bus specification) revised edition 2.2 of the PCI Special Interest Group (SIG) that announced on Dec 18th, 1998; The ISA(Industry Standard Architecture) bus; The EISA Specification (EISA standard) 3.12 editions that meets the BCPR Services Inc that announced in 1992,1992 expansion ISA (EISA) bus; The USB (universal serial bus) (USB) that meets the USB Specification (USB standard) that announced on September 23rd, 1,998 1.1 editions; And similar peripheral bus.Nonvolatile memory 470 can be such as sram devices such as ROM (read-only memory) (ROM) or flash memories.Peripherals 480 (1) to 480 (m) can comprise for example keyboard; Mouse or other pointing devices; Such as mass-memory units such as hard disk drive, compact-disc (CD) driver, CD and digital video disc (DVD) drivers; Display or the like.
Though disclose the present invention in detail, should be appreciated that, also can make various changes, replacement and change at this.In addition, though described the software and hardware of controlling some function, these functions can use the combination of software, hardware or software and hardware to realize as known in the art like that.Equally, in claims, term " instruction " can comprise instruction in the RISC framework or the instruction in the CISC framework, and the instruction of using in other computer architectures.Those of ordinary skills can easily determine and make other examples under the prerequisite that does not deviate from the spirit and scope of the present invention that defined by appended claims.

Claims (39)

1. branch predictor comprises:
Branch prediction circuit, its uses input from translation look-aside buffer to come branch outcome in the execution command in the prediction processor.
2. branch predictor as claimed in claim 1 is characterized in that, described branch prediction circuit comprises:
Pattern history table; And
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, described historical shift register when the miss signal that receives from described translation look-aside buffer with himself zero clearing.
3. branch predictor as claimed in claim 2 is characterized in that, described branch prediction circuit also comprises:
Be coupled to the storer of described historical shift register, described storer is delivered to described historical shift register with a reset values when the miss signal that receives from described translation look-aside buffer.
4. branch predictor as claimed in claim 3 is characterized in that, described storer comprises:
Three-state buffer.
5. branch predictor as claimed in claim 3 is characterized in that, described branch prediction circuit also comprises:
Be coupled to the feedback loop of described historical shift register, described feedback loop keeps the value of highest significant position in the described historical shift register.
6. branch predictor as claimed in claim 5 is characterized in that described feedback loop remains 1 with the value of described highest significant position.
7. branch predictor as claimed in claim 5 is characterized in that, the highest effective 1 value place bit position has determined to be stored in the length of the global branch history in the described historical shift register in described historical shift register.
8. branch predictor as claimed in claim 7 is characterized in that, the length that is stored in the described global branch history in the described historical shift register is by the highest described effective 1 value place bit position definition.
9. branch predictor as claimed in claim 5 is characterized in that, described feedback loop comprises:
Be coupled to described historical shift register with door, describedly receive the output place value of described historical shift register and allow signal with door; And
Be coupled to described and door and described historical shift register or door, described or door receive from described with first input value and from second input value of described historical shift register, and export a new place value to described historical shift register.
10. branch predictor as claimed in claim 2 is characterized in that described historical shift register contains the global branch history of distance to go.
11. branch predictor as claimed in claim 2 is characterized in that, described historical shift register comprises the m position, and exports a m bit pattern history value to described pattern history table via XOR gate.
12. branch predictor as claimed in claim 11 is characterized in that, described XOR gate receives a described m bit pattern history value and a m position branch address value, and exports a m bit pattern history value to described pattern history table.
13. a branch predictor comprises:
The branch prediction circuit that comprises m position global branch history;
Be coupled to the storer of translation look-aside buffer and described branch prediction circuit, described branch prediction circuit resets during the miss indication of described storer in receiving described translation look-aside buffer; And
Be coupled to the feedback loop of described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1.
14. branch predictor as claimed in claim 13 is characterized in that, described branch prediction circuit comprises:
Pattern history table; And
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, described historical shift register when the miss indication that receives from described translation look-aside buffer with himself zero clearing;
Branch address storer for each branch's memory address of indicating in the described historical shift register.
15. branch predictor as claimed in claim 14 is characterized in that, described storer is coupled to described historical shift register.
16. branch predictor as claimed in claim 13 is characterized in that, described storer comprises:
Three-state buffer.
17. branch predictor as claimed in claim 13 is characterized in that, described feedback loop comprises:
Be coupled to described historical shift register with door, describedly receive the output place value of described historical shift register and allow signal with door; And
Be coupled to described and door and described historical shift register or door, described or door receive from described with first input value and from second input value of described historical shift register, and export a new place value to described historical shift register.
18. a processor comprises:
Translation look-aside buffer;
The branch prediction circuit that comprises m position global branch history;
Be coupled to the storer of described translation look-aside buffer and described branch prediction circuit, described branch prediction circuit resets during the miss indication of described storer in receiving described translation look-aside buffer; And
Be coupled to the feedback loop of described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1.
19. processor as claimed in claim 18 is characterized in that, described branch prediction circuit comprises:
Pattern history table;
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, described historical shift register when the miss indication that receives from described translation look-aside buffer with himself zero clearing; And
Branch address storer for each branch's memory address of indicating in the described historical shift register.
20. processor as claimed in claim 19 is characterized in that, described storer is coupled to described historical shift register.
21. processor as claimed in claim 18 is characterized in that, described storer comprises:
Three-state buffer.
22. processor as claimed in claim 18 is characterized in that, described feedback loop comprises:
Be coupled to described historical shift register with door, describedly receive the output place value of described historical shift register and allow signal with door; And
Be coupled to described and door and described historical shift register or door, described or door receive from described with first input value and from second input value of described historical shift register, and export a new place value to described historical shift register.
23. a computing system comprises:
Storer;
Be coupled to the processor of described storer, described processor comprises
Translation look-aside buffer;
The branch prediction circuit that comprises m position global branch history;
Be coupled to the storer of described translation look-aside buffer and described branch prediction circuit, described branch prediction circuit resets during the miss indication of described storer in receiving described translation look-aside buffer; And
Be coupled to the feedback loop of described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1.
24. computing system as claimed in claim 23 is characterized in that, described branch prediction circuit comprises:
Pattern history table;
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, described historical shift register is receiving miss indication from described translation look-aside buffer with himself zero clearing;
Branch address storer for each branch's memory address of indicating in the described historical shift register.
25. computing system as claimed in claim 24 is characterized in that, described storer is coupled to described historical shift register.
26. a method comprises:
Use comes the branch outcome of many execution commands in the prediction processor from the input of a translation look-aside buffer.
27. method as claimed in claim 26 is characterized in that, uses the input from translation look-aside buffer to come the branch outcome of many execution commands of volume in the prediction processor to comprise:
Predict the branch outcome of each bar of described many execution commands;
The branch outcome of each bar of described many execution commands that maintenance is predicted; And
Receive about with the translation look-aside buffer of one of described many execution commands data that are associated in when miss indication takes place with described global branch history zero clearing.
28. method as claimed in claim 27 is characterized in that, described global branch history zero clearing is comprised receiving when miss indication takes place in translation look-aside buffer:
Replace described global branch history with a predetermined clear value.
29. a machine readable media that stores the executable instruction that is used to carry out a kind of method on it, described method comprises:
Use comes the branch outcome of many execution commands in the prediction processor from the input of translation look-aside buffer.
30. machine readable media as claimed in claim 29 is characterized in that, uses the input from translation look-aside buffer to come the branch outcome of many execution commands in the prediction processor to comprise:
Predict the branch outcome of each bar of described many execution commands;
The branch outcome of each bar of described many execution commands that maintenance is predicted; And
Receive about with the translation look-aside buffer of one of described many execution commands data that are associated in when miss indication takes place with described global branch history zero clearing.
31. machine readable media as claimed in claim 30 is characterized in that, described global branch history zero clearing is comprised receiving when miss indication takes place in described translation look-aside buffer:
Replace described global branch history with a predetermined clear value.
32. a method comprises:
Use is selected a predicted entry from the input of translation look-aside buffer;
Predict based on described predicted entry and described input whether a branch can be taked;
Reception about described branch whether by the actual information of taking;
Whether use about described branch by the described predicted entry of the actual information updating of taking;
Whether upgrade the global history value is taked by actual to indicate described branch; And
Take out next branch instruction.
33. method as claimed in claim 32 is characterized in that, uses and selects predicted entry to comprise from the input of translation look-aside buffer:
Use is selected a predicted entry from the input of described translation look-aside buffer from pattern history table.
34. method as claimed in claim 32 is characterized in that, upgrades described predicted entry and comprises:
Upgrade the described predicted entry in the pattern history table.
35. whether method as claimed in claim 32 is characterized in that, upgrade the global history value and comprised by actual taking to indicate described branch:
Whether the described global history value of upgrading in the global shift register is taked by actual to indicate described branch.
36. a machine readable media that stores the executable instruction that is used to carry out a kind of method on it, described method comprises:
Use is selected a predicted entry from the input of translation look-aside buffer;
Predict based on described predicted entry and described input whether a branch can be taked;
Reception about described branch whether by the actual information of taking;
Whether use about described branch by the described predicted entry of the actual information updating of taking;
Whether upgrade the global history value is taked by actual to indicate described branch; And
Take out next branch instruction.
37. machine readable media as claimed in claim 36 is characterized in that, uses and selects predicted entry to comprise from the input of translation look-aside buffer:
Use is selected a predicted entry from the input of described translation look-aside buffer from pattern history table;
Whether upgrade the global history value is taked by actual to indicate described branch; And
Take out next bar branch instruction.
38. machine readable media as claimed in claim 36 is characterized in that, upgrades described predicted entry and comprises:
Upgrade the predicted entry in the described pattern history table.
39. whether machine readable media as claimed in claim 36 is characterized in that, upgrade the global history value and comprised by actual taking to indicate described branch:
Whether the described global history value of upgrading in the global shift register is taked by actual to indicate described branch.
CN2004800432090A 2004-06-02 2004-06-02 TLB correlated branch predictor and method for use therof Expired - Fee Related CN1961285B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2004/000583 WO2005119428A1 (en) 2004-06-02 2004-06-02 Tlb correlated branch predictor and method for use therof

Publications (2)

Publication Number Publication Date
CN1961285A true CN1961285A (en) 2007-05-09
CN1961285B CN1961285B (en) 2011-05-25

Family

ID=35463053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2004800432090A Expired - Fee Related CN1961285B (en) 2004-06-02 2004-06-02 TLB correlated branch predictor and method for use therof

Country Status (4)

Country Link
JP (1) JP4533432B2 (en)
CN (1) CN1961285B (en)
DE (1) DE112004002877T5 (en)
WO (1) WO2005119428A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7984279B2 (en) * 2006-11-03 2011-07-19 Qualcomm Incorporated System and method for using a working global history register
GB2577708B (en) * 2018-10-03 2022-09-07 Advanced Risc Mach Ltd An apparatus and method for monitoring events in a data processing system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04337833A (en) * 1991-05-15 1992-11-25 Koufu Nippon Denki Kk Data processor
IE940855A1 (en) * 1993-12-20 1995-06-28 Motorola Inc Data processor with speculative instruction fetching and¹method of operation
US6079003A (en) * 1997-11-20 2000-06-20 Advanced Micro Devices, Inc. Reverse TLB for providing branch target address in a microprocessor having a physically-tagged cache
US5938761A (en) * 1997-11-24 1999-08-17 Sun Microsystems Method and apparatus for branch target prediction
US6427206B1 (en) * 1999-05-03 2002-07-30 Intel Corporation Optimized branch predictions for strongly predicted compiler branches
US6546481B1 (en) * 1999-11-05 2003-04-08 Ip - First Llc Split history tables for branch prediction
US6681345B1 (en) * 2000-08-15 2004-01-20 International Business Machines Corporation Field protection against thread loss in a multithreaded computer processor

Also Published As

Publication number Publication date
JP4533432B2 (en) 2010-09-01
JP2008501166A (en) 2008-01-17
CN1961285B (en) 2011-05-25
DE112004002877T5 (en) 2007-05-03
WO2005119428A1 (en) 2005-12-15

Similar Documents

Publication Publication Date Title
KR101651911B1 (en) Parallel apparatus for high-speed, highly compressed lz77 tokenization and huffman encoding for deflate compression
US7024545B1 (en) Hybrid branch prediction device with two levels of branch prediction cache
TWI507980B (en) Optimizing register initialization operations
US6502188B1 (en) Dynamic classification of conditional branches in global history branch prediction
US6275927B2 (en) Compressing variable-length instruction prefix bytes
KR101597774B1 (en) Processors, methods, and systems to implement partial register accesses with masked full register accesses
TW201704991A (en) Backwards compatibility by algorithm matching, disabling features, or throttling performance
TWI733760B (en) Memory copy instructions, processors, methods, and systems
US20170286114A1 (en) Processors, methods, and systems to allocate load and store buffers based on instruction type
US9473168B1 (en) Systems, methods, and apparatuses for compression using hardware and software
CN1582429A (en) System and method to reduce execution of instructions involving unreliable data in a speculative processor
US20120089807A1 (en) Method and apparatus for floating point register caching
GB2583415A (en) Read and write masks update instruction for vectorization of recursive computations over independent data
RU2639695C2 (en) Processors, methods and systems for gaining access to register set either as to number of small registers, or as to integrated big register
CN105453030A (en) Mode dependent partial width load to wider register processors, methods, and systems
US10289752B2 (en) Accelerator for gather-update-scatter operations including a content-addressable memory (CAM) and CAM controller
CN105247479A (en) Instruction order enforcement pairs of instructions, processors, methods, and systems
US6212621B1 (en) Method and system using tagged instructions to allow out-of-program-order instruction decoding
US6460116B1 (en) Using separate caches for variable and generated fixed-length instructions
KR102521929B1 (en) Implementation of register renaming, call-return prediction and prefetching
US20050228977A1 (en) Branch prediction mechanism using multiple hash functions
US10069512B2 (en) Systems, methods, and apparatuses for decompression using hardware and software
US20210303468A1 (en) Apparatuses, methods, and systems for a duplication resistant on-die irregular data prefetcher
EP3109754A1 (en) Systems, methods, and apparatuses for improving performance of status dependent computations
CN1961285A (en) TLB correlated branch predictor and method for use therof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110525

Termination date: 20150602

EXPY Termination of patent right or utility model