US20020124162A1 - Computer system and method for fetching a next instruction - Google Patents

Computer system and method for fetching a next instruction Download PDF

Info

Publication number
US20020124162A1
US20020124162A1 US09/927,346 US92734601A US2002124162A1 US 20020124162 A1 US20020124162 A1 US 20020124162A1 US 92734601 A US92734601 A US 92734601A US 2002124162 A1 US2002124162 A1 US 2002124162A1
Authority
US
United States
Prior art keywords
branch
instruction
instructions
nfapd
predictive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/927,346
Inventor
Robert Yung
Kit Tam
Alfred Yeung
William Joy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority to US09/927,346 priority Critical patent/US20020124162A1/en
Publication of US20020124162A1 publication Critical patent/US20020124162A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems

Definitions

  • the present invention relates to the field of computer systems. More specifically, the present invention relates to a computer system having a minimum latency cache which stores instructions decoded to determine class, branch prediction and next fetch address prediction information.
  • Static prediction Under this approach, the higher probability direction for a particular branch instruction is ascertained. When the branch instruction is fetched, the ascertained direction is always taken. For example, a direction for a branch instruction maybe set to “Branch Taken”, or alternatively, set to “Branch Not Taken”.
  • the static prediction approach is simple to implement, however, its prediction hit rate is generally less than 75%. Such a prediction hit rate is generally too low for high performance computers.
  • the dynamic software prediction approach generally works quite well when used in conjunction with a compilation technique known as trace scheduling. Without trace scheduling, the prediction hit rate is generally very low. Unfortunately, trace scheduling is difficult to apply to some programs and implementations.
  • the dynamic hardware prediction generally provides an adequate prediction hit rate. However, it increases the complexity of the processor design and requires additional hardware to maintain the separate branch prediction table. Further, if the size of a cache is enlarged in a redesign, the size of the table would also have to be increased, complicating the redesign process.
  • the present invention relates to a novel computer system.
  • the computer system includes a low latency cache that stores instructions decoded to determine class, branch prediction information, and next address fetch information.
  • the present invention includes a cache having a plurality of cache lines.
  • Each cache line includes (n) instructions and (n) instruction class (ICLASS) fields for storing the decoded class information of the instructions respectively.
  • Each cache line also includes one or more branch prediction (BRPD) fields and one or more next fetch address prediction (NFAPD) fields.
  • BRPD branch prediction
  • NAPD next fetch address prediction
  • the corresponding ICLASS field, BRPD field information and the NFAPD information are all provided to the prefetch and dispatch unit of the computer system.
  • the ICLASS information informs the prefetch unit if the fetched instruction is a branch. Since the instruction has already been decoded to determine it's class, the need to perform a partial decode in the prefetch and dispatch unit to determine if an instruction is a branch instruction is avoided. If the instruction is a branch instruction, the BRPD field provides a prediction of either “Branch Taken” or “Branch Not Taken”. For non-branch instructions, the BRPD field is ignored. For non-branch instructions, the NFAPD typically contains the next sequential address.
  • the NFAPD contains either the next sequential address or the target address of the branch instruction. If the BRPD field contains a “Branch Taken” prediction, the corresponding NFAPD field typically contains the target address for the branch instruction. Alternatively, if the BRPD field contains a “Branch Not Taken” status, the corresponding NFAPD field typically contains the next sequential address. In any event, the NFAPD information is used to define the next line from the cache to be fetched, thereby avoiding the need to calculate the next fetch address in the prefetch unit. The prefetch and dispatch unit needs to calculate the next fetch address only when a misprediction of a branch instruction occurs. An update policy is used to correct the BRPD and the NFAPD values in the event the predictions turn out to be wrong.
  • the number of BRPD fields and NFAPD fields per cache line varies depending on the specific embodiment of the present invention.
  • a specific BRPD field and an NFAPD field is provided for each instruction per cache line. If there is more than one branch instruction per cache line, each branch instruction enjoys the benefit of a dedicated branch prediction and next fetch address prediction.
  • one BRPD field and one NFAPD field is shared among all the instructions per cache line. Under these circumstances, only a dominant instruction in the cache line makes use of the BRPD and the NFAPD information.
  • a dominant instruction is defined as the first branch instruction with a “Branch Taken” status in the cache line.
  • the BRPD field is set to “Branch Taken”, and the NFAPD typically contains the target address for the dominant branch instruction.
  • control is typically transferred to the target address of the dominant instruction. Since the dominant instruction is the first instruction in a cache line to cause a control transfer, it is not necessary for the other instructions in the cache line to have their own BRPD fields and NFAPD fields respectively.
  • the present invention represents a significant improvement over the prior art.
  • the need to perform a partial decode or a next fetch address calculation in the prefetch and dispatch unit is eliminated with a vast majority of die fetched instructions. As such, fetch latency is significantly reduced and processor throughput is greatly enhanced.
  • FIG. 1 is a block diagram of a computer system according to the present invention.
  • FIG. 2 illustrates a block diagram of an instruction cache in the computer system of the present invention.
  • FIG. 3 illustrates a block diagram of an instruction prefetch and dispatch unit used in the computer system of the present invention.
  • FIGS. 4 a - 4 b are two flow diagrams illustrating the operation of the instruction prefetch and dispatch unit.
  • FIG. 5 is a flow diagram illustrating the operation of the instruction cache.
  • FIG. 6 illustrates exemplary line entries in the instruction cache used in the computer system of the present invention.
  • the computer system 10 includes an instruction prefetch and dispatch unit 12 .
  • execution units 14 an instruction cache 16 , a data cache 18 , a memory unit 20 and a memory management unit 22 .
  • the instruction cache 16 and data cache 18 are coupled to the instruction prefetch and dispatch unit 12 , the execution units 14 , and the memory management unit 22 respectively.
  • the prefetch and dispatch unit 12 is coupled to the execution units 14 and the memory management unit 22 .
  • the data cache 18 is coupled to memory 20 .
  • the instruction cache 16 is coupled to memory 20 .
  • the memory management unit 22 and the prefetch and dispatch unit 12 fetch instructions from instruction cache 16 and data from the data cache 18 respectively and dispatch them as needed to the execution units 14 .
  • the results of the executed instructions are then stored in the data cache 18 or main memory 20 .
  • the other elements, 14 and 18 through 22 are intended to represent a broad category of these elements found in most computer systems.
  • the components and the basic functions of these elements 14 , and 18 through 22 are well known and will not be described further. It will be appreciated that the present invention may be practiced with other computer systems having different architectures. In particular, the present invention may be practiced with a computer system having no memory management unit 22 . Furthermore, the present invention may be practiced with a unified instruction/data cache or an instruction cache only.
  • the instruction cache 16 includes an instruction array 24 , a tag array 26 , an ICLASS array 27 , a predictive annotation array 28 , and selection logic 30 .
  • the cache is segmented into a plurality of cache lines 34 1 through 34 x .
  • Each cache line 34 includes (n) instructions in the instruction array 24 , (m) branch prediction BRPD fields 40 , (k) next address prediction NFAPD fields 42 in the predictive annotation array 28 , (n) ICLASS fields 44 in the ICLASS array 27 , and (n) tags in the tag array 26 .
  • the instruction cache 16 may be set associative. With such in embodiment, individual arrays 24 through 29 are provided for each set in the instruction cache 16 .
  • Each of the (n) instructions per cache line 34 contained in the instruction cache 16 are decoded to determine their class.
  • the instruction class encodings are stored in the appropriate ICLASS field 44 , when the cache line 34 is being brought into the instruction cache 16 .
  • the instruction class encodings are stored before the cache line 34 is brought into the instruction cache 16 . Examples of instruction classes are the program counter (PC) relative branch, register indirect branch, memory access, arithmetic and floating point.
  • PC program counter
  • the instruction cache 16 When the instruction cache 16 receives a next fetch address from the instruction prefetch and dispatch unit 12 , the appropriate cache line 34 is accessed.
  • the (n) instructions, the (m) BRPD fields 40 , the (k) NFAPD fields 42 , the (n) ICLASS fields 44 , and the corresponding tag information, of the cache line are provided to the selection logic 30 .
  • the selection logic 30 selects the proper line from the plurality of sets. With embodiments having only a single set, the selection logic 30 simply passes the accessed line 34 to the instruction prefetch and dispatch unit 12 .
  • the set selection logic 30 is intended to represent a broad category of selection logic found in most computer systems, including the selection logic described in U.S.
  • the BRPD fields 40 and NFAPD fields 42 are initialized in accordance with a pre-established policy when a cache line 34 is brought into the cache 16 .
  • the corresponding ICLASS field 44 information, BRPD field 40 information and the NFAPD field 42 information are all provided to the prefetch and dispatch unit 12 . Since the instruction has already been decoded to determine class, the need to perform a full decode in the prefetch and dispatch unit 12 to determine if an instruction is a branch instruction is avoided. If the instruction is a non-branch instruction, the BRPD information is ignored.
  • the NFAPD information provides the next address to be fetched, which is typically the sequential address of the next line in the instruction cache 16 .
  • a predecoded instruction is a branch instruction
  • the corresponding BRPD field 40 contains either a “Branch Taken” or a “Branch Not Taken” prediction and the NFAPD field 42 contains a prediction of either the target address of the branch instruction or the sequential address of the next line 34 in the instruction cache 16 . Regardless of the type of instruction, the predicted next address is used to immediately fetch the next instruction.
  • an update policy is used to update the entries in the corresponding BRPD field 40 and the NFAPD field 42 when the actual direction of the branch instruction and the actual next fetch address is resolved in the execution units 14 . If the branch prediction and next fetch address prediction were correct, execution continues and the BRPD field 40 or the NFAPD field 42 are not altered. On the other hand, if either prediction is wrong, the BRPD field 40 and the NFAPD field 42 are updated as needed by the prefetch and dispatch unit 12 . If the misprediction caused the execution of instructions down an incorrect branch path, execution is stopped and the appropriate execution units 14 are flushed. Execution of instructions thereafter resumes along the correct path. The next time the same instruction is fetched, a branch prediction decision is made based on the updated branch prediction information in the BRPD field 40 and the next prefetch address is based on the updated contents of NFAPD field 42 .
  • the BRPD fields 40 and NFAPD fields 42 are updated in accordance with a specified update policy.
  • a specified update policy For the sake of simplicity, only a single bit of information is used for the BRPD field 40 . This means that the BRPD field 40 can assume one of two states, either “Branch Taken” or “Branch Not Taken”.
  • One possible update policy is best described using a number of examples, as provided below.
  • the BRPD field and the NFAPD field are updated to the actual branch taken and actual next fetch address.
  • more sophisticated branch prediction algorithms may be used. For example, multiple bits may be used for the BRPD field 42 , thereby providing finer granularity and more information about each branch prediction.
  • each branch instruction per cache line 34 enjoys the benefit of a dedicated branch prediction and next fetch address prediction as stored in BRPD field 40 and corresponding NFAPD field 42 respectively.
  • only the dominant instruction in the cache line 34 makes use of the branch prediction information and the next fetch address information.
  • a dominant instruction is defined as the first branch instruction with a “Branch Taken” status in the cache line 34 . Therefore, the BRPD contains a “Branch Taken” prediction and the corresponding NFAPD typically contains the target address for the dominating instruction. Since the dominant instruction is the first instruction in the cache line to cause a control transfer, it is not necessary for the other instructions to have their own BRPD fields 40 and NFAPD fields 42 .
  • BRPD fields 40 and NFAPD fields 42 are design dependent. As the number of BRPD fields 40 (m) and NFAPD fields 42 (k) increases toward the number of instructions (n) per cache line 34 , the likelihood of branch and next fetch address prediction hits will increase. In contrast, as the number of BRPD fields 40 and NFAPD fields 42 approaches one, the likelihood of mispredictions increases, but the structure of cache 16 is simplified.
  • the prefetch and dispatch unit 12 includes a comparator 68 , a next fetch address (NFA) register 70 , an instruction queue 72 , an update unit 74 , and a dispatch unit 76 .
  • the comparator 68 is coupled to receive the BRPD field 40 and the NFAPD field 42 information from instruction cache 16 and the actual branch direction and next fetch address from the execution units 14 . It should be noted that the actual branch and next fetch address typically arrive at the comparator 68 at a later point in time since a certain period of time is needed for the actual branch to resolve in the execution units 14 .
  • the comparator 68 determines if the BRPD and the NFAPD are respectively correct, i.e., a hit. If the comparison yields a miss, the BRPD field and/or the NFAPD field 42 information is updated by update circuit 74 in accordance with the update policy described above. The updated BRPD and/or NFAPD information is then returned to the instruction cache 16 . The actual NFA also is placed in the NFA register 70 .
  • FIG. 4 a and FIG. 4 b two flow diagrams illustrating the operation of the prefetch and dispatch until 12 are shown.
  • the instruction prefetch and dispatch unit 12 determines if a fetch/prefetch should be initiated (block 94 ), if a fetch/prefetch should be initiated, the instruction prefetch and dispatch unit 12 uses the address stored in the NFA register 70 to fetch the next instruction from instruction cache 16 (block 96 ).
  • the instruction cache 16 provides the instruction prefetch and dispatch unit 12 with the requested instruction.
  • the instruction is then placed into the instruction queue 72 . Thereafter, the instruction is dispatched by dispatch unit 76 .
  • the corresponding NFAPD value is placed in the NFA register 70 and is used to fetch the next instruction.
  • the comparator 68 determines that the NFAPD is incorrect, the actual NFA is placed into the NFA register 70 , and the fetching of instructions resumes at the actual NFA.
  • the instruction prefetch and dispatch unit repeats the above process steps until the instruction queue 72 is empty or the computer system is shut down.
  • the instruction prefetch and dispatch unit 12 also receives a branch resolution signal 200 (actual branch) as the branch instruction completes execution in the execution units 14 (block 108 ). The instruction prefetch and dispatch unit 12 then determines if the branch prediction is correct (diamond 110 ). If the predicted branch is incorrect, the instruction prefetch and dispatch unit 12 updates the selected BRPD field 40 and the NFAPD field 42 in accordance with the above-defined update policy (block 114 ). If the selected BRPD predicted the branch direction correctly, the instruction prefetch and dispatch unit 12 determines if the next address in the NFAPD field is correct (block 112 ). If the selected NFAPD predicted the next fetch address incorrectly, the instruction prefetch and dispatch unit 12 updates the NFAPD (block 116 ). If the NFAPD is correct, its status remains unchanged.
  • a branch resolution signal 200 actual branch
  • the instruction cache 16 receives the fetch address from the instruction prefetch and dispatch unit 12 (block 74 ). In response, the instruction cache 16 determines if there is a cache hit (block 76 ). If there is a cache hit, selection logic 30 , if necessary, selects and provides the appropriate set of instructions and the corresponding ICLASS field 44 , BRPD field 40 and NFAPD field 42 information to the instruction prefetch and dispatch unit 12 .
  • the instruction cache 16 initiates a cache fill procedure (block 80 ).
  • the instructions accessed from memory 20 are provided directly to prefetch and dispatch unit 12 .
  • the instructions may be provided to the instruction prefetch and dispatch unit 12 after the cache line is filled in cache 16 .
  • the instructions are decoded to determine their class prior to being stored in the instruction cache 16 .
  • the BRPD field 40 and NFAPD field 42 are initialized in accordance with the initialization policy of the branch and next fetch address prediction algorithm (block 86 ).
  • the BRPD field 42 contains only I bit of information, and therefore can assume only two states; “Branch Taken” and “Branch Not Taken”.
  • FIG. 6 several lines 34 1 - 34 7 of the instruction cache 16 is shown.
  • there are four instructions (n 4) per cache line 34 .
  • the four instructions are labeled, from left to right 4, 3, 2, 1, respectively, as illustrated in column 101 of the cache 16 .
  • a “1” bit indicates that the instruction in that position is a branch instruction.
  • a “0” bit indicates that the instruction is some other type of instruction, but not a branch instruction.
  • the BRPD fields 40 for the cache lines 34 are provided.
  • a “0” value indicates a “Branch Not Taken” prediction and a “1” value indicates “Branch Taken” prediction.
  • the BRPD information provides the branch prediction only for the dominant instruction in the cache line.
  • the column 105 contains the next fetch address in the NFAPD field 42 .
  • the four instructions are all non-branch instructions, as indicated by the four “0” in column 101 .
  • the corresponding BRPD field 40 is set to “0” “Branch Not Taken” and the NFAPD field 42 is set to the sequential address.
  • the second and third cache lines 34 2 and 34 3 each include one branch instruction respectively.
  • the branch instruction is located in the first position, as indicated by the “1” in the first position of column 101 .
  • the corresponding BRPD field is set to “0”, and NFAPD is set to “next sequ addr 1 ”. Accordingly, the branch prediction is “Branch Not Taken”, and the NFAPD is the next sequential address (i.e., 34 3 ).
  • the first instruction is a branch instruction.
  • the corresponding BRPD field is set to “1”, and NFAPD is set to “target addr 1 ”.
  • the branch prediction algorithm thus predicts “Branch Taken”, and the next fetch address is the “target address 1 ” of the first instruction.
  • the fourth cache line 34 4 and fifth cache line 35 5 provide examples of cache lines 34 having two branch instructions.
  • the branch instructions are located in the first and third positions in column 101 .
  • cache line 344 both instructions have a branch prediction set to “B ranch Not Taken”, i.e., there are no dominant instructions.
  • the corresponding field BRPD is therefore set to “0”, and NFAPD is set to “next sequ addr”.
  • the branch prediction algorithm predicts “Branch Taken” for the first branch instruction.
  • the first instruction in the cache 35 5 is therefore the dominant instruction of the cache line.
  • the corresponding BRPD field is set to “I”, and NFAPD is set to “target addr 1 ”. Since the dominant instruction will cause a control transfer, the branch prediction and next fetch address for the third instruction are not necessary.
  • the sixth 34 6 and seventh 34 7 cache lines provide two more examples of cache lines having two branch instructions. In both cache lines, the first and third instruction are branch instructions. In the sixth cache line 346 , the branch prediction is “Branch Not Taken”, but the prediction for the second branch instruction is, “Branch Taken”.
  • the third instruction is considered the dominant instruction and the NFAPD field contains the target address for the third instruction of the line.
  • BRPD is set to “1”
  • NFAPD is set to “target address 3 ”.
  • the branch prediction for both branch instructions is “Branch Taken”. Since the first instruction is the dominant instruction of the line, the BRPD field is set to “Branch Taken” “1” and the NFAPD field is set to “target addr 1 ”.
  • the operation of the present invention is straight forward.
  • the BRPD field 40 and the NFAPD field 42 for each branch instruction are used to predict the “Branch Taken” and next fetch address. Further, the BRPD field 40 and the NFAPD field 42 are updated in accordance with the outcome of the respective branch instruction when executed.

Abstract

N instruction class (IClass) fields, m branch prediction (BRPD) and k next fetch address fields are added to each instruction set of n instructions of a cache line of an instruction cache, where m and k are less than or equal to n. The BRPD and NFAPD fields of a cache line are initialized in accordance to a pre-established initialization policy of a branch and next fetch address prediction algorithm while the cache line is first brought into the instruction cache. The sets of IClasses, BRPDS, and NFAPDs of a cache line are accessed concurrently with the corresponding sets of instructions of the cache line. One BRPD and one NFAPD is selected from the set of BRPDs and NFAPDs corresponding to the selected set of instructions. The selected BRPD and NFAPD are updated in accordance to a pre-established update policy of the branch and next fetch address prediction algorithm when the actual branch direction and next fetch address are resolved. Additionally, in one embodiment, m and k are equal to 1, and the selected NFAPD is stored immediately into the NFA register of the instruction prefetch and dispatch unit, allowing the selected NFAPD to be used as the fetch address for the next instruction cache access to achieve zero fetch latency for both control transfer and sequential next fetch.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This is a division of application Ser. No. 08/800,367, filed Feb. 14, 1997, which is a continuation of application Ser. No. 08/363,107, filed Dec. 22, 1994, which is a continuation application of Ser. No. 07/938,371, filed Aug. 31, 1992, all of which are incorporated herein by reference.[0001]
  • DESCRIPTION OF THE INVENTION
  • 1. Field of the Invention [0002]
  • The present invention relates to the field of computer systems. More specifically, the present invention relates to a computer system having a minimum latency cache which stores instructions decoded to determine class, branch prediction and next fetch address prediction information. [0003]
  • 2. Background of the Invention [0004]
  • Historically, when a branch instruction was dispatched in a computer system, instruction fetching and dispatching were stalled until the branch direction and the target address were resolved. Since this approach results in lower system performance, it is rarely used in modern high performance computers. To obtain higher system performance, various techniques have been developed to allow instruction fetching and dispatching to continue in an efficient manner without waiting for the resolution of the branch direction. Central to the efficiency of continuing instruction prefetching and dispatching is the ability to predict the correct branch direction. There are several common approaches to predicting branch direction: [0005]
  • 1. Static prediction: Under this approach, the higher probability direction for a particular branch instruction is ascertained. When the branch instruction is fetched, the ascertained direction is always taken. For example, a direction for a branch instruction maybe set to “Branch Taken”, or alternatively, set to “Branch Not Taken”. [0006]
  • 2. Dynamic software prediction: Under this approach, a branch prediction algorithm predicts the branch direction. [0007]
  • 3. Dynamic hardware prediction: Under this approach, a branch prediction algorithm predicts the branch direction based on the branch history information maintained in a branch prediction table. [0008]
  • The static prediction approach is simple to implement, however, its prediction hit rate is generally less than 75%. Such a prediction hit rate is generally too low for high performance computers. The dynamic software prediction approach generally works quite well when used in conjunction with a compilation technique known as trace scheduling. Without trace scheduling, the prediction hit rate is generally very low. Unfortunately, trace scheduling is difficult to apply to some programs and implementations. The dynamic hardware prediction generally provides an adequate prediction hit rate. However, it increases the complexity of the processor design and requires additional hardware to maintain the separate branch prediction table. Further, if the size of a cache is enlarged in a redesign, the size of the table would also have to be increased, complicating the redesign process. [0009]
  • SUMMARY OF THE INVENTION
  • The present invention relates to a novel computer system. The computer system includes a low latency cache that stores instructions decoded to determine class, branch prediction information, and next address fetch information. [0010]
  • The present invention includes a cache having a plurality of cache lines. Each cache line includes (n) instructions and (n) instruction class (ICLASS) fields for storing the decoded class information of the instructions respectively. Each cache line also includes one or more branch prediction (BRPD) fields and one or more next fetch address prediction (NFAPD) fields. [0011]
  • When an instruction is fetched, the corresponding ICLASS field, BRPD field information and the NFAPD information are all provided to the prefetch and dispatch unit of the computer system. The ICLASS information informs the prefetch unit if the fetched instruction is a branch. Since the instruction has already been decoded to determine it's class, the need to perform a partial decode in the prefetch and dispatch unit to determine if an instruction is a branch instruction is avoided. If the instruction is a branch instruction, the BRPD field provides a prediction of either “Branch Taken” or “Branch Not Taken”. For non-branch instructions, the BRPD field is ignored. For non-branch instructions, the NFAPD typically contains the next sequential address. For branch instructions, the NFAPD contains either the next sequential address or the target address of the branch instruction. If the BRPD field contains a “Branch Taken” prediction, the corresponding NFAPD field typically contains the target address for the branch instruction. Alternatively, if the BRPD field contains a “Branch Not Taken” status, the corresponding NFAPD field typically contains the next sequential address. In any event, the NFAPD information is used to define the next line from the cache to be fetched, thereby avoiding the need to calculate the next fetch address in the prefetch unit. The prefetch and dispatch unit needs to calculate the next fetch address only when a misprediction of a branch instruction occurs. An update policy is used to correct the BRPD and the NFAPD values in the event the predictions turn out to be wrong. [0012]
  • The number of BRPD fields and NFAPD fields per cache line varies depending on the specific embodiment of the present invention. In one embodiment, a specific BRPD field and an NFAPD field is provided for each instruction per cache line. If there is more than one branch instruction per cache line, each branch instruction enjoys the benefit of a dedicated branch prediction and next fetch address prediction. In a simplified embodiment one BRPD field and one NFAPD field is shared among all the instructions per cache line. Under these circumstances, only a dominant instruction in the cache line makes use of the BRPD and the NFAPD information. A dominant instruction is defined as the first branch instruction with a “Branch Taken” status in the cache line. For example, with a dominant instruction, the BRPD field is set to “Branch Taken”, and the NFAPD typically contains the target address for the dominant branch instruction. When the instruction is fetched, control is typically transferred to the target address of the dominant instruction. Since the dominant instruction is the first instruction in a cache line to cause a control transfer, it is not necessary for the other instructions in the cache line to have their own BRPD fields and NFAPD fields respectively. [0013]
  • The present invention represents a significant improvement over the prior art. The need to perform a partial decode or a next fetch address calculation in the prefetch and dispatch unit is eliminated with a vast majority of die fetched instructions. As such, fetch latency is significantly reduced and processor throughput is greatly enhanced.[0014]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The objects, features and advantages of the system of the present invention will be apparent from the following detailed description of the invention with references to the drawings in which: [0015]
  • FIG. 1 is a block diagram of a computer system according to the present invention. [0016]
  • FIG. 2 illustrates a block diagram of an instruction cache in the computer system of the present invention. [0017]
  • FIG. 3 illustrates a block diagram of an instruction prefetch and dispatch unit used in the computer system of the present invention. [0018]
  • FIGS. 4[0019] a-4 b are two flow diagrams illustrating the operation of the instruction prefetch and dispatch unit.
  • FIG. 5 is a flow diagram illustrating the operation of the instruction cache. [0020]
  • FIG. 6 illustrates exemplary line entries in the instruction cache used in the computer system of the present invention. [0021]
  • DESCRIPTION OF THE EMBODIMENTS
  • Referring to FIG. 1, a functional block diagram illustrating a computer system of the present invention is shown. The [0022] computer system 10 includes an instruction prefetch and dispatch unit 12. execution units 14, an instruction cache 16, a data cache 18, a memory unit 20 and a memory management unit 22. The instruction cache 16 and data cache 18 are coupled to the instruction prefetch and dispatch unit 12, the execution units 14, and the memory management unit 22 respectively. The prefetch and dispatch unit 12 is coupled to the execution units 14 and the memory management unit 22. The data cache 18 is coupled to memory 20. The instruction cache 16 is coupled to memory 20.
  • Cooperatively, the [0023] memory management unit 22 and the prefetch and dispatch unit 12 fetch instructions from instruction cache 16 and data from the data cache 18 respectively and dispatch them as needed to the execution units 14. The results of the executed instructions are then stored in the data cache 18 or main memory 20. Except for the instruction prefetch and dispatch unit 12 and the instruction cache 16, the other elements, 14 and 18 through 22, are intended to represent a broad category of these elements found in most computer systems. The components and the basic functions of these elements 14, and 18 through 22 are well known and will not be described further. It will be appreciated that the present invention may be practiced with other computer systems having different architectures. In particular, the present invention may be practiced with a computer system having no memory management unit 22. Furthermore, the present invention may be practiced with a unified instruction/data cache or an instruction cache only.
  • Referring now to FIG. 2, a block diagram illustrating the [0024] instruction cache 16 of the present invention is shown. The instruction cache 16 includes an instruction array 24, a tag array 26, an ICLASS array 27, a predictive annotation array 28, and selection logic 30. The cache is segmented into a plurality of cache lines 34 1 through 34 x. Each cache line 34 includes (n) instructions in the instruction array 24, (m) branch prediction BRPD fields 40, (k) next address prediction NFAPD fields 42 in the predictive annotation array 28, (n) ICLASS fields 44 in the ICLASS array 27, and (n) tags in the tag array 26. It also should be noted that the instruction cache 16 may be set associative. With such in embodiment, individual arrays 24 through 29 are provided for each set in the instruction cache 16.
  • Each of the (n) instructions per cache line [0025] 34 contained in the instruction cache 16 are decoded to determine their class. In one embodiment, the instruction class encodings are stored in the appropriate ICLASS field 44, when the cache line 34 is being brought into the instruction cache 16. In an alternative embodiment, the instruction class encodings are stored before the cache line 34 is brought into the instruction cache 16. Examples of instruction classes are the program counter (PC) relative branch, register indirect branch, memory access, arithmetic and floating point.
  • When the [0026] instruction cache 16 receives a next fetch address from the instruction prefetch and dispatch unit 12, the appropriate cache line 34 is accessed. The (n) instructions, the (m) BRPD fields 40, the (k) NFAPD fields 42, the (n) ICLASS fields 44, and the corresponding tag information, of the cache line are provided to the selection logic 30. In the event the instruction cache 16 includes more than one set, then the selection logic 30 selects the proper line from the plurality of sets. With embodiments having only a single set, the selection logic 30 simply passes the accessed line 34 to the instruction prefetch and dispatch unit 12. The set selection logic 30 is intended to represent a broad category of selection logic found in most computer systems, including the selection logic described in U.S. patent application, Ser. No. 07/906,699, filed on Jun. 30, 1992, now U.S. Pat. No. 5,392,414, entitled Rapid Data Retries From A Data Storage Using Prior Access Predictive Annotation assigned to the same assignee of the present invention.
  • The BRPD fields [0027] 40 and NFAPD fields 42 are initialized in accordance with a pre-established policy when a cache line 34 is brought into the cache 16. When an instruction is fetched, the corresponding ICLASS field 44 information, BRPD field 40 information and the NFAPD field 42 information are all provided to the prefetch and dispatch unit 12. Since the instruction has already been decoded to determine class, the need to perform a full decode in the prefetch and dispatch unit 12 to determine if an instruction is a branch instruction is avoided. If the instruction is a non-branch instruction, the BRPD information is ignored. The NFAPD information, however, provides the next address to be fetched, which is typically the sequential address of the next line in the instruction cache 16. If a predecoded instruction is a branch instruction, the corresponding BRPD field 40 contains either a “Branch Taken” or a “Branch Not Taken” prediction and the NFAPD field 42 contains a prediction of either the target address of the branch instruction or the sequential address of the next line 34 in the instruction cache 16. Regardless of the type of instruction, the predicted next address is used to immediately fetch the next instruction.
  • After a branch instruction is fetched, an update policy is used to update the entries in the corresponding [0028] BRPD field 40 and the NFAPD field 42 when the actual direction of the branch instruction and the actual next fetch address is resolved in the execution units 14. If the branch prediction and next fetch address prediction were correct, execution continues and the BRPD field 40 or the NFAPD field 42 are not altered. On the other hand, if either prediction is wrong, the BRPD field 40 and the NFAPD field 42 are updated as needed by the prefetch and dispatch unit 12. If the misprediction caused the execution of instructions down an incorrect branch path, execution is stopped and the appropriate execution units 14 are flushed. Execution of instructions thereafter resumes along the correct path. The next time the same instruction is fetched, a branch prediction decision is made based on the updated branch prediction information in the BRPD field 40 and the next prefetch address is based on the updated contents of NFAPD field 42.
  • During operation, the BRPD fields [0029] 40 and NFAPD fields 42 are updated in accordance with a specified update policy. For the sake of simplicity, only a single bit of information is used for the BRPD field 40. This means that the BRPD field 40 can assume one of two states, either “Branch Taken” or “Branch Not Taken”. One possible update policy is best described using a number of examples, as provided below.
  • 1. If the BRPD predicts “Branch Taken” and the NFAPD field contains the target address, and the actual branch is not taken. then the BRPD is updated to “Branch Not Taken” and the NFAPD is updated to the next sequential address. [0030]
  • 2. If the BRPD predicts “Branch Taken”, and the actual branch is taken, but the NFAPD misses, then the NFAPD is updated to the target address of the branch instruction. [0031]
  • 3. If the BRPD predicts “Branch Not Taken” and the NFAPD field contains the next sequential address, and the actual branch is taken, then the BRPD is updated to “Branch Taken” and the NFAPD is updated to the target address of the branch instruction. [0032]
  • 4. If the BRPD predicts “Branch Not Taken”, and the actual branch is not taken, but the NFAPD misses, the NFAPD is updated to the sequential address. [0033]
  • 5. If the BRPD predicts “Branch Not Taken”, and the actual branch is not taken, and the NFAPD provides the next sequential address, then the BRPD and NFAPD fields are not updated. [0034]
  • 6. If the BRPD predicts “Branch Taken” and the actual branch is taken and the NFAPD provides the target address, then the BRPD and NFAPD fields are not updated [0035]
  • In summary, the BRPD field and the NFAPD field are updated to the actual branch taken and actual next fetch address. In alternative embodiments, more sophisticated branch prediction algorithms may be used. For example, multiple bits may be used for the [0036] BRPD field 42, thereby providing finer granularity and more information about each branch prediction.
  • In one embodiment, a [0037] specific BRPD field 40 and a corresponding NFAPD field 42 is provided for each instruction per cache line 34 (i.e., n=m=k). As such, each branch instruction per cache line 34 enjoys the benefit of a dedicated branch prediction and next fetch address prediction as stored in BRPD field 40 and corresponding NFAPD field 42 respectively. In a simplified embodiment, one BRPD field 40 (i.e., m=1) and one NFAPD field 42 (i.e., k=l) is shared among all the instructions per cache line 34. With this embodiment, only the dominant instruction in the cache line 34 makes use of the branch prediction information and the next fetch address information. A dominant instruction is defined as the first branch instruction with a “Branch Taken” status in the cache line 34. Therefore, the BRPD contains a “Branch Taken” prediction and the corresponding NFAPD typically contains the target address for the dominating instruction. Since the dominant instruction is the first instruction in the cache line to cause a control transfer, it is not necessary for the other instructions to have their own BRPD fields 40 and NFAPD fields 42.
  • It will be appreciated that the number of BRPD fields [0038] 40 and NFAPD fields 42 is design dependent. As the number of BRPD fields 40 (m) and NFAPD fields 42 (k) increases toward the number of instructions (n) per cache line 34, the likelihood of branch and next fetch address prediction hits will increase. In contrast, as the number of BRPD fields 40 and NFAPD fields 42 approaches one, the likelihood of mispredictions increases, but the structure of cache 16 is simplified.
  • Referring to FIG. 3, a block diagram of the pertinent sections of the prefetch and [0039] dispatch unit 12 are shown. The prefetch and dispatch unit 12 includes a comparator 68, a next fetch address (NFA) register 70, an instruction queue 72, an update unit 74, and a dispatch unit 76. For each instruction, the comparator 68 is coupled to receive the BRPD field 40 and the NFAPD field 42 information from instruction cache 16 and the actual branch direction and next fetch address from the execution units 14. It should be noted that the actual branch and next fetch address typically arrive at the comparator 68 at a later point in time since a certain period of time is needed for the actual branch to resolve in the execution units 14. The comparator 68 determines if the BRPD and the NFAPD are respectively correct, i.e., a hit. If the comparison yields a miss, the BRPD field and/or the NFAPD field 42 information is updated by update circuit 74 in accordance with the update policy described above. The updated BRPD and/or NFAPD information is then returned to the instruction cache 16. The actual NFA also is placed in the NFA register 70.
  • Referring now to FIG. 4[0040] a and FIG. 4b, two flow diagrams illustrating the operation of the prefetch and dispatch until 12 are shown. In FIG. 4a, the instruction prefetch and dispatch unit 12 determines if a fetch/prefetch should be initiated (block 94), if a fetch/prefetch should be initiated, the instruction prefetch and dispatch unit 12 uses the address stored in the NFA register 70 to fetch the next instruction from instruction cache 16 (block 96). In response, the instruction cache 16 provides the instruction prefetch and dispatch unit 12 with the requested instruction. The instruction is then placed into the instruction queue 72. Thereafter, the instruction is dispatched by dispatch unit 76. It should be noted that with each fetched instruction, the corresponding NFAPD value is placed in the NFA register 70 and is used to fetch the next instruction. When the comparator 68 determines that the NFAPD is incorrect, the actual NFA is placed into the NFA register 70, and the fetching of instructions resumes at the actual NFA. The instruction prefetch and dispatch unit repeats the above process steps until the instruction queue 72 is empty or the computer system is shut down.
  • As shown in FIG. 4[0041] b. the instruction prefetch and dispatch unit 12 also receives a branch resolution signal 200 (actual branch) as the branch instruction completes execution in the execution units 14 (block 108). The instruction prefetch and dispatch unit 12 then determines if the branch prediction is correct (diamond 110). If the predicted branch is incorrect, the instruction prefetch and dispatch unit 12 updates the selected BRPD field 40 and the NFAPD field 42 in accordance with the above-defined update policy (block 114). If the selected BRPD predicted the branch direction correctly, the instruction prefetch and dispatch unit 12 determines if the next address in the NFAPD field is correct (block 112). If the selected NFAPD predicted the next fetch address incorrectly, the instruction prefetch and dispatch unit 12 updates the NFAPD (block 116). If the NFAPD is correct, its status remains unchanged.
  • Referring now to FIG. 5, a flow diagram illustrating the operation of the [0042] instruction cache 16 is shown. The instruction cache 16 receives the fetch address from the instruction prefetch and dispatch unit 12 (block 74). In response, the instruction cache 16 determines if there is a cache hit (block 76). If there is a cache hit, selection logic 30, if necessary, selects and provides the appropriate set of instructions and the corresponding ICLASS field 44, BRPD field 40 and NFAPD field 42 information to the instruction prefetch and dispatch unit 12.
  • If there is a cache miss, the [0043] instruction cache 16 initiates a cache fill procedure (block 80). In one embodiment, the instructions accessed from memory 20 are provided directly to prefetch and dispatch unit 12. Alternatively, the instructions may be provided to the instruction prefetch and dispatch unit 12 after the cache line is filled in cache 16. As described earlier, the instructions are decoded to determine their class prior to being stored in the instruction cache 16. Additionally, the BRPD field 40 and NFAPD field 42 are initialized in accordance with the initialization policy of the branch and next fetch address prediction algorithm (block 86).
  • Operation
  • For the purpose of describing the operation of the present invention, several examples are provided. In the provided examples, there is only one (1) [0044] BRPD field 40 and NFAPD field 42 provided per cache line (i.e., m=k=1). For the purpose of simplifying the examples, the BRPD field 42 contains only I bit of information, and therefore can assume only two states; “Branch Taken” and “Branch Not Taken”.
  • Referring to FIG. 6, several lines [0045] 34 1-34 7 of the instruction cache 16 is shown. In this example, there are four instructions (n=4) per cache line 34. The four instructions are labeled, from left to right 4, 3, 2, 1, respectively, as illustrated in column 101 of the cache 16. A “1” bit indicates that the instruction in that position is a branch instruction. A “0” bit indicates that the instruction is some other type of instruction, but not a branch instruction. In column 103, the BRPD fields 40 for the cache lines 34 are provided. A single BRPD field 40 (m=1) is provided for the four instructions per cache line 34. In the BRPD field 40, a “0” value indicates a “Branch Not Taken” prediction and a “1” value indicates “Branch Taken” prediction. With this embodiment, the BRPD information provides the branch prediction only for the dominant instruction in the cache line. The column 105 contains the next fetch address in the NFAPD field 42. A single NFAPD field 42 (k=1) is provided for the four instructions per cache line 34. If the BRPD field 40 is set to “0”, then the corresponding NFAPD field 42 contains the address of the next sequential instruction. On the other hand, if the BRPD field 40 contains a “1”, then the corresponding NFAPD field 42 contains the target address of the dominant instruction in the cache line 34.
  • In the first cache line [0046] 34 1, the four instructions are all non-branch instructions, as indicated by the four “0” in column 101. As such, the corresponding BRPD field 40 is set to “0” “Branch Not Taken” and the NFAPD field 42 is set to the sequential address.
  • The second and third cache lines [0047] 34 2 and 34 3 each include one branch instruction respectively. In the cache line 34 2, the branch instruction is located in the first position, as indicated by the “1” in the first position of column 101. The corresponding BRPD field is set to “0”, and NFAPD is set to “next sequ addr 1”. Accordingly, the branch prediction is “Branch Not Taken”, and the NFAPD is the next sequential address (i.e., 34 3). In the third cache line 34 3, the first instruction is a branch instruction. The corresponding BRPD field is set to “1”, and NFAPD is set to “target addr 1”. The branch prediction algorithm thus predicts “Branch Taken”, and the next fetch address is the “target address 1” of the first instruction.
  • The fourth cache line [0048] 34 4 and fifth cache line 35 5 provide examples of cache lines 34 having two branch instructions. In both lines 34 4 and 34 5, the branch instructions are located in the first and third positions in column 101. With cache line 344, both instructions have a branch prediction set to “B ranch Not Taken”, i.e., there are no dominant instructions. The corresponding field BRPD is therefore set to “0”, and NFAPD is set to “next sequ addr”.
  • In contrast, with the fifth cache line [0049] 355. the branch prediction algorithm predicts “Branch Taken” for the first branch instruction. The first instruction in the cache 35 5 is therefore the dominant instruction of the cache line. The corresponding BRPD field is set to “I”, and NFAPD is set to “target addr 1”. Since the dominant instruction will cause a control transfer, the branch prediction and next fetch address for the third instruction are not necessary. The sixth 34 6 and seventh 34 7 cache lines provide two more examples of cache lines having two branch instructions. In both cache lines, the first and third instruction are branch instructions. In the sixth cache line 346, the branch prediction is “Branch Not Taken”, but the prediction for the second branch instruction is, “Branch Taken”. Accordingly, the third instruction is considered the dominant instruction and the NFAPD field contains the target address for the third instruction of the line. Thus, BRPD is set to “1”, and NFAPD is set to “target address 3”. In the seventh cache line 34 7, the branch prediction for both branch instructions is “Branch Taken”. Since the first instruction is the dominant instruction of the line, the BRPD field is set to “Branch Taken” “1” and the NFAPD field is set to “target addr 1”.
  • In embodiments where the number of BRPD fields [0050] 40 and NFAPD fields 42 equals the number of instructions per cache line 34 (i.e., m-n), the operation of the present invention is straight forward. The BRPD field 40 and the NFAPD field 42 for each branch instruction are used to predict the “Branch Taken” and next fetch address. Further, the BRPD field 40 and the NFAPD field 42 are updated in accordance with the outcome of the respective branch instruction when executed.
  • While the invention has been described in relationship to the embodiments shown in the accompanying figures, other alternatives, embodiments and modifications will be apparent to those skilled in the art. It is intended that the specification be only exemplary, and that the true scope and spirit of the invention be indicated by the following claims. [0051]

Claims (20)

What is claimed is
1. In a computer system comprising at least one execution unit for executing instructions, a method for rapidly dispatching instructions to said at least one execution unit for execution, said method comprising the steps of:
a) storing a plurality of sets of instructions in a plurality of cache lines of an instruction cache array;
b) storing a plurality of corresponding sets of tag and associated control information in a plurality of corresponding tag entries of a corresponding tag array;
c) storing a plurality of corresponding sets of instruction classes in a plurality of corresponding instruction class entries of a corresponding instruction class array, each of said set of instruction classes comprising a plurality of instruction classes for said instructions of said corresponding set of instructions;
d) storing a plurality of corresponding sets of predictive annotations in a plurality of corresponding predictive annotation entries of a corresponding predictive annotation array, each of said set of predictive annotations comprising at least one branch prediction for said instructions of said corresponding set of instructions; and
e) fetching and prefetching repeatedly selected ones of said stored sets of instructions for dispatch to said at least one execution unit for execution using said stored corresponding instruction classes and branch predictions.
2. The method as set forth in claim 1, wherein, said instruction class and predictive annotation entries are stored into said corresponding instruction class and predictive annotation arrays in said steps c) and d) one instruction class and corresponding predictive annotation entry at a time, each of said instruction class and corresponding predictive annotation entries being stored into said instruction class and predictive annotation arrays when their corresponding cache line of instructions is stored into said instruction cache array, each of said branch predictions of said predictive annotation entries being initialized in accordance with an initialization policy of a branch prediction algorithm when its predictive annotation entry is stored into said predictive annotation array.
3. The method as set forth in claim 2, wherein, each of said at least one branch prediction of each of said sets of predictive annotations is initialized to predict “branch will not be taken”.
4. The method as set forth in claim 1, wherein, each of said fetchings and prefetchings in said step e) comprises the steps of:
e.1) accessing one of said cache line of instructions and its corresponding tag, instruction class and predictive annotation entries concurrently using a fetch address;
e.2) selecting one of said sets of instructions from said accessed cache line and a branch prediction from said selected set of instructions' corresponding set of predictive annotations in said accessed cache line's corresponding predictive annotation entry;
e.3) determining a next fetch address from said selected branch prediction;
e.4) determining subsequently whether said selected branch prediction predicts correctly; and
e.5) updating said selected branch prediction in accordance to an update policy of a branch prediction algorithm based on said prediction correctness determination.
5. The method as set forth in claim 4, wherein, said update policy in said step
e.5) updates each of said selected branch predictions as follows:
Branch Prediction, Update Class Actual Policy PC-relative branch PT, ANT PNT −> BRPD[A] PC-relative branch PNT, AT PT −> BRPD[A] PC-relative branch PT, AT No Action PC-relative branch PNT, ANT No Action Register indirect PNT, AT PT −> BRPD[A] control transfer Register indirect PT, AT No Action control transfer Unconditional PC PNT, AT PT −> BRPD[A] control transfer Unconditional PC PT, AT No Action control transfer
6. The method as set forth in claim 1, wherein,
each of said set of predictive annotations in said step d) further comprises at least one next fetch address prediction for said instructions of said corresponding set of instructions; and
said fetchings and prefetchings in said step e) use said stored corresponding next fetch address predictions as well as said instruction classes and branch predictions.
7. The method as set forth in claim 6, wherein, said next fetch address predictions are initialized, accessed, selected, and updated in substantially the same manner as said branch predictions.
8. The method as set forth in claim 7, wherein, each of said at least one next fetch address prediction of each of said sets of predictive annotations is initialized to predict an address that equals to a sum of a program counter and a next sequential fetch block size, said program counter indicating a current fetch address and said next fetch sequential block size indicating a next sequential fetch block's block size.
9. The method as set forth in claim 7, wherein, each of said selected next fetch address predictions is updated as follows:
Next Fetch Branch Prediction, Addr Hit/ Update Type Actual Miss Policy PC relative branch PT, ANT A + FS = NFAPD[A] PC relative branch PT, AT Miss TA −> NFAPD[A] PC relative branch PNT, AT TA −> NFAPD[A] PC relative branch PNT, ANT Miss A + FS = NFAPD[A] PC relative branch PT, AT Hit No Action PC relative branch PNT, ANT Hit No Action Register indirect PNT, AT TA −> NFAPD[A] control transfer Register indirect PT, AT Miss TA −> NFAPD[A] control transfer Register indirect PT, AT Hit No Action control transfer Unconditional PC PNT, AT TA −> NFAPD[A] control transfer Unconditional PC PT, AT Miss TA −> NFAPD[A] control transfer Unconditional PC PT, AT No Action control transfer
10. The method as set forth in claim 7, wherein,
each set of said predictive annotations comprises one branch prediction and one next fetch address prediction, said one branch and next fetch address predictions predicting branch direction and next fetch address for a dominant instruction of its corresponding set of instructions;
said method further comprises the step of f) storing each of said selected next fetch address predictions in a register, said register being used for storing a next fetch address for a next set of instructions to be fetched, said stored next fetch address being also used for selecting said branch prediction and said next fetch address prediction for said next set of instructions to be fetched.
11. In a computer system comprising at least one execution unit for executing instructions, an apparatus for rapidly dispatching instructions to said at least one execution unit for execution, said apparatus comprising:
a) instruction array means comprising a plurality of cache lines for storing a plurality of sets of instructions;
b) tag array means comprising a plurality of tag entries for storing a plurality of corresponding sets of tag and associated control information;
c) instruction class array means comprising a plurality of instruction class entries for storing a plurality of corresponding sets of instruction classes, each of said set of instruction classes comprising a plurality of instruction classes for said instructions of said corresponding set of instructions;
d) predictive annotation array means comprising a plurality of predictive annotation entries for storing a plurality of corresponding sets of predictive annotations, each of said set of predictive annotations comprising at least one branch prediction for said instructions of said corresponding set of instructions; and
e) fetching and prefetching means coupled to said instruction array means, said tag array means, said instruction class array means, and said predictive annotation array means for fetching and prefetching repeatedly selected ones of said stored sets of instructions for dispatch to said at least one execution unit for execution using said stored corresponding instruction classes and branch predictions.
12. The apparatus as set forth in claim 11, wherein,
said instruction class and said predictive annotation array means store each of said instruction class and corresponding predictive annotation entries one instruction class and corresponding predictive annotation entry at a time,
said instruction class and predictive annotation array means store each of said instruction class and corresponding predictive annotation entries into said instruction class and predictive annotation array means when said instruction array means stores its corresponding cache line of instructions into itself,
said predictive annotation array means initializes each of said branch predictions of said predictive annotation entries in accordance to an initialization policy of a branch prediction algorithm when said predictive annotation array means stores its predictive annotation entries into itself.
13. The apparatus as set forth in claim 12, wherein, said predictive annotation array means initializes each of said at least one branch prediction of each of said sets of predictive annotations to predict “branch will not be taken”.
14. The apparatus as set forth in claim 11, wherein, said fetching and prefetching means comprises:
e.1) accessing means for accessing one of said cache line of instructions and its corresponding tag, instruction class and predictive annotation entries stored in said instruction, tag, instruction class and predictive annotation array means concurrently using a fetch address;
e.2) selection means coupled to said instruction, tag, instruction class and predictive annotation array means for selecting one of said sets of instructions from said accessed cache line and a branch prediction from said selected set of instructions' corresponding set of predictive annotations in said accessed cache line's corresponding predictive annotation entry;
e.3) first determination means coupled to said selection means for determining a next fetch address from said selected branch prediction;
e.4) second determination means coupled to said selection means and said execution means for determining subsequently whether said selected branch prediction predicts correctly; and
e.5) update means coupled to said second determination means and said instruction, tag and predictive annotation array means for updating said selected branch prediction in accordance with an update policy of a branch prediction algorithm based on said prediction correctness determination.
15. The apparatus as set forth in claim 14, wherein, said update means updates each of said selected branch predictions as follows:
Branch Prediction, Update Type Actual Policy PC relative branch PT, ANT PNT −> BRPD[A] PC relative branch PNT, AT PT −> BRPD[A] PC relative branch PT, AT No Action PC relative branch PNT, ANT No Action Register Indirect PNT, AT PT −> BRPD[A] control transfer Register Indirect PT, AT No Action control transfer Unconditional PC PNT, AT PT −> BRPD[A] control transfer Unconditional PC PT, AT No Action control transfer
16. The apparatus as set forth in claim 11, wherein,
each of said set of predictive annotations further comprises at least one next fetch address prediction for said instructions of said corresponding set of instructions; and
said fetching and prefetching means uses said stored corresponding next fetch address predictions as well as said instruction classes and branch predictions.
17. The apparatus as set forth in claim 16, wherein,
said fetching and prefetching means comprises accessing mean, selection means and update means for accessing, selecting and updating said branch predictions,
said predictive annotation array means, said accessing means, said selection means, and said update means initializes, accesses, selects, and updates said next fetch address predictions in substantially the same manner as said branch predictions.
18. The apparatus as set forth in claim 17, wherein, said predictive annotation array means initializes each of said at least one next fetch address prediction of each of said sets of predictive annotations to predict an address that equals to a sum of a program counter and a next sequential fetch block size, said program counter indicating a current fetch address and said next sequential fetch block size indicating a next sequential fetch block's block size.
19. The apparatus as set forth in claim 17, wherein, said update means update each of said selected next fetch address predictions as follows:
Next Fetch Branch Prediction, Addr Hit/ Update Type Actual Miss Policy PC relative branch PT, ANT A + FS = NFAPD[A] PC relative branch PT, AT Miss TA −> NFAPD[A] PC relative branch PNT, AT TA −> NFAPD[A] PC relative branch PNT, ANT Miss A + FS = NFAPD[A] PC relative branch PT, AT Hit No Action PC relative branch PNT, ANT Hit No Action Register Indirect PNT, AT TA −> NFAPD[A] control transfer Register Indirect PT, AT Miss TA −> NFAPD[A] control transfer Register Indirect PT, AT Hit No Action control transfer Unconditional PC PNT, AT TA −> NFAPD[A] control transfer Unconditional PC PT, AT Miss TA −> NFAPD[A] control transfer Unconditional PC PT, AT No Action control transfer
20. The apparatus as set forth in claim 17, wherein,
each set of said predictive annotations comprises one branch prediction and one next fetch address prediction, said one branch and next fetch address predictions predicting branch direction and next fetch address for a dominant instruction of its corresponding set of instructions;
said apparatus further comprises e) register means coupled to said fetching and prefetching means for storing each of said selected next fetch address predictions, said register being used for storing a next fetch address for a next set of instructions to be fetched, said stored next fetch address being also used for selecting said branch prediction and said next fetch address prediction for said next set of instructions to be fetched.
US09/927,346 1992-08-31 2001-08-13 Computer system and method for fetching a next instruction Abandoned US20020124162A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/927,346 US20020124162A1 (en) 1992-08-31 2001-08-13 Computer system and method for fetching a next instruction

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US93837192A 1992-08-31 1992-08-31
US36310794A 1994-12-22 1994-12-22
US08/800,367 US6304961B1 (en) 1992-08-31 1997-02-14 Computer system and method for fetching a next instruction
US09/927,346 US20020124162A1 (en) 1992-08-31 2001-08-13 Computer system and method for fetching a next instruction

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US08/800,367 Division US6304961B1 (en) 1992-08-31 1997-02-14 Computer system and method for fetching a next instruction

Publications (1)

Publication Number Publication Date
US20020124162A1 true US20020124162A1 (en) 2002-09-05

Family

ID=25471319

Family Applications (2)

Application Number Title Priority Date Filing Date
US08/800,367 Expired - Fee Related US6304961B1 (en) 1992-08-31 1997-02-14 Computer system and method for fetching a next instruction
US09/927,346 Abandoned US20020124162A1 (en) 1992-08-31 2001-08-13 Computer system and method for fetching a next instruction

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US08/800,367 Expired - Fee Related US6304961B1 (en) 1992-08-31 1997-02-14 Computer system and method for fetching a next instruction

Country Status (5)

Country Link
US (2) US6304961B1 (en)
EP (1) EP0586057B1 (en)
JP (1) JP3518770B2 (en)
KR (1) KR100287628B1 (en)
DE (1) DE69327927T2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010030353A2 (en) * 2008-09-10 2010-03-18 Vns Portfolio Llc Method and apparatus for reducing latency associated with executing multiple instruction groups
US20140115263A1 (en) * 2012-10-19 2014-04-24 Lsi Corporation CHILD STATE PRE-FETCH IN NFAs
US9268570B2 (en) 2013-01-23 2016-02-23 Intel Corporation DFA compression and execution
US9304768B2 (en) 2012-12-18 2016-04-05 Intel Corporation Cache prefetch for deterministic finite automaton instructions
US9665664B2 (en) 2012-11-26 2017-05-30 Intel Corporation DFA-NFA hybrid
US10133982B2 (en) 2012-11-19 2018-11-20 Intel Corporation Complex NFA state matching method that matches input symbols against character classes (CCLS), and compares sequence CCLS in parallel
US10185568B2 (en) 2016-04-22 2019-01-22 Microsoft Technology Licensing, Llc Annotation logic for dynamic instruction lookahead distance determination

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5367703A (en) * 1993-01-08 1994-11-22 International Business Machines Corporation Method and system for enhanced branch history prediction accuracy in a superscalar processor system
US5742805A (en) * 1996-02-15 1998-04-21 Fujitsu Ltd. Method and apparatus for a single history register based branch predictor in a superscalar microprocessor
US5774710A (en) * 1996-09-19 1998-06-30 Advanced Micro Devices, Inc. Cache line branch prediction scheme that shares among sets of a set associative cache
US6253316B1 (en) 1996-11-19 2001-06-26 Advanced Micro Devices, Inc. Three state branch history using one bit in a branch prediction mechanism
US5978906A (en) 1996-11-19 1999-11-02 Advanced Micro Devices, Inc. Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
US5954816A (en) * 1996-11-19 1999-09-21 Advanced Micro Devices, Inc. Branch selector prediction
US5995749A (en) * 1996-11-19 1999-11-30 Advanced Micro Devices, Inc. Branch prediction mechanism employing branch selectors to select a branch prediction
US5974538A (en) * 1997-02-21 1999-10-26 Wilmot, Ii; Richard Byron Method and apparatus for annotating operands in a computer system with source instruction identifiers
US6108774A (en) * 1997-12-19 2000-08-22 Advanced Micro Devices, Inc. Branch prediction with added selector bits to increase branch prediction capacity and flexibility with minimal added bits
US6112299A (en) * 1997-12-31 2000-08-29 International Business Machines Corporation Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching
US6636959B1 (en) 1999-10-14 2003-10-21 Advanced Micro Devices, Inc. Predictor miss decoder updating line predictor storing instruction fetch address and alignment information upon instruction decode termination condition
US6647490B2 (en) * 1999-10-14 2003-11-11 Advanced Micro Devices, Inc. Training line predictor for branch targets
US6546478B1 (en) 1999-10-14 2003-04-08 Advanced Micro Devices, Inc. Line predictor entry with location pointers and control information for corresponding instructions in a cache line
US6502188B1 (en) 1999-11-16 2002-12-31 Advanced Micro Devices, Inc. Dynamic classification of conditional branches in global history branch prediction
US7769983B2 (en) * 2005-05-18 2010-08-03 Qualcomm Incorporated Caching instructions for a multiple-state processor
US7590825B2 (en) 2006-03-07 2009-09-15 Intel Corporation Counter-based memory disambiguation techniques for selectively predicting load/store conflicts
US7711927B2 (en) 2007-03-14 2010-05-04 Qualcomm Incorporated System, method and software to preload instructions from an instruction set other than one currently executing
US9455743B2 (en) * 2014-05-27 2016-09-27 Qualcomm Incorporated Dedicated arithmetic encoding instruction
CN112540795A (en) * 2019-09-23 2021-03-23 阿里巴巴集团控股有限公司 Instruction processing apparatus and instruction processing method

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4228498A (en) 1977-10-12 1980-10-14 Dialog Systems, Inc. Multibus processor for increasing execution speed using a pipeline effect
US4437149A (en) * 1980-11-17 1984-03-13 International Business Machines Corporation Cache memory architecture with decoding
US4435756A (en) 1981-12-03 1984-03-06 Burroughs Corporation Branch predicting computer
US4894772A (en) 1987-07-31 1990-01-16 Prime Computer, Inc. Method and apparatus for qualifying branch cache entries
US5142634A (en) * 1989-02-03 1992-08-25 Digital Equipment Corporation Branch prediction
US5129067A (en) * 1989-06-06 1992-07-07 Advanced Micro Devices, Inc. Multiple instruction decoder for minimizing register port requirements
US5136697A (en) * 1989-06-06 1992-08-04 Advanced Micro Devices, Inc. System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
US5226130A (en) 1990-02-26 1993-07-06 Nexgen Microsystems Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency
US5230068A (en) 1990-02-26 1993-07-20 Nexgen Microsystems Cache memory system for dynamically altering single cache memory line as either branch target entry or pre-fetch instruction queue based upon instruction sequence
DE69130588T2 (en) * 1990-05-29 1999-05-27 Nat Semiconductor Corp Partially decoded instruction cache and method therefor
JPH04111127A (en) * 1990-08-31 1992-04-13 Toshiba Corp Arithmetic processor
WO1992006426A1 (en) * 1990-10-09 1992-04-16 Nexgen Microsystems Method and apparatus for parallel decoding of instructions with branch prediction look-up
US5265213A (en) 1990-12-10 1993-11-23 Intel Corporation Pipeline system for executing predicted branch target instruction in a cycle concurrently with the execution of branch instruction
WO1993017385A1 (en) * 1992-02-27 1993-09-02 Intel Corporation Dynamic flow instruction cache memory

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010030353A2 (en) * 2008-09-10 2010-03-18 Vns Portfolio Llc Method and apparatus for reducing latency associated with executing multiple instruction groups
WO2010030353A3 (en) * 2008-09-10 2010-06-10 Vns Portfolio Llc Method and apparatus for reducing latency associated with executing multiple instruction groups
US20140115263A1 (en) * 2012-10-19 2014-04-24 Lsi Corporation CHILD STATE PRE-FETCH IN NFAs
US9268881B2 (en) * 2012-10-19 2016-02-23 Intel Corporation Child state pre-fetch in NFAs
US10133982B2 (en) 2012-11-19 2018-11-20 Intel Corporation Complex NFA state matching method that matches input symbols against character classes (CCLS), and compares sequence CCLS in parallel
US9665664B2 (en) 2012-11-26 2017-05-30 Intel Corporation DFA-NFA hybrid
US9304768B2 (en) 2012-12-18 2016-04-05 Intel Corporation Cache prefetch for deterministic finite automaton instructions
US9268570B2 (en) 2013-01-23 2016-02-23 Intel Corporation DFA compression and execution
US10185568B2 (en) 2016-04-22 2019-01-22 Microsoft Technology Licensing, Llc Annotation logic for dynamic instruction lookahead distance determination

Also Published As

Publication number Publication date
KR100287628B1 (en) 2001-06-01
JP3518770B2 (en) 2004-04-12
JPH06208463A (en) 1994-07-26
EP0586057B1 (en) 2000-03-01
EP0586057A2 (en) 1994-03-09
EP0586057A3 (en) 1994-06-22
DE69327927D1 (en) 2000-04-06
US6304961B1 (en) 2001-10-16
DE69327927T2 (en) 2000-10-12
KR940004436A (en) 1994-03-15

Similar Documents

Publication Publication Date Title
US6304961B1 (en) Computer system and method for fetching a next instruction
JP2531495B2 (en) Method and system for improving branch history prediction accuracy in a superscalar processor system
US5623614A (en) Branch prediction cache with multiple entries for returns having multiple callers
US7493480B2 (en) Method and apparatus for prefetching branch history information
USRE35794E (en) System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
US5941981A (en) System for using a data history table to select among multiple data prefetch algorithms
US5790823A (en) Operand prefetch table
US6601161B2 (en) Method and system for branch target prediction using path information
US6065115A (en) Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction
US6611910B2 (en) Method for processing branch operations
US6442707B1 (en) Alternate fault handler
EP0180725B1 (en) Instruction prefetch operation for branch instructions
US6351796B1 (en) Methods and apparatus for increasing the efficiency of a higher level cache by selectively performing writes to the higher level cache
US20050125632A1 (en) Transitioning from instruction cache to trace cache on label boundaries
EP0927394B1 (en) A cache line branch prediction scheme that shares among sets of a set associative cache
US5935238A (en) Selection from multiple fetch addresses generated concurrently including predicted and actual target by control-flow instructions in current and previous instruction bundles
CN1790256A (en) Branch lookahead prefetch for microprocessors
US5964869A (en) Instruction fetch mechanism with simultaneous prediction of control-flow instructions
US5784711A (en) Data cache prefetching under control of instruction cache
US7107437B1 (en) Branch target buffer (BTB) including a speculative BTB (SBTB) and an architectural BTB (ABTB)
US6289444B1 (en) Method and apparatus for subroutine call-return prediction
US5918044A (en) Apparatus and method for instruction fetching using a multi-port instruction cache directory
US5796998A (en) Apparatus and method for performing branch target address calculation and branch prediciton in parallel in an information handling system
EP0798632B1 (en) Branch prediction method in a multi-level cache system
US6978361B2 (en) Effectively infinite branch prediction table mechanism

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION